Cluster Networking: TCP/IP Over Ethernet

Article Index

RFCs and the Internet Protocol

Unlike Ethernet's "open" but "not public and not free" specifications, the specification documents of the Internet Protocol are truly open, truly public, and truly free. They were developed by a loose consortium of super-geeks working in academia, industry, and in government labs, funded by the Defense Advanced Research Projects Agency (DARPA) and openly published as RFCs, and are one of the most marvelous of human works of all time. DARPA has long since relinquished primary control of the Internet to the Internet Engineering Task Force (IETF) but the Internet remains as a shining proof that defense research can produce peaceful dividends. The RFC process itself has proved to be a tremendous contribution in its own right. It is a nearly perfect realization of a genetic optimization process that allows for evolutionary growth, and is the direct parent for hundreds of mailing lists and development groups that even today directly drive the technical development of the Internet and Open Source software such as Linux and FreeBSD.

At this point a rather large fraction of the world's economy is derived from the network DARPA conceived and funded. Not even the much-touted space program has paid off its investment so overwhelmingly. Note that I'm using my bully pulpit to draw a harsh contrast between two competing "standards" paradigms -- IEEE's semi-closed process that ultimately yields intellectual property belonging to, blessed by and resold by the IEEE versus the fully open RFC process that leads to an openly and freely published standard specification. It is pretty clear which one I think is superior.

The particular RFC that originally specified IP itself is RFC 791 although there are others that govern (for example) the particular encapsulation of IP within Ethernet packets we are about to discuss and various extensions or modifications. All RFCs are readily available on the Internet where you can read them for free. The Resource sidebar has links to information should you wish to browse.

An IP packet is called a datagram, emphasizing the metaphor that it is like a piece of mail or a telegram -- it has an "envelope" (the header) that tells where and how to send its "contents", the actual message.

The IP specification actually goes beyond just providing hierarchical, routable, maskable addresses. It also provides for the rudiments of reliable data transmission. As we examine an IP header below, we'll note that it has a lot more fields, fields that deal with fragmentation (data streams that are too large to fit into a single packet's MTU), lost packets, and more. IP alone still isn't very reliable over wide area networks, but it is more reliable than Ethernet. As you doubtless recall from my last column, it is TCP that adds true reliability to the data transmission stream, where IP mostly adds routability.

Figure Two shows the IP header as it appears in RFC 791.

Figure 2: IP Header from RFC 791

    0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  IHL  |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |    Protocol   |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source Address                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields in Figure Two are as follows: four bits for the IP version (version 4 for the headers we are describing, hence IPV4), four bits for the Internet Header Length (which points to the beginning of the data in 32 bit words, minimum value of 5), 8 bits for Type of Service (really quality of service), 16 bits for length of datagram including header(s) (which consequently must be less than or equal to 65,535 bytes) and several fields associated with fragmentation.

Next is the "Time to Live" (TTL) field, which is very is important. When a new packet is created this field is filled with a value determined by your IP implementation (typically 32 or 64, obviously less than 256 for IPV4) and then decremented by every IP (router) hop the packet traverses moving toward its destination. If the count ever reaches 0, the packet is killed. This prevents packets from being perpetuated by networking loops and clogging up the Internet the way Yorkshire pudding clogs up your arteries. Of course, it also means that if it is started at too low a value or if the network route is too long, packets may not reach their destination even when the network is traversable.

This condition can happen, especially if your TTL is set to 32 (a number that used to be generous compared to the "hop radius" of the Internet. The /usr/sbin/traceroute command is a good way to determine the number of routing hops between locations, although it can also fail if one of the intervening hops blocks Internet Control Message Protocol (ICMP) packets (a component of IP described by RFC 792).

You can see your (Linux) system's default TTL by entering:

cat /proc/sys/net/ipv4/ip_default_ttl
and can alter it with the /sbin/sysctl command, although this is likely not necessary or advisable.

The protocol specifies the next layer of encapsulation used by the packet. The header checksum makes corruption of the header itself (only) detectable. The rest of the header is much like the Ethernet header but with a different order: source address first followed by destination address, each four bytes long (for IPv4). The options and padding are themselves optional in that they may not appear in all packets. Sometimes data will start right after destination address, sometimes not. The minimum (and typical) length of a primary IP header is thus twenty bytes (5 32 bit words). The maximum length is specified in the RFC as 60 bytes depending on options used and padding.

Following the header is the data in the datagram. The shortest message that can be sent is one byte of data accompanied by 20 bytes of header, or 21 bytes total (with no fragmentation possible).

The entire IP datagram must itself be encapsulated as the "data" part of an Ethernet packet. This encapsulation is not arbitrary, as you might expect there is an RFC (894) that describes how it is to be done. Fortunately this is a very simple RFC and the encapsulation is done in pretty much the obvious way. The IP datagram becomes the data part of the Ethernet packet (basically wrapping it inside an Ethernet prologue/header and epilogue/footer). Small datagrams are padded with zeros as needed to reach the minimum Ethernet data size, but the padding is not included in the datagram length so the zeros are ignored by the receiving system that unwraps the packets to get at their contents.

Lets quit here for now, and come back to this next time where we will learn that IP over Ethernet has a dark side (we promised you dark secrets, if you recall). As it stands, it is connectionless (you throw a packet out there hoping it will find a home). It isn't reliable (it may or may not ever find a home but the protocol has no way of determining if it does or doesn't). It is quite costly for certain patterns of messages. Some of these issues will be addressed by adding on the TCP layer, and some won't. Hopefully we'll see you there. {mosgoogle right}

Sidebar: Networking Resources
Charles Spurgeon's Ethernet Web Site is This is a truly excellent resource and has been converted into an O'Reilly book.

Javin's Protocol Dictionary: This site has a nice review of the 802.3 specifications and the structure of packets, in particular the changes associated with gigabit Ethernet.

Charles L. Hedrick's Introduction to the Internet Protocols document is the document I originally used to learn about TCP/IP networking.

The original (and still operant) RFCs that define e.g. TCP and IP. However, there are many more, including RFCs that deal specifically with sending TCP/IP over Ethernet.

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux, you may wish to visit Linux Magazine.

Robert Brown, Ph.D, is has written extensively about Linux clusters. You can find his work and much more on his home page

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.