|
Page 1 of 2
Introduction
The SC conference is always a lot of fun because there are so many
cool new things at the show, you get to see people that you have only
emailed, you get to see old friends, and you get to "geek out"
without too much grief from your family. This year's show was no
exception. It was the largest SC conference ever and had lots of new
announcements and even included a large presence from Microsoft. Of course, Cluster Monkey has already commented about
this turn of events.
This year's SC was a good show. I didn't get to any of the
presentations, but I did try to see as much of the show floor as I
could. So I want to share with you what I saw and what I learned in
talking with the various vendors. However the show floor was huge so
I may have to rely on press releases to get me over the hump.
In addition to this summary, which is by no means a complete synopsis of the show, check out Joe Lanman's Blog as well.
SC05 - Location, Location, Location
This year's SC Conference was at the Washington State Convention
and Trade Center in Seattle Washington. It was a great location for
the conference although the exhibition hall had to be split into two
parts to accommodate all of the vendors. However I may be a little
prejudice since I used to live in Seattle.
The Convention Center is located just a few blocks up from Pike
Place Market and it is literally surrounded by coffee places,
particularly Starbucks. I swear that you can't walk 20 feet without
running into a coffee place. On the other hand, I don't like
Starbucks because it tastes like they've burned their beans. Anyway,
there are a number of hotels in the area, plenty of places to eat,
and, more importantly, lots of watering holes that serve the best of
the local micro-breweries (never mind Dungeness Crab).
Monkey Get Together
The First Monkey Get Together was a huge success. A number of
people showed up and all of the hats were given out (although we had
to save one for rgb. So
if you see someone with a yellow hat with a monkey on it on the Duke
campus, introduce yourself! rgb is a great guy.). I got to see some
old friends like Roger Smith, Joey, and Trey from the
ERC
at Mississippi State, Dan
Stanzione of Cluster Monkey-famedom, Glen Otero - International
Man of Mystery and Super Cluster Monkey (ladies, he's still single
and still a body builder), and others. I also got to meet some new
friends like Josip Loncaric. Josip was an early major contributor to
clusters. He made a small change to to the TCP stack in the 2.2
kernel, greatly improving the TCP performance. He now works at Los
Alamos on aspects of clusters and high performance computing. It was
a real honor to meet him and to talk to him (a little hero worship
going on there).
I also spent some time talking to
Dimitri
Mavriplis, who is a professor at the University of Wyoming. He is
one of the best CFD (Computational Fluid Dynamics) researchers in the
world. It was great fun to talk about CFD with him since that's one
of my interests, as well as clusters (he uses clusters in his
research). If you are looking for CFD codes for your clusters, Dr.
Mavriplis is the man to talk to.
I think the Monkey Get Together was a big success. It was very
nice to see such a ground swell of support for clusters, particularly
beowulfs, from such a cross-section of the community. There were
various people there from competing companies but they can discuss
the state of clusters and the future of clusters in a constructive
and passionate way.
Linux Networx Announcements
I'd like to talk about the Linux
Networx announcements at the conference, but I need to disclose
that I work for Linux Networx so you can view this as a shameless
plug if you like. However, even if I didn't work for Linux Networx, I
would still write about their announcements and you'll see why in the
following paragraphs.
Linux Networx introduced two new clusters: the LS-1 and the LS/X.
These two systems represent a new approach to clusters - bringing
them to the systems level. Doug as mentioned in one of his recent
writings
that clusters were a disruptive influence on HPC at many levels, one
of them being "disruptive support." Doug went on to say
that, "... . There are integrated clusters from larger vendors
that reduce the number of user options in order to increase the level
of performance, integration, and support. ..." This is precisely
what Linux Networx has done. The key concept is to take a systems
approach to clusters and make them easier to use, easier to manage,
easier to support, and easier to upgrade. Both LS-1 and LS/X embody
this philosophy.
 Full-Height LS-1 and Half-Height LS-1. Courtesy of Linux Netwrx
LS-1
The LS-1
has been designed based on the years of experience Linux Networx has
with clusters using the "best of breed" components and
processes. The LS-1 is designed for the small to medium range market
with up to 128 nodes. The current LS-1 system is Opteron only with
dual socket nodes that are dual-core capable. You can also choose to
have a GigE network, Myrinet 2G network, or an Infiniband network
(Infinipath is coming around 1Q of 2006). There are also a number of
storage options that range from simple NFS boxes to parallel file
systems with great IO performance. At SC05 there was also a
technology demo of a parallel visualization capability for the LS-1.
Linux Networx is working very hard on visualization. To give you a
little insider information, I think the resulting visualization
product will be really neat and cost much less than the equivalent
SGI visualization equipment (Not that I'm biased or anything).
LS/X
The LS/X is
designed for the upper range of supercomputer performance. It uses a
mid-plane architecture where the boards slide into an 8U sub-rack (I
guess you can call them blades). Linux Networx is currently shipping
a 4-socket Opteron node (dual-core capable) with two built-in
Infinipath
NICs, two GigE NICs, and up to 64 GB of memory. For each 4-socket
node there are also two bays at the rear of the rack that allow
either two SATA drives or two PCI-Express cards to be connected to
the node. Linux Networx is also doing some 8-socket boards for
special situations, but they may or may not be generally available.
However at SC05, Linux Networx was showing an 8-socket node with 8
Opteron sockets (dual-core capable), 4
Infinipath
NICs, 4 GigE NICs, up to 128 GB of memory, and up to four SATA or
four PCI-Express cards per node. Up to 6 of the 4-socket nodes can be
put into an 8U sub-rack and up to 4 sub-racks in a normal rack, for a
total of up to 96 sockets in a single rack.
 Three racks of LS/X. Courtesy of Linux Networx
The LS/X nodes slide into a mid-plane to get their power (from a
DC PDU in the bottom of the rack), communication, and expandability.
The sub-racks have built-in Tier-1 switching for the Infinipath and
GigE networks. The racks can also have Tier-2 switching in the bottom
of the rack. These built-in switches greatly reduce the number of
required cables. For a full rack you only need 17 cables!! A very
high percentage of the parts of the nodes are field replaceable (you
just pull them out and put in a new one). The racks are also designed
to sit over vented tiles in a raised floor area to pull air up into
the rack. This eliminates hot air recirculation. The performance of
the LS/X is setting records on benchmarks which should be posted on
the website soon. It is very competitive to the IBM Blue Gene, Power
5, Cray X1, Cray XD1 on the HPC Challenge Benchmark. In some cases it
has the best performance of any of these systems.
The Intel booth was right next to the Linux Networx booth so I did
want to mention that an Intel person, who watched the unveiling of
the LS-1 and the LS/X on Monday night, commented that they thought
the systems were the "...sexiest machines on the floor..."
despite not having Intel chips in them.
Pathscale and Infinipath
I spent some time talking to the Pathscale
folks. They are great people to talk to since they know so much and
they are so enthusiastic about clusters. Greg Lindahl took some time
to demonstrate how to use their compilers to search for the best set
of compile flags for performance for a given code. Very cool feature.
However, what was even more interesting was that they like to hear
what compiler flags people end up using for what codes. Greg said
this helps them understand how to improve their compiler. Part of the
improvements come from knowing how to better optimize code and part
comes from knowing what options are routinely used and how to improve
them. he had some very interesting comments about what compile
options work well for certain codes.
Even more exciting than their compilers is their
Infinipath
interconnect. They announced this new interconnect a while ago, but
it is now shipping in quantity. Let me tell you, this interconnect is
really hot stuff. Pathscale has taken a great deal of care to
understand how various parameters affect code performance. While
things such as zero-byte packet latency and peak bandwidth are
important in some respects, Pathscale has realized that things such
as N/2 and message rate are perhaps more important. N/2 is the packet
size where the interconnect reaches half of the bandwidth (basically
the bandwidth in one direction). You want the smallest N/2 possible
for the best code performance and Infinipath has it. In addition, you
want the fastest message rate possible out of the NIC for the best
performance (seems obvious but I never thought about before).
Pathscale took this into account when designing their NIC. They have
the best message rate of any interconnect that I know of. In
addition, the performance of the NIC gets better as you add cores.
Imagine that?
Pathscale has a number of papers on their website that discuss
Infinipath and the influence of network performance on code
performance and scalability. You can download the papers from their
website. They
are very useful and informative.
Since I work for Linux Networx and we are using the Infinipath
ASIC in our new LS/X system, I can safely say that the benchmarks
I've seen using the Infinipath NIC are amazing. We should be posting
benchmarks in the near future, but I can safely say that the results
will stun people. Very, Very fast.
10 GigE
I've been watching 10 Gigabit Ethernet (GigE) for over a year when
companies started to talk about 10 GigE NICS (Network Interface
Card). At last year's SC Conference
Neterion
(formerly S2IO) and Chelsio
were showing 10 GigE NICs, primarily using Fiber Optic connections.
They were expensive, but so was GigE a few years ago. However, the
really large problem was the cost of 10 GigE switches. So I walked
around the floor at SC05 talking to various Ethernet switch companies
as well as the 10 GigE NIC vendors.
The general consensus was that the prices for 10 GigE NICs are
coming down quickly and will continue to do so. Plus copper 10 GigE
NICs are common now. But, perhaps more importantly, 10 GigE switches
prices are coming down. The 10 GigE switch prices are coming down
from the traditional HPC Ethernet companies such as
Foundry,
Force10, and
Extreme
Networks. However, the biggest price drops are coming from,
perhaps unexpectedly, companies that either haven't traditionally
played in the HPC space, are new companies, or companies that are new
to Ethernet.
Chelsio
Chelsio was showing their 10 GigE NICs at SC05. They have the
lowest list priced 10 GigE NICs I've seen. Their T210-CX 10 GigE NIC
has a copper connection while the T210 NIC has a fiber connection.
Both are PCI-X NICs (maybe if we ask hard enough they will do a
PCI-Express version). They both have RDMA support as well as TOE (TCP
Off-Load Engine). Chelsio also has a "dumb" NIC that does
not have RDMA or TOE support and uses fiber connectors (N210).
Chelsio is also using their 10 GigE technology for the rapidly
expanding iSCSI market. At SC05 they announced a PCI-Express based 10
GigE NIC with 4 ports, TOE and iSCSI hardware acceleration.
10 GigE Switches
I didn't get to talk to the primary 10 GigE switch companies -
Foundry, Force10 or Extreme, so I'm going to have to rely on their
websites and press releases. Foundry
currently has a range of switches that can accommodate 10 GigE line
cards. Their high end switch, the BigIron RX-16, can accommodate up
to 64 10 GigE ports in a single chassis. At the lower end, their
SuperX series of switches can accommodate up to 16 ports of 10 GigE.
Force10 has the
largest 10 GigE port count in a single chassis that I know of. On
Oct. 31 they
announced
that they had new line cards for their Terascale E-Series switches
that would allow them to go to 224 ports of 10 GigE in a single
switch (14 line cards with 16, 10 GigE ports per line card). At that
size they also said the price per port would be about $3,600. By they
way, in the same switch chassis you can also put 1,260 GigE ports.
Extreme Networks was
also at SC05. They have a large switch, the
BlackDiamond 10808
that allows up to 48 ports of 10 GigE. They are also
working with Myricom to use their 10 GigE switches with Myricom's new
10G interconnect NICs.
While not necessarily new, there were some companies showing small
port count 10 GigE switches with the lowest per port cost available.
Fujitsu was proudly displaying
their 12 port, 10 GigE switch. It is one of the fastest 10 GigE
switches available with a very low per port cost of approximately
$1,200.
Already companies are taking advantage of the Fujitsu 12 port 10
GigE ASIC. One of the traditional HPC interconnect companies,
Quadrics,
is branching out into the 10 GigE market. At SC05, they were showing
a new 10 GigE switch
that uses the Fujitsu ASIC. The switch is an 8U
chassis that has 12 slots for 10 GigE line cards. Each line card has
eight 10 GigE ports that connect using CX4 connectors (they look like
the new "thin" Infiniband cables). This means that the
switch can have up to a total of 96 ports of 10 GigE. The remaining
four ports on the line card are used internally to connect the line
cards in a fat tree configuration. This means that the network is 2:1
oversubscribed but looks to have very good performance. This will one
of the largest 10 GigE single switch on the market when it comes out
in Q1 2006 (that I know of). No prices have been announced, but I've
heard rumors that the price should be below $2,00 a port. Quadrics
also stated in their press release that they will have follow-on
products that increase the port count to 160 and then 1,600.
I also spoke with a new company,
Fulcrum Micro and talked
to them about a new 10 GigE switch ASIC they are
developing. It has great performance (about 200 nanosecond latency)
with up to 24 ports and uses cut-through rather than
store-and-forward to help performance. The ASIC will be available in
Jan. 2006 for about $20/port. A number of vendors are looking at them
for making HPC centric 10 GigE Ethernet switches. They have a nice
paper that talks about how to take the 24-port 10 GigE switches,
built using their ASICs of course, and construct a 288-port fat-tree
topology with full bandwidth to each port. The fat-tree would only
have a latency of about 400 nanoseconds (two tiers of switches).
Maybe the ASICs from Fulcrum Microsystems will get 10 GigE over the
price hump and get it on par with other high speed interconnects.
|