Nothing stops a cluster geek, not even surgery
As some of you might know I ended up spending a great deal of SC08 either in the hospital or in my hotel room recovering from emergency surgery. Not the best way to spend SC since it only comes once a year, but my body just didn't allow it. However, I did get out a little on the last day to run around the show floor like a mad man.
SC is always a very interesting conference for me for many reasons. I get to see some cool new toys, see old friends, make new ones, and totally geek-out for a week without my family rolling their eyes at me in total embarrassment. So, without further ado, here are my (limited) impressions of SC08
Austin's Really Great
I get to spend a great deal of time in Austin because of my day job but it's not downtown. So it was very interesting for me to be downtown in Austin, especially near the night-life around 6th street. It's a much, much better destination that Reno for SC07 and Tampa for SC06. There are tons of places to eat including some places with good steaks and BBQ (I'm a huge BBQ nut). The prices can be a little steep but at least we can find places to eat (unlike Tampa) and we didn't have to cut through the smoke to get anywhere (like Reno). So, hat's off to the SC committee - Austin was a pretty good pick. BTW - the hospital near the downtown, Brackenridge, is a top notch hospital and is a main trauma center. The people there were spectacular to say the least. But then again, I'm not going to judge the location for SC based on the quality of the hospitals. But given my rapidly advancing age, it may become one of my key criteria for future SC conferences.
I think several other people (e.g. Doug and Joe Landman), have mentioned that the show floor felt less full than usual. I don't know what the final attendance was, but I do know that a number of people who were supposed to come, canceled at the last minute. I guess the reality of operating expenses has hit just about everyone. But walking around the floor, I got the distinct impression that the attendance was down.
Another impression I got was that the number of "customer" booths was way up. Remember that SC is a unique conference in that the vendors and their customers all share the same exhibit floor. To me, it seemed as though the number of customer booths, primarily universities and national labs, was up considerably. I didn't stop to talk to many of them but my usual favorites, TACC and aggregate.org were there and in rare form. I did see more universities from Asia which I think is a good sign. At the same time, the national lab booths just seem to be getting bigger and more elaborate every year. I waiting for the day when the largest and loudest booth with the best swag is not a vendor but rather a national lab. When that happens I think it will speak volumes about the HPC industry and funding. But, I digress.
One other impression I have is the general buzz was different as well. It didn't seem as "fun" as past SC shows. There seemed to be a "bite" in every conversation. My favorite conversations were academics talking to vendors or sometimes at other non-vendor booths. These conversations got very heated with some academics raising their voice telling the vendors that they were dead wrong and they were hurting the industry and if they only listened to them they had a solution to whatever problem there was. In 5+ years of SC's I never quite heard conversations get this heated. Things ware usually very pleasant and at the every least fun technically. But when you get people who are absolutely convinced they are absolutely right and their sensitivity, for whatever reason, is heightened, makes for a really argumentative environment. And I didn't get this impression from one or two discussions but many of them. Sigh... I hope it was just because some people were grumpy, but if the general attitude is true then I don't think it's a good sign for the community (I won't even talk about the beowulf list which has almost become next to useless, but that's another story... :) ).
Cool Stuff for HPC
Since I didn't walk around the show floor too much, I will have to rely on press releases and website information to help. I always look for a "theme" or two in the shows and this year I think I can definitely find one theme and perhaps a second and third theme. The main theme of this year's show, at least to me, was GPUs.
Everyone was talking about or demoing GPUs for HPC. I've been following GPUs for a number of years and I was glad to see them come to the forefront this year. A number of vendors were demo-ing systems with GPUs such as Cray, Bull, Dell, NEC, HP, BOXX, Mathematica, Lenovo and others. Plus there seemed to be lots of discussion about tools for GPUS with many people expressing hope that OpenCL would be the savior of GPU coding.
Nvidia had a press release about what other companies are doing to incorporate Nvidia GPUs into Personal Supercomputers. In general, the plan is to use Nvidia's C1060 card in a workstation or rack mount system. They even have a website that discusses personal supercomputers using Tesla cards.
You can go to the Home of CUDA, Nvidia's freely available tool for building GPU codes. There are a number of examples of speedups obtained from running on GPUs. However, getting your application to run on GPUs is not as simple as a "make" or adding a new variable to a command line (e.g. "-gpu"). You still have to rethink your algorithms to take advantage of the GPU. While this sounds easy, it's not. You have to retrain the way you think to take advantage of the GPU. But if you can coerce your code into running on GPUs, the potential for magnitude increases in performance is there. Keep in mind that not all codes or algorithms may be able to take advantage of GPUs.
While Nvidia was the main talk in regard to GPUs, Aprius was also there showing off an interesting box called the CA8000 Computational Acceleration System. It's a 4U box that contain up to eight (8) PCIe boards - most likely computational acceleration cards (e.g. GPUs). Each card can be a PCIe x16 Gen 2 card that is double wide that draws up to 300W. Ideally, you populate the CA8000 with a few cards such as GPUs, and then use the Aprius PCIe Optical adapters in the box that allow you to connect the box to a single node or multiple nodes. You can use up to four (4) of these adapters and four (4) cards. This is perfect for situations where the compute nodes cannot handle a GPU directly (either they don't have the right kind of slot or they don't have enough power). Using the connectors you can get a 2:1 or a 4:1 accelerators/node ratio with this box.
Since AMD doesn't have an external GPU box as Nvidia does, the CA8000 is perfect for AMD GPU solutions. It also matches Nvidia's recommended ratio of no more than 2 cores per CPU. But the CA8000 does not offer the density that the Nvidia S1070 1U box offers. Nonetheless, I think this box is very interesting for a variety of reasons - it allows nodes that can't have a GPU to connect to GPUs, it gives AMD an external solution that comes close to Nvidia's solution.Nvidia was also on the floor in full force. Their booth is always good and they have some real technical experts floating around (unlike some companies who stuff the booths with eye-candy and they don't send anyone with technical skills to back them up - but that's another story). Due to my horribly limited time I didn't get to chat with Nvidia. I'm sorry I missed that since that's always a highlight of the show.
One of the coolest announcements and one I was really looking forward to digging into was that the Portland Group announced the new version (8.0) of their compiler suite. While the compilers are always good and PGI continues to make them better, I think this suite could represent the beginning of a huge trend for GPUs - integrating GPU code generation into standard compilers
The idea is that standard compilers have the ability to generate code for GPUs. Of course you have to write code that the compilers recognize or even better, the compiler could have a compile option such as "-gpu" that would look at the code and generate GPU code where appropriate. I know this is wishful thinking, but the compiler writers at PGI are exceptional. The advantage of this approach is it allows people to use standard compilers, that they already may be using, to build applications that will run on GPUs. This approach is even more important for Fortran since there are no really good ways to easily port Fortran code to run on GPUs.
Keep watching CM for a follow-up article I hope to do on the new PGI compilers
Solid State Storage
Another theme that I think is close to the GPU theme in magnitude is Solid-State storage. For some time I think we've all seen articles coming out about SSD's. While they are still expensive and have limitations that many people are aware of (e.g. they can actually lose data). But there were two companies that I would like to highlight - Texas Memory and Solid Access.
One of the reasons that SSD's and the like have become so popular is that people are looking for increased performance and possibly lower power consumption for applications (even if it is "perceived" need for increased performance). But in general, people are starting to examine creating "tiers" for the storage behind a file system with HSM. Figure One below illustrates the concept.
The width of the triangle indicates capacity and the height indicates performance (however, you want to measure performance - throughput, IOPS, etc.). The general premise with this illustration is that as you move up the triangle, costs increase as well. So faster storage costs more (makes sense). Therefore to save money, don't put all of your storage on the fastest, most expensive storage. It's better to put only the data that needs that extremely faster storage on the something like SSD's or Ramdisks, and then move the data to something a lot slower, such as SATA drives with limited bandwidth to the file system. This is the HSM concept (move the data up and down as needed). So people are looking at SSD's and Ramdisks the to get best performance possible but they want to combine with existing storage to be more cost effective.
Texas Memory has been around for a number of years, but I think their importance in the HPCC storage market is about to take a quantum leap because of the tiering approach. They have a variety of products that have both SSD as the storage medium as Ramdisks as the storage medium. For example, they have a unit called the RamSan-500 that consists of 1TB to 2TB of Flash Raid along with 16GB to 64GB of cache. It can be connected via a 4X FC links (2-8 of them). This box alone can do 2GB per second of throughput and 100,000 IOPS from the flash storage (as a comparison, a single hard drive could do maybe 50MB/s and around 100 IOPS). Their RamSan-440 is a RAM based storage unit with 256 to 512GB of storage. It can do up to 4.5GB/s throughput and 600,000 IOPS.
Texas Memory has a range of storage options including a 42U rack with flash based storage and memory cache. The RamSan-5000 has up top 10-20TB of flash based storage and 160GB to 640GB of storage. In aggregate, it can do 20GB/s to the flash storage and achieve over 1,000,000 IOPS. Keep a eye on Texas Memory - they are going to start shaking the HPCC Storage market.
The other company that has an SSD solution as a stand-alone unit is Solid Access. They have several products that offer various approaches to adding solid-state storage. The base product, the USSD 200, is a 2U box that has a maximum capacity of 128GB but with a throughput of 3.6GB/s when you use multiple FC links. It can be connected in a variety of ways including 320 MB/s SCSI-3 Ultra-wide LVD, 3 Gb/s SAS, and 4 Gb/s FC.
During SC08, they also announced a new 1U box (USSD 300 series) that have up to 256GB of flash storage. It can do 100,000 IOPS per single FC port, and 4GB/s with aggregated network. They also announced a USSD 320 which is a 2U unit with up to 256GB of storage.
TACC and Visualization
While I don't think it was a "theme" of the show, the TACC announcement of their new visualization center that includes a new viz wall called Stallion. This project is very noteworthy because it's built totally from commodity parts and uses Kubuntu Linux. It has 24 Dell XPS 690 workstations (one of them is a head node). Each of the 23 compute nodes have two Nvidia graphics cards each with 1GB of video memory, 4.5GB of memory, and a single Intel quad-core CPU. These are connected to a total of 45 Dell 30" monitors (I guess each workstation is connected to 2 monitors). The monitors are capable of 2560 x 1600 in resolution and they are arranged in 15 columns of 5 monitors each. That's a total of 307 million pixels.
Stallion now is the largest tiled display in the world, passing the San Diego Supercomputer Center, which is amazing, but I think the coolest aspect to the whole project is that it's using standard workstations, standard displays, standard video cards, standard networking, along with Linux and some open-source viz software. It's not a specialized system customer built and customer integrated as in the good old SGI days. It follows the same tenants of beowulf clusters but for viz clusters. Not a bad concept IMHO.
I hate to say it given my extremely limited time on the show floor and time to talk to vendors and others, these are the highlights for me. I think Doug has additional comments that he will be posting. Next SC I will do my level best not to end up in the hospital so I can at least give a reasonable overview of the show.
Dr. Jeff Layton hopes to someday have a 20 TB file system in his home computer. He lives in the Atlanta area and can sometimes be found lounging at the nearby Fry's, dreaming of hardware and drinking coffee (but never during working hours).