Features

Why Is Cluster HPC So Hard?

The Hard Stuff

So now we come to the hard stuff and attempt to answer the question Why can't I sell clusters by the boat load? As any good marketer will tell you, segmenting the customer base is important. So, let's narrow the focus a bit. Clearly, a word processor is not going to need 200 GFLOPS. On the other hand the engineers, biologists, chemists, physicists, and assorted domain experts can use HPC. Indeed, the 2004 HPC Users Survey found that HPC was considered essential for business survival and competitiveness. So the reoccurring question is, The hardware is cheap, much of the software is freely available, industry needs HPC, so what is the problem? I'm glad you asked. Here is may take on this issue.

Clusters are a paradigm shift.

The way I like to explain the shift is that instead of building (modifying) the problem to fit the supercomputer (vectorizing your code), we now can modify the supercomputer to fit the problem. While codes must be parallelized to run on clusters, the optimization does not stop there. The classic cluster issue is often the nodes vs. network problem. Given a fixed budget, where to I put my money? Well, as with all thing cluster, it depends. If your problem set runs well on gigabit Ethernet you can buy more processors. If you need a very fast interconnect (i.e. expensive) you must cut down on the number of processors to stay within your budget. The are other issues as well including I/O, memory size, dual core, compilers, filesystems -- the list gets rather large.

Instead of our domain expert running a code on a supercomputer with a highly defined software and hardware environment, he or she has to understand a whole load of details previously handled by the nice people at your local supercomputing company. Indeed, clusters are cheaper because the cost of engineering an integrated solution has shifted from the provider to the user. Read that again. And, if you are starting to think that the traditional supercomputer was an expensive HPC appliance, you are quite right.

For the more scientifically inclined, there is a kind of conservation of cost when it comes to HPC. Cost in this sense is both time and money because the time to solve an implementation problem often cannot be reduced with money. The low price of clusters did eliminate some costs, but shifted many of the non-reducible costs to the end user.

Clusters and multi-cores are hard to program.

Parallel programming is a tough nut to crack. The parallel programming problem has been around for quite a while and there are no signs it will be resolved any time soon. It was an issue before clusters came on the scene and will be a huge issue for multi-core systems. {mosgoogle right}

Why is parallel programming so hard. Every try an get a group of people to work together at the same time. Now, think about what it would be like if you removed their brains and had to tell each person exactly what to do. That is explicit parallel programming. It basically sucks. It is like programming in machine code, but is actually worse. There is no guarantee that a parallel code you write will be both portable and efficient on all architectures.

There are software tools. For clusters we have things like MPI (Message Passing Interface). Multi-core systems have threads, OpenMP and even MPI. All of these provide a framework to manage your workers, but before you code up your real problem, you have to manage the workers. The programming cost always falls on the user or ISV. However, in the past, programmers got a free lunch from the hardware side. The cost (time and money) for continued performance gains just got much bigger for everyone.

There is a lack of people to help with the hard stuff.

The situation is quite understandable. Clusters came on very quickly, disrupted everything, and now a new paradigm has emerged. There will be a lag while the rest of the market catches up. There are two areas that have been effected. The first is on the administrative side of things. The second, is with the domain experts (end users). Not only is there pile of new information to be learned, but the role each plays in the cluster environment has changed. Administrators need a better understanding of the domain issues and the end users need a better understanding or the plumbing issues.

Making Things Less Hard

Until these hard issues get addressed, uptake of HPC clusters will be slow. And, as multi-core processors spread, things are going to get hard for many more people. Parallel computing is coming to the masses and no one knows how to make it easy. Think about that for a moment.

Getting back to our marketing plan we see that success in this market is not going to be how many processors you can shove in a rack or how easy it is to administer 1024 nodes or what operating system runs on your cluster. These issues (and others) are important and solvable. They also present a great way for vendors to differentiate themselves in a commodity market. The hard issues will, however, stifle the growth we all know is possible. Sorry for bad news.

All is not lost, however. To prove that I can do more than throw cold water on the HPC cluster market, I'll have some suggestions in a future installment. And, more importantly, how you can help.

Douglas Eadline is the swinging Head Monkey at ClusterMonkey.net.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.