Hits: 13072

From the things to consider while on vacation department

Last fall when my daughter started school, she came home and said the teacher recommended students get a graphing calculator. Mind you, it was not the hundred bucks for the calculator that promoted me to grab a pencil and paper and say "Back in my day, this was our graphing calculator. No batteries needed, you can even keep the stylus on your ear, and as a bonus feature it has an undo tip as well", but rather the idea that the pencil and paperwas becoming a lost art. I mean if you are on desert island and need to plot a parabola, where are you going to find a graping calculator? (You might be wondering what this has to do with clusters? read on.)

At that point, my wife joined the conversation and mentioned that the teachers probably used graphing calculator in high school and college as well. Sigh. Nothing like feeling old. It seems when I was in my college years, there were classes devoted to learning how to plot equations on these large toggle switch laden things called minicomputers. We have improved a bit since then and graphing calculators are one example. In my daughters school, graphing calculators are used to teach subjects, they are not a subject per-se. If I put my analogy hat on for a moment, I think back to the day, with all their consternations, minicomputers let me do better math and science. Graphing calculators can now handle many of those chores and presumably will help my daughter do better math and science. Thinking about clusters, with all their modern day consternations, I have to ask what is stopping clusters from being a useful tool. Instead of plotting equations, however, we can calculate and plot entire solution spaces right in the classroom.

 Sidebar One: Big Word Alert Consternation - a sudden, alarming amazement or dread that results in utter confusion; dismay. That is pretty much how I sum up those that know nothing about clusters. Indeed, racks of blinking lights, wires and fans are still and alarming amazement to me.

### Education is Obvious

One obvious one solution is enhanced education. Ahh, well, there is the catch. Check your favorite search engine for "Beowulf Cluster Courses" or "Linux Cluster Courses" and some of the top responses are people asking where to find such a course. There were some hits, however, I'll mention those later.

So why no cluster courses? I propose three reasons. First, up until about 3 years ago, clusters were a fast moving target. They are still moving, but they not quite as fast. There are now some "reproducible" methods that include prebuilt distributions and tool-kits. Beyond node provisioning, things are a bit more settled, but compilation and invocation issues can still vary widely. For instance, multiple MPI versions often can cause confusion among users, unless some type of environment management is used (e.g See the Modulespackage) users usually have to contend with making sure they know where the various MPI libraries/binaries live on the cluster. (Nothing like trying to start an LAM/MPI job with and MPICH version of mpirun.) {mosgoogle right}

A second issue is finding people to teach about clusters. Most of the rugged individualists and cluster pioneers, are too busy to teach or write about clusters. There are exceptions of course, but in general, with clusters it seems you are either plowing the fields or building the plow. Taking time to explain your craft just does not seem to fit in the workday.

The final issue is the scope of what can be taught. To date, most classes or tutorials that I have seen have been aimed at setting up and administering a basic cluster. Interestingly, there is no "How to Build a Cluster" Tutorial at SC2006 this past year (there have been in years past). When one talks about clusters, there are really two sub-groups that need to be addressed; administrators and users. The administrators are interested in provisioning the cluster to meet the users needs and minimizing the work needed to maintain and upgrade the cluster. The users on the other hand want to run codes and use the cluster effectively. There is some overlap, but in general, these are two very different agendas. Obviously, the majority of cluster courses have been on provisioning a cluster which needs to happen before you can address the "domain expert" (i.e. users) issues. Fortunately, I believe the impact of these issues are lessening. Cluster recipes have settled down a bit and more people know how to "do clusters." The Beowulf mailing list and of course Cluster Monkeyare good resources as well. In my opinion, the real challenge is going to be packaging and bringing this information to the domain experts. Economic issues aside, clusters should be as easy as using printing calculators

### Apprenticeships Do Not Scale Well

As I mentioned above, finding people to teach about clusters is difficult. This situation then begs the question, In the absence of courses, how does one learn about clusters?There seem to be two ways, both of which do not sale very well. First, you have the old fashion apprenticeship. Working with someone who has clusters (and knows what they are doing) is great way to learn. The other is to set off by yourself and with the help of a mailing list of two build your own cluster. Both approaches take time. The do it yourself approach will allow you to make mistakes which of course is how you learn about most things.

It should be noted however, that most cluster people be they users or administrators usually don't go into such projects blindly. They bring with them some "carry over" from other areas of computing. Indeed, a large portion of "cluster know-how" comes from other established areas of computing. They all have some level of an educational infrastructure (manuals, mailing lists, freely available software, and even courses) that can be leveraged by those wanting to learn about clusters. Consider the following topic list.

• Message Passing Interface (MPI)- Because MPI has been around before clusters hit the big time, there are numerous books and classes that facilitate learning MPI. Also, PVM (Parallel Virtual Machine) was a great way to connect workstations and learn about parallel computing.
• Compilers- Most cluster experts have a good understanding of compilers and building code. Understanding that the long stream of error messages can be due to missing library (and easily fixed) prevents the sense of overwhelm that comes with trying to build that new software package in your environment.
• Operating System Administration- Opportunities to learn about operating systems are plentiful. Three inch thick books are in good supply as well as certification classes and training
• Commodity Hardware- Most clusters use off-the-shelf hardware. Resources for understanding commodity hardware are also plentiful. Although nothing works like have a motherboard or two with which to test ideas.
• Schedulers- Resource scheduling has been around ever since people started sharing computers. There are resources to help learn about schedulers and like most things, a little hands-on time does wonders.
• Networking - Networking is perhaps the toughest area to find good information -- even in cluster courses. For many other cases non-optimal network performance works quite well for just browsing the web or transferring a file. Although much of Linux networking is plug-and-play, there is room for optimization when it comes to clusters. High end interconnect networks have in the past been even more obscure. Fortunately the market seems to be focusing on either 10 GigE or Infiniband solutions and many of the high end network companies are moving in this direction as well.

In my opinion a good cluster course puts all this together and provides insights on how to weave the essential parts of these components into well oiled cluster machine. Of course, there are some exclusive cluster issues which deal with parallel computing, but a good grasp of the above issues is creates a solid foundation.

### Real Cluster Courses

To fill in the gaps and get a chance to ask questions, there are real cluster courses and places to go to learn about clusters. The most exciting area is the The National Center of Excellence for HPC Technology (NCEHPCT -- www.highperformancecomputing.org -- link seems to be down). The NCEHPCT is a consortium of four community colleges that develops educational programs in high performance computing technology (). These colleges include Maui Community College (Maui, HI), Contra Costa College (San Pablo, CA), Pellissippi State Technical Community College (Knoxville, TN), Wake Technical Community College (Raleigh, NC). For example, it is now possible to get get Associates degree in High Performance Computing (HPC) from Wake Technical Community in Raleigh, NC.

If you are interested in tutorials you may want to investigate the Linux Cluster Institute which includes educational sessions as well as technical papers. There is also the annual IEEE Cluster meeting. This meeting tends to be a bit more research oriented, but does have some tutorials. Of course there is always annual Supercomputing show). There are some other short courses as well. Recently the Advanced Research Computinggroup at Georgetown presented "An Introduction to Beowulf Design, Planning, Building and Administering". In addition, The ARC recently held courses in the following subject areas "Advanced Sun Grid Engine" and "Intermediate Beowulf Administration and Optimization ." Some Googling may find other islands of cluster education in your area.

The final step is figuring out the various levels of HPC cluster certification so that when the boss says "Go get me one of the cluster things and some people to run it." You can find real people with cluster skill sets.

### Cluster Books

Since I am highlighting some of the resources to learn about clusters I though I would give a brief survey of the currently available books. These books (with short summaries and links) are listed here on Cluster Monkeyas well.

There are now eleven cluster books of which I am aware. A group of four books are based on the efforts of Thomas Sterling. The first book, "How to Build a Beowulf", Sterling, Salmon, Becker, Savarese, (1999, MIT Press, ISBN 0-262-69218-X), is now a bit outdated. It does have some relevant parts, but most of the software it discusses is now considered old. The follow-on book by Sterling, "Beowulf Cluster Computing with Linux", (2002, MIT Press, ISBN 0-262-69274-0), is a collection of topics edited by Thomas Sterling. The book contains a large amount of useful information from prominent community members It should be noted that he has also edited a book entitled "Beowulf Cluster computing with Windows" (ISBN 0-262-69275-92) which shares some of the content with the Linux book. There is now an updated edition of the Linux version, edited by William Gropp, Ewing Lusk (in addition to Sterling). This version provides a very good, but high level view of Linux HPC clustering. It includes ROCKS and OSCAR coverage plus other important issues (ISBN 0-262-69292-93, 504 pages).

Robert Brown has a freely available book entitled Engineering a Beowulf-Style Compute Clusterin which the design and construction of Beowulf style clusters is presented.

A good cluster background book is "In Search of Clusters" by Gregory Pfister (1997 ISBN 0138997098, 608 pages). The book was written in the pre-Beowulf era but has some very good (and detailed) technical analysis in it. {mosgoogle right}

Several new books have appeared in the last year. A book called "Building Clustered Linux Systems" by Robert W. Lucke provides a very good overview of cluster computing methods and hardware. The book provides a rather wide coverage of options, but does not dive too deep into any one approach. It is somewhat Hewlett Packard focused as author works for HP. (ISBN: 0-13-144853-66, 648 Pages). Another book is called "The Linux Enterprise Cluster" by Karl Kopper. This book focuses on the enterprise cluster (not HPC) and covers failover, heartbeat, load balancing, reliable printing/web server, and how to build a job scheduling system. Good coverage and examples. ( ISBN: 0-13-144853-65, 464 pages) Yet another recent book is called "High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI" by Joseph D. Sloan. It is O'Reilly's second attempt at a Linux cluster book. Many feel this second attempt has missed the mark again. (ISBN: 0-596-00570-92, 367 pages)

There are two books which I consider rather dated. The first is called "Linux Cluster Architecture", Alex Vrenios, (2002, Sams, ISBN 0-672-32368-0). This book describes how to build a small cluster based on Linux, however, it misses a large part of the software that is used on HPC clusters today. The second is called "Linux Clustering" by Charles Bookman, (ISBN New Riders 1-57870-274-7). This book covers a wide range of Linux cluster systems and only dedicates several pages to the HPC area.

### Getting There

Learning about clusters is still not easy, but it not as hard as it used to be. With any luck, my daughter will be needing a cluster for college. Then I can pull out some old hardware and show her how it was done back in the day. Of course she will open her sixty four core laptop, connect to a computational grid and design a new protein all with a few key strokes and convince me she doesn't need another one of dad's good ideas.

This article was originally published in Linux Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish visit Linux Magazine.

Douglas Eadline is proud of the fact that he has more cores running in his basement than anyone else in the neighborhood.