Hits: 17974

What you may not know can cost you

The commodity cluster has changed the High Performance Computing (HPC) landscape in a significant way. Indeed, clusters have had a disruptive influence on many sectors of the IT market in addition to HPC. As with most disruptive technologies, clusters hit the market with a the promise of "faster, better, cheaper" computing. Marketing numbers from IDC, seem to support the perception that clusters are delivering on their promise. A deeper look, however, reveals that in reality some of the "cheaper" promise is due to shifting certain costs from the traditional HPC vendor to the customer.

Purchasing an HPC cluster can be likened to buying low cost self-assembled furniture. The pile of flat-packed boxes that you take home is often a far cry from the professionally assembled model in the showroom. You have saved money, but you also will be spending time deciphering instructions, hunting down tools, and undoing missteps before you can enjoy your new furniture. Indeed, should you not have the mechanical inclination to put bolt 32 into offset hole 19, your new furniture may never live up to your expectations.

The furniture analogy actual breaks down in a perverse way because that new rack of servers does not come with an HPC assembly manual. Indeed, a typical HPC cluster procurement has many hidden costs that are not included in the raw hardware purchase and can force the actual cost to much higher levels. Because the there is a huge difference between a stack of hardware and function HPC cluster, understanding the hidden cost variables involved in a HPC procurement can save time, money, and most of aggravation.

Before the Commodity Cluster

Prior to the emergence of the commodity based cluster, HPC systems (or Supercomputers as they were called) were delivered in a much more turn-key fashion. A supercomputer company would deliver a system that was fully integrated from the ground up and provided a central point of contact for problems.

The most famous of these companies was, of course, Cray Computer. When you purchased a Cray, the system was delivered ready to run. An end user could sit down and begin compiling and running code right away. Should the user need assistance (i.e. optimization, debugging, etc.) an end-user manual was never far away and training was always available. If there was a problem with a compiler switch or performance issue, Cray had the ability to examine the issue "end-to-end" because they integrated the hardware and software into a functioning system.

This level of integration, as well as the specialized hardware, came at a justified premium cost. Users focused on programming solutions, system administrators supported the users, and supercomputer companies focused on running the programs in the least amount of time. This traditional relationship worked well for the HPC market.

Defining A Cluster

As the supercomputing market progressed in the mid 1990's, the use of commodity clusters began to grow. Built from essentially the same hardware as the ubiquitous desktop PC, the commodity cluster offered a low cost method to achieve HPC performance levels.

While clustering systems together for greater performance was not a new concept, the use of commodity hardware was somewhat novel. The price for such hardware is low (due to it being sold in large volume) and the performance has been steadily increasing (due to competition within the desktop and server sector). In addition, Ethernet and other high end networking products are available for connecting individual cluster servers (or nodes).

Historically, HPC has used the UNIX operating system to drive high end hardware. The growth of the Linux® operating system (and subsequent distributions) has emerged as an open (freely available) and virtually plug-and-play replacement for these UNIX environments. In addition, the openness of Linux has fostered an eco-structure on which other HPC software could be easily ported or written.

Exploding Costs

Initial results for commodity clusters often provided an increase in price-to performance by a factor of ten or more over traditional systems. As a result, clusters began to spring up in many of the HPC hot spots around the world, particularly in the US government labs.

For those wanting to use clusters, however, it remains difficult to purchase a fully integrated cluster because the components come from a variety of manufactures and integrators are reluctant to take responsibility for the whole system. The operating system often comes from a Linux vendor (or project), the middle-ware (MPI libraries) comes from one of several possible sources, an optimizing Fortran or C/C++ compiler, not part of the standard OS bundle, comes from still another source. Storage, interconnects, switches, debuggers, parallel file systems, and many other options also add to the list of possible sources..

Someone Has to Pay

Cluster purchases are often optimized by price (most raw hardware for the lowest cost). On paper such procurements often seem impressive as the performance is often rated in terms of raw hardware cost. In practice, however, integration and associated infrastructure costs often escape the performance accounting. These costs can often increase the total cost of ownership beyond user expectations and budgets.

Another misconception that extends far beyond HPC clusters, is the notion that openly available software is free and therefore adds no cost to a cluster. While the initial cost of open software may be non-existent, there is a substantial cost associated with software support and integration. In the case of HPC clusters, these costs can quite substantial and have in essence are now the responsibility of the customer.

Support and infrastructure costs can can range from small to substantial depending on the users goal. In general, the more people that use the cluster, the higher the amount of work the end users must shoulder. Hidden costs for a cluster can be broken down into five categories; Integration, Validation, Maintenance, Upgrading, and Infrastructure. These topics will be discussed separately below.

Integration Costs

Because a cluster is built from multi-sourced components, the user is responsible for integration costs. These costs can be somewhat substantial and create a high maintenance cost if care is not taken when components are integrated. For instance, clusters can be quickly configured to run MPI programs across large numbers of nodes by simply installing the OS and MPI libraries on each node. Success, and a press release, are rapid but fleeting. Adding additional functionality for production work requires cluster-wide decisions to be made. The wrong decisions can significantly impact future maintenance costs (see below). Installing additional MPI libraries or compilers (determined by users needs) if not done carefully, can require custom scripts and settings that are not portable and often lost in upgrade procedures. In addition, adding and integrating other tools such as schedulers, profilers, debuggers, with existing libraries and compilers can be tedious and prone to errors.

Validation/Optimization Costs

One cost that is particularly hidden from the customer is the cost to validate and optimize the hardware and software. Since there is no single point of contact for the entire cluster, the user must make sure everything works as expected. In some cases, integrators will run system wide tests, but the wide array of hardware and software choices pushes the ultimate responsibility for correct operation onto the customer. This process takes time and should be performed each time a significant change is made to the system. (i.e. a new compiler, kernel, MPI version etc.) In the worst case, a change or upgrade may actually produce wrong answers or fail entirely because the solution was not validated before implementation. An often forgotten integration issue is I/O. Most clusters must be integrated into an existing storage hierarchy. In addition,accessing the actual storage needs for a cluster, before storage decisions are made, is critically important. A poor storage design can result in poor application performance. In many cases, the first hardware added to a new cluster is to resolve a storage based issue.

Maintenance Costs

Keeping a cluster running can be a time sink. Some false comfort can be gained by purchasing a hardware maintenance agreements from a vendor. While, they will repair obvious problems they often must deffer to another vendor or software project for any non-obvious failures (i.e. disk drive and power supply failures are obvious, poor interconnect performance may be due to several sources) The user is then required to invest the time to identify and assign responsibility to a specific vendor and in some cases play negotiator between two vendors.

Another issue with the classic cluster design (OS image installed on each node) is that of version skew or "node personalities." Initially keeping nodes in sync, seems trivial -- just install the same thing on all the nodes. This approach breaks down as the cluster ages because replacement nodes must be re-imaged to reflect all other nodes. To accomplish this, changes must be tracked and a current "snapshot" created. Changes also include OS tuning parameters and tweaks that must be performed on nodes so that certain software applications will run correctly. This "change/snapshot/re-image" cycle is expensive and can incur significant down time for the simplest of maintenance issues.

There are several more advanced cluster methodologies, such as NFS-root, or RAM-Disk, that help solve some of these issues. These applications must be evaluated carefully as changing your provisioning scheme after the cluster is operational can be difficult and cause disruptions.

Upgrade Costs

Of course software changes and upgrades provide better security, more features, and hopefully better performance. In many clusters the "software stack" comes from many sources and there is often an unknown dependency tree living in the installed software. For instance, upgrading to a new distribution of Linux (Red Hat, Suse, etc.) may require rebuilding of MPI libraries and other middle-ware. User applications may also need to be rebuilt with a third party optimizing compiler that does not yet support the new distribution upgrade. Administrators and users are then required to determine work arounds or fixes that allow the users to run the new software. Other packages may suffer a similar fate resulting in extra time and frustration.

Infrastructure Costs

In addition to the hidden support costs, commodity clusters also place a burden on infrastructure costs. The power and cooling costs for a typical (x86 based) cluster are often not factored into the price to performance numbers. An the average dual socket cluster node currently requires around 300 watts of power. Cooling and power delivery inefficiencies can double this node power requirement to 600 watts. Therefore, on an annual basis a signal cluster node can require 5256 kilowatt hours. At a nominal cost of $.10 per kilowatt hour, the annual power and cooling costs for a single cluster node is approximately $526.

These numbers are more striking when the cost of the entire cluster is taken in to account. Consider a typical cluster purchase in today's market where the typical node can cost $3500 per (including racks, switches, etc.) Using standard dual core technology a node provides two processors and four cores. A typical 128 node cluster will then provide 256 processors and 512 cores and costs $448,000. Based on the above assumptions, the annual power and cooling budget is then $67,300. Over a three year period this amounts to $202,000 or 45% of the system cost.

While costs may vary due to market conditions and location, the above analysis illustrates that for a typical commodity cluster the three year power cost can easily reach 40-50% of the hardware purchase price.

Other infrastructure issues can effect cost as well. A typical industrial rack mount chassis can hold 42 cluster nodes. An average cluster node weighs around 45 pounds. Thus, each rack requires a floor capable of supporting 2000 pounds in the space of a single rack mount enclosure. In a typical data center, racks mount hardware is a mix of storage and servers with many underpopulated racks. HPC clusters, on the other hand, represent the most dense and therefore heaviest load in a data center. In our 128 node example, the cluster will require support for 6000 pounds in a 4x8 foot area.

Time Is Money

The above issues need to be resolved before any real production computing can begin. Instead of a domain expert running a code on a supercomputer with a highly defined software and hardware environment, he or she has to understand a details previously handled by the people at a traditional supercomputing company. The initial cost of clusters are cheaper because the cost of engineering and integration has shifted from the vendor to the user.

For the more scientifically inclined, there is a kind of conservation of cost when it comes to HPC. Cost in this sense is both time and money because the time to solve an implementation problem often cannot be reduced with money. The low price of clusters did eliminate some costs, but shifted many of the non-reducible costs to the end user which ultimately impacts how much computing per dollar the cluster user can archive. These costs coupled with infrastructure costs often push the push the total cost of ownership much higher than originally anticipated.

Success Metrics

The lack of software procurement costs (i.e. the use of freely available software) invited many organizations to focus solely on nodes per dollar cost of an HPC solution. A more correct and measurable number should be based on sustainable solutions per day per dollar, where the dollar estimate includes all the above hidden costs (software integration and infrastructure). This measure is the real cost of HPC and will provide a sound basis on which to determine the ultimate cost effectiveness of an HPC investment.

Factoring the hidden costs into such a number can be very difficult. The amount of time and money required depends on your level of in-house expertise. Attempting to build and maintain a production HPC cluster requires a skill set that is currently in short supply and thus expensive. If your organization does not have the technical depth, then purchasing hardware in a very real sense is putting the cart before the horse. Infrastructure costs, on the other hand, are more easily estimated and therefore should be an integral part in all success metrics.

A Strategy For Success

Commodity clusters have shown a tremendous price to performance advantage over the traditional approach to supercomputing. Those considering an HPC solution should be aware that the cost to fully implement a solution goes well beyond the hardware price and includes both the software integration/maintenance and infrastructure costs.

If you are planning to purchase an HPC cluster, consider the additional work required to achieve a functioning system. Failure to account for the hidden time and money will result in lost up-time, higher costs, and poor performance. As part of your cluster plan, determine whether you have the in-house expertise to accomplish these tasks in a cost effective manor. If you need help, look to a vendor that has an intimate understanding of your needs and experience with HPC systems. In reality most large vendors will stop slightly beyond a "standard install" by using a professional services organization (either internal or externally based) at which point, you are on your own. There are a number of smaller vendors that can help minimize the hidden costs and provide real long term support for you HPC needs. Finally, there are a small number of consultants that specialize in cluster integration, testing, and support.

Cluster HPC is powerful and effective computing platform. Understanding the real cost structure will help set expectations and assist in planing and implementing your HPC resource.

Authors Note: Portions of this article are taken from a white paper I had written for SiCortex in January of 2007. I also want to point readers to an excellent article by fellow cluster monkey, Jeff Layton, entitled How to Write a Technical Cluster RFP