The True Cost of HPC Cluster Ownership

Article Index

What you may not know can cost you

The commodity cluster has changed the High Performance Computing (HPC) landscape in a significant way. Indeed, clusters have had a disruptive influence on many sectors of the IT market in addition to HPC. As with most disruptive technologies, clusters hit the market with a the promise of "faster, better, cheaper" computing. Marketing numbers from IDC, seem to support the perception that clusters are delivering on their promise. A deeper look, however, reveals that in reality some of the "cheaper" promise is due to shifting certain costs from the traditional HPC vendor to the customer.

Purchasing an HPC cluster can be likened to buying low cost self-assembled furniture. The pile of flat-packed boxes that you take home is often a far cry from the professionally assembled model in the showroom. You have saved money, but you also will be spending time deciphering instructions, hunting down tools, and undoing missteps before you can enjoy your new furniture. Indeed, should you not have the mechanical inclination to put bolt 32 into offset hole 19, your new furniture may never live up to your expectations.

The furniture analogy actual breaks down in a perverse way because that new rack of servers does not come with an HPC assembly manual. Indeed, a typical HPC cluster procurement has many hidden costs that are not included in the raw hardware purchase and can force the actual cost to much higher levels. Because the there is a huge difference between a stack of hardware and function HPC cluster, understanding the hidden cost variables involved in a HPC procurement can save time, money, and most of aggravation.

Before the Commodity Cluster

{mosgoogle right} Prior to the emergence of the commodity based cluster, HPC systems (or Supercomputers as they were called) were delivered in a much more turn-key fashion. A supercomputer company would deliver a system that was fully integrated from the ground up and provided a central point of contact for problems.

The most famous of these companies was, of course, Cray Computer. When you purchased a Cray, the system was delivered ready to run. An end user could sit down and begin compiling and running code right away. Should the user need assistance (i.e. optimization, debugging, etc.) an end-user manual was never far away and training was always available. If there was a problem with a compiler switch or performance issue, Cray had the ability to examine the issue "end-to-end" because they integrated the hardware and software into a functioning system.

This level of integration, as well as the specialized hardware, came at a justified premium cost. Users focused on programming solutions, system administrators supported the users, and supercomputer companies focused on running the programs in the least amount of time. This traditional relationship worked well for the HPC market.

Defining A Cluster

As the supercomputing market progressed in the mid 1990's, the use of commodity clusters began to grow. Built from essentially the same hardware as the ubiquitous desktop PC, the commodity cluster offered a low cost method to achieve HPC performance levels.

While clustering systems together for greater performance was not a new concept, the use of commodity hardware was somewhat novel. The price for such hardware is low (due to it being sold in large volume) and the performance has been steadily increasing (due to competition within the desktop and server sector). In addition, Ethernet and other high end networking products are available for connecting individual cluster servers (or nodes).

Historically, HPC has used the UNIX operating system to drive high end hardware. The growth of the Linux® operating system (and subsequent distributions) has emerged as an open (freely available) and virtually plug-and-play replacement for these UNIX environments. In addition, the openness of Linux has fostered an eco-structure on which other HPC software could be easily ported or written.

Exploding Costs

Initial results for commodity clusters often provided an increase in price-to performance by a factor of ten or more over traditional systems. As a result, clusters began to spring up in many of the HPC hot spots around the world, particularly in the US government labs.

For those wanting to use clusters, however, it remains difficult to purchase a fully integrated cluster because the components come from a variety of manufactures and integrators are reluctant to take responsibility for the whole system. The operating system often comes from a Linux vendor (or project), the middle-ware (MPI libraries) comes from one of several possible sources, an optimizing Fortran or C/C++ compiler, not part of the standard OS bundle, comes from still another source. Storage, interconnects, switches, debuggers, parallel file systems, and many other options also add to the list of possible sources..

Someone Has to Pay

Cluster purchases are often optimized by price (most raw hardware for the lowest cost). On paper such procurements often seem impressive as the performance is often rated in terms of raw hardware cost. In practice, however, integration and associated infrastructure costs often escape the performance accounting. These costs can often increase the total cost of ownership beyond user expectations and budgets.

Another misconception that extends far beyond HPC clusters, is the notion that openly available software is free and therefore adds no cost to a cluster. While the initial cost of open software may be non-existent, there is a substantial cost associated with software support and integration. In the case of HPC clusters, these costs can quite substantial and have in essence are now the responsibility of the customer.

Support and infrastructure costs can can range from small to substantial depending on the users goal. In general, the more people that use the cluster, the higher the amount of work the end users must shoulder. Hidden costs for a cluster can be broken down into five categories; Integration, Validation, Maintenance, Upgrading, and Infrastructure. These topics will be discussed separately below.

Integration Costs

Because a cluster is built from multi-sourced components, the user is responsible for integration costs. These costs can be somewhat substantial and create a high maintenance cost if care is not taken when components are integrated. For instance, clusters can be quickly configured to run MPI programs across large numbers of nodes by simply installing the OS and MPI libraries on each node. Success, and a press release, are rapid but fleeting. Adding additional functionality for production work requires cluster-wide decisions to be made. The wrong decisions can significantly impact future maintenance costs (see below). Installing additional MPI libraries or compilers (determined by users needs) if not done carefully, can require custom scripts and settings that are not portable and often lost in upgrade procedures. In addition, adding and integrating other tools such as schedulers, profilers, debuggers, with existing libraries and compilers can be tedious and prone to errors.

    Search

    Login And Newsletter

    Create an account to access exclusive content, comment on articles, and receive our newsletters.

    Feedburner

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.