The CDR can be used to find a single design for an application or for design space exploration. To find a single design, the user simply inputs the relevant constraints, selects the performance model, and the CDR generates the design that meets them best while maximizing performance on the desired benchmark. For example, if the application execution time is dominated solving a linear system of 150,000 dense equations with 150,000 unknowns using LU decomposition, the High Performance Linpack (HPL) benchmark is likely to predict performance well. Assuming the system will need one matrix of 150,000 by 150,000 double precision floating point values, the cluster will need 150,000 × 150,000 × 8 bytes/double = 180 GB of memory. The memory size constraint can be set by selecting 128 GB to 256 GB in the memory section of the CDR web form, shown in Figure One.
The HPL executable is approximately 574 KB, so the code size is set to "1 MB or less." Though HPL is sensitive to memory bandwidth, we do not need to specify a memory bandwidth constraint in the memory section. The HPL performance model automatically adjusts performance based on memory bandwidth. If paying to improve memory bandwidth helps performance better than paying for to improve some other aspect of the system, the CDR will select designs with higher memory bandwidth. Otherwise, it will spend money on improving another aspect of the system with a larger impact on performance. Likewise, network parameters need not be specified because the network performance is included in the HPL performance model, so no constraints need be specified. We assume the budget is $100,000, there are no requirements for disk within the cluster, and no constraints on space, power, cooling, and operating costs.
For this input specification, the CDR fully evaluates just over 9,000 designs after eliminating millions of designs with a partial evaluation. The best design is summarized by the CDR in Figure Two. The design has 200 nodes with 13 cold spares and costs just under $100,000. It more than meets our constraint of at least 160 GB per cluster for data, 1 MB per node for code with approximately 400 GB of total DRAM, leaving 2 GB per node. Running at full power the cluster will need 30.6 KW and 8.8 tons of air conditioning. The design uses a Flat Neighborhood Network (FNN) [DiMa00, HMLDH00] of trees, shown here as Figure Three, to achieve connectivity between all 200 nodes.
The components used in this design are shown in Figure Four. Component prices come from the CDR components database. Prices of non-commodity hardware (like Infiniband, Myrinet, Quadrics, etc. network interfaces) are entered into the database by hand, and thus do not change frequently. Commodity hardware prices are automatically updated daily using Amazon.Com's web API [Levi05]. As faster processors, memory, and networking hardware become less expensive (or more expensive), the best design for the particular application will change.
While it is a bit disconcerting to know that the CDR can produce different designs given the same input, following price fluctuations is an important feature of the CDR. It is difficult for a human designer to follow the prices of all of the hundreds of components in the CDR's database. It is even more difficult for the designer to know how those price changes affect trade-offs in price and performance. A designer can try to recompute the trade-offs by hand, but by the time the calculation is done, prices may have changed.