|
Page 2 of 3
Design Examples
The CDR can be used to find a single design for an application or
for design space exploration. To find a single design, the user simply
inputs the relevant constraints, selects the performance model, and
the CDR generates the design that meets them best while maximizing
performance on the desired benchmark. For example, if the application
execution time is dominated solving a linear system of 150,000 dense
equations with 150,000 unknowns using LU decomposition, the High Performance
Linpack (HPL) benchmark is likely to predict performance well. Assuming
the system will need one matrix of 150,000 by 150,000 double precision
floating point values, the cluster will need 150,000 × 150,000 × 8
bytes/double = 180 GB of memory. The memory size constraint can
be set by selecting 128 GB to 256 GB in the memory section
of the CDR web form, shown in Figure One.
 Figure One: Memory entry section of the CDR web form
The HPL executable is approximately 574 KB, so the code size is
set to "1 MB or less." Though HPL is sensitive to memory bandwidth,
we do not need to specify a memory bandwidth constraint in the memory
section. The HPL performance model automatically adjusts performance
based on memory bandwidth. If paying to improve memory bandwidth helps
performance better than paying for to improve some other aspect of
the system, the CDR will select designs with higher memory bandwidth.
Otherwise, it will spend money on improving another aspect of the
system with a larger impact on performance. Likewise, network parameters
need not be specified because the network performance is included
in the HPL performance model, so no constraints need be specified.
We assume the budget is $100,000, there are no requirements for disk
within the cluster, and no constraints on space, power, cooling, and
operating costs.
For this input specification, the CDR fully evaluates just over 9,000
designs after eliminating millions of designs with a partial
evaluation. The best design is summarized by the CDR in
Figure Two. The design has 200 nodes with 13 cold spares and costs
just under $100,000. It more than meets our constraint of at least
160 GB per cluster for data, 1 MB per node for code with
approximately 400 GB of total DRAM, leaving 2 GB per
node. Running at full power the cluster will need 30.6 KW and
8.8 tons of air conditioning. The design uses a Flat
Neighborhood Network (FNN) [DiMa00, HMLDH00] of trees, shown here as Figure Three, to achieve connectivity between
all 200 nodes.
 Figure Two: Summary of HPL-optimized cluster
The components used in this design are shown in
Figure Four. Component prices come from
the CDR components database. Prices of non-commodity hardware (like
Infiniband, Myrinet, Quadrics, etc. network interfaces) are entered
into the database by hand, and thus do not change
frequently. Commodity hardware prices are automatically updated daily
using Amazon.Com's web API [Levi05]. As
faster processors, memory, and networking hardware become less
expensive (or more expensive), the best design for the particular
application will change.
 Figure Four: Components used in the HPL-optimized cluster design
While it is a bit disconcerting to know that the CDR can produce different
designs given the same input, following price fluctuations is an important
feature of the CDR. It is difficult for a human designer to follow
the prices of all of the hundreds of components in the CDR's database.
It is even more difficult for the designer to know how those price
changes affect trade-offs in price and performance. A designer can
try to recompute the trade-offs by hand, but by the time the calculation
is done, prices may have changed.
|