|
Page 3 of 3
Support
Support is one of those things that depends upon you company policies
and/or your personal tendencies. If your company requires a certain
level of support, then this is something you need to specify. However,
if you have some flexibility in choosing a support model, then you
have many options. I recommend asking for pricing at least two support
models
The first support model is the traditional enterprise model of 4-hour
on-site service/repair/replace model. This model can be quite expensive,
but this is what traditional IT managers have come to expect. I think
that IT managers don't truly understand the cluster concept. Why worry
about one node out of many going down? You only lose a small percentage
of the compute power of the cluster and you can still run jobs on the
cluster. So why spend huge amounts of money to get node back into
production quickly when it has only a small impact on production?
However, one good thing this support model does is tell you the upper
end of support costs.
The second support model is at the lower end of the support spectrum.
Here the model is one of mail the node back, get it diagnosed, repaired,
and mailed back. Plus, email support for normal "problems" within the
cluster is included. To me this support model makes much more sense
because it gets the nodes repaired and/or replaced, plus it covers
software problems.
I actually think a third alternative is something I would consider. It
is really a mix of the two previous models. The model focuses on the
critical pieces of the cluster. The model has critical response for
the cluster interconnect and perhaps the master node and storage
(you can decide what "critical response" means to you) and uses a
lower support model on thing such as the compute nodes and software.
However, I would perhaps put a clause into the support contract to
cover systematic software problems and systematic hardware problems.
I consider this to be a "balanced" support model.
Warranty
A warranty is a bit different than support. But you can use a warranty
to effectively replace support for hardware. I know of one major site
that does this very effectively. The concept is that you just need
hardware fixed and/or replaced in a timely manner. By asking for a
warranty that covers the hardware for the life of the machine, you
can get rid of support costs. Of course, this doesn't cover software
support, but if you have a good staff, then the software support is
effectively covered except for upgrades which can be handled directly
by the software vendor.
Another option you might want to think about is having on-site spares.
Chose some percentage of the number nodes - perhaps 2% - and require
that many nodes to be on-site. However, rather than have them sit on
a shelf somewhere, put them into production. You get the performance
from them while all nodes are operating. Plus, you can lose up to
that many nodes before you go below the required number. For example,
if you have 4 on-site spare nodes, you can lose up to 5 before you
go below the required number.
Step 4 - Let's Write the Technical RFP
Now that you have a
good set of requirements in hand, your homework is done and you are
ready for your final exam - writing the technical RFP. There are a
few general guidelines I recommend in doing the actual writing.
First, always be as specific as possible to reduce miscommunication
with the vendor. However, always be ready for miscommunication since
it will happen. Also, be as flexible as possible to allow the vendor
to innovate and, hopefully, demonstrate their ability to provide good
value. If you can, provide figures. They will help reduce the
miscommunication. And finally, be as concise and clear as possible
(sounds like an English exam doesn't it?).
What you put into the
technical cluster RFP is really up to you and your specific case. At
the bare minimum you need to specify what hardware you want (or don't
want) and possibly what kind of codes you want to run. However, there
are some things I strongly recommend go into the RFP. Let's
break these things up into three categories, miscellaneous, company,
and benchmarking
Miscellaneous
The miscellaneous
category is one that many people forget. This category deals with
things such as the environmental aspects and practical aspects.
Environmental aspects include what kind of power and connections you
need for the cluster, the footprint of the cluster, weight, height,
required cooling, noise, etc. These are very important things to
consider when planning your cluster. You can either request this
information from the various vendors or you can make them
requirements in your technical cluster RFP. One thing you need to
think about is getting the cluster from it's delivery point to its
final destination. Make sure the path can accommodate the weight and
the size, particularly the height, of the cluster (you would be
surprised how many people forget to check all of the
doorways from the delivery point to the server room!)
Miscellaneous items
also includes things like asking for the OS to be included in the
response to the RFP (unless you specify it otherwise) and a
description of the cluster management system (CMS) including
instructions on how to rebuild the cluster from bare metal, restoring
a node, bringing down the cluster, bringing the cluster up, and
monitoring the cluster. It should also include a request for details
about the warranty period and exactly what is covered and
not covered in the warranty. Moreover, you should request details
about tech support. The vendor should provide a single phone number
and point of contact (POC) for your cluster. They should also provide
details on what is supported (including possible OS problems), and
how quickly they will respond to the problem. This also include
hardware and software support. Be sure to ask about installing
commercial applications after the cluster is installed. Some vendors
will use this as an excuse for not supporting their clusters. Ask if
they vendor installs and supports MPI. Also, inquire about security
patches and the procedure for installing them. You can also ask the
vendor about what patches they apply to the kernel (if they do patch
the kernel)
Company
The next category, "company" allows you to get some information about the company
itself. You can ask for information about things such as total
cluster revenue, but many companies don't like to volunteer that
information and as far as I know, they are under no obligation to do
so. But some good things to ask about the company are
- What have they given to the beowulf community?
- Do they support open-source projects and if so which ones and how deep is their involvement?
- Do they support open-source cluster projects and if so which ones and how deep is their involvement?
- Do they support on-site administrator training? (I suggest requiring this)
- Does the vendor stock replacement parts for your cluster?
- Ask the company to provide a description of a support call that has gone well?
- Ask the company to provide a description of a support call that has not gone well? What did they do to recover from it?
- Can they describe they experience with clusters? Experience with Linux and clusters?
- What do they do to tune their clusters for performance?
- Recommendations from customers?
I'm sure you can think
of more good questions to ask the vendors. If the vendor is a good
one, then it should have no problem in answering these questions.
Benchmarks
In the final category,
"benchmarks", I recommend asking the vendors to run benchmarks on
the proposed hardware. The purpose of running the benchmarks is for
several reasons: it forces the vendor to actually test the proposed
hardware; it determines which vendors can run and complete
all of the benchmarks; it allows a direct comparison between vendors
on the benchmarks; it allows the vendors to show off their tuning
capabilities; and it gives the vendor some flexibility so they can
show their knowledge of clusters
Ideally, you should have the vendors run your benchmarks. This will give
you the best information on the performance of the cluster running your
applications with your data sets. However, I know this isn't always
possible. The next best thing are to run synthetic or open-source
benchmarks.
The benchmarks I
recommend asking the vendors to run are all open-source, so there are
no issues with obtaining them. However, be warned that these are
"synthetic" benchmarks in that they are not your codes. So, I
wouldn't recommend betting the speed of your codes on the results of
the benchmarks. I recommend asking for four varieties of benchmarks:
nodal benchmarks which are sometimes called micro-benchmarks; network
benchmarks; message passing benchmarks; and file system benchmarks. I
would have the vendor run various codes in each category several
times and report the average, standard deviation, geometric mean, and
the raw scores. This allows you to see the spread of the scores
Nodal benchmarks, such
as lmbench,
stream,
and cpu_rate,
are very
useful in measuring various aspects of the the performance of the
nodes. This allows you to differentiate between various hardware
offerings from the vendors
Network benchmarks can
spot problems in network configurations and also allow you to compare
network performance between vendors. The best program for doing this
is probably
Netpipe.
I would require that Netpipe be run in several
ways including via MPI over the proposed network. Also, I would
require the vendor to run
MPI Link Checker from Microway on the test
cluster. As an added precaution, I would require it to also be run on
the delivered cluster with some guarantee about performance (latency
and bandwidth) between all connections
The next category of
benchmarks, message-passing benchmarks is useful for comparing
vendors and the ability to tune the cluster for performance. The
current best set of message-passing benchmarks is the
NAS Parallel Benchmarks.
The final category,
file system benchmarks, will give you performance numbers for local
disk performance (if you have disks in the nodes), and file system
performance over the network. Benchmarks such as
IOZone,
Bonnie++,
and Postmark,
are useful for performance testing. I would have the
vendors run the benchmarks using the proposed file system (you can
either pick the file system or let the vendor choose) on the proposed
cluster nodes. I would also have them run the exact same benchmarks
on any NFS mounted file systems you are using in the cluster. Be sure
to ask the vendor to provide the mount and export options for all NFS
exported file systems
Step 5 - Selecting Prospective Vendors
Once you get back the
benchmark results and the cost of the proposed solutions, it's time
to either select the winner from a technical point of view or to down
select to the finalists. However, before you do this, I would suggest
developing a scoring scheme for the various aspects of the cluster.
You could assign certain scores for completing each benchmark and
another score based on performance on the benchmark. You can also
assign scores based on other factors such as weight, power, cooling,
etc. Then when you receive the response from the vendors, you can
give them an overall technical score
Depending upon the
procurement procedures in your company, the technical score may only
be a certain percentage of the overall score. For example, the
technical score could be 70% of the total and the remaining 30% could
be the score of the vendor itself including cost. The procurement
policies of your business or lab or university will determine the
final breakdown between the technical score and the other scores
A few quick comments
about the process. Once you have the RFP developed, send it out!
Don't waste time it. Also, be sure to give the vendors time to
perform the benchmarks. If you don't hear from the vendors, be sure
to check with them to make sure they understand everything and are
making progress. Also, be flexible, because there will be questions
and concerns from the vendors. If there are changes to be made, be
sure all vendors know about it. Also, don't share pricing
information or benchmark results between the vendors. And finally,
beware of companies that low bid. They are trying to buy your current
business but may end up costing you down the road.
Step 6 - How to Down Select
Now that you have sent
out the RFP (or rather the procurement people have sent it out) and
you have gotten the responses back from the vendors and have scored
all of the vendors, how do you down select or select the winner?
Well, that's a very good question, and one that is difficult to
answer, because it depends upon your procurement policies. However, I
can offer a few words of advice
Be sure to define the
scoring before you send out the RFP, but don't tell the
vendors the scoring. Then when you get the scores back, put together
a review team that has a vested interest in the cluster. Have the
team members score the vendors and on the basis of the scores, rank
the vendors. Then have the team discuss the ranking of the vendors
and perhaps make adjustments in the rankings. Then you can select the
winner or the vendors to be considered for the final competition.
Depending upon the policies of where you work, your winner(s) will
have to be filtered through procurement and the central IT
management. Be ready to explain the scoring system, the actual
scores, the rankings based on the scores, and any adjustments done to
the rankings.
Then, if you can,
provide feedback to the companies that were not selected. This
feedback will help them improve their product offering for the next
competition you have
Example Scenario
In Sidebar 2, a sample
scenario is listed. This scenario is totally fictitious and does not
represent any real competition that I'm aware of. Also, the numbers
used in the sample technical cluster RFP are totally fictitious as
well. However, the overall structure is one that I recommend for a
technical cluster RFP. Here is the scenario:
A group of researchers
is interested in a cluster to support an MPI application. In this
case, based on the requirements of the users and the speed of certain
processors that came from testing their application, the group knows
how many nodes they need. Various interconnect technologies have been
studied to understand the impact of increased networking bandwidth
and decreasing latency on the performance of the codes. The number of
processors is fixed at 256 and dual processor nodes are allowed.
There is a master node that serves out the compilers and
queuing/scheduling software to the rest of the cluster. There is also
a dedicated file system server that has to have 4 TB (Terabyte) of
space to the cluster only (not on the "outside" network). The
goal is to meet all of these requirements for the lowest cost from a
company that provides good value. Please read the sidebar for an
example technical RFP that I have put together.
From this basic
framework you can add or subtract things that fit your specific
needs. You can also turn some of the requirements into requests for
information or just as easily turn the requests for information into
requirements. You can also use the concepts to create a technical
cluster RFP for the most computational power for a fixed price. There
are many variations than can be done using the pieces the example
provides.
Final Comments
I hope this article has
proved useful to you. It's a bit long, but I wanted to make sure that
most people could take away one useful thing from the article.
Writing a technical RFP can be a very long and grueling process with
the potential for many disagreements. However, if you do your
homework then writing one is not difficult. In the end, doing your
homework and following some of these guidelines can help you save
time and money
| Sidebar Three: RFP Outline |
Overview
The Cluster shall have 256 processors in the compute nodes, one master node
with up to two processors, and one file server node with up to two processors,
all on a private network with the master node also having an additional
network connection to an outside network.
Compute Node Requirements:
Nodes can be dual processors
- Nodes can have either:
Intel P4 Xeon processors running at least 3.0 GHz
- AMD Opteron processors - at least Opteron 248
- (Note: Opterons give 35% improvement in speed compared to Xeon)
Each node should have two built-in Gigabit (GigE) NICs
- At least one PCI-Express slot shall be available per node
- One riser card that supports PCI-Express cards per node is required
- Each node should have at least 2 Gigabytes (GB) of DDR ECC memory per processor
- Each node should have at least one hard drive - ATA or SATA drives are acceptable
- Each node should have built-in graphics of some type that is supported by Linux
- The nodes can be rack mountable or blades. A 2U rack mount case is the largest case acceptable. Smaller cases are preferred.
Master Node Requirements:
- Node can have dual processors
- Node can have either:
Intel P4 Xeon processors running at least 3.0 GHz
- AMD Opteron processors - at least Opteron 248
- (Note: Processors should match the compute nodes)
- Master node should have two built-in Gigabit (GigE) NICs
- At least one PCI-Express slot shall be available per node
- One riser card that supports PCI-Express cards per node is required
- Master node should have at least 2 Gigabytes (GB) of DDR ECC memory per processor
- Master node should have at least two hard drives in a RAID-1 configuration. ATA or SATA disks are acceptable.
- Master node should have a 3D graphics card with at least 256 MB of dedicated video memory that is supported by Linux
- Master node should have hot swappable power supplies
- Master node can be rack mountable or blade. A 2U rack mount case is the largest case acceptable. Smaller cases are preferred.
File Serving (FS) Node Requirements:
- Node can be dual processors
- Node can have either:
- Intel P4 Xeon processors running at least 3.0 GHz
- AMD Opteron processors - at least Opteron 248
- (Note: Processors should match the compute nodes)
- FS node should have two built-in Gigabit (GigE) NICs
- At least one PCI-Express slot shall be available per node
- One riser card that supports PCI-Express cards per node is required
- FS node should have at least 6 Gigabytes (GB) of DDR ECC memory (3 GB per processor)
- FS node should have at least two hard drives in a RAID-1 configuration for the OS. ATA or SATA disks are acceptable.
- FS node should have at least 4 Terabytes (TB) of usable space in a RAID-5 configuration. Disks should be hot-swappable. More
space is preferred. ATA or SATA disks are acceptable if hot-swappable.
- FS node should have a graphics card of some type supported by Linux
- FS node should have hot swappable power supplies
- FS node can be rack mountable or blade. No greater than 5U case is acceptable. Smaller cases are preferred.
Networking Requirements:
There are two private networks connecting the compute nodes,
the master node, and the file serving node.
- A GigE network for NFS and management traffic connecting
the compute nodes, the master node, and the file server node
- A separate computational network for message-passing
traffic that connects the compute nodes, the master node,
and the file server node.
The switch for the gigabit network is vendor selected.
A single switch to connect all nodes is required. Please
provide the following information:
- Switch manufacturer and model number
- Back plane bandwidth
The computational network is vendor selected from the
list below. The following performance numbers may be used to
select the network (Note: higher performance is preferred):
- Gigabit Ethernet: Baseline
- Myrinet - 10% faster than baseline
- Quadrics - 15% faster than baseline
- Dolphin - 12% faster than baseline
- IB - 30% faster than baseline
Please provide the following information:
- Network chosen including model numbers
- Detailed layout of the two networks. Please include a diagram.
- Tested or estimated cross-sectional bandwidth (specify whether tested or estimated)
- Tested or estimated worst-case latency (specify whether tested or estimated)
Physical and Environmental Requirements:
- Cluster should fit through an 84" high doorway without disassembly
- Cluster racks should have wheels for movement
- Cluster racks should have lockable front and rear doors
- Cluster racks should have mesh front and back except for specific cooling system
- Please provide the following information:
- Weight of each rack with all nodes and cabling installed in pounds
- Dimensions of each rack with all nodes and cabling installed
- Power requirements for each rack, specifying the number of type of outlets
- Cooling requirements for the entire cluster in tons of air-conditioning
required. Also please give ambient temperature.
- Estimate total noise level of cluster
Software Requirements:
- A cluster management system (CMS) should be installed in the cluster.
The CMS will have the ability to perform the following functions:
- Ability to install OS on each node (unless CMS does not do this)
- Ability to rebuild a node that has been down
- Ability to execute commands across cluster (parallel commands)
- Ability to rebuild cluster from bare metal
- Ability to shut down cluster
- Ability to restart cluster without overloading circuits
- Extensive monitoring capability (Please provide details of what is monitored)
- Insert Operating System of Choice
- Describe OS update process including security updates
- Vendor will install and test GNU and commercial compilers. The
commercial compilers will be provided 15 days prior to installation
date to the vendor.
- Vendor will install and test commercial MPI implementation
(provided by customer). It will be provided 15 days prior to installation
date. Open-source MPICH and LAM will also be installed and tested.
- Vendor will install and test commercial queuing/scheduling software
(provided by customer). It will be provided 15 days prior to installation
date.
Benchmarking Requirements:
- The vendor will be responsible for benchmarking proposed hardware.
The following benchmarks will be run:
- Stream benchmark:
- Run a single copy of stream seven times on proposed computer nodes
and compute average and geometric mean
- Run two copies of stream at the same times (one on each processor)
seven times on the proposed compute nodes and compute average and
geometric mean. Perform this benchmark only if proposing dual CPU nodes.
- Report all stream results (all cases)
- Report compiler, version, and compile flags used
- Report version of stream benchmark used
- No code modifications to the benchmark are allowed
- lmbench benchmark:
- Run a single copy of lmbench seven times on proposed computer
nodes and compute average and geometric mean
- Run two copies of lmbench at the same times (one on each processor)
seven times on the proposed compute nodes and compute average and
geometric mean. Perform this benchmark only if proposing dual CPU nodes.
- Report all lmbench results (all cases)
- Report compiler, version, and compile flags used
- Report version of lmbench benchmark used
- No code modifications to the benchmark are allowed
- cpu_rate benchmark:
- Run a single copy of cpu_rate seven times on proposed computer
nodes and compute average and geometric mean
- Run two copies of cpu_rate at the same times (one on each processor)
seven times on the proposed compute nodes and compute average and
geometric mean. Perform this benchmark only if proposing dual CPU nodes.
- Report all cpu_rate results (all cases)
- Report compiler, version, and compile flags used
- Report version of cpu_rate benchmark used
- No code modifications to the benchmark are allowed
- NetPIPE benchmark:
- Run NetPIPE between two compute nodes through the proposed gigabit
switch and the proposed computational network.
- Run NetPIPE seven times and compute the average of peak
bandwidth in MB/sec (Megabytes per second), average latency in
microseconds, geometric mean of peak bandwidth in MB/sec (Megabytes
per second), geometric mean of latency in microseconds.
- Do this for the gigabit network
- Do this for the computational network
- Provide graphs of all seven runs for each network
- Report compiler, version, and compile options used
- Report network options used (particularly MTU)
- Report OS details including kernel and network drivers. Also report
network driver parameters used. Report all kernel patches.
- Run NetPIPE with TCP, LAM, MPICH, and Commercial MPI
- No code modifications to the benchmark are allowed
- NPB (NASA Parallel Benchmarks) benchmarks:
- Run NPB 2.4 benchmarks with Class C sizes over the proposed computational
network with proposed computational nodes
- Run all 8 benchmarks for the following combinations of number of processes and MPI:
- 4 processors (Single CPU per node - 4 nodes. Two processes per node - 2 nodes)
- 16 processors (Single CPU per node - 16 nodes. Two processes per node - 8 nodes)
- 64 processors (Single CPU per node - 64 nodes. Two processes per node - 32 nodes)
- MPI: MPICH, LAM, and commercial MPI
- For each of the 8 codes in NPB 2, you will have 12 sets of results:
three MPI implementations and four node configurations
- Run all combinations seven times and compute average and
geometric mean for each of the 8 benchmarks in NPB 2. Do this for each of the
12 combinations.
- Report all results in a spreadsheet
- Report compiler, version, and compile options used
- Report network configuration
- Report OS and kernel version used. Report all patches to kernel
- No code modifications to the benchmark are allowed
- Bonnie++ benchmark:
- Run Bonnie++ seven times on the proposed compute node
hardware, master node hardware, and file server hardware
- This is a total of 21 runs
- Run with file sizes that are 5 times physical memory
- Compute average and geometric mean of results (throughput and CPU usage)
- Report all results in a spreadsheet
- Report compiler, version, and compile options
- No code modifications to the benchmark are allowed
- Bonnie++ over NFS benchmarks:
- Run Bonnie++ over NFS from proposed file server to two proposed
clients over the proposed gigabit network including the proposed GigE switch.
- NFS server is the proposed file server
- NFS client is the proposed compute node
- Run Bonnie++ seven times on each of the two clients. Compute average
and geometric mean of the results (throughput and CPU usage).
- Unmount file systems in between runs
- Report all results for both clients in a spreadsheet
- Export file system using the following options:
- Client mount options are variable with the following exceptions that must be set:
- Report compiler, version, and compile options
- Report network configuration used
- Report NFS server configuration used including export options
- Report NFS client mount options used
- Report OS version including kernel, kernel patches, GigE NIC drivers,
version, and options for the NFS file server and NFS clients.
- No code modifications to the benchmark are allowed.
Other Information:
- Please provide the following information:
- List your largest cluster installation to date
- List your smallest cluster installation to date
- How many clusters have you installed?
- How many clusters have you installed with a high-speed interconnect?
- What open-source projects to you support?
- What open-source cluster projects do you support?
- Do you have experience with installing parallel file systems? If so,
which ones?
- Please provide 3 references that may be contacted
- Please describe a recent support call that is an example of a
"good" support call.
- Please describe a recent support call that is an example of a
"bad" support call.
- What was done to recover this call?
- Please describe your experience with Linux clusters
- Please describe all high-speed networks that you installed at customer sites
- Please list all kernel patches that you apply
Warranty/Maintenance Requirements:
- Please describe warranty coverage and length of warranty. Please
provides details about what components are not covered.
- Please provide a separate line item in costing for yearly hardware
and software maintenance for the following:
- On-site, next day repair of all components
- On-site, next day repair of interconnect and master node
- Please provide the same information for all equipment that is from
third party companies (e.g. the networking equipment)
- Please describe warranty repair procedure beginning with support call
initiation
- Please describe maintenance repair procedure beginning with support
call initiation
Delivery Requirements:
- Cluster will be delivered within 30 calender days of receipt of purchase
order (PO).
- Customer will provide all customer provided software within 15 days
of the receipt of the PO
- Any changes to this schedule must be agreed upon by vendor and customer
- All nodes will be configured and operational within 14 days of delivery
to customer site.
- MPI Link-Checker from Microway will be run on entire cluster to ensure
that all nodes are working correctly. It will be run over the computational
network and the NFS and management network. The following performance (bandwidth
and latency) must be met:
- insert requirements for each network here
- A 30 day test period will begin after the 14 day installation period
- If the cluster does not meet the following specifications then it is
deemed unacceptable and will be sent back to the vendor:
- insert acceptance requirements here
- On-site training of two administrators will be provided
- performance guarantees on user codes or open-source benchmarks can
be inserted here
|
This article was originally published in ClusterWorld Magazine. It has been
updated and formatted for the web. If you want to read more about HPC
clusters and Linux you may wish to visit
Linux Magazine.
Jeff Layton is proud that he has 4.33 computers for every person in his
house - the most in his neighborhood but he's not telling his neighbors.
Comment on this article
You must login to leave comments...
Other Visitors Comments
There are no comments currently....
|