Robert G. Brown
rgb at phy.duke.edu
Thu May 22 14:14:20 EDT 2003
On Tue, 20 May 2003, Jung Jin Choi wrote:
> Hi all,
> I just got to know what beowulf system is and found out that
> I can connect the pc's up to the number of ports in a switch.
> Let's say, if I have a 16 port switch, I can connect up to 16 pc's.
> Now, my question is if I have 200pc's, how do I connect them?
> Should I connect 48 pc's to a 48 ports switch, then connect these
> four 48 ports switch to another switch?
> Please teach me some ways to connect many pc's...
You can proceed several ways. Which one is right for you depends on
your application needs and budget.
If your cluster application is embarrassingly parallel (EP), or does a
LOT of work computationally for a LITTLE interprocessor communication,
then connecting a stack of switches together is fine and is also by far
the cheapest alternative. The best way to interconnect them as almost
certainly going to be what you describe; can buy a "master" switch with
e.g. 16 ports (m) and plug A and B and C and D and ... into its ports
one at a time. All traffic from A to any non-A port thus goes one hop
A1 -> Am || mA -> mB || Bm -> B2
for port 1 on switch A to get to port 2 on switch B (the ->'s are within
the switch, the ||'s are between switches). This is fairly symmetrical,
not TOO expensive, and can manage even "real parallel" applications as
long as they aren't too fine grained.
If it IS too fine grained, then your next choice is to bump your budget.
How much you have to bump it depends on your needs and the topology of
your problem. Switch cost per port is absurdly low for switches with
less than or equal to 32 ports. Note the following snapshot from
$2359 - Switch 64port
$586 - Switch 48port
$129 - Switch 32port
$76 - Switch 24port
$31 - Switch 16port
$48 - Switch 12port
$22 - Switch 8port
$19 - Switch 5port
$19 - Switch 4port
In quantities of 32 or fewer ports per switch, the price per port is
ballpark of only $2-4 dollars (don't ask me to explain the anomaly at
e.g. 16 ports vs 12:-). At 48 it jumps to over $10 per port. At 64 it
jumps again to close to $40 (and an e.g. $1800 HP Procurve 4000M, times
two for 80 ports on a single backplane, holds at around $50/port ).
Clearly it gets really expensive to pack lots of ports on a single
backplane, especially planes that attempt to deliver full bisection
bandwidth between all pairs of ports. Compare a wimpy ~200 ports made
up of only three filled 4000M chassis with gig uplinks at ballpark $6000
vs hmmm, $31x17 = maybe $600 including the cabling to get to 256 ports
with 16 16 port switches interconnected via a 16 port switch. Of course
performance is better if you go up a notch and get 16 port stackable
switches with a gigibit uplink and put them on a 16 port gigabit switch.
Prices, however, also go up to the ballpark of $2-3K (I think) --
clearly you can span a range of anywhere from $5 to $50 per port to get
to 200+ ports with various topologies and bottlenecks.
If you feel REALLY rich, you can look into Foundry switches, e.g. their
bigiron switches and the like. These are enterprise-class switching
chassis and you will have to bleed money from every pore to buy one, but
you can get hundreds of ports on a common backplane with state of the
art switching technology delivering a large fraction of full bisection
bandwidth and symmetry.
Most users who want to build high performance networks that require full
bisection bandwidth between all pairs of hosts to run fine grained code
on a cluster containing hundreds of nodes and up eschew 100BT or even
1000BT and ethernet altogether, and choose either myrinet or SCI. Both
of those have their own ways of managing very large clusters. In both
cases you will also bleed money from every pore, but it actually might
end up being LESS money than a really big ethernet switch and will have
far better performance (latencies down in the <5 microsecond range
instead of in the >100 microsecond range).
This is really only a short review of the options (you may here more
from some of the networking experts on the list) but this might get you
To summarize, a) profile your task and determine is communication
requirements; b) match your task and its expected scaling to your
budget, your node architecture, and a network simultaneously. That is,
to get the most work done per dollar spent, figure out whether you have
to spend relatively much on a network or relatively little. If EP or
coarse grained, spend little on the network, and shift more and more
over to the networ at the expense of nodes as the task becomes fine
grained and the ratio of communication to computation increases. Don't
worry about "spending nodes" on faster communications but fewer (maybe a
LOT fewer) nodes that you expected/hoped to get -- if you're doing a
really fine grained real parallel task, it won't scale out to LOTS of
nodes anyway, certainly not without a premiere network interconnecting
As in, nodes+network prices range from ballpark of $750 for a
cost-benefit optimal processor with maybe 512 MB of memory each on a
"cheapest" network to $2000 or even more for a high end network.
However, if your parallel task only scales linearly to 16 nodes with a
cheap network (and maybe even slows DOWN after 32 nodes, so buying 200
doesn't actually help:-) you may be better off using your
200-cheapest-node-budget to buy only 64 nodes that use a network that
permits the task to scale nearly linearly up to all 64.
Hope this helps,
> Thank you
> Jung Choi
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf