Why not NT clusters? Need arguments.

Jeff Layton laytonjb at mindspring.com
Fri Oct 6 20:54:50 EDT 2000


At 08:31 PM 10/6/00 -0400, you wrote:
>On Fri, 6 Oct 2000, Jon Tegner wrote:
>
>> In a disussion of clusters I got the question why not using systems
>> running microsoft NT. I only came up with cost and stability in a
>> sweeping way, and I couldnt present more quantitative arguments. Later,
>> I even found that an nt cluster sits on place 207 on the top500 list
>> (see http://www.top500.org/lists/TOP500List.php3?Y=2000&M=06)
>> is that an exception, or are there many of these beasts around?
>> 
>> I would appreciate to be enlightened on this issue.
>
>
>The fine people on this list can provide a large number of reasons
>for choosing Linux over NT. Let me give you an empirical observation.
>
>I have not read about or seen any reference many production
>NT clusters. (Those that are actually doing something useful)
>On the other hand, I can tell you we have quite a few customers who
>are designing aircraft, computing molecules, and finding oil
>with Linux clusters.  There are many other applications out
>there running day in and day out on Linux clusters. I have not
>seen this with NT. My only conclusion is that if it worked
>well you would see more. One sure reason is that HPC (High Performance
>Computing) has been and will continue to be a *NIX world. Just
>seems to work better.

I can add an anecdote, but I can't name the particular site.

One site that I've worked with (not mine) has a 64 processor
Linux cluster and a cluster with about 32 processor NT
cluster. When I spoke with the sys-admin of the Linux cluster
they told me that the longest the NT cluster had stayed up
for a run was 4 HOURS. On the clusters I run at work (Lockheed-Martin)
our runs are at least 30 hours. Consequently, a 4 hour uptime
for a cluster is absolutely unacceptable even if we save
intermediate results and restart the codes (CFD codes).

In addition, the sys-admin was telling me that during a
run (using MPI) if a node failed, the WHOLE cluster came
unglued and all the nodes had to be rebooted! Not a pretty
picture.

This was last year (1999). I'm not sure what's up in 2000,
but I don't think it would be near the stability of a Linux
cluster. I think the MPI/Pro guys have done some good things
to help reduce the problems on NT clusters, but I don't think
they can cure the unstable OS (with SEVERE memory leaks).
Perhaps Windows 2000 at Cornell is better than NT, but I'm
not betting on it (has anybody checked 2000 for memory leaks
yet?).

Good Luck,

Jeff Layton

Lockheed-Martin

>
>Doug
>
>-------------------------------------------------------------------
>Paralogic, Inc.           |     PEAK     |      Voice:+610.814.2800
>130 Webster Street        |   PARALLEL   |        Fax:+610.814.5844
>Bethlehem, PA 18015 USA   |  PERFORMANCE |    http://www.plogic.com
>-------------------------------------------------------------------
>
>
>_______________________________________________
>Beowulf mailing list
>Beowulf at beowulf.org
>http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list