From andrewxwang at yahoo.com.tw  Sun Feb  1 00:39:40 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sun, 1 Feb 2004 13:39:40 +0800 (CST)
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
Message-ID: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>

http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html

2.6 looks very promising, wondering when distributions
will include it.

Also ia64 performance looks bad when compared to Xeon
or amd64. Intel switching to amd64 is a good choice
;->

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From amacater at galactic.demon.co.uk  Sun Feb  1 05:40:34 2004
From: amacater at galactic.demon.co.uk (Andrew M.A. Cater)
Date: Sun, 1 Feb 2004 10:40:34 +0000
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
Message-ID: <20040201104034.GA9280@galactic.demon.co.uk>

On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote:
> http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
> 
> 2.6 looks very promising, wondering when distributions
> will include it.
> 
Debian unstable does today.  The new installer for the next release
of Debian (currently Debian testing) which is in beta test may well
include a 2.6 kernel option.

> Also ia64 performance looks bad when compared to Xeon
> or amd64. Intel switching to amd64 is a good choice
> ;->
> 
Newsflash: Severe weather means Hell freezes over, preventing flying 
pigs from taking off :)

IIRC: Since you seem well aware of SPBS / storm - is the newest storm 
release fully free / GPL'd such that I can use it anywhere?

Thanks,

Andy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Sun Feb  1 05:57:37 2004
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Sun, 1 Feb 2004 02:57:37 -0800 (PST)
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <1075512676.4915.207.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.04.10402010204440.3455-100000@c-24-18-245-161.client.comcast.net>

> I could easily optimize it more (do the work on a larger buffer at a
> once), but I think enough waste heat has been created here.  This is a
> simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3.

Enough time wasted on finding different solutions to a simple problem?  Surely
not.  Let me toss my hat into the ring:

              Awk     Perl       C    My program (C)
wrnpc10.txt  1.771   1.125   0.506     0.164
shaks12.txt  3.055   1.877   0.955     0.243
big.txt     20.339  12.792   5.858     1.196
vbig.txt   101.466  63.770  29.079     5.666

All times are from a dual PIII-1GHz on a ServerWorks board with 1GB dual
channel PC133 ram.  Each time is the best of three runs and is wall time.

The awk version is by Selva Nair, Perl by Joe Landman, C version by Robert G
Brown.  The Java version isn't portable enough for me to run (go Java!) and I
didn't see the source for a C++/STL version.  Compiler used was gcc 2.96,
awk was 3.1.0, and perl was 5.6.1.

The actual results for shaks12.txt, which are of course never the same:
version  total   unique
awk     902299    31384
perl              23903
C       902299    37499
My      906912    27321
wc      901325

I considered words to be formed from 0-9, a-z, A-Z, and '.  Everything is
lower cased.  The shaks12.txt is complicated by the use of the single quote
for as both for quotations and for contractions.  I also have the list of
words and counts, sorted no less, but do not print it.

I'll give you guys a few days, and see if anyone finds a solution before I
reveal my secrets.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sun Feb  1 06:51:22 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sun, 1 Feb 2004 12:51:22 +0100 (CET)
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402011249380.29291-100000@druifje.clustervision.com>

On Sun, 1 Feb 2004, [big5] Andrew Wang wrote:

> http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
> 
> 2.6 looks very promising, wondering when distributions
> will include it.
> 
It's already possible to use yum to get a 2.6 kernel for Fedora.
(Must start testing it myself).

This prompted me to look at the Fedora roadmap:

http://fedora.redhat.com/participate/schedule/

Looks like 2.6 will be in Fedora 2, scheduled for April.

And very interestingly:
"and integrating work on other architectures (at least AMD64, and possibly 
also SPARC)."

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From klamman.gard at telia.com  Sun Feb  1 07:30:52 2004
From: klamman.gard at telia.com (Per Lindstrom)
Date: Sun, 01 Feb 2004 13:30:52 +0100
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk>
Message-ID: <401CF17C.8080706@telia.com>

I have experienced some problems to compile SMP support for the 
2.6.1-kernel on my Intel Xeon based workstation:
M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
Chipset: Intel 7505
CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB
Graph: GeForce FX 5200 128MB

The SMP support works fine all the way up to kernel 2.4.22 but when 
there is stop for the XEON.

The SMP support works fine for the Intel Tualatin workstation all the 
way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to 
build a 2.6.0.

Please advice if some one have solved this problem.

Best regards
Per Lindstrom
.
.
Andrew M.A. Cater wrote:

>On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote:
>  
>
>>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
>>
>>2.6 looks very promising, wondering when distributions
>>will include it.
>>
>>    
>>
>Debian unstable does today.  The new installer for the next release
>of Debian (currently Debian testing) which is in beta test may well
>include a 2.6 kernel option.
>
>  
>
>>Also ia64 performance looks bad when compared to Xeon
>>or amd64. Intel switching to amd64 is a good choice
>>;->
>>
>>    
>>
>Newsflash: Severe weather means Hell freezes over, preventing flying 
>pigs from taking off :)
>
>IIRC: Since you seem well aware of SPBS / storm - is the newest storm 
>release fully free / GPL'd such that I can use it anywhere?
>
>Thanks,
>
>Andy
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sun Feb  1 06:44:18 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sun, 1 Feb 2004 12:44:18 +0100 (CET)
Subject: [Beowulf] HVAC and room cooling...
In-Reply-To: <401C253E.9040206@obs.unige.ch>
Message-ID: <Pine.LNX.4.44.0402011243180.29291-100000@druifje.clustervision.com>

On Sat, 31 Jan 2004, Pfenniger Daniel wrote:

> 
> Note that in the responded message John was confusing N2 and NO2.

Eeek! I am outed as a physicist...
I've come out of the lab (closet). Guess I can now wear a slide rule
with pride.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Sun Feb  1 12:50:44 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Sun, 1 Feb 2004 12:50:44 -0500 (EST)
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <401CF17C.8080706@telia.com>
Message-ID: <Pine.LNX.4.44.0402011248310.16245-100000@coffee.psychology.mcmaster.ca>

> M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
> Chipset: Intel 7505
> CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
> RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB

all extremely mundane and FULLY supported.

> Graph: GeForce FX 5200 128MB

bzzt.  take it out, try again.  don't even *think* about loading the 
binary nvidia driver.

> The SMP support works fine all the way up to kernel 2.4.22 but when 
> there is stop for the XEON.

needless to say, 2.6 has been extensively tested on xeons, and it works fine.
your problem is specific to your config.

if you want help, you'll have to start by describing how it fails.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From shaeffer at neuralscape.com  Sun Feb  1 13:17:15 2004
From: shaeffer at neuralscape.com (Karen Shaeffer)
Date: Sun, 1 Feb 2004 10:17:15 -0800
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <401CF17C.8080706@telia.com>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk> <401CF17C.8080706@telia.com>
Message-ID: <20040201181715.GB8159@synapse.neuralscape.com>

On Sun, Feb 01, 2004 at 01:30:52PM +0100, Per Lindstrom wrote:
> I have experienced some problems to compile SMP support for the 
> 2.6.1-kernel on my Intel Xeon based workstation:
> M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
> Chipset: Intel 7505
> CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
> RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB
> Graph: GeForce FX 5200 128MB
> 
> The SMP support works fine all the way up to kernel 2.4.22 but when 
> there is stop for the XEON.


I am compiling linux-2.6.2-rc2 on dual XEONs with no problems. The kernels
actually run too. I'm just starting performance testing, but results are
very promising.

Thanks,
Karen

> 
> The SMP support works fine for the Intel Tualatin workstation all the 
> way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to 
> build a 2.6.0.
> 
> Please advice if some one have solved this problem.
> 
> Best regards
> Per Lindstrom
> .
> .
> Andrew M.A. Cater wrote:
> 
> >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote:
> > 
> >
> >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
> >>
> >>2.6 looks very promising, wondering when distributions
> >>will include it.
> >>
> >>   
> >>
> >Debian unstable does today.  The new installer for the next release
> >of Debian (currently Debian testing) which is in beta test may well
> >include a 2.6 kernel option.
> >
> > 
> >
> >>Also ia64 performance looks bad when compared to Xeon
> >>or amd64. Intel switching to amd64 is a good choice
> >>;->
> >>
> >>   
> >>
> >Newsflash: Severe weather means Hell freezes over, preventing flying 
> >pigs from taking off :)
> >
> >IIRC: Since you seem well aware of SPBS / storm - is the newest storm 
> >release fully free / GPL'd such that I can use it anywhere?
> >
> >Thanks,
> >
> >Andy
> >_______________________________________________
> >Beowulf mailing list, Beowulf at beowulf.org
> >To change your subscription (digest mode or unsubscribe) visit 
> >http://www.beowulf.org/mailman/listinfo/beowulf
> >
> > 
> >
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
---end quoted text---

-- 
 Karen Shaeffer
 Neuralscape, Palo Alto, Ca. 94306
 shaeffer at neuralscape.com  http://www.neuralscape.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From poobah_99 at hotmail.com  Sun Feb  1 14:24:03 2004
From: poobah_99 at hotmail.com (Ryan Kastrukoff)
Date: Sun, 01 Feb 2004 11:24:03 -0800
Subject: [Beowulf] unsubscribe universe beowulf@beowulf.org
Message-ID: <Sea2-F65n4ZHkRQq5xV0003baf7@hotmail.com>


_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail  
http://join.msn.com/?page=dept/bcomm&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Sun Feb  1 14:33:03 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Sun, 01 Feb 2004 14:33:03 -0500
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk>
Message-ID: <401D546F.7090109@scalableinformatics.com>


Andrew M.A. Cater wrote:

>
>>Also ia64 performance looks bad when compared to Xeon
>>or amd64. Intel switching to amd64 is a good choice
>>;->
>>
>>    
>>
>Newsflash: Severe weather means Hell freezes over, preventing flying 
>pigs from taking off :)
>  
>

Note:   http://www.hometownvalue.com/hell.htm   which is zip code 48169
According to weather.com, this zip code is about 27 F right now.  As 32 
F is officially "freezing over", we can with all accuracy note that 
indeed, Hell (MI) has frozen over.

Note 2:  It was quite a bit colder last week and up to yesterday where 
southeast Michigan was hovering in the low negative/positive single 
digits in degrees F.   We shouldn't complain as the folks in Minnesota 
have not seen the high side of 0 very much recently.

As for the aerodynamic porcine units, you are on your own.

Joe
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From toon at moene.indiv.nluug.nl  Sun Feb  1 10:37:37 2004
From: toon at moene.indiv.nluug.nl (Toon Moene)
Date: Sun, 01 Feb 2004 16:37:37 +0100
Subject: [Beowulf] HVAC and room cooling...
In-Reply-To: <401C2C97.8020903@tamu.edu>
References: <Pine.LNX.4.44.0401311219210.13853-100000@druifje.clustervision.com> <401BE891.708@obs.unige.ch> <401C0807.4000209@telia.com> <401C253E.9040206@obs.unige.ch> <401C2C97.8020903@tamu.edu>
Message-ID: <401D1D41.8090709@moene.indiv.nluug.nl>

Gerry Creager (N5JXS) wrote:

> That's the end of gas exchange physiology I.  There will be a short quiz 
> Monday.  We'll continue with the next module.  I encourage everyone to 
> have read the Pulmonary Medicine chapters in Harrison's for the next 
> lecture.

Hmmm, I won't hold my breath on that one :-)

-- 
Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Feb  1 15:53:54 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 1 Feb 2004 15:53:54 -0500 (EST)
Subject: [Beowulf] HVAC and room cooling...
In-Reply-To: <401D1D41.8090709@moene.indiv.nluug.nl>
Message-ID: <Pine.LNX.4.44.0402011553070.1223-100000@lilith.rgb.private.net>

On Sun, 1 Feb 2004, Toon Moene wrote:

> Gerry Creager (N5JXS) wrote:
> 
> > That's the end of gas exchange physiology I.  There will be a short quiz 
> > Monday.  We'll continue with the next module.  I encourage everyone to 
> > have read the Pulmonary Medicine chapters in Harrison's for the next 
> > lecture.
> 
> Hmmm, I won't hold my breath on that one :-)

Careful or I'll beat you with John's slide rule (what kinda physicist
uses a slide rule for anything other than a blunt instrument?;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sun Feb  1 21:35:43 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Mon, 2 Feb 2004 10:35:43 +0800 (CST)
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk>
Message-ID: <20040202023543.11015.qmail@web16807.mail.tpe.yahoo.com>

 --- "Andrew M.A. Cater" 
> IIRC: Since you seem well aware of SPBS / storm - is
> the newest storm 
> release fully free / GPL'd such that I can use it
> anywhere?

They now call it "torque", not sure when they are
going to get a new name again :(

Not sure what you mean by "use it anywhere". You can
use SPBS (yes, I like this name better) in commerical
environments. If you make modifications to SPBS, you
need to provide the source code for download.

If you want to modify the source, and sell it as a
product, you may want to use SGE.

AFAIK, SGE uses a license similar to the BSD, while
OpenPBS uses a license similar to GPL.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nixon at nsc.liu.se  Mon Feb  2 05:19:30 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Mon, 02 Feb 2004 11:19:30 +0100
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <1075566655.2560.8.camel@loiosh> (agrajag@dragaera.net's
 message of "31 Jan 2004 11:30:56 -0500")
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
	<1075566655.2560.8.camel@loiosh>
Message-ID: <m3d68xyea5.fsf@nammatj.nsc.liu.se>

Jag <agrajag at dragaera.net> writes:

> On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
>
>> NIS works fine for many purposes as well, but be warned -- in certain
>> configurations and for certain tasks it becomes a very high overhead
>> protocol.  In particular, it adds an NIS hit to every file stat, for
>> example, so that it can check groups and permissions.  
>
> A good way around this is to run nscd (Name Services Caching Daemon). 

I'm really, really suspicious against nscd. I've more than once seen
it hang on to stale information forever for no good reason at all.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Mon Feb  2 07:45:05 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Mon, 2 Feb 2004 06:45:05 -0600 (CST)
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <m3d68xyea5.fsf@nammatj.nsc.liu.se>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
 <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se>
Message-ID: <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>

Nscd is a necessary evil sometimes though.

-B

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Mon, 2 Feb 2004, Leif Nixon wrote:

> Jag <agrajag at dragaera.net> writes:
>
> > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> >
> >> NIS works fine for many purposes as well, but be warned -- in certain
> >> configurations and for certain tasks it becomes a very high overhead
> >> protocol.  In particular, it adds an NIS hit to every file stat, for
> >> example, so that it can check groups and permissions.
> >
> > A good way around this is to run nscd (Name Services Caching Daemon).
>
> I'm really, really suspicious against nscd. I've more than once seen
> it hang on to stale information forever for no good reason at all.
>
> --
> Leif Nixon                                    Systems expert
> ------------------------------------------------------------
> National Supercomputer Centre           Linkoping University
> ------------------------------------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Mon Feb  2 10:32:01 2004
From: timm at fnal.gov (Steven Timm)
Date: Mon, 2 Feb 2004 09:32:01 -0600 (CST)
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <1075730850.3936.19.camel@protein.scalableinformatics.com>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
 <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se>
 <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
 <1075730850.3936.19.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.58.0402020928020.11811@boxer.fnal.gov>

On Mon, 2 Feb 2004, Joe Landman wrote:

> I have tried to avoid NIS on linux, as it appears not to be as stable as
> needed under heavy load.  I have had customers bring it crashing down
> when it serves login information, just by running simple scripts across
> the cluster.

To clarify, the problem is when there is some cron job (or reboot)
in which a couple of hundred nodes all go after the NIS server
at once.  It's magnified by the fact that there's an NIS lookup
done even when it's a user in the local password file such as root.

The problems can be mitigated by having a lot of nodes be slaves.
At one point I had all of the nodes of my cluster be slaves.  But
the problem with that is that the transmission protocol is not
perfect and every once in a while you wind up with a slave
server that is down a map or two.

We've now shifted to pushing out our password files via rsync.

>
> I prefer pushing name service lookups through DNS, and I tend to use
> dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html).
> Setting up a full blown named/bind system for a cluster seems like
> significant overkill in most cases.
>
> On the authentication side, I had high hopes for LDAP, but haven't been
> able to easily/repeatably make a working LDAP server with databases.  I
> am starting to think more along the lines of a simple database with pam
> modules on the frontend.  See
> http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or
> http://sourceforge.net/projects/pam-mysql/ for examples.

Our set of kerberos 5 kdc's have thus far been able to handle the load
of some 1500 nodes with more still coming.  Plus then we have no
real passwords in the passwd file and thus the security issues
of distributing it are much less critical.

Steve Timm


>
>
>
> On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote:
> > Nscd is a necessary evil sometimes though.
> >
> > -B
> >
> > Brent Clements
> > Linux Technology Specialist
> > Information Technology
> > Rice University
> >
> >
> > On Mon, 2 Feb 2004, Leif Nixon wrote:
> >
> > > Jag <agrajag at dragaera.net> writes:
> > >
> > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> > > >
> > > >> NIS works fine for many purposes as well, but be warned -- in certain
> > > >> configurations and for certain tasks it becomes a very high overhead
> > > >> protocol.  In particular, it adds an NIS hit to every file stat, for
> > > >> example, so that it can check groups and permissions.
> > > >
> > > > A good way around this is to run nscd (Name Services Caching Daemon).
> > >
> > > I'm really, really suspicious against nscd. I've more than once seen
> > > it hang on to stale information forever for no good reason at all.
> > >
> > > --
> > > Leif Nixon                                    Systems expert
> > > ------------------------------------------------------------
> > > National Supercomputer Centre           Linkoping University
> > > ------------------------------------------------------------
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Mon Feb  2 09:07:30 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 02 Feb 2004 09:07:30 -0500
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
	 <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se>
	 <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
Message-ID: <1075730850.3936.19.camel@protein.scalableinformatics.com>

I have tried to avoid NIS on linux, as it appears not to be as stable as
needed under heavy load.  I have had customers bring it crashing down
when it serves login information, just by running simple scripts across
the cluster. 

I prefer pushing name service lookups through DNS, and I tend to use
dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html). 
Setting up a full blown named/bind system for a cluster seems like
significant overkill in most cases.  

On the authentication side, I had high hopes for LDAP, but haven't been
able to easily/repeatably make a working LDAP server with databases.  I
am starting to think more along the lines of a simple database with pam
modules on the frontend.  See
http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or
http://sourceforge.net/projects/pam-mysql/ for examples.  


On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote:
> Nscd is a necessary evil sometimes though.
> 
> -B
> 
> Brent Clements
> Linux Technology Specialist
> Information Technology
> Rice University
> 
> 
> On Mon, 2 Feb 2004, Leif Nixon wrote:
> 
> > Jag <agrajag at dragaera.net> writes:
> >
> > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> > >
> > >> NIS works fine for many purposes as well, but be warned -- in certain
> > >> configurations and for certain tasks it becomes a very high overhead
> > >> protocol.  In particular, it adds an NIS hit to every file stat, for
> > >> example, so that it can check groups and permissions.
> > >
> > > A good way around this is to run nscd (Name Services Caching Daemon).
> >
> > I'm really, really suspicious against nscd. I've more than once seen
> > it hang on to stale information forever for no good reason at all.
> >
> > --
> > Leif Nixon                                    Systems expert
> > ------------------------------------------------------------
> > National Supercomputer Centre           Linkoping University
> > ------------------------------------------------------------
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Mon Feb  2 09:24:25 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Mon, 2 Feb 2004 08:24:25 -0600 (CST)
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <1075730850.3936.19.camel@protein.scalableinformatics.com>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
  <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se> 
 <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
 <1075730850.3936.19.camel@protein.scalableinformatics.com>
Message-ID: <Pine.GSO.4.58.0402020822350.8906@is.rice.edu>

We use ldap extensively here on all of our clusters that IT maintains. We
like it because it allows great flexibility if we need to write web
based account management systems for groups on campus. LDAP is actually
very very easy to implement, especially if you use redhat as your
distribution. We use redhat mostly exclusive here so our setup and
configuration for ldap is pretty cookie-cutter.


-Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Mon, 2 Feb 2004, Joe Landman wrote:

> I have tried to avoid NIS on linux, as it appears not to be as stable as
> needed under heavy load.  I have had customers bring it crashing down
> when it serves login information, just by running simple scripts across
> the cluster.
>
> I prefer pushing name service lookups through DNS, and I tend to use
> dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html).
> Setting up a full blown named/bind system for a cluster seems like
> significant overkill in most cases.
>
> On the authentication side, I had high hopes for LDAP, but haven't been
> able to easily/repeatably make a working LDAP server with databases.  I
> am starting to think more along the lines of a simple database with pam
> modules on the frontend.  See
> http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or
> http://sourceforge.net/projects/pam-mysql/ for examples.
>
>
>
> On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote:
> > Nscd is a necessary evil sometimes though.
> >
> > -B
> >
> > Brent Clements
> > Linux Technology Specialist
> > Information Technology
> > Rice University
> >
> >
> > On Mon, 2 Feb 2004, Leif Nixon wrote:
> >
> > > Jag <agrajag at dragaera.net> writes:
> > >
> > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> > > >
> > > >> NIS works fine for many purposes as well, but be warned -- in certain
> > > >> configurations and for certain tasks it becomes a very high overhead
> > > >> protocol.  In particular, it adds an NIS hit to every file stat, for
> > > >> example, so that it can check groups and permissions.
> > > >
> > > > A good way around this is to run nscd (Name Services Caching Daemon).
> > >
> > > I'm really, really suspicious against nscd. I've more than once seen
> > > it hang on to stale information forever for no good reason at all.
> > >
> > > --
> > > Leif Nixon                                    Systems expert
> > > ------------------------------------------------------------
> > > National Supercomputer Centre           Linkoping University
> > > ------------------------------------------------------------
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Mon Feb  2 09:29:49 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 02 Feb 2004 09:29:49 -0500
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <Pine.GSO.4.58.0402020822350.8906@is.rice.edu>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
	 <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se>
	 <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
	 <1075730850.3936.19.camel@protein.scalableinformatics.com>
	 <Pine.GSO.4.58.0402020822350.8906@is.rice.edu>
Message-ID: <1075732189.3936.28.camel@protein.scalableinformatics.com>

On Mon, 2004-02-02 at 09:24, Brent M. Clements wrote:
> We use ldap extensively here on all of our clusters that IT maintains. We
> like it because it allows great flexibility if we need to write web
> based account management systems for groups on campus. LDAP is actually
> very very easy to implement, especially if you use redhat as your
> distribution. We use redhat mostly exclusive here so our setup and
> configuration for ldap is pretty cookie-cutter.

I know the clients are rather easy, it is setting up the server that I
found somewhat difficult.  I did go through the howto's, used the RH
packages.  Had some issues I could not find resolution to.  This was
about a year ago.  

I have a nice LDAP server set up with a completely read-only database
now.  I haven't been able to convince it to let clients write (e.g.
password and other changes).  Not sure what I am doing wrong, relatively
sure it is pilot error.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jonbernard at uab.edu  Mon Feb  2 11:46:21 2004
From: jonbernard at uab.edu (Jon B Bernard)
Date: Mon, 2 Feb 2004 10:46:21 -0600
Subject: [Beowulf] HVAC and room cooling...
Message-ID: <92E49C92F9CDBF4EA106E2E7154938830202B1F3@UABEXMB1.ad.uab.edu>

The American Society of Heating, Refrigerating and Air-Conditioning
Engineers (www.ashrae.org) has just released "Thermal Guidelines for
Data Processing Environments". It looks like there's also a summary
available in the January issue of their journal, or online for $8.

Jon

-----Original Message-----
From: Brent M. Clements [mailto:bclem at rice.edu]
Sent: Friday, January 30, 2004 11:18 PM
To: rossini at u.washington.edu
Cc: John Bushnell; beowulf at beowulf.org
Subject: Re: [Beowulf] HVAC and room cooling...


I have found that the best thing to do is outsource the colocation of
your
equipment. The cost of installing and maintaining the proper type of
cooling and ventilation for mid-large size clusters costs more than to
colocate.

We are currently exploring placing our larger clusters in colocation
facilities right now.

The only downside that we have is that we can't find colocation
facilities
that will give us 24/7 physical access to our equipment. As you all
know...researchers push beowulf hardware to the limits and the meantime
to
failure is higher.

-B

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Fri, 30 Jan 2004, A.J. Rossini wrote:

> John Bushnell <bushnell at chem.ucsb.edu> writes:
>
> > (So many watts) times 'x' equals how many "tons" of AC.  Multiply
> > by at least two of course ;-) 
>
> Or 3, sigh...
>
> >>Also, does anyone have any brilliant thoughts for cooling an
internal
> >>room that can't affordably get chilled water?  (I've been suggesting
> >>to people that it isn't possible, but someone brought up "portable
> >>liquid nitrogen" -- for the room, NOT for overclocking -- I'm trying
> >>to get stable systems, not instability :-).
> >
> >   You can have an external heat exchanger.  If you are lucky and
are,
> > say, on the first floor somewhere close to an external wall, it is
> > pretty simple to run a small pipe between the internal AC and the
> > heat exchanger outside.  Don't know how far it is practical to run
> > one though.  We have one in our computer room, but it is only six
> > feet or so from the exchanger outside.  Our newer AC runs on chilled
> > water which was quoted for a lot less than another inside/outside
> > combo, but we already had a leftover chilled water supply in the
> > computer room.
>
> I've looked at the chilled-water approach.  They estimated between
> $40k-$80k.  oops (this room is REALLY in the middle of the building.
> Great for other computing purposes, but not for cooling).
>
> I'm looking for the proverbial vent-free A/C.  Sort of like
> frictionless tables and similar devices I recall from undergraduate
> physics...
>
> Thanks for the comments!
>
> best,
> -tony
>
> --
> rossini at u.washington.edu
http://www.analytics.washington.edu/
> Biomedical and Health Informatics   University of Washington
> Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research
Center
> UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
> FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email
>
> CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be
> confidential and privileged. If you received this message in error,
> please destroy it and notify the sender. Thank you.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Eckhoff.Peter at epamail.epa.gov  Mon Feb  2 16:27:25 2004
From: Eckhoff.Peter at epamail.epa.gov (Eckhoff.Peter at epamail.epa.gov)
Date: Mon, 02 Feb 2004 16:27:25 -0500
Subject: [Beowulf] Re: HVAC and room cooling...
Message-ID:  <OF3CA7E126.F8954003-ON85256E2E.0070752C-85256E2E.0076A971@epamail.epa.gov>


We have 3 - 16 hard drive file servers, 13 compute nodes and a master
unit.  We
had to spread the load from 3 to 4 - 20 Amp circuits to keep from
popping
circuit breakers.

We have AC coming into an interior room and experienced several
problems.

Problem 1:  There was no adequate exhaust system.  5 active vents in , 1
passive
    vent out and in the wrong location.

Solution:  We substituted in several grates in place of acoustic tiles.
     The heat is vented up into the plenum above.  There are fans atop
the rack
    venting the interior of the rack into one of the grates above.  The
other heat follows.

Problem 2:  What do you do when the AC stops?  Maintenance and the
    occasional AC system oops can be devastating to a cluster in a small
room.

Solution 2a:  We are tied directly into a security system.  When a
sensor in the room
   reaches a temperature level, "Security" responds dependent upon the
level
   detected.

Solution 2b:  We installed a backup automated telephone dialer.  Not
that we don't
   trust "Security", but we wanted a backup to let us know what was
going on.
   When the temperature reaches a certain level, the phone dials us with
an
   automated message:
   " This is the Sensaphone 1108.  The time is 1:36 AM and ...
   [ ed.  your CPUs are about to fry... Have a nice night!!!"  ;-)  ]

Solution 2c:  Install a thermal sensor into a serial or tcp/ip socket.
Some vendors
   have software that read these sensors and will shut down the
machines.  We are
   still working on our system.  Others' experiences and solutions are
welcomed.
   We are using dual Tyan motherboards with dual AMD MP processors.

Good luck!!

Peter

*******************************************
Peter Eckhoff
Environmental Scientist
U.S. Environmental Protection Agency
4930 Page Road, D243-01
Research Triangle Park, NC 27709

Tel: (919) 541-5385
Fax: (919) 541-0044
E-mail: eckhoff.peter at epa.gov
Website:  www.epa.gov/scram001

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Mon Feb  2 19:56:33 2004
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 02 Feb 2004 16:56:33 -0800
Subject: [Beowulf] Re: HVAC and room cooling...
Message-ID: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov>

At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote:


>Problem 2:  What do you do when the AC stops?  Maintenance and the 
>occasional AC system oops can be devastating to a cluster in a small room.
>
>Solution 2a:  We are tied directly into a security system.  When a
>sensor in the room reaches a temperature level, "Security" responds 
>dependent upon the
>level detected.
>
>Solution 2b:  We installed a backup automated telephone dialer.  Not
>that we don't trust "Security", but we wanted a backup to let us know what was
>going on.
>    When the temperature reaches a certain level, the phone dials us with
>an
>    automated message:
>    " This is the Sensaphone 1108.  The time is 1:36 AM and ...
>    [ ed.  your CPUs are about to fry... Have a nice night!!!"  ;-)  ]

YOu need to seriously consider a "failsafe" totally automated shutdown (as 
in chop the power when temperature gets to, say, 40C, in the room)... 
Security might be busy (maybe there was a big problem with the chiller 
plant catching fire or the boiler exploding.. if they're directing fire 
engine traffic, the last thing they're going to be thinking about is going 
over to your machine room and shutting down your hardware.

The autodialer is nice, but, what if you're out of town when the balloon 
goes up?

A simple temperature sensor with a contact closure wired into the "shunt 
trip" on your power distribution will work quite nicely as a "kill it 
before it melts". Sure, the file system will be corrupted, and so forth, 
but, at least, you'll have functioning hardware to rebuild it on.

Automated monitoring and tcp sockets are nice for management in the day to 
day situation, ideal for answering questions like: Should we get another 
fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if 
there's a DDoS attack on someone near you, and netops decides to shut down 
the router. What if all those Windows desktops run amok, sending mass 
emails to each other or trying to remotely manage each other's IIS, 
bringing the network to a grinding halt.

The upshot is: Do not trust computers to save your computers in the 
ultimate extreme.  Have a totally separate, bulletproof system.  It's 
cheap, it's reliable, all that stuff.


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Tue Feb  3 08:40:25 2004
From: gropp at mcs.anl.gov (William Gropp)
Date: Tue, 03 Feb 2004 07:40:25 -0600
Subject: [Beowulf] O'Reilly's _Building Linux Clusters_
In-Reply-To: <20040203125618.GA6026@mikee.ath.cx>
References: <20040203125618.GA6026@mikee.ath.cx>
Message-ID: <6.0.0.22.2.20040203073727.02614538@localhost>

At 06:56 AM 2/3/2004, Mike Eggleston wrote:
>This book from 2000 discusses building clusters from linux. I
>bought it from a discount store not because I'm going to build
>another cluster from linux, but rather because of the discussions
>on cluster management. Has anyone read/implemented his approach?
>What other cluster management techniques/solutions are out there?

Beowulf Cluster Computing With Linux, 2nd edition (MIT Press) includes 
chapters on cluster setup and cluster management (new in the 2nd 
edition).  Disclaimer: I'm one of the editors of this book.

Bill 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Tue Feb  3 09:05:07 2004
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Tue, 3 Feb 2004 08:05:07 -0600
Subject: [Beowulf] O'Reilly's _Building Linux Clusters_
In-Reply-To: <6.0.0.22.2.20040203073727.02614538@localhost>
References: <20040203125618.GA6026@mikee.ath.cx> <6.0.0.22.2.20040203073727.02614538@localhost>
Message-ID: <20040203140507.GB6026@mikee.ath.cx>

On Tue, 03 Feb 2004, William Gropp wrote:

> At 06:56 AM 2/3/2004, Mike Eggleston wrote:
> >This book from 2000 discusses building clusters from linux. I
> >bought it from a discount store not because I'm going to build
> >another cluster from linux, but rather because of the discussions
> >on cluster management. Has anyone read/implemented his approach?
> >What other cluster management techniques/solutions are out there?
> 
> Beowulf Cluster Computing With Linux, 2nd edition (MIT Press) includes 
> chapters on cluster setup and cluster management (new in the 2nd 
> edition).  Disclaimer: I'm one of the editors of this book.
> 
> Bill 
> 
> 

I have the 1st edition and it does have a chapter discussing
some of the management. How would this method scale to managing
a (not really a cluster) group of AIX servers?

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Eckhoff.Peter at epamail.epa.gov  Tue Feb  3 09:26:37 2004
From: Eckhoff.Peter at epamail.epa.gov (Eckhoff.Peter at epamail.epa.gov)
Date: Tue, 03 Feb 2004 09:26:37 -0500
Subject: [Beowulf] Re: HVAC and room cooling...
Message-ID:  <OFD63F60DF.CB249E13-ON85256E2F.004BD6FB-85256E2F.0050232D@epamail.epa.gov>


Hello Jim

The main goal for us is to stay up and running as long as we can.

 (Please read the last paragraph before responding to this one:)
Most of our temperature problems have been caused by AC maintenance
induced
temperature spikes.  Having "security" open the doors slows the room
heating
process.  The Sensaphone call to us helps us to know that there is a
problem
and we can phone in to be briefed.  "Do we have to come in or has the
room already
begun to cool?"

The last of the Solutions is for just the type of incident that you
describe.  These are
very rare but like you say, they need to be planned for.  Our ideal goal
would be one
that signals a problem to the cluster.  The cluster takes the signal and
gracefully shuts
down the programs and then shuts down the nodes.  We did not find such a
solution on
the commercial market for our "came with the room" UPS.

Instead we found a sensor/software combination where the sensor ties
into the
serial port of one of the nodes.  So far we **have** been able to
gracefully shut down the
programs that are running.  We have **not** found a way to automatically
turn off the
various cluster nodes.  That's where we need some help/suggestions.


*******************************************
Peter Eckhoff
Environmental Scientist
U.S. Environmental Protection Agency
4930 Page Road, D243-01
Research Triangle Park, NC 27709

Tel: (919) 541-5385
Fax: (919) 541-0044
E-mail: eckhoff.peter at epa.gov
Website:  www.epa.gov/scram001


                      Jim Lux                                                                                                          
                      <James.P.Lux at jpl.        To:       Peter Eckhoff/RTP/USEPA/US at EPA, beowulf at scyld.com                             
                      nasa.gov>                cc:                                                                                     
                                               Subject:  Re: [Beowulf] Re: HVAC and room cooling...                                    
                      02/02/04 07:56 PM                                                                                                
                                                                                                                                       
                                                                                                                                       
At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote:


>Problem 2:  What do you do when the AC stops?  Maintenance and the
>occasional AC system oops can be devastating to a cluster in a small
room.
>
>Solution 2a:  We are tied directly into a security system.  When a
>sensor in the room reaches a temperature level, "Security" responds
>dependent upon the
>level detected.
>
>Solution 2b:  We installed a backup automated telephone dialer.  Not
>that we don't trust "Security", but we wanted a backup to let us know
what was
>going on.
>    When the temperature reaches a certain level, the phone dials us
with
>an
>    automated message:
>    " This is the Sensaphone 1108.  The time is 1:36 AM and ...
>    [ ed.  your CPUs are about to fry... Have a nice night!!!"  ;-)  ]

YOu need to seriously consider a "failsafe" totally automated shutdown
(as
in chop the power when temperature gets to, say, 40C, in the room)...
Security might be busy (maybe there was a big problem with the chiller
plant catching fire or the boiler exploding.. if they're directing fire
engine traffic, the last thing they're going to be thinking about is
going
over to your machine room and shutting down your hardware.

The autodialer is nice, but, what if you're out of town when the balloon

goes up?

A simple temperature sensor with a contact closure wired into the "shunt

trip" on your power distribution will work quite nicely as a "kill it
before it melts". Sure, the file system will be corrupted, and so forth,

but, at least, you'll have functioning hardware to rebuild it on.

Automated monitoring and tcp sockets are nice for management in the day
to
day situation, ideal for answering questions like: Should we get another

fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if

there's a DDoS attack on someone near you, and netops decides to shut
down
the router. What if all those Windows desktops run amok, sending mass
emails to each other or trying to remotely manage each other's IIS,
bringing the network to a grinding halt.

The upshot is: Do not trust computers to save your computers in the
ultimate extreme.  Have a totally separate, bulletproof system.  It's
cheap, it's reliable, all that stuff.


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From grid at iki.fi  Tue Feb  3 09:26:53 2004
From: grid at iki.fi (Michael Kustaa Gindonis)
Date: Tue, 3 Feb 2004 16:26:53 +0200
Subject: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset
In-Reply-To: <200402021546.i12Fk4h24131@NewBlue.scyld.com>
References: <200402021546.i12Fk4h24131@NewBlue.scyld.com>
Message-ID: <200402031626.53453.grid@iki.fi>

Hi,

I noticed in the Linux kernel configuration that there is support for LSI's 
Fusion-MPT chipset. Also, it is possible to run MPI over this.

Do any readers  of this list have any experiences in this area?

Knowledge about LSI's plans to support this chipset in the future?

... Mike 


On Monday 02 February 2004 17:46, beowulf-request at scyld.com wrote:
> Send Beowulf mailing list submissions to
> 	beowulf at beowulf.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://www.beowulf.org/mailman/listinfo/beowulf
> or, via email, send a message with subject or body 'help' to
> 	beowulf-request at beowulf.org
>
> You can reach the person managing the list at
> 	beowulf-admin at beowulf.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Beowulf digest..."
>
>
> Today's Topics:
>
>    1. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Mark Hahn)
>    2. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Karen Shaeffer)
>    3. unsubscribe universe beowulf at beowulf.org (Ryan Kastrukoff)
>    4. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Joe Landman)
>    5. Re: HVAC and room cooling... (Toon Moene)
>    6. Re: HVAC and room cooling... (Robert G. Brown)
>    7. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (=?big5?q?Andrew=20Wang?=)
>    8. Re: Authentication within beowulf clusters. (Leif Nixon)
>    9. Re: Authentication within beowulf clusters. (Brent M. Clements)
>   10. Re: Authentication within beowulf clusters. (Joe Landman)
>   11. Re: Authentication within beowulf clusters. (Brent M. Clements)
>   12. Re: Authentication within beowulf clusters. (Joe Landman)
>   13. Re: Authentication within beowulf clusters. (Steven Timm)
>
> --__--__--
>
> Message: 1
> Date: Sun, 1 Feb 2004 12:50:44 -0500 (EST)
> From: Mark Hahn <hahn at physics.mcmaster.ca>
> To: Per Lindstrom <klamman.gard at telia.com>
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
>
> > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
> > Chipset: Intel 7505
> > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
> > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB
>
> all extremely mundane and FULLY supported.
>
> > Graph: GeForce FX 5200 128MB
>
> bzzt.  take it out, try again.  don't even *think* about loading the
> binary nvidia driver.
>
> > The SMP support works fine all the way up to kernel 2.4.22 but when
> > there is stop for the XEON.
>
> needless to say, 2.6 has been extensively tested on xeons, and it works
> fine. your problem is specific to your config.
>
> if you want help, you'll have to start by describing how it fails.
>
>
> --__--__--
>
> Message: 2
> Date: Sun, 1 Feb 2004 10:17:15 -0800
> From: Karen Shaeffer <shaeffer at neuralscape.com>
> To: Per Lindstrom <klamman.gard at telia.com>
> Cc: "Andrew M.A. Cater" <amacater at galactic.demon.co.uk>,
> beowulf at beowulf.org Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs
> amd64
>
> On Sun, Feb 01, 2004 at 01:30:52PM +0100, Per Lindstrom wrote:
> > I have experienced some problems to compile SMP support for the
> > 2.6.1-kernel on my Intel Xeon based workstation:
> > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
> > Chipset: Intel 7505
> > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
> > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB
> > Graph: GeForce FX 5200 128MB
> >
> > The SMP support works fine all the way up to kernel 2.4.22 but when
> > there is stop for the XEON.
>
> I am compiling linux-2.6.2-rc2 on dual XEONs with no problems. The kernels
> actually run too. I'm just starting performance testing, but results are
> very promising.
>
> Thanks,
> Karen
>
> > The SMP support works fine for the Intel Tualatin workstation all the
> > way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to
> > build a 2.6.0.
> >
> > Please advice if some one have solved this problem.
> >
> > Best regards
> > Per Lindstrom
> > .
> > .
> >
> > Andrew M.A. Cater wrote:
> > >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote:
> > >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
> > >>
> > >>2.6 looks very promising, wondering when distributions
> > >>will include it.
> > >
> > >Debian unstable does today.  The new installer for the next release
> > >of Debian (currently Debian testing) which is in beta test may well
> > >include a 2.6 kernel option.
> > >
> > >>Also ia64 performance looks bad when compared to Xeon
> > >>or amd64. Intel switching to amd64 is a good choice
> > >>;->
> > >
> > >Newsflash: Severe weather means Hell freezes over, preventing flying
> > >pigs from taking off :)
> > >
> > >IIRC: Since you seem well aware of SPBS / storm - is the newest storm
> > >release fully free / GPL'd such that I can use it anywhere?
> > >
> > >Thanks,
> > >
> > >Andy
> > >_______________________________________________
> > >Beowulf mailing list, Beowulf at beowulf.org
> > >To change your subscription (digest mode or unsubscribe) visit
> > >http://www.beowulf.org/mailman/listinfo/beowulf
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
> ---end quoted text---
>
> --
>  Karen Shaeffer
>  Neuralscape, Palo Alto, Ca. 94306
>  shaeffer at neuralscape.com  http://www.neuralscape.com
>
> --__--__--
>
> Message: 3
> From: "Ryan Kastrukoff" <poobah_99 at hotmail.com>
> To: beowulf at beowulf.org
> Date: Sun, 01 Feb 2004 11:24:03 -0800
> Subject: [Beowulf] unsubscribe universe beowulf at beowulf.org
>
>
>
> _________________________________________________________________
> The new MSN 8: smart spam protection and 2 months FREE*
> http://join.msn.com/?page=features/junkmail
> http://join.msn.com/?page=dept/bcomm&pgmarket=en-ca&RU=http%3a%2f%2fjoin.ms
>n.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca
>
>
> --__--__--
>
> Message: 4
> Date: Sun, 01 Feb 2004 14:33:03 -0500
> From: Joe Landman <landman at scalableinformatics.com>
> To: "Andrew M.A. Cater" <amacater at galactic.demon.co.uk>
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
>
> Andrew M.A. Cater wrote:
> >>Also ia64 performance looks bad when compared to Xeon
> >>or amd64. Intel switching to amd64 is a good choice
> >>;->
> >
> >Newsflash: Severe weather means Hell freezes over, preventing flying
> >pigs from taking off :)
>
> Note:   http://www.hometownvalue.com/hell.htm   which is zip code 48169
> According to weather.com, this zip code is about 27 F right now.  As 32
> F is officially "freezing over", we can with all accuracy note that
> indeed, Hell (MI) has frozen over.
>
> Note 2:  It was quite a bit colder last week and up to yesterday where
> southeast Michigan was hovering in the low negative/positive single
> digits in degrees F.   We shouldn't complain as the folks in Minnesota
> have not seen the high side of 0 very much recently.
>
> As for the aerodynamic porcine units, you are on your own.
>
> Joe
>
> --__--__--
>
> Message: 5
> Date: Sun, 01 Feb 2004 16:37:37 +0100
> From: Toon Moene <toon at moene.indiv.nluug.nl>
> Organization: Moene Computational Physics, Maartensdijk, The Netherlands
> To: gerry.creager at tamu.edu
> CC: Pfenniger Daniel <daniel.pfenniger at obs.unige.ch>,
>    Per Lindstrom <klamman.gard at telia.com>,
>    John Hearns <john.hearns at clustervision.com>, rossini at u.washington.edu,
>    beowulf at beowulf.org
> Subject: Re: [Beowulf] HVAC and room cooling...
>
> Gerry Creager (N5JXS) wrote:
> > That's the end of gas exchange physiology I.  There will be a short quiz
> > Monday.  We'll continue with the next module.  I encourage everyone to
> > have read the Pulmonary Medicine chapters in Harrison's for the next
> > lecture.
>
> Hmmm, I won't hold my breath on that one :-)
>
> --
> Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
> GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction)
>
>
> --__--__--
>
> Message: 6
> Date: Sun, 1 Feb 2004 15:53:54 -0500 (EST)
> From: "Robert G. Brown" <rgb at phy.duke.edu>
> To: Toon Moene <toon at moene.indiv.nluug.nl>
> Cc: gerry.creager at tamu.edu, Pfenniger Daniel
> <daniel.pfenniger at obs.unige.ch>, Per Lindstrom <klamman.gard at telia.com>,
>    John Hearns <john.hearns at clustervision.com>, <rossini at u.washington.edu>,
>    <beowulf at beowulf.org>
> Subject: Re: [Beowulf] HVAC and room cooling...
>
> On Sun, 1 Feb 2004, Toon Moene wrote:
> > Gerry Creager (N5JXS) wrote:
> > > That's the end of gas exchange physiology I.  There will be a short
> > > quiz Monday.  We'll continue with the next module.  I encourage
> > > everyone to have read the Pulmonary Medicine chapters in Harrison's for
> > > the next lecture.
> >
> > Hmmm, I won't hold my breath on that one :-)
>
> Careful or I'll beat you with John's slide rule (what kinda physicist
> uses a slide rule for anything other than a blunt instrument?;-)
>
>    rgb
>
> --
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
>
> --__--__--
>
> Message: 7
> Date: Mon, 2 Feb 2004 10:35:43 +0800 (CST)
> From: =?big5?q?Andrew=20Wang?= <andrewxwang at yahoo.com.tw>
> Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
> To: beowulf at beowulf.org
>
>  --- "Andrew M.A. Cater"
>
> > IIRC: Since you seem well aware of SPBS / storm - is
> > the newest storm
> > release fully free / GPL'd such that I can use it
> > anywhere?
>
> They now call it "torque", not sure when they are
> going to get a new name again :(
>
> Not sure what you mean by "use it anywhere". You can
> use SPBS (yes, I like this name better) in commerical
> environments. If you make modifications to SPBS, you
> need to provide the source code for download.
>
> If you want to modify the source, and sell it as a
> product, you may want to use SGE.
>
> AFAIK, SGE uses a license similar to the BSD, while
> OpenPBS uses a license similar to GPL.
>
> Andrew.
>
>
> -----------------------------------------------------------------
> ?C???? Yahoo!?_??
> ?????C???B?????????B?R?A???????A???b?H??????
> http://tw.promo.yahoo.com/mail_premium/stationery.html
>
> --__--__--
>
> Message: 8
> To: Beowulf Mailing List <beowulf at beowulf.org>
> Subject: Re: [Beowulf] Authentication within beowulf clusters.
> From: Leif Nixon <nixon at nsc.liu.se>
> Date: Mon, 02 Feb 2004 11:19:30 +0100
>
> Jag <agrajag at dragaera.net> writes:
> > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> >> NIS works fine for many purposes as well, but be warned -- in certain
> >> configurations and for certain tasks it becomes a very high overhead
> >> protocol.  In particular, it adds an NIS hit to every file stat, for
> >> example, so that it can check groups and permissions.
> >
> > A good way around this is to run nscd (Name Services Caching Daemon).
>
> I'm really, really suspicious against nscd. I've more than once seen
> it hang on to stale information forever for no good reason at all.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nixon at nsc.liu.se  Tue Feb  3 04:21:24 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Tue, 03 Feb 2004 10:21:24 +0100
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov> (Jim
 Lux's message of "Mon, 02 Feb 2004 16:56:33 -0800")
References: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov>
Message-ID: <m3ad40plgr.fsf@nammatj.nsc.liu.se>

Jim Lux <James.P.Lux at jpl.nasa.gov> writes:

> YOu need to seriously consider a "failsafe" totally automated shutdown
> (as in chop the power when temperature gets to, say, 40C, in the
> room)... Security might be busy (maybe there was a big problem with
> the chiller plant catching fire or the boiler exploding.. if they're
> directing fire engine traffic, the last thing they're going to be
> thinking about is going over to your machine room and shutting down
> your hardware.

Ah, that reminds me of the bad old days in industry.

The A/C went belly up the night between Friday and Saturday. That
triggered the alarm down at Security, who promptly called the on-duty
ventilation technicians and notified us. Excellent.

Except that the A/C alarm was never reset properly, so when the A/C
failed again Saturday afternoon nobody noticed.

When the temperature reached 35C, the thermal kill switch triggered
automatically. Pity that the electrician had never got around to
actually, like, *wire* it to anything.

We arrived Monday morning to the smell of frying electronics.
Expensive weekend, that.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From verycoldpenguin at hotmail.com  Tue Feb  3 11:24:18 2004
From: verycoldpenguin at hotmail.com (Gareth Glaccum)
Date: Tue, 03 Feb 2004 16:24:18 +0000
Subject: [Beowulf] Re: HVAC and room cooling...
Message-ID: <Law15-F58TXipOejTVs000394e5@hotmail.com>

We sell solutions with automated power-off scripts upon node overheat using 
some of the APC products controlled from a linux master. Not that particular 
unit though.
Gareth


>From: Joshua Baker-LePain <jlb17 at duke.edu>
>To: Eckhoff.Peter at epamail.epa.gov
>CC: beowulf at scyld.com
>Subject: Re: [Beowulf] Re: HVAC and room cooling...
>Date: Tue, 3 Feb 2004 10:11:32 -0500 (EST)
>
>On Tue, 3 Feb 2004 at 9:26am, Eckhoff.Peter at epamail.epa.gov wrote
>
> > Instead we found a sensor/software combination where the sensor ties
> > into the
> > serial port of one of the nodes.  So far we **have** been able to
> > gracefully shut down the
> > programs that are running.  We have **not** found a way to automatically
> > turn off the
> > various cluster nodes.  That's where we need some help/suggestions.
>
>Well, your high-temperature-triggered scripts should call a 'shutdown -h
>now'.  *If* your nodes are on motherboards that support it, and *if* the
>BIOS is new enough to support it, and *if* the nodes were booted with
>'apm=power-off' on the kernel command line, then they should actually
>power off.
>
>Another option would be something like this:
>
>http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP7960
>
>With that (ungodly expensive) power strip, you can remotely cut the power
>to selected outlets.  It probably can be automated, but you'd have to
>check that.
>
>As Jim said, though, all this is great, but there really does need to be
>one final level of hardware level failsafe.  It is entirely conceivable
>that all your software monitoring could fail, and the temperature will
>still be climbing.  There needs to be a piece of hardware in the room that
>literally cuts power to the whole damn room at a set temperature that is
>(obviously) above the one that trips your software shutdown scripts.
>
>--
>Joshua Baker-LePain
>Department of Biomedical Engineering
>Duke University
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

_________________________________________________________________
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rmiguel at usmp.edu.pe  Tue Feb  3 12:06:50 2004
From: rmiguel at usmp.edu.pe (Richard Miguel)
Date: Tue, 3 Feb 2004 12:06:50 -0500
Subject: [Beowulf] about cluster's tunning
References: <200402021546.i12Fk4h24131@NewBlue.scyld.com> <200402031626.53453.grid@iki.fi>
Message-ID: <015f01c3ea78$24daccc0$1101000a@cpn.senamhi.gob.pe>

Hi.. i have a cluster with 27 nodes PIV Intel .. I have installed a model
for climate forecast. My question is how i can improvement the performance
of my cluster.. there is techniques for tunning of clusters througth
operative system or network hardware?.

thanks for yours anwers.. and suggests..

R. Miguel

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Tue Feb  3 13:12:24 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Tue, 3 Feb 2004 13:12:24 -0500 (EST)
Subject: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset
In-Reply-To: <200402031626.53453.grid@iki.fi>
Message-ID: <Pine.LNX.4.44.0402031256030.12463-100000@coffee.psychology.mcmaster.ca>

> I noticed in the Linux kernel configuration that there is support for LSI's 
> Fusion-MPT chipset. Also, it is possible to run MPI over this.

huh?  afaikt, it's just another overly expensive, overly complicated hw raid
controller.  I guess there must be a market for this kind of wrongheaded
crap, but I really don't understand it.

I guess it's just the impulse to offload whatever possible from the host;
that's an understandable idea, but you really need to look at whether it
makes sense, or whether it's just a holdover from bygone days when your 
million-dollar mainframe was actually compute-bound ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb  3 23:01:17 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 4 Feb 2004 15:01:17 +1100
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
Message-ID: <200402041501.19592.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 1 Feb 2004 04:39 pm, Andrew Wang wrote:

> 2.6 looks very promising, wondering when distributions
> will include it.

Mandrake 10 will include it (beta 2 just appeared with 2.6.2rc3 - they reckon 
the final 2.6.2 will make the release of Mdk10).

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAIG6NO2KABBYQAh8RAlCaAJ9Y5LKBLZQjGvCJCzO7ViuwZMGFiQCePiI+
Q2x2XGPUUWKYDT2nRv/5DHI=
=S0ef
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb  4 08:17:30 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 4 Feb 2004 08:17:30 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <m3ad40plgr.fsf@nammatj.nsc.liu.se>
Message-ID: <Pine.LNX.4.44.0402040811110.5431-100000@lilith.rgb.private.net>

On Tue, 3 Feb 2004, Leif Nixon wrote:

> Jim Lux <James.P.Lux at jpl.nasa.gov> writes:
> 
> > YOu need to seriously consider a "failsafe" totally automated shutdown
> > (as in chop the power when temperature gets to, say, 40C, in the
> > room)... Security might be busy (maybe there was a big problem with
> > the chiller plant catching fire or the boiler exploding.. if they're
> > directing fire engine traffic, the last thing they're going to be
> > thinking about is going over to your machine room and shutting down
> > your hardware.
> 
> Ah, that reminds me of the bad old days in industry.
> 
> The A/C went belly up the night between Friday and Saturday. That
> triggered the alarm down at Security, who promptly called the on-duty
> ventilation technicians and notified us. Excellent.
> 
> Except that the A/C alarm was never reset properly, so when the A/C
> failed again Saturday afternoon nobody noticed.
> 
> When the temperature reached 35C, the thermal kill switch triggered
> automatically. Pity that the electrician had never got around to
> actually, like, *wire* it to anything.
> 
> We arrived Monday morning to the smell of frying electronics.
> Expensive weekend, that.

Did you ever manage to track down the electrician and put bamboo slivers
underneath his toenails or something?  That one seems like it would be
worth some sort of retaliation.  A small nuclear device planted in his
front lawn.  An anonymous call to the IRS.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Wed Feb  4 17:34:21 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Wed, 4 Feb 2004 17:34:21 -0500 (EST)
Subject: [Beowulf] about cluster's tunning
In-Reply-To: <015f01c3ea78$24daccc0$1101000a@cpn.senamhi.gob.pe>
Message-ID: <Pine.LNX.4.44.0402041732220.28584-100000@boltzmann.basement-supercomputing.com>


You may want to look at the online course mentioned here:

http://www.clusterworld.com/article.pl?sid=03/11/12/1919210&mode=thread&tid=10


Doug

On Tue, 3 Feb 2004, Richard Miguel wrote:

> Hi.. i have a cluster with 27 nodes PIV Intel .. I have installed a model
> for climate forecast. My question is how i can improvement the performance
> of my cluster.. there is techniques for tunning of clusters througth
> operative system or network hardware?.
> 
> thanks for yours anwers.. and suggests..
> 
> R. Miguel
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nixon at nsc.liu.se  Wed Feb  4 15:08:04 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Wed, 04 Feb 2004 21:08:04 +0100
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <Pine.LNX.4.44.0402040811110.5431-100000@lilith.rgb.private.net> (Robert
 G. Brown's message of "Wed, 4 Feb 2004 08:17:30 -0500 (EST)")
References: <Pine.LNX.4.44.0402040811110.5431-100000@lilith.rgb.private.net>
Message-ID: <m3ektaobff.fsf@nammatj.nsc.liu.se>

"Robert G. Brown" <rgb at phy.duke.edu> writes:

> On Tue, 3 Feb 2004, Leif Nixon wrote:
>
>> When the temperature reached 35C, the thermal kill switch triggered
>> automatically. Pity that the electrician had never got around to
>> actually, like, *wire* it to anything.
>> 
>> We arrived Monday morning to the smell of frying electronics.
>> Expensive weekend, that.
>
> Did you ever manage to track down the electrician and put bamboo slivers
> underneath his toenails or something?

Sadly, no. And don't get me started on luser electricians.

"Ooops, did that feed go to the computer room?"

"Hmmm, what's on this circuit? Let's toggle it and see what reboots."
(Yes, it really happened. I don't often shout at people, but that time...)

Dropping a fine gauge wire across the main power rails was an
interesting stunt, too. Too bad he didn't even get flash burns.


I think the main point here is: If you get hold of a competent
electrician, take *real* good care of him.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From waitt at saic.com  Thu Feb  5 07:41:24 2004
From: waitt at saic.com (Tim Wait)
Date: Thu, 05 Feb 2004 07:41:24 -0500
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <m3ektaobff.fsf@nammatj.nsc.liu.se>
References: <Pine.LNX.4.44.0402040811110.5431-100000@lilith.rgb.private.net> <m3ektaobff.fsf@nammatj.nsc.liu.se>
Message-ID: <402239F4.8030304@saic.com>


> Dropping a fine gauge wire across the main power rails was an
> interesting stunt, too. Too bad he didn't even get flash burns.

How about an electrician, who, while working on your building
power conditioning, sends 180V through your 120V building,
frying everything not on UPS?

We were not amused.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Thu Feb  5 11:23:13 2004
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Thu, 5 Feb 2004 11:23:13 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <402239F4.8030304@saic.com>
Message-ID: <Pine.LNX.4.44.0402051114500.9928-100000@ra.thebes>

On Thu, 5 Feb 2004, Tim Wait wrote:

> 
> > Dropping a fine gauge wire across the main power rails was an
> > interesting stunt, too. Too bad he didn't even get flash burns.
> 
> How about an electrician, who, while working on your building
> power conditioning, sends 180V through your 120V building,
> frying everything not on UPS?
> 
> We were not amused.

Oh, give the guy a break:  Red, Black, White...it is all very confusing!

My most serious problem has been with the computer room UPS begin shutdown
accidentally, dropping a half-dozen raid servers.  Many TBs of data were
endangered.  I might be able to forgive them if it only happened once, but
I've needed to force myself to stop counting events because doing so
interferes with my ability to properly suppress homocidal urges.

Seriously, one would think that a Darwinian effect would kick in at some 
point and cull the electrical service hurd.  My observations (and others 
here as well) seem to dispute that.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From michael.gindonis at hip.fi  Thu Feb  5 12:27:20 2004
From: michael.gindonis at hip.fi (Michael Gindonis)
Date: Thu, 5 Feb 2004 19:27:20 +0200
Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs
In-Reply-To: <200402041702.i14H2Jh03108@NewBlue.scyld.com>
References: <200402041702.i14H2Jh03108@NewBlue.scyld.com>
Message-ID: <200402051927.21214.michael.gindonis@hip.fi>

On Wednesday 04 February 2004 19:02, beowulf-request at scyld.com wrote:
> From: Mark Hahn <hahn at physics.mcmaster.ca>
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset
>
> > I noticed in the Linux kernel configuration that there is support for
> > LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this.
>
> huh? ?afaikt, it's just another overly expensive, overly complicated hw
> raid controller. ?I guess there must be a market for this kind of
> wrongheaded crap, but I really don't understand it.

Hi Mark, 

When purchasing a cluster or cluster hardware, one can spend as little as 20 
Euro ( ~30 CAD) per node on interconnects to more than 1000 Euro per node for 
Myrinet or Scali.

The Fusion-MPT chipset adds about 100 Euro to the cost of a motherboard. 100 
Euro per node is much eaier to justify than 1000 Euro per node when the 
Cluster when the cluster will not be primarly running tighly coupled parallel 
problems. If the performance of MPI of Fusion-MPT is much better than than 
Ethernet with good latency, it becomes a cheap way to add flexibilty to a 
cluster.

Here is some info about it the Chipset... 

http://www.lsilogic.com/files/docs/marketing_docs/storage_stand_prod/
integrated_circuits/fusion.pdf

http://www.lsilogic.com/technologies/lsi_logic_innovations/
fusion___mpt_technology.html

There is also information in the in the linux kernel documentation about 
running MPI over this kind of interconnect.

... Mike
--
Michael Kustaa Gindonis
Helsinki Institute of Physics, Technology Program
michael.gindonis at hip.fi

http://wikihip.cern.ch/twiki/bin/view/Main/MichaelGindonis

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Feb  5 21:12:58 2004
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 5 Feb 2004 18:12:58 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402051114500.9928-100000@ra.thebes>
Message-ID: <Pine.LNX.3.96.1040205181040.5420A-100000@Maggie.Linux-Consulting.com>


hi ya

On Thu, 5 Feb 2004, Michael T. Prinkey wrote:

> On Thu, 5 Feb 2004, Tim Wait wrote:
> 
> > How about an electrician, who, while working on your building
> > power conditioning, sends 180V through your 120V building,
> > frying everything not on UPS?
> > 
> > We were not amused.
> 
> Oh, give the guy a break:  Red, Black, White...it is all very confusing!

dont forget blue and green too ... 
	- fun to disconnect the wires at the main 
	and move wires around ... while the bldg is "lit"

i think its crazy that the "nuetral" side is tied together
at the panel .. but the outlets in the building seems to work ..

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at yahoo.com  Thu Feb  5 22:07:18 2004
From: lathama at yahoo.com (Andrew Latham)
Date: Thu, 5 Feb 2004 19:07:18 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <Pine.LNX.4.44.0402051114500.9928-100000@ra.thebes>
Message-ID: <20040206030718.46964.qmail@web60310.mail.yahoo.com>

Trained Electrician Here

Worked at a HVAC system fab. plant. I wired large Air Make Up Units. I was
trained in by a very old school guy (CS degree from 1962). I watched as
turnovers in workers happened and started to notice the lower paid 
guys that would work on 480V extension cords while they where hot with 60hp
motors drawing on them! I strayed from that path for a while until a friend,
who was handed the task of managing the renovation of an old building downtown.
She had questions I had time. I ended up finding a retired electrician that
knew his stuff. I asked him how he kept up to date. His reply was that he is on
the writing committee for the National Electric Code. Needless to say I keep in
contact with him on various topics. 

Note: CatV + Lighting = PCs + Fire
Note2: Attic Access doors do not belong in the ceiling of a wiring closet.
Something about fire wanting to go upwards, maybe some of you physics guys can
explain it better.


--- "Michael T. Prinkey" <mprinkey at aeolusresearch.com> wrote:
> On Thu, 5 Feb 2004, Tim Wait wrote:
> 
> > 
> > > Dropping a fine gauge wire across the main power rails was an
> > > interesting stunt, too. Too bad he didn't even get flash burns.
> > 
> > How about an electrician, who, while working on your building
> > power conditioning, sends 180V through your 120V building,
> > frying everything not on UPS?
> > 
> > We were not amused.
> 
> Oh, give the guy a break:  Red, Black, White...it is all very confusing!
> 
> My most serious problem has been with the computer room UPS begin shutdown
> accidentally, dropping a half-dozen raid servers.  Many TBs of data were
> endangered.  I might be able to forgive them if it only happened once, but
> I've needed to force myself to stop counting events because doing so
> interferes with my ability to properly suppress homocidal urges.
> 
> Seriously, one would think that a Darwinian effect would kick in at some 
> point and cull the electrical service hurd.  My observations (and others 
> here as well) seem to dispute that.
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

=====
/------------------------------------------------------------\
Andrew Latham AKA: LATHAMA (lay-th-ham-eh) - LATHAMA.COM

Penguin Loving, Moralist Agnostic.
What Is an agnostic? -  An agnostic thinks it impossible to know
the truth in matters such as, a superbeing or the future with which 
religions are mainly concerned with. Or, if not impossible, at least 
impossible at the present time.

lathama at lathama.com - lathama at yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb  5 23:15:52 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 5 Feb 2004 23:15:52 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.3.96.1040205181040.5420A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0402052314340.8662-100000@lilith.rgb.private.net>

On Thu, 5 Feb 2004, Alvin Oga wrote:

> i think its crazy that the "nuetral" side is tied together
> at the panel .. but the outlets in the building seems to work ..

That's not crazy, that's actually rather sane.  What would be crazy
would be grounding the neutrals and/or ground wire in different places.
Can you say "ground loop"?

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Feb  5 23:26:37 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 5 Feb 2004 23:26:37 -0500 (EST)
Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs
In-Reply-To: <200402051927.21214.michael.gindonis@hip.fi>
Message-ID: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>

> When purchasing a cluster or cluster hardware, one can spend as little as 20 
> Euro ( ~30 CAD) per node on interconnects

or less, actually.  you seem to be thinking of gigabit, which is indeed a 
very attractive cluster interconnect.  otoh, there are lots of even more
loosely-coupled, non-IO-intensive apps that run just fine on 100bT.

> to more than 1000 Euro per node for 
> Myrinet or Scali.

or IB.

> The Fusion-MPT chipset adds about 100 Euro to the cost of a motherboard. 

yes, obviously.  I'd probably rather have another gigabit port or two;
bear in mind that some very elegant things can be done when each node has
multiple network connections...

really, the chipset isn't the point; it's just a $5 coprocessor.  what counts
is coming up with a physical layer, including affordable switches, and
somehow getting millions of people to make/buy them.

> 100 
> Euro per node is much eaier to justify than 1000 Euro per node when the 
> Cluster when the cluster will not be primarly running tighly coupled parallel 
> problems. 

hmm, we've already established that gigabit is much cheaper, and for
loose-coupled systems, chances are good that even 100bT will suffice.

> If the performance of MPI of Fusion-MPT is much better than than 
> Ethernet with good latency,

but does it even exist?  so far, all I can find is two lines on a marketing glossy...

> it becomes a cheap way to add flexibilty to a 
> cluster.

many things could happen; I'm not optimistic about this Fusion-MPT thing.
it seems to fly in the face of "do one thing, well".

> Here is some info about it the Chipset... 
> 
> http://www.lsilogic.com/files/docs/marketing_docs/storage_stand_prod/
> integrated_circuits/fusion.pdf

that's the vapid marketing glossy.

> http://www.lsilogic.com/technologies/lsi_logic_innovations/
> fusion___mpt_technology.html

that is even worse.

> There is also information in the in the linux kernel documentation about 
> running MPI over this kind of interconnect.

I'm not sure what "kind" here means, do you mean over scsi?  the traditional
problem with *-over-scsi (and there have been more than a couple) has been
that scsi interfaces aren't optimized for low-latency.  the bandwidth isn't
that hard, really - 320 MB/s is around Myrinet speed, and significantly
slower than IB.  OK, how about FC?  it's obviously got an advantage over U320
in that FC switches exist (oops, expensive) but it's really just a 1-2 Gb
network protocol with 2k packets.  as for the "high performance ARM-based
architecture" part, well, I must admit that I don't associate ARM with high
performance of the gigabyte-per-second sort.  

personally, I'd love to see sort of the network equivalent of the old
smart-frame-buffer idea.  practically, though, it really boils down to the
gritty details like availability of switches, choosing a physical-layer
standard, etc.  gigabit is the obvious winner there, but IB is trying hard
to get over that bump...

(Myri seems not to be very ambitious, and 10G eth seems to be straying into
a morass of tcp-offload and the like...)

regards, mark hahn.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb  5 11:36:39 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 5 Feb 2004 11:36:39 -0500 (EST)
Subject: [Beowulf] wulflogger, wulfstat's dumber cousin...
Message-ID: <Pine.LNX.4.44.0402051129090.13789-100000@ganesh.phy.duke.edu>

On request I've got a second xmlsysd client going called "wulflogger".
Wulflogger is just wulfstat with the ncurses stuff stripped off so that
it manages connections to the xmlsysd's on a cluster, reads them at some
input frequency, and writes selected status data to stdout in a simple
table.  The advantage of this tool is that it makes it really easy to
write web or script or report applications, and it also makes it very
easy to maintain a dynamic logfile of selected statistics for the entire
cluster.

This is and will likely remain a very simple tool.  The only fanciness I
envision for the future is an output descriptor format of some sort that
could be input at run time, so that a user could select output fields
and formats instead of getting the collections I've prebuilt.  That's
pretty complex (especially since wulflogger/wulfstat throttle xmlsysd to
return only the collective stats it needs) so it won't be anytime soon.

Only -t 1 is probably "finished" as output format goes, although -t 0
will probably get mostly cosmetic changes at this point as well.

Anyway, any wulfstat/xmlsysd users might want to grab it and give it a
try.  It makes it pretty simple to write a perl script to generate e.g.
rrd images or other graphical representations of the cluster -- in a
future release I'll provide sample perl scripts for parsing out fields
and doing stuff with it.

It is for the moment only available from my personal website:

 http://www.phy.duke.edu/~rgb/Beowulf/wulflogger.php

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb  3 10:38:56 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 3 Feb 2004 16:38:56 +0100 (CET)
Subject: [Beowulf] O'Reilly's _Building Linux Clusters_
In-Reply-To: <20040203125618.GA6026@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0402031633480.18515-100000@druifje.clustervision.com>

On Tue, 3 Feb 2004, Mike Eggleston wrote:

> This book from 2000 discusses building clusters from linux. I
> bought it from a discount store not because I'm going to build
> another cluster from linux, but rather because of the discussions

Mike, I bought this book almost when it came out.
Its easy to do injustice to someone with a quick email,
especially as David Spector put a lot of effort into the book,
and I haven't.
However, this OReilly is reckoned not to be one of the best.

I always recommend 'Linux Clustering' by Charles Bookman,
and 'Beowulf Cluster Computing with Linux' edited by Thomas Sterling.

Online, there is the book by Bob Brown
http://www.phy.duke.edu/brahma/Resources/beowulf_book.php

For cluster management specifically, google for Rocks and Oscar,
and there are lots of other pages.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Tue Feb  3 07:56:18 2004
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Tue, 3 Feb 2004 06:56:18 -0600
Subject: [Beowulf] O'Reilly's _Building Linux Clusters_
Message-ID: <20040203125618.GA6026@mikee.ath.cx>

This book from 2000 discusses building clusters from linux. I
bought it from a discount store not because I'm going to build
another cluster from linux, but rather because of the discussions
on cluster management. Has anyone read/implemented his approach?
What other cluster management techniques/solutions are out there?

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Tue Feb  3 10:11:32 2004
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Tue, 3 Feb 2004 10:11:32 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <OFD63F60DF.CB249E13-ON85256E2F.004BD6FB-85256E2F.0050232D@epamail.epa.gov>
References: <OFD63F60DF.CB249E13-ON85256E2F.004BD6FB-85256E2F.0050232D@epamail.epa.gov>
Message-ID: <Pine.LNX.4.58.0402031003590.23278@chaos.egr.duke.edu>

On Tue, 3 Feb 2004 at 9:26am, Eckhoff.Peter at epamail.epa.gov wrote

> Instead we found a sensor/software combination where the sensor ties
> into the
> serial port of one of the nodes.  So far we **have** been able to
> gracefully shut down the
> programs that are running.  We have **not** found a way to automatically
> turn off the
> various cluster nodes.  That's where we need some help/suggestions.

Well, your high-temperature-triggered scripts should call a 'shutdown -h 
now'.  *If* your nodes are on motherboards that support it, and *if* the 
BIOS is new enough to support it, and *if* the nodes were booted with 
'apm=power-off' on the kernel command line, then they should actually 
power off.

Another option would be something like this:

http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP7960

With that (ungodly expensive) power strip, you can remotely cut the power 
to selected outlets.  It probably can be automated, but you'd have to 
check that.

As Jim said, though, all this is great, but there really does need to be 
one final level of hardware level failsafe.  It is entirely conceivable 
that all your software monitoring could fail, and the temperature will 
still be climbing.  There needs to be a piece of hardware in the room that 
literally cuts power to the whole damn room at a set temperature that is 
(obviously) above the one that trips your software shutdown scripts.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Wed Feb  4 08:26:55 2004
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 4 Feb 2004 14:26:55 +0100 (CET)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <m3ad40plgr.fsf@nammatj.nsc.liu.se>
Message-ID: <Pine.LNX.4.33.0402041419050.32236-100000@maloney.ethz.ch>

On Tue, 3 Feb 2004, Leif Nixon wrote:
> Ah, that reminds me of the bad old days in industry.

That in turn reminds me of recent construction work around here...

Since the building with our offices and our small server room had to
be renovated, the water-based cooling system for the server room had
to be temporarily replaced with a mobile unit that pumps the heat into
the hallway. The company responsible had no better idea than to
replace the cooling system on friday afternoon -- of course without
telling anybody. As the mobile unit was much too small, the server
room had turned into sauna until monday when we discovered the
problem. Ups.

Luckily no hardware was damaged, even though the sensors in the
hard-disk drives of our server measured a maximum of 47C.

Regards,
Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H16             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From patrick at myri.com  Fri Feb  6 04:17:43 2004
From: patrick at myri.com (Patrick Geoffray)
Date: Fri, 06 Feb 2004 04:17:43 -0500
Subject: [Beowulf] Ambition
In-Reply-To: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>
Message-ID: <40235BB7.6010802@myri.com>

Ah Mark, I could not resist. Actually I could, but the list has been a 
little boring lately, so... :-)

Mark Hahn wrote:
> (Myri seems not to be very ambitious, and 10G eth seems to be straying into
> a morass of tcp-offload and the like...)

Myri is very ambitious, but you can be carefully ambitious or 
marketingly ambitious. Nobody buys an interconnect looking only at the 
specs. People try, benchmark, run their code, rationalize and buy what 
they need at the right price. If you look at what people are doing, 
there is a lot of Ethernet (Fast and GigE) because thats good enough for 
many, many codes. Then there is a smaller market for more demanding 
needs, either in term of performance or scalability, where you want to 
find the sweet spot in the performance/price curve.

Does it make sense to have 10Gb now ? I don't think so, and for several 
reasons:
* PCI-Express is not here yet: It's coming, yes, but it's not available 
in volume. Today, PCI-X supports 1 GB bidirectional, which is 4 Gb link 
speed. It's clearly the bottleneck right now. HyperTransport looks 
attractive, but there is no connector defined yet and vendors should be 
able to see a potential for volume before to commit resources for a 
native HT interface.
* 10 Gb optics are still expensive: price is going down, but there is 
not enough volume yet to drive the price down faster. Copper ? I still 
have nightmares about copper. 10 GigE will drive the technology price 
down as the 10 GigE market blossoms.
* 10 GigE is not attractive enough yet because there is no clear 
improvement at the application level. Running a naive IP stack at 10 Gb 
requires a lot of resources on the host. RDMA is just a buzword, it's 
not The Solution. Storage may leverage RDMA, but not IP and certainly 
not MPI. That's why people are working to put processing on the data 
path, but it is far from obvious so it takes some time.


Gigabit is the clear winner today and 10 GigE will be the clear winner 
tomorrow, because Ethernet is the de facto Standard. Everybody else are 
parasites, either breading on niches or marketing poop...

Patrick

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sigut at id.ethz.ch  Fri Feb  6 06:31:51 2004
From: sigut at id.ethz.ch (G.M.Sigut)
Date: Fri, 6 Feb 2004 12:31:51 +0100 (MET)
Subject: [Beowulf] about cluster's tunning
Message-ID: <200402061131.i16BVpCQ002951@grisnir.ethz.ch>

> Date: Thu, 5 Feb 2004 12:04:04 -0500
> Subject: Beowulf digest, Vol 1 #1657 - 4 msgs
...
> --__--__--
> 
> Message: 1
> Date: Wed, 4 Feb 2004 17:34:21 -0500 (EST)
> From: "Douglas Eadline, Cluster World Magazine" <deadline at linux-mag.com>
> Subject: Re: [Beowulf] about cluster's tunning
> 
> You may want to look at the online course mentioned here:
> 
> http://www.clusterworld.com/article.pl?sid=03/11/12/1919210&mode=thread&tid=10

Oh yeah. Very nice. Especially after you register (for the course)
and are told your browser is no good. There is a page which helps
you to select an approved browser - and that says:

    Unable to detect your operating system.
    
    Please select your operating system:
    
      -> Windows operating system
      -> Mac operating system.

What a pity that I am working on a Sun. (and Linux) ...

George  :-(   (is there a smiley for "I'm going to puke"?)

 >>>>>>>>>>>>>>>>>>>>>>>>>  George M. Sigut  <<<<<<<<<<<<<<<<<<<<<<<<<<<
 ETH Zurich,  Informatikdienste,  Sektion Systemdienste,  CH-8092 Zurich
 Swiss Federal Inst. of Technology,  Computing Services, System Services
 e-mail:  sigut at id.ethz.ch,  Phone: +41 1 632 5763,  Fax: +41 1 632 1022
 >>>> if my regular address does not work, try "sigut at pop.agri.ch"  <<<<

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nathan at iwantka.com  Fri Feb  6 08:52:20 2004
From: nathan at iwantka.com (Nathan Littlepage)
Date: Fri, 6 Feb 2004 07:52:20 -0600
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402052314340.8662-100000@lilith.rgb.private.net>
Message-ID: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com>

> That's not crazy, that's actually rather sane.  What would be crazy
> would be grounding the neutrals and/or ground wire in 
> different places.
> Can you say "ground loop"?
> 


Grounding loops.. truly a bane. I remember one instance where someone
wired a telecommunications switch to two different grounds. The -48v DC
power had it's own ground, and someone had grounded the chassis to a
different feed. I little lesser know fact was the lightning rod on the
tower next to the building was linked to the same ground as the power.
When lightning did strike, nothing but smoke as the charge rolled from
one ground to the other on each bay.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Feb  6 09:30:10 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 6 Feb 2004 09:30:10 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com>
Message-ID: <Pine.LNX.4.44.0402060917520.8662-100000@lilith.rgb.private.net>

On Fri, 6 Feb 2004, Nathan Littlepage wrote:

> > That's not crazy, that's actually rather sane.  What would be crazy
> > would be grounding the neutrals and/or ground wire in 
> > different places.
> > Can you say "ground loop"?
> > 
> 
> 
> Grounding loops.. truly a bane. I remember one instance where someone
> wired a telecommunications switch to two different grounds. The -48v DC
> power had it's own ground, and someone had grounded the chassis to a
> different feed. I little lesser know fact was the lightning rod on the
> tower next to the building was linked to the same ground as the power.
> When lightning did strike, nothing but smoke as the charge rolled from
> one ground to the other on each bay.

There is also a memorable instance of powered racks with incoming two
phase power split into two circuits having a polarity reversal so its
neutral wire on one circuit was 120V above chassic ground and the
neutral on the other circuit.  When somebody plugged a single unit with
components on both lines -- I think it was more like "meltdown and
fire".  Not really a ground loop, of course...

...but plenty of people have been electrocuted or fires started
because there was a lot more resistance on the neutral line to a remote
"ground" than there was to a nice, local, piece of metal.  Basically,
AFAICT there is really nothing in the NEC or CEC that is "stupid".  In
fact, I think that most of the code has undergone a near-Darwinian
selection process, as in electricians who fail to wire to code (and
often their clients) not infrequently fail to reproduce.

I don't think code is conservative ENOUGH, if anything, and like to
overwire for any given situation.  12-2 is just as easy and cheap to
work with as 14-2, for example.  10-2 unfortunately is not, but it
gives me comfort to use it whereever I can.  And I kinda wish that all
circuit breakers were GFCI by code as well, not just ones servicing
lines near water and pipes.  However, these are still available as user
choices -- code permits you to go over, just not under.

Anybody curious about wiring should definitely google for the electrical
wiring FAQ site.  It explains wiring in relatively simple terms.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Feb  6 09:23:48 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 6 Feb 2004 15:23:48 +0100 (CET)
Subject: [Beowulf] about cluster's tunning
In-Reply-To: <200402061131.i16BVpCQ002951@grisnir.ethz.ch>
Message-ID: <Pine.LNX.4.44.0402061523010.18103-100000@druifje.clustervision.com>

It just worked fine for me.
Mozilla 1.4.1  running on Fedora

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sp at scali.com  Fri Feb  6 03:52:50 2004
From: sp at scali.com (Steffen Persvold)
Date: Fri, 06 Feb 2004 09:52:50 +0100
Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs
In-Reply-To: <200402051927.21214.michael.gindonis@hip.fi>
References: <200402041702.i14H2Jh03108@NewBlue.scyld.com> <200402051927.21214.michael.gindonis@hip.fi>
Message-ID: <402355E2.2040909@scali.com>

Michael Gindonis wrote:
> On Wednesday 04 February 2004 19:02, beowulf-request at scyld.com wrote:
> 
>>From: Mark Hahn <hahn at physics.mcmaster.ca>
>>To: beowulf at beowulf.org
>>Subject: Re: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset
>>
>>
>>>I noticed in the Linux kernel configuration that there is support for
>>>LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this.
>>
>>huh?  afaikt, it's just another overly expensive, overly complicated hw
>>raid controller.  I guess there must be a market for this kind of
>>wrongheaded crap, but I really don't understand it.
> 
> 
> Hi Mark, 
> 
> When purchasing a cluster or cluster hardware, one can spend as little as 20 
> Euro ( ~30 CAD) per node on interconnects to more than 1000 Euro per node for 
> Myrinet or Scali.
> 

Michael,

I'm not entirely sure what you mean by "Scali" here. Scali is a _software_ vendor and our MPI can use all of the interconnects that are popular within HPC today (GbE, Myrinet, 
InfiniBand and SCI).

Best regards,
-- 
Steffen Persvold
Senior Software Engineer
mob. +47 92 48 45 11
tel. +47 22 62 89 50
fax. +47 22 62 89 51

Scali - http://www.scali.com
High Performance Clustering

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel.kidger at quadrics.com  Fri Feb  6 11:11:43 2004
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Fri, 6 Feb 2004 16:11:43 +0000
Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs
In-Reply-To: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>
Message-ID: <200402061611.43045.daniel.kidger@quadrics.com>

On Friday 06 February 2004 4:26 am, Mark Hahn added:
>> When purchasing a cluster or cluster hardware, one can spend as little as 20 
>> Euro ( ~30 CAD) per node on interconnects
>> to more than 1000 Euro per node for
>> Myrinet or Scali.
>
> or IB.

I guess you should add QsNet II to that list too 
(except that our cards are under e1000 - not counting switches)

Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Fri Feb  6 13:12:49 2004
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Fri, 06 Feb 2004 10:12:49 -0800
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402060917520.8662-100000@lilith.rgb.private
 .net>
References: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com>
Message-ID: <5.2.0.9.2.20040206094811.018e3688@mailhost4.jpl.nasa.gov>

At 09:30 AM 2/6/2004 -0500, Robert G. Brown wrote:
>On Fri, 6 Feb 2004, Nathan Littlepage wrote:
>
> > > That's not crazy, that's actually rather sane.  What would be crazy
> > > would be grounding the neutrals and/or ground wire in
> > > different places.
> > > Can you say "ground loop"?
> > >
> >
> >
> > Grounding loops.. truly a bane. I remember one instance where someone
> > wired a telecommunications switch to two different grounds. The -48v DC
> > power had it's own ground, and someone had grounded the chassis to a
> > different feed. I little lesser know fact was the lightning rod on the
> > tower next to the building was linked to the same ground as the power.
> > When lightning did strike, nothing but smoke as the charge rolled from
> > one ground to the other on each bay.
>
>There is also a memorable instance of powered racks with incoming two
>phase power split into two circuits having a polarity reversal so its
>neutral wire on one circuit was 120V above chassic ground and the
>neutral on the other circuit.  When somebody plugged a single unit with
>components on both lines -- I think it was more like "meltdown and
>fire".  Not really a ground loop, of course...

The classic error is wiring two sets of receptacles (e.g two racks full of 
gear) on the two sides of the 220, with neutrals properly connected, then 
having the neutral conductor fail, so the two 110V loads are in series 
across 110V.  Works fine as long as the loads are balanced, but when you 
start to turn off the loads on one side, the voltages don't balance any more.


>...but plenty of people have been electrocuted or fires started
>because there was a lot more resistance on the neutral line to a remote
>"ground" than there was to a nice, local, piece of metal.

The notorious MGM Grand fire in Las Vegas, for instance, was caused by a 
ground/neutral/resistance thing.

>  Basically,
>AFAICT there is really nothing in the NEC or CEC that is "stupid".  In
>fact, I think that most of the code has undergone a near-Darwinian
>selection process, as in electricians who fail to wire to code (and
>often their clients) not infrequently fail to reproduce.
>
>I don't think code is conservative ENOUGH, if anything, and like to
>overwire for any given situation.  12-2 is just as easy and cheap to
>work with as 14-2, for example.

Not if you buy your wire in traincarload lots when wiring a 
subdivision.  That extra copper adds up, not only in copper cost, but 
shipping, etc.  Consider that the wiring harness in an automobile weighs on 
the order of 50-100kg, and you see why they're interested in going to 
multiplex buses and 42V systems.  Ballparking for my house, which is, give 
or take 50 feet long, 20 feet wide, and 20 feet high, I'd say there are 
wiring runs comparable to, say, 3000 feet.  That's 9000 total feet of 
conductors (Black,White, Ground).  12AWG is 19.8 lb/1000 ft, 14 is 12.4 
lb/1000ft. Using AWG14 instead of AWG12 saves the contractor 70 pounds of 
copper. Copper, in huge quantities, is about $0.70/lb, so by the time it 
gets to the wire maker, it's probably a dollar a pound, so it saves the 
contractor $70 (not counting any shipping costs, etc. which could be 
another $0.10/lb or so)

$70/house is a bunch o' bux to a builder putting up 500 homes in a 
tract.  They make a profit by watching a thousand little details, each of 
which is some tiny fraction of the overall price ($70 on a 2000 ft house is 
0.035/square foot, compared to $70-100/ft construction cost).  It's much 
like automotive applications, or mass market consumer electronics, where 
they obssess about BOM (bill of materials) cost changes of pennies. (Do you 
really, really need that bypass capacitor?  Does it have to be that big? 
How many product returns will we get if we leave it out?)

This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
8.96, even after you factor in the fact that you might need more aluminum 
(because it's lower conductivity), it's still better than 2:1 weight 
difference. (Aluminum and Copper are about the same price these days, but 
copper has bigger fluctuations... back in the 70's copper was expensive and 
aluminum cheap (about 2:1))  So, 2:1 mass, 2:1 price.. changes the cost of 
the wire alone from $200/house down to $50/house...


Consider an office building with 20-30 floors, of 10,000 square feet 
each.  AWG12 vs AWG14 can be a BIG deal.  There was a lot of arguing about 
the heavier neutral wire needed in light industrial office 208Y/120 wiring 
with all the poor power factor loads (i.e. computers with lightly loaded 
switching power supplies).


>   10-2 unfortunately is not, but it
>gives me comfort to use it whereever I can.  And I kinda wish that all
>circuit breakers were GFCI by code as well, not just ones servicing
>lines near water and pipes.  However, these are still available as user
>choices -- code permits you to go over, just not under.
>
>Anybody curious about wiring should definitely google for the electrical
>wiring FAQ site.  It explains wiring in relatively simple terms.
>
>    rgb
>
>--
>Robert G. Brown                        http://www.phy.duke.edu/~rgb/
>Duke University Dept. of Physics, Box 90305
>Durham, N.C. 27708-0305
>Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From fant at pobox.com  Fri Feb  6 13:59:24 2004
From: fant at pobox.com (Andrew Fant)
Date: Fri, 6 Feb 2004 13:59:24 -0500 (EST)
Subject: [Beowulf] Gentoo for Science and Engineering
Message-ID: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net>


Hello,

I am sending this out to let people know about a new mailing-list/IRC
channel which is being organized for people interested in the use of
Gentoo Linux in Computational Science and Engineering applications.  At
this point we are just getting started, but hopefully we will grow into an
organization which presents a one-stop resource about applying Gentoo to
CS&E applications from the desktop to HPC clusters and grids.  In
addition, we will be working closely with Gentoo developers and the Core
Gentoo management to provide feedback and guidance in how it can most
closely meet the needs of technical end-users.  Anyone who has an interest
in computational science and engineering and who is interested in learning
more about Gentoo or making it a better CS&E platform is most cordially
invited to join

About Gentoo Linux:

Gentoo Linux is a source-based distribution that makes the assumption that
the end-user or administrator knows more about what the system is supposed
to do than the distribution developers.  At the core of this is a package
system known as Portage, which is similar in form to the BSD ports system.
It uses the rough equivalent of an RPM spec file (called an ebuild within
Gentoo) to automatically download source, compile the package (and any
prerequisites) with appropriate optimizations and options as defined by
the user, and install it in such a way that it can be removed or upgraded
at a later time.  Sometimes referred to as a meta-distribution by the
developers, Gentoo initially installs a minimal environment and doesn't
force the end-user to install packages and services that are unwanted or
unnecessary.  Also, no network daemons are started on a system unless an
administrator expressly starts them.

Gentoo Linux is developed by a community of developers, much as Fedora and
Debian are.  At present, there are over 6000 different ebuilds for
different system utilities and applications in Portage.  Of these, more
than 100 are classified as scientific applications, including bioperl,
octave, spice, and gromacs.  In addition, many common scientific libraries
and HPC tools are present, including Atlas, FFTW, gmp, LAM/MPI and
openpbs.  The main website can be found at http://www.gentoo.org.

Contact information:

The mailing-list is only starting now, and is rather quiet, though I hope
to change that over the next couple of weeks.  To subscribe, send a blank
email to gentoo-science-subscribe at gentoo.org.  You will get a confirmation
message back.  For those who want to just ask questions or find out more
in a real-time setting, we are on IRC at irc.freenode.org in
#gentoo-science.  Of course, questions may also be directed to me at
afant at geekmail.cc.

Thank you for your time.  Please feel free to forward this information to
other groups that you feel would be interested.  I apologize to anyone who
considered this an off-topic post.

Andy Fant

Andrew Fant      |   This    |  "If I could walk THAT way...
Molecular Geek   |   Space   |     I wouldn't need the talcum powder!"
fant at pobox.com   |    For    |          G. Marx (apropos of Aerosmith)
Boston, MA USA   |   Hire    |    http://www.pharmawulf.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Fri Feb  6 22:19:58 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sat, 7 Feb 2004 11:19:58 +0800 (CST)
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net>
Message-ID: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com>

Can you add GridEngine (SGE) and Torque (SPBS)?

The problem with OpenPBS is not only it is broken, it
is not under development these days, but also I found
that Altair is not allowing new users to download
OpenPBS. I went to its homepage today but it only
leads me to the PBSPro page.

SGE already has a FreeBSD-style "port", so adding a
port for Gentoo Linux should also be easy. And I think
SGE is more popluar these days too.

SPBS is basically PBS, but with lots of problems
fixed, and better Maui scheduler support. Also, please
support the mpiexec parallel job starter, as it allows
OpenPBS and SPBS to control slave MPI tasks.

SGE: http://gridengine.sunsource.net
SPBS: http://www.supercluster.org/projects/torque/
mpiexec: http://www.osc.edu/~pw/mpiexec/

Thx :->
Andrew.

--- Andrew Fant <fant at pobox.com> ????
> In addition, many
> common scientific libraries
> and HPC tools are present, including Atlas, FFTW,
> gmp, LAM/MPI and
> openpbs.  The main website can be found at
> http://www.gentoo.org.
> 
> Contact information:
> 
> The mailing-list is only starting now, and is rather
> quiet, though I hope
> to change that over the next couple of weeks.  To
> subscribe, send a blank
> email to gentoo-science-subscribe at gentoo.org.  You
> will get a confirmation
> message back.  For those who want to just ask
> questions or find out more
> in a real-time setting, we are on IRC at
> irc.freenode.org in
> #gentoo-science.  Of course, questions may also be
> directed to me at
> afant at geekmail.cc.
> 
> Thank you for your time.  Please feel free to
> forward this information to
> other groups that you feel would be interested.  I
> apologize to anyone who
> considered this an off-topic post.
> 
> Andy Fant
> 
> Andrew Fant      |   This    |  "If I could walk
> THAT way...
> Molecular Geek   |   Space   |     I wouldn't need
> the talcum powder!"
> fant at pobox.com   |    For    |          G. Marx
> (apropos of Aerosmith)
> Boston, MA USA   |   Hire    |   
> http://www.pharmawulf.com
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sat Feb  7 03:18:47 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sat, 7 Feb 2004 09:18:47 +0100 (CET)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <5.2.0.9.2.20040206094811.018e3688@mailhost4.jpl.nasa.gov>
Message-ID: <Pine.LNX.4.44.0402070915240.26424-100000@druifje.clustervision.com>

On Fri, 6 Feb 2004, Jim Lux wrote:

> This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> 8.96, even after you factor in the fact that you might need more aluminum 
> (because it's lower conductivity), it's still better than 2:1 weight 

Oh yes.
Lots of telephone circuits were wired in aluminium in the 1960's in the 
UK. Corrosion now means these customers have difficulty getting ADSL.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From amacater at galactic.demon.co.uk  Sat Feb  7 06:21:19 2004
From: amacater at galactic.demon.co.uk (Andrew M.A. Cater)
Date: Sat, 7 Feb 2004 11:21:19 +0000
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com>
References: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net> <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com>
Message-ID: <20040207112119.GA5120@galactic.demon.co.uk>

On Sat, Feb 07, 2004 at 11:19:58AM +0800, Andrew Wang wrote:
> Can you add GridEngine (SGE) and Torque (SPBS)?
> 
> The problem with OpenPBS is not only it is broken, it
> is not under development these days, but also I found
> that Altair is not allowing new users to download
> OpenPBS. I went to its homepage today but it only
> leads me to the PBSPro page.
> 
To clarify things a bit, I hope.

In the beginning was PBS - developed in house at NASA by engineers
who needed a Portable Batch System.  If you understand Cray NQS syntax 
and concepts it's familiar :) They left / sold to Veridian who in turn 
sold to Altair.  The original PBS was GPL or a close equivalent, if I 
understand correctly.

Altair are marketing a propietary development of PBS as PBSPro.  OpenPBS
remains available, though you have to register with Altair for download.
What they have done very recently, which is rather sneaky, is for the 
site to oblige you to register for an evaluation copy of PBSPro and 
potentially answer a questionnaire prior to providing the link to allow 
you to download OpenPBS.

OpenPBS is not under active development and PBSPro may have stalled.  
Certainly the price per node that Altair are quoting has apparently 
dropped significantly - though their salesmen are still persistent :)

The academic community and the active users forked OpenPBS to create 
Scalable PBS [SPBS] which is the name most widely known.  They've added
patches, fixes and features, though there is still an Altair licence for 
OpenPBS in there.  In the last couple of months, SPBS changed its name
initially to StORM and then to Torque.

HTH other relative newbies who may be confused by trying to find the 
product :)

Andy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sat Feb  7 09:19:21 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sat, 7 Feb 2004 22:19:21 +0800 (CST)
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <20040207112119.GA5120@galactic.demon.co.uk>
Message-ID: <20040207141921.68156.qmail@web16810.mail.tpe.yahoo.com>

--- "Andrew M.A. Cater" ????  
> Certainly the price per node that Altair are quoting
> has apparently 
> dropped significantly - though their salesmen are
> still persistent :)

Both LSF and PBSPro dropped their price significantly.
LSF used to be US $1000 per CPU is now $50, and PBSPro
used to be a few hundred dollars, and now lower than
$30.

SGE (GridEngine) 6.0 has a lot of new enchancements
and the SGE mailing lists are very popular; and SPBS
is gaining a lot of OpenPBS users' acceptance; and
Condor is adding another set of new features and then
opensource in the next few months.

See if LSF and PBSPro are going to drop their price
again in the very near future.

BTW, it is just like Linux vs M$, at the beginning,
Linux wasn't there, and M$ could charge as much as it
wanted, and then Linux slowly came, and M$ found it
harder and harder to compete with Linux.

Linux won't kill M$, and SGE/SPBS/Condor won't kill
LSF or PBSPro, not in this few years. The only thing
we will see, however, is the lower cost, more
features, and better support by Platform Computing
(LSF) and Altair (PBSPro) in order to fight back, so
users win.

Andrew.

> The academic community and the active users forked
> OpenPBS to create 
> Scalable PBS [SPBS] which is the name most widely
> known.  They've added
> patches, fixes and features, though there is still
> an Altair licence for 
> OpenPBS in there.  In the last couple of months,
> SPBS changed its name
> initially to StORM and then to Torque.
> 
> HTH other relative newbies who may be confused by
> trying to find the 
> product :)
> 
> Andy
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From br66 at HPCL.CSE.MsState.Edu  Sat Feb  7 13:48:06 2004
From: br66 at HPCL.CSE.MsState.Edu (Balaji Rangasamy)
Date: Sat, 7 Feb 2004 12:48:06 -0600 (CST)
Subject: [Beowulf] Cluster applications.
In-Reply-To: <40235BB7.6010802@myri.com>
Message-ID: <Pine.GSO.4.44.0402071237100.13084-100000@aurora.cs.msstate.edu>

Hi,
I am looking for a real high performance computing application to evaluate
the performance of a 2-node cluster running RH9.0, connected back to back
by 1GbE. Here are some characteristics of the application I am looking
for:
1 Communication intensive, should not be embarassingly parallel.
2 Should be able to stress the network to the maximum.
3 Should not be a benchmark, a real application.
4 Tunable message sizes.
5 Preferably MPI
6 Free (am I greedy?).
Can someone point out one/some application(s) with at least first 3
features in the above list? Thank you very much.
Regards,
Balaji.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Sat Feb  7 10:11:55 2004
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Sat, 07 Feb 2004 09:11:55 -0600
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402070915240.26424-100000@druifje.clustervision.com>
References: <Pine.LNX.4.44.0402070915240.26424-100000@druifje.clustervision.com>
Message-ID: <4025003B.3020105@tamu.edu>

Should we mention the problems in household wiring caused by use of 
aluminum wiring, then using breakers, outlets and fixtures designed for 
copper?  I almost lost a house in Houston to that once.  I spent the 8 
hours after the fire department left retightening all the connections 
throughout.

John Hearns wrote:
> On Fri, 6 Feb 2004, Jim Lux wrote:
> 
> 
>>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
>>8.96, even after you factor in the fact that you might need more aluminum 
>>(because it's lower conductivity), it's still better than 2:1 weight 
> 
> 
> Oh yes.
> Lots of telephone circuits were wired in aluminium in the 1960's in the 
> UK. Corrosion now means these customers have difficulty getting ADSL.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From clwang at csis.hku.hk  Fri Feb  6 20:51:51 2004
From: clwang at csis.hku.hk (Cho Li Wang)
Date: Sat, 07 Feb 2004 09:51:51 +0800
Subject: [Beowulf] CFP: 2004 IFIP International Conference on Network and Parallel
 Computing (NPC2004)
Message-ID: <402444B7.5E50DC8B@csis.hku.hk>

                            NPC2004
IFIP International Conference on Network and Parallel Computing
                      October 18-20, 2004
                         Wuhan, China
                http://grid.hust.edu.cn/npc04

****************************************************************
Call For Papers

  The goal of IFIP International Conference on Network and Parallel
Computing (NPC 2004) is to establish an international forum for
engineers and scientists to present their excellent ideas and
experiences in all system fields of network and parallel computing. NPC
2004, hosted by the Huazhong University of Science and Technology, will
be held in the city of Wuhan, China - the "Homeland of White Clouds and
the Yellow Crane." Topics of interest include, but are not limited to:

        -Grid-based Computing            
        -Cluster-based Computing 
        -Peer-to-peer Computing          
        -Network Security 
        -Ubiquitous Computing            
        -Network Architectures 
        -Advanced Web and Proxy Services 
        -Mobile Agents                   
        -Network Storage
        -Multimedia Streaming Services
        -Middleware Frameworks and Toolkits
        -Parallel & Distributed Architectures and Algorithms
        -Performance Modeling/ Evaluation 
        -Programming Environments and Tools for Parallel and 
         Distributed Platforms

  Submitted papers may not have appeared in or be considered for another
conference. Papers must be written in English and must be in PDF format.
Detailed electronic submission instructions will be posted on the
conference web site. The conference proceedings will be published by
Springer Verlag in the Lecture Notes in Computer Science (LNCS) Series
(pending).

**************************************************************************
Committee

  General Co-Chairs: 
        H. J. Siegel            Colorado State University, USA
        Guo-jie Li              Chinese Academy of Sciences, China
  Steering Committee Chair:
        Kemal Ebcioglu          IBM T.J. Watson Research Center, USA
  Program Co-Chairs:
        Guang-rong Gao          University of Delaware, USA
        Zhi-wei Xu              Chinese Academy of Sciences, China
  Program Vice-Chairs:
        Victor K. Prasanna      University of Southern California, USA
        Albert Y. Zomaya        University of Sydney, Australia
        Hai Jin                 Huazhong University of Science and
                                Technology, China
  Local Arrangement Chair:
        Song Wu                 Huazhong University of Science and
                                Technology, China

***************************************************************************
Important Dates

  Paper Submission                      March 15, 2004
  Author Notification                   May 1, 2004
  Final Camera Ready Manuscript         June 1, 2004

***************************************************************************
  
  For more information, please contact the program vice-chair at the
address below:

        Dr. Hai Jin, Professor
        Director, Cluster and Grid Computing Lab
        Vice-Dean, School of Computer
        Huazhong University of Science and Technology
        Wuhan, 430074, China
        Tel:    +86-27-87543529
        Fax:   +86-27-87557354
        e-fax:  +1-425-920-8937
        e-mail: hjin at hust.edu.cn
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Sat Feb  7 14:40:29 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sat, 7 Feb 2004 11:40:29 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402070915240.26424-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.44.0402071135330.23926-100000@twin.uoregon.edu>

On Sat, 7 Feb 2004, John Hearns wrote:

> On Fri, 6 Feb 2004, Jim Lux wrote:
> 
> > This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> > 8.96, even after you factor in the fact that you might need more aluminum 
> > (because it's lower conductivity), it's still better than 2:1 weight 
> 
> Oh yes.
> Lots of telephone circuits were wired in aluminium in the 1960's in the 
> UK. Corrosion now means these customers have difficulty getting ADSL.

yeah but that's 24-26awg twisted pair for phone a 14 12 10 or 8 awg cable 
for power have substantialy less surface area relative to it's volume.
 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Sat Feb  7 17:21:38 2004
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Sat, 7 Feb 2004 14:21:38 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling... wires - al
In-Reply-To: <4025003B.3020105@tamu.edu>
Message-ID: <Pine.LNX.3.96.1040207141437.17720B-100000@Maggie.Linux-Consulting.com>


hi ya

On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote:

> Should we mention the problems in household wiring caused by use of 
> aluminum wiring, then using breakers, outlets and fixtures designed for 
> copper?  I almost lost a house in Houston to that once.  I spent the 8 
> hours after the fire department left retightening all the connections 
> throughout.

people install wires with al or steel cores in the wire cause its
way way cheaper than copper ... copper is only needed for good conduction
on the outside of the wire

al corrosion ...  coat it with stuff :-) or wrap it w/ copper but
now you have to worry about copper corrosion 
	- house or building wiring is different animals than
	high voltage transmission lines too

aluminum "pixie" dust does whacky things ..

c ya 
alvin

- i've always wondered why people put massive heatsinks on top of the
  cpu ... air will have a harder time to cool a big mass of metal
  as opposed to cooling a smaller piece of metal  or cooling it some
  other way ..
	- problems of getting the heat out of the cpu ( 0.25"sq metal lid)
	- problems of getting the heat out of the cpu heatsink

	- blowing air down onto the heatsink is silly too .. left over
	from the 20-30 yr old ideas i guess

> 
> John Hearns wrote:
> > On Fri, 6 Feb 2004, Jim Lux wrote:
> > 
> > 
> >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> >>8.96, even after you factor in the fact that you might need more aluminum 
> >>(because it's lower conductivity), it's still better than 2:1 weight 
> > 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Sat Feb  7 21:36:50 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Sat, 7 Feb 2004 21:36:50 -0500 (EST)
Subject: [Beowulf] Cluster applications.
In-Reply-To: <Pine.GSO.4.44.0402071237100.13084-100000@aurora.cs.msstate.edu>
Message-ID: <Pine.LNX.4.44.0402072132240.28584-100000@boltzmann.basement-supercomputing.com>


Check out:

http://www.clusterworld.com/article.pl?sid=03/03/17/1838236&mode=thread&tid=8

Also, the "Right Stuff" Column in ClusterWorld addresses some of these 
issues. To see the a small summary of the columns look at:

http://www.clusterworld.com/issues.shtml


Doug


On Sat, 7 Feb 2004, Balaji Rangasamy wrote:

> Hi,
> I am looking for a real high performance computing application to evaluate
> the performance of a 2-node cluster running RH9.0, connected back to back
> by 1GbE. Here are some characteristics of the application I am looking
> for:
> 1 Communication intensive, should not be embarassingly parallel.
> 2 Should be able to stress the network to the maximum.
> 3 Should not be a benchmark, a real application.
> 4 Tunable message sizes.
> 5 Preferably MPI
> 6 Free (am I greedy?).
> Can someone point out one/some application(s) with at least first 3
> features in the above list? Thank you very much.
> Regards,
> Balaji.
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From klamman.gard at telia.com  Sun Feb  8 03:50:22 2004
From: klamman.gard at telia.com (Per Lindstrom)
Date: Sun, 08 Feb 2004 09:50:22 +0100
Subject: [Beowulf] Cluster applications.
In-Reply-To: <Pine.GSO.4.44.0402071237100.13084-100000@aurora.cs.msstate.edu>
References: <Pine.GSO.4.44.0402071237100.13084-100000@aurora.cs.msstate.edu>
Message-ID: <4025F84E.4040706@telia.com>

Hi Balaji,

May I suggest the use of the GNU FEA-software CALCULIX, http://calculix.de/

When will it be up to you to decide how demanding problem your cluster 
have to solve.

Best regards
Per Lindstrom

Balaji Rangasamy wrote:

>Hi,
>I am looking for a real high performance computing application to evaluate
>the performance of a 2-node cluster running RH9.0, connected back to back
>by 1GbE. Here are some characteristics of the application I am looking
>for:
>1 Communication intensive, should not be embarassingly parallel.
>2 Should be able to stress the network to the maximum.
>3 Should not be a benchmark, a real application.
>4 Tunable message sizes.
>5 Preferably MPI
>6 Free (am I greedy?).
>Can someone point out one/some application(s) with at least first 3
>features in the above list? Thank you very much.
>Regards,
>Balaji.
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Feb  8 10:52:44 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 8 Feb 2004 10:52:44 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <4025003B.3020105@tamu.edu>
Message-ID: <Pine.LNX.4.44.0402080807550.13314-100000@lilith.rgb.private.net>

On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote:

> Should we mention the problems in household wiring caused by use of 
> aluminum wiring, then using breakers, outlets and fixtures designed for 
> copper?  I almost lost a house in Houston to that once.  I spent the 8 
> hours after the fire department left retightening all the connections 
> throughout.

You mean the part where aluminum turns out to burn like magnesium,
incredibly hot and impossible to quench?

I would under no circumstances put aluminum wiring in, well, anything.
Certainly not anything where a serious overload or arcing situation
could occur, which is nearly anything.

I seem to remember the government finding out about aluminum the hard
way with some of their armored fighting vehicles a decade or two ago.
When struck with a hot enough round, the armor itself just burned right
up.

   rgb

> 
> John Hearns wrote:
> > On Fri, 6 Feb 2004, Jim Lux wrote:
> > 
> > 
> >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> >>8.96, even after you factor in the fact that you might need more aluminum 
> >>(because it's lower conductivity), it's still better than 2:1 weight 
> > 
> > 
> > Oh yes.
> > Lots of telephone circuits were wired in aluminium in the 1960's in the 
> > UK. Corrosion now means these customers have difficulty getting ADSL.
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nathan at iwantka.com  Sun Feb  8 14:10:43 2004
From: nathan at iwantka.com (Nathan Littlepage)
Date: Sun, 08 Feb 2004 13:10:43 -0600
Subject: [Beowulf] DC Powered Chassis
Message-ID: <402689B3.9070104@iwantka.com>

With all the power talk on the 'HVAC and Room Cooling' subject. I've 
been looking for 1 or 2u chassis that support -48v DC as the main power 
source. Does anyone know of someone that manufactures these?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Mon Feb  9 00:24:28 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sun, 8 Feb 2004 21:24:28 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402080807550.13314-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0402082058470.8682-100000@twin.uoregon.edu>

On Sun, 8 Feb 2004, Robert G. Brown wrote:

> On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote:
> 
> > Should we mention the problems in household wiring caused by use of 
> > aluminum wiring, then using breakers, outlets and fixtures designed for 
> > copper?  I almost lost a house in Houston to that once.  I spent the 8 
> > hours after the fire department left retightening all the connections 
> > throughout.
> 
> I seem to remember the government finding out about aluminum the hard
> way with some of their armored fighting vehicles a decade or two ago.
> When struck with a hot enough round, the armor itself just burned right
> up.

armor is supposed to burn. several armor desgins including that of the 
american abrams battle tank, are desgined to ablate under pressure from 
kinetic energy weapons. british chobham type composite armor, boron 
carbide, or aluminum or some conbination of those and others, protect 
larger armored vehicles from depleted uranium and tungsten sabot 
munitions.

depleted uranium has similar or better pyrophoric properties (igniting at
500c and burning at 2000c) and the added nastyness of being a toxic heavy
metal... in general taking a 10kg urunium slug, accelerating it to
15,000fps and slamming it into another object will cause a fire. It has 
been used in both armor and projectiles for more or less the same reasons.
 
>    rgb
> 
> > 
> > John Hearns wrote:
> > > On Fri, 6 Feb 2004, Jim Lux wrote:
> > > 
> > > 
> > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> > >>8.96, even after you factor in the fact that you might need more aluminum 
> > >>(because it's lower conductivity), it's still better than 2:1 weight 
> > > 
> > > 
> > > Oh yes.
> > > Lots of telephone circuits were wired in aluminium in the 1960's in the 
> > > UK. Corrosion now means these customers have difficulty getting ADSL.
> > > 
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> > 
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Sun Feb  8 17:49:18 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sun, 8 Feb 2004 14:49:18 -0800 (PST)
Subject: [Beowulf] DC Powered Chassis
In-Reply-To: <402689B3.9070104@iwantka.com>
Message-ID: <Pine.LNX.4.44.0402081449120.5461-100000@twin.uoregon.edu>

http://www.rackmountpro.com/productsearch.cfm?catid=118

On Sun, 8 Feb 2004, Nathan Littlepage wrote:

> With all the power talk on the 'HVAC and Room Cooling' subject. I've 
> been looking for 1 or 2u chassis that support -48v DC as the main power 
> source. Does anyone know of someone that manufactures these?
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Feb  9 13:13:16 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 9 Feb 2004 13:13:16 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402082058470.8682-100000@twin.uoregon.edu>
Message-ID: <Pine.LNX.4.44.0402091252230.5606-100000@ganesh.phy.duke.edu>

On Sun, 8 Feb 2004, Joel Jaeggli wrote:

> On Sun, 8 Feb 2004, Robert G. Brown wrote:
> 
> > On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote:
> > 
> > > Should we mention the problems in household wiring caused by use of 
> > > aluminum wiring, then using breakers, outlets and fixtures designed for 
> > > copper?  I almost lost a house in Houston to that once.  I spent the 8 
> > > hours after the fire department left retightening all the connections 
> > > throughout.
> > 
> > I seem to remember the government finding out about aluminum the hard
> > way with some of their armored fighting vehicles a decade or two ago.
> > When struck with a hot enough round, the armor itself just burned right
> > up.
> 
> armor is supposed to burn. several armor desgins including that of the 

"supposed to burn"?  Where to "burn" is to release additional heat
energy into an already hot environment in a self-sustaining way?  Ouch.

Supposed to ablate and dissipate energy (hopefully in non-destructive
ways on the outside of the vehicle) sure, but naive aluminum designs can
be deadly and at various points in the past have been seriously
mistrusted by the military personnel supposedly being protected.  See
e.g.

  http://www.g2mil.com/aluminum.htm

where they recall the early bradley flaws, and argue that the HMS
Sheffield (sunk by a single exocet missle in the falklands war) went
down in large measure because it was an aluminum ship, where steel ships
have been hit by more than one exocet and survived.  The site also
presents a counterpoint that argues that aluminum isn't THAT bad a
choice (as near as I can make out) provided that all one wishes to stop
is "small arms fire".  It very quickly loses out to steel, though, in a
variety of measures when faced with RPG's or things that actually cause
fires, as it is a good conductor of heat and quickly spreads a fire and
structurally collapses at a relatively low temperature.  The aluminum
Bradley did tolerably in the first gulf war, losing only 3 to enemy fire
(compared to 17 lost to friendly fire from Abrams tanks) but it does
have provisions for additional armor plates of steel to be added on
outside and I imagine that it used them.  Most of what it faced in the
gulf war OTHER than our Abrams was its forte -- small arms fire.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Sun Feb  8 17:09:36 2004
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Sun, 8 Feb 2004 14:09:36 -0800 (PST)
Subject: [Beowulf] DC Powered Chassis
In-Reply-To: <402689B3.9070104@iwantka.com>
Message-ID: <Pine.LNX.3.96.1040208140456.23039A-100000@Maggie.Linux-Consulting.com>


hi ya nathan

On Sun, 8 Feb 2004, Nathan Littlepage wrote:

> With all the power talk on the 'HVAC and Room Cooling' subject. I've 
> been looking for 1 or 2u chassis that support -48v DC as the main power 
> source. Does anyone know of someone that manufactures these?

some collection of "these"

http://www.Linux-1U.net/PowerSupp/DC
http://www.Linux-1U.net/PowerSupp/12v


problem with +12v or -48v dc inputs is you need to provide
enough current to these "dc power supply"
	- at 12v .. we were calculating about 400A ...
	since we estimate 4A per mb and 100 mb per rack
	and double it or 50% for keeping the powersupply
	reasonably within its normal lifespan ( mtbf )

fun stuff...
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nathan at iwantka.com  Sun Feb  8 17:31:40 2004
From: nathan at iwantka.com (Nathan Littlepage)
Date: Sun, 08 Feb 2004 16:31:40 -0600
Subject: [Beowulf] DC Powered Chassis
In-Reply-To: <Pine.LNX.3.96.1040208140456.23039A-100000@Maggie.Linux-Consulting.com>
References: <Pine.LNX.3.96.1040208140456.23039A-100000@Maggie.Linux-Consulting.com>
Message-ID: <4026B8CC.7050102@iwantka.com>

Thanks!

Alvin Oga wrote:

>hi ya nathan
>
>On Sun, 8 Feb 2004, Nathan Littlepage wrote:
>
>  
>
>>With all the power talk on the 'HVAC and Room Cooling' subject. I've 
>>been looking for 1 or 2u chassis that support -48v DC as the main power 
>>source. Does anyone know of someone that manufactures these?
>>    
>>
>
>some collection of "these"
>
>http://www.Linux-1U.net/PowerSupp/DC
>http://www.Linux-1U.net/PowerSupp/12v
>
>
>problem with +12v or -48v dc inputs is you need to provide
>enough current to these "dc power supply"
>	- at 12v .. we were calculating about 400A ...
>	since we estimate 4A per mb and 100 mb per rack
>	and double it or 50% for keeping the powersupply
>	reasonably within its normal lifespan ( mtbf )
>
>fun stuff...
>alvin
>
>
>  
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From clwang at csis.hku.hk  Sat Feb  7 20:24:01 2004
From: clwang at csis.hku.hk (clwang at csis.hku.hk)
Date: Sun,  8 Feb 2004 09:24:01 +0800
Subject: [Beowulf] CFP: GCC2004 (3rd International Conference on Grid and Cooperative Computing)
Message-ID: <1076203441.40258fb1bf1c0@intranet.csis.hku.hk>

----------------------------------------------------------------
                      Call for Papers

3rd International Conference on Grid and Cooperative Computing 
               (http://grid.hust.edu.cn/gcc2004)
               October 21-24 2004, Wuhan, China
-----------------------------------------------------------------

The Third International Conference on Grid and Cooperative Computing
(GCC 2004) will be held from Oct. 21 to 24, 2004 in Wuhan. It will serve
as a forum to present current work by researchers in the grid computing
and cooperative computing area. GCC 2004 is the follow-up of the highly
successful GCC 2003 in Shanghai, China, and GCC 2002 in Sanya, China.
Wuhan is rich in culture and history. Its civilization began about 3,500
years ago, and is of great importance in Chinese culture, economy and
politics. It shares the same culture of Chu, formed since the ancient
Kingdom of Chu more than 2,000 years ago. Numerous natural and
artificial attractions and scenic spots are scattered around. Famous
scenic spots in Wuhan include Yellow Crane Tower, Guiyuan Temple, East
Lake, and Hubei Provincial Museum with the famous chimes playing the
music of different styles.
GCC 2004 will emphasize the design and analysis of grid computing and
cooperative computing and their scientific, engineering, and commercial
applications. In addition to technical sessions of contributed paper
presentations, the conference will have several workshops, a poster
session, tutorials, and vendor exhibitions.
GCC 2004 invites the submission of papers in grid computing, Web
services and cooperative computing, including theory and applications.
The conference is soliciting only original high quality research papers
on all above aspects.
The main topics of interest include, but not limited to: 

        -Resource Grid and Service Grid
        - Information Grid and Knowledge Grid
        - Grid Monitoring, Management and Organization Tools
        - Grid Portal
        - Grid Service, Web Service and their QoS
        - Service Orchestration
        - Grid Middleware and Toolkits
        - Grid Security
        - Innovative Grid Applications
        - Advanced Resource Reservation and Scheduling
        - Performance Evaluation and Modeling
        - Computer-Supported Cooperative Work
        - P2P Computing, automatic computing and so on
        - Meta-information Management
        - Software glue Technologies

PAPER SUBMISSION

Paper submissions must present original, unpublished research or
experiences. Late-breaking advances and work-in-progress reports from
ongoing research are also encouraged to be submitted to GCC 2004.
All papers submitted to this conference will be peer-reviewed and
accepted on the basis of their scientific merit and relevance to the
conference topics. Accepted papers will be published as conference
proceedings, published by Springer-Verlag in the Lecture Notes in
Computer Science (LNCS) Series (Pending).
It is also planned that a selection of papers from GCC 2004 proceedings
will be extended and published in international journals.

WORKSHOPS
Proposals are solicited for workshops to be held in conjunction with the
main conference. Interested individuals should submit a proposal by
March 1, 2004 to the Workshop Chair.

TUTORIALS
Proposals are solicited for tutorials to be held at the conference.
Interested individuals should submit a proposal by May 30,2004. The
proposal should include a brief description of the intended audience, a
lecture outline, and a vita for each lecturer.

EXHIBITION/VENDOR PRESENTATIONS
Companies and R&D laboratories are encouraged to present their exhibits
at the conference. In addition, a full day of vendor presentations is
planned.

IMPORTANT DATES
March           1, 2004                 Workshop Proposal Due
May             1, 2004                 Conference Paper Due
May             30, 2004                Tutorial Proposal Due
June            1, 2004                 Notification of
Acceptance/Rejection    
June            30, 2004                Camera-Ready Paper Due

ORGANIZATION
CONFERENCE Co-CHAIRS
Xicheng Lu, National University of Defense Technology, China
Andrew A. Chien, University of California at San Diego, USA.

PROGRAM Co-CHAIRS
Hai Jin, Huazhong University of Science and Technology, China.
hjin at hust.edu.cn
Yi Pan, Georgia State University, USA. pan at cs.gsu.edu

WORKSHOP CHAIR
Nong Xiao, National University of Defense Technology, China.
xiao-n at vip.sina.com, Xiao_n at sina.com.

Publicity Chair
Minglu Li, Shanghai Jiao Tong University, China. li-ml at cs.sjtu.edu.cn

Tutorial Chair
Dan Meng, Institute of Computing Technology, Chinese Academy of
Sciences, China. md at ncic.ac.cn

Poster Chair
Song Wu, Huazhong University of Science and Technology, China.
wusong at mail.hust.edu.cn

LOCAL ARRANGEMENT CHAIR
Pingpeng Yuan, Huazhong University of Science and Technology, China.
ppyuan at mail.hust.edu.cn.

Program Committee Members(More to be added)

Mark Baker (University of Portsmouth, UK)
Rajkumar Buyya (The University of Melbourne, Australia)
Wentong Cai (Nanyang Technological University, Singapore)
Jiannong Cao (Hong Kong Polytechnic University, Hong Kong)
Guihai Chen (Nanjing University, China)
Minyi Guo (University of Aizu, Japan)
Chun-Hsi Huang (University of Connecticut, USA)
Weijia Jia (City University of Hong Kong, Hong Kong)
Francis Lau (The University of Hong Kong, Hong Kong)
Keqin Li (State University of New York, USA)
Qing Li (City University of Hong Kong, Hong Kong)
Lionel Ni (Hong Kong University of Science and Technology, Hong Kong)
Hong Shen (Japan Advanced Institute of Science and Technology, Japan)
Yuzhong Sun (Institute of Computing Technology, CAS, China)
Huaglory Tianfield (Glasgow Caledonian University, UK)
Cho-Li Wang (The University of Hong Kong, Hong Kong)
Jie Wu (Florida Atlantic University, USA)
Cheng-Zhong Xu (Wayne State University, USA)
Laurence Tianruo Yang (St. Francis Xavier University, Canada)
Qiang Yang (Hong Kong University of Science & Technology, Hong Kong)
Yao Zheng (Zhejiang University, China)
Wanlei Zhou (Deakin University, Australia)
Jianping Zhu (The University of Akron, USA) 

For more information, please visit conference web site at: 
http://grid.hust.edu.cn/gcc2004.


-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgoornaden at intnet.mu  Mon Feb  9 10:34:28 2004
From: rgoornaden at intnet.mu (roudy)
Date: Mon, 9 Feb 2004 19:34:28 +0400
Subject: [Beowulf] parallel program
References: <200402081701.i18H1qh28395@NewBlue.scyld.com>
Message-ID: <001601c3ef22$35dd7be0$ab007bca@roudy>

Hello Beowulf people,
I have completed to build my cluster. I have have already run linpack on my
cluster and it's performance is fine.
Can someone help me by giving me some very big programs to run on my cluster
to compare the performance with a stand-alone computer.
Thanks
Roudy

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgoornaden at intnet.mu  Mon Feb  9 10:34:28 2004
From: rgoornaden at intnet.mu (roudy)
Date: Mon, 9 Feb 2004 19:34:28 +0400
Subject: [Beowulf] parallel program
References: <200402081701.i18H1qh28395@NewBlue.scyld.com>
Message-ID: <001601c3ef22$35dd7be0$ab007bca@roudy>

Hello Beowulf people,
I have completed to build my cluster. I have have already run linpack on my
cluster and it's performance is fine.
Can someone help me by giving me some very big programs to run on my cluster
to compare the performance with a stand-alone computer.
Thanks
Roudy

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcookeman at yahoo.com  Mon Feb  9 18:00:47 2004
From: jcookeman at yahoo.com (Justin Cook)
Date: Mon, 9 Feb 2004 15:00:47 -0800 (PST)
Subject: [Beowulf] parallel program
In-Reply-To: <001601c3ef22$35dd7be0$ab007bca@roudy>
Message-ID: <20040209230047.89106.qmail@web60510.mail.yahoo.com>

http://www.mpa-garching.mpg.de/galform/gadget/index.shtml

There is a serial and parallel version.  Have fun...

Justin

--- roudy <rgoornaden at intnet.mu> wrote:
> Hello Beowulf people,
> I have completed to build my cluster. I have have
> already run linpack on my
> cluster and it's performance is fine.
> Can someone help me by giving me some very big
> programs to run on my cluster
> to compare the performance with a stand-alone
> computer.
> Thanks
> Roudy
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Feb  9 17:50:45 2004
From: becker at scyld.com (Donald Becker)
Date: Mon, 9 Feb 2004 17:50:45 -0500 (EST)
Subject: [Beowulf] BWBUG meeting Tuesday Feb 10 at 3:00, Platform Computing
Message-ID: <Pine.LNX.4.44.0402091745480.2869-100000@localhost.localdomain>


        --- Note that this meeting is in VA not Maryland! --


Date: February 10, 2004
Time: 3:00 PM  (doors open at 2:30)
Location: Northrop Grumman IT, McLean Virginia


The folks from Platform Computing will be speaking about their LSF scheduler
and Grid Computing for Beowulf.

This event is sponsored by the Baltimore-Washington Beowulf Users Group
(BWBUG) and will be held at Northrop Grumman Information Technology 7575
Colshire Drive,  2nd floor, McLean Virginia.
Please register on line at http://bwbug.org
As usual there will be door prizes, food and refreshments.

Need to be a member?: No ( guests are welcome )
Parking: Free


T. Michael Fitzmaurice, Jr.
8110 Gatehouse Road, Suite 400W
Falls Church, VA 22042
703-205-3132 office
240-475-7877 cell

Email  michael.fitzmaurice at ngc.com

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Mon Feb  9 18:25:55 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 10 Feb 2004 10:25:55 +1100
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402091252230.5606-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0402091252230.5606-100000@ganesh.phy.duke.edu>
Message-ID: <200402101025.57234.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 10 Feb 2004 05:13 am, Robert G. Brown wrote:

> argue that the HMS Sheffield (sunk by a single exocet missle in the
> falklands war) went down in large measure because it was an aluminum ship

A quick correction, the Sheffield was an all steel ship, as I believe were all 
the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was 
not brought under control because the exocet (which failed to explode) took 
out a large chunk of the fire fighting system. She finally sank under tow on 
May 10th 1982, six days after being hit.

The sci.military.naval FAQ has an excellent section on the role of aluminium 
in the loss of warships which looks at this urban legend, and gives real 
examples when aluminium did cause the loss, at:

	http://www.hazegray.org/faq/smn6.htm#F7

as well as a section on the Type 42's at:

	http://www.hazegray.org/navhist/rn/destroyers/type42/

cheers,
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD4DBQFAKBcDO2KABBYQAh8RAq2vAJdRfrlHek12hced85HGV0z1nWbYAJ9GJegr
FBxjHUczDti0OXNKX5VoKA==
=PA8t
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Feb  9 18:31:18 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 9 Feb 2004 18:31:18 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <200402101025.57234.csamuel@vpac.org>
Message-ID: <Pine.LNX.4.44.0402091830390.25916-100000@lilith.rgb.private.net>

On Tue, 10 Feb 2004, Chris Samuel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Tue, 10 Feb 2004 05:13 am, Robert G. Brown wrote:
> 
> > argue that the HMS Sheffield (sunk by a single exocet missle in the
> > falklands war) went down in large measure because it was an aluminum ship
> 
> A quick correction, the Sheffield was an all steel ship, as I believe were all 
> the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was 
> not brought under control because the exocet (which failed to explode) took 
> out a large chunk of the fire fighting system. She finally sank under tow on 
> May 10th 1982, six days after being hit.

I stand corrected.  Obviously one can't believe everything one
googles...;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Mon Feb  9 22:42:32 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Tue, 10 Feb 2004 11:42:32 +0800 (CST)
Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
Message-ID: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>

>From comp.arch: "One of the things that the version
8.0 of the Intel compiler included was an
"Intel-specific" flag."

But looks like the purpose is to slow down AMD:

http://groups.google.ca/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&group=comp.arch&selm=a13e403a.0402091438.14018f5a%40posting.google.com


If intel releases 64-bit x86 CPUs and compilers, then
AMD may get even better benchmarks results.

Again, no matter how pretty the benchmarks results
look, in the end we still need to run on the real
system. So, what's the point of having benchmarks?

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb 10 03:10:39 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 10 Feb 2004 09:10:39 +0100 (CET)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <200402101025.57234.csamuel@vpac.org>
Message-ID: <Pine.LNX.4.44.0402100905270.26720-100000@druifje.clustervision.com>

On Tue, 10 Feb 2004, Chris Samuel wrote:

> A quick correction, the Sheffield was an all steel ship, as I believe were all 
> the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was 
> not brought under control because the exocet (which failed to explode) took 
> out a large chunk of the fire fighting system. She finally sank under tow on 
> May 10th 1982, six days after being hit.

Steering the argument back to computers :-)
I saw a documentary about the Sheffield once.
Two ships were sent out as 'goalkeepers',
the Sheffield and the smaller Broadsword.
The Sheffield had a longer range missile system, the Broadsword
a short range one (or other way around).
During a period of vulnerability (can;t remember the exact reason)
the Broadsword had to reboot its ageing fire control computer.
I think build by Ferranti. (No slur intended on their fine engineers,
but the thing was old at the time).


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wardwe at navseadn.navy.mil  Tue Feb 10 16:21:01 2004
From: wardwe at navseadn.navy.mil (Ward William E DLDN)
Date: Tue, 10 Feb 2004 16:21:01 -0500
Subject: [Beowulf] Intel Compiler cheating against non-Intel CPUs?
Message-ID: <AF67AB108F16D21196F600805F19516D2452E2A7@phdnex01.navseadn.navy.mil>

Has anyone seen this yet?

Any comments or discussion?

>From the message, it looks like the Intel Compilers
are cheating against SSE and SSE2 capable non-Intel
CPUs (ie, A64 especially).  

http://groups.google.ca/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=a13e403a.040
2091438.14018f5a%40posting.google.com&rnum=1

R/William Ward
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mbanck at gmx.net  Tue Feb 10 13:01:16 2004
From: mbanck at gmx.net (Michael Banck)
Date: Tue, 10 Feb 2004 19:01:16 +0100
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <20040207112119.GA5120@galactic.demon.co.uk>
References: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net> <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> <20040207112119.GA5120@galactic.demon.co.uk>
Message-ID: <20040210180116.GA27872@blackbird.oase.mhn.de>

On Sat, Feb 07, 2004 at 11:21:19AM +0000, Andrew M.A. Cater wrote:
> On Sat, Feb 07, 2004 at 11:19:58AM +0800, Andrew Wang wrote:
> > Can you add GridEngine (SGE) and Torque (SPBS)?
> > 
> > The problem with OpenPBS is not only it is broken, it
> > is not under development these days, but also I found
> > that Altair is not allowing new users to download
> > OpenPBS. I went to its homepage today but it only
> > leads me to the PBSPro page.
> > 
> To clarify things a bit, I hope.
> 
> In the beginning was PBS - developed in house at NASA by engineers
> who needed a Portable Batch System.  If you understand Cray NQS syntax 
> and concepts it's familiar :) They left / sold to Veridian who in turn 
> sold to Altair.  The original PBS was GPL or a close equivalent, if I 
> understand correctly.
> 
> Altair are marketing a propietary development of PBS as PBSPro.  OpenPBS
> remains available, though you have to register with Altair for download.
> What they have done very recently, which is rather sneaky, is for the 
> site to oblige you to register for an evaluation copy of PBSPro and 
> potentially answer a questionnaire prior to providing the link to allow 
> you to download OpenPBS.
> 
> OpenPBS is not under active development and PBSPro may have stalled.  
> Certainly the price per node that Altair are quoting has apparently 
> dropped significantly - though their salesmen are still persistent :)
> 
> The academic community and the active users forked OpenPBS to create 
> Scalable PBS [SPBS] which is the name most widely known.  They've added
> patches, fixes and features, though there is still an Altair licence for 
> OpenPBS in there.  In the last couple of months, SPBS changed its name
> initially to StORM and then to Torque.

Thanks for the clarification. Does anybody know whether Torque is
considered to be conforming to the Open Source Definition[1]?

In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software
License', which seems to prohibit commercial distribution, making it
non-free unfortunately. Is there some other fork of PBS with a true Open
Source license perhaps?


thanks,


Michael

[1] http://www.opensource.org/docs/definition.php
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb 10 18:00:05 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 11 Feb 2004 10:00:05 +1100
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <20040210180116.GA27872@blackbird.oase.mhn.de>
References: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net> <20040207112119.GA5120@galactic.demon.co.uk> <20040210180116.GA27872@blackbird.oase.mhn.de>
Message-ID: <200402111000.08919.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote:

> Thanks for the clarification. Does anybody know whether Torque is
> considered to be conforming to the Open Source Definition[1]?
>
> In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software
> License', which seems to prohibit commercial distribution, making it
> non-free unfortunately. Is there some other fork of PBS with a true Open
> Source license perhaps?

My understanding is that they cannot alter the license as they have inherited 
that from the original OpenPBS sources, and as they do not hold all the 
copyrights to the code it cannot be changed unless Altair can be persuaded.

My understanding is that the SuperCluster people picked the 2.3.12 version as 
a starting point as that was the most recent with the most liberal license 
(i.e. others could fork development from it).

I've CC'd this to the SuperCluster folks so they can comment and correct.

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU
QwJlxOBwfLiUT7Y543RwiIY=
=xTbA
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb 10 17:44:17 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 11 Feb 2004 09:44:17 +1100
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402100905270.26720-100000@druifje.clustervision.com>
References: <Pine.LNX.4.44.0402100905270.26720-100000@druifje.clustervision.com>
Message-ID: <200402110944.21802.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 10 Feb 2004 07:10 pm, John Hearns wrote:

> During a period of vulnerability (can;t remember the exact reason)
> the Broadsword had to reboot its ageing fire control computer.
> I think build by Ferranti. (No slur intended on their fine engineers,
> but the thing was old at the time).

I'm not aware of that one, but on a similar vein there was the widespread 
failure of the Patriot systems during the first Gulf War, including the 
attack on the barracks at Dhahran where 28 were killed.

This was caused by the system truncating the values of the clock when written 
to memory, which over a long period of operation resulted in the system 
dismissing incoming missiles as false alarms.

	http://shelley.toich.net/projects/CS201/patriot.html

However, this is starting to sound more like the RISKS digest than Beowulf, so 
I'll leave it there.

Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKV7EO2KABBYQAh8RApXjAJ9Gil07Z/XekN3XDSturEu2KihedQCfXBA7
aUUMVqTZuHfQ5RKsKGwnuNw=
=+9RK
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at pathscale.com  Tue Feb 10 18:52:41 2004
From: lindahl at pathscale.com (Greg Lindahl)
Date: Tue, 10 Feb 2004 15:52:41 -0800
Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>; from andrewxwang@yahoo.com.tw on Tue, Feb 10, 2004 at 11:42:32AM +0800
References: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>
Message-ID: <20040210155241.A29026@fileserver.internal.keyresearch.com>

On Tue, Feb 10, 2004 at 11:42:32AM +0800, Andrew Wang wrote:

> Again, no matter how pretty the benchmarks results
> look, in the end we still need to run on the real
> system. So, what's the point of having benchmarks?

There isn't much point at staring at a benchmark that isn't at all
relevant to how you're using the system -- for example, a SPECcpu
score with the Intel compiler in 32-bit mode isn't going to tell you
much about an AMD64 app in 64-bit mode.

If I remember correctly, a guy at Intel published a paper about a
feedback optimization technique related to irregular strides that got
a 22% improvement in mcf. When I get back to the office in a couple of
days, I'll post a reference. And no, it's not at all Intel-specific.

-- greg
(posting from Paris. I should be asleep!)


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb 10 19:32:47 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 11 Feb 2004 11:32:47 +1100
Subject: Fwd: Re: [Beowulf] Gentoo for Science and Engineering
Message-ID: <200402111132.49119.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Forwarded at the request of SuperCluster.org

- ----------  Forwarded Message  ----------

Subject: Re: [Beowulf] Gentoo for Science and Engineering
Date: Wed, 11 Feb 2004 11:55 am
From: help at supercluster.org
To: Chris Samuel <csamuel at vpac.org>
Cc: beowulf at beowulf.org

Chris,

  Thanks for the cc.  You will probably need to forward this message to
beowulf as I don't think we are registered.  OpenPBS 2.3.12 was selected
because its license did allow anyone to modify/distribute the code for any
reason with the only conditions being that the license be included and the
original creators acknowledged.

  To our understanding, changing the license can only be done by the
current license holders, ie Altair.  The good news is that they are
currently considering this as a possibility although we do not know which
way they are leaning.

  As far the Cluster Resources/Supercluster is concerned, our plans are to
continue to contribute to this project, developing infrastructure changes
as needed, adding scalability, security, usability, and functionality
enhancements, and rolling in community patches and enhancements with no
intention of creating a commercial/closed product out of it.

  Let us know if we can be of further assistance.

Thanks,
Supercluster Development Group

On Wed, 11 Feb 2004, Chris Samuel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote:
> > Thanks for the clarification. Does anybody know whether Torque is
> > considered to be conforming to the Open Source Definition[1]?
> >
> > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software
> > License', which seems to prohibit commercial distribution, making it
> > non-free unfortunately. Is there some other fork of PBS with a true Open
> > Source license perhaps?
>
> My understanding is that they cannot alter the license as they have
> inherited that from the original OpenPBS sources, and as they do not hold
> all the copyrights to the code it cannot be changed unless Altair can be
> persuaded.
>
> My understanding is that the SuperCluster people picked the 2.3.12 version
> as a starting point as that was the most recent with the most liberal
> license (i.e. others could fork development from it).
>
> I've CC'd this to the SuperCluster folks so they can comment and correct.
>
> - --
>  Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
>
> iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU
> QwJlxOBwfLiUT7Y543RwiIY=
> =xTbA
> -----END PGP SIGNATURE-----

- --
- --------------------------------------------------------
Supercluster Development Group
Scheduling and Resource Management of Clusters and Grids
Maui Home Page   - http://supercluster.org/maui
Silver Home Page - http://supercluster.org/silver
Documentation    - http://supercluster.org/documentation

- -------------------------------------------------------

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKXgvO2KABBYQAh8RAok7AKCABbnmwiYvRf4BxeFoY+Jp9F/W1gCfReKD
dKc1islXxQLdTrabQglX1MU=
=xfyh
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From yduan at albert.chem.udel.edu  Tue Feb 10 10:37:49 2004
From: yduan at albert.chem.udel.edu (Dr. Yong Duan)
Date: Tue, 10 Feb 2004 10:37:49 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and
 other benchmarks?)
In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402101027470.23860-100000@albert.chem.udel.edu>


On Tue, 10 Feb 2004, [big5] Andrew Wang wrote:

> Again, no matter how pretty the benchmarks results
> look, in the end we still need to run on the real
> system. So, what's the point of having benchmarks?
> 
> Andrew.
> 

A guidelines, I guess. A lot of CPUs (including some rather expensive 
ones and often call them HPC CPUs) perform at less than half the speed of 
consumer grade CPUs. You'd definitely avoid those, for instance.
Also, you can look at the performance in each area and figure out the 
relative performance expected to your own code. In the end, the most 
reliable benchmark is always on your own code, of course.

Whether Intel compiler has been tuned for SPEC2K is probably an open 
question. I tried various compilers on our code and found it is also tuned 
for it :), consistently 10-20% faster than others. This included 
performance on Opterons, strangely enough.

yong

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From help at supercluster.org  Tue Feb 10 19:55:22 2004
From: help at supercluster.org (help at supercluster.org)
Date: Tue, 10 Feb 2004 17:55:22 -0700 (MST)
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <200402111000.08919.csamuel@vpac.org>
Message-ID: <Pine.LNX.4.44.0402101744500.14842-100000@supercluster.org>

Chris,

  Thanks for the cc.  You will probably need to forward this message to 
beowulf as I don't think we are registered.  OpenPBS 2.3.12 was selected 
because its license did allow anyone to modify/distribute the code for any 
reason with the only conditions being that the license be included and the 
original creators acknowledged.

  To our understanding, changing the license can only be done by the 
current license holders, ie Altair.  The good news is that they are 
currently considering this as a possibility although we do not know which 
way they are leaning.

  As far the Cluster Resources/Supercluster is concerned, our plans are to 
continue to contribute to this project, developing infrastructure changes 
as needed, adding scalability, security, usability, and functionality 
enhancements, and rolling in community patches and enhancements with no 
intention of creating a commercial/closed product out of it.

  Let us know if we can be of further assistance.

Thanks,
Supercluster Development Group

On Wed, 11 Feb 2004, Chris Samuel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote:
> 
> > Thanks for the clarification. Does anybody know whether Torque is
> > considered to be conforming to the Open Source Definition[1]?
> >
> > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software
> > License', which seems to prohibit commercial distribution, making it
> > non-free unfortunately. Is there some other fork of PBS with a true Open
> > Source license perhaps?
> 
> My understanding is that they cannot alter the license as they have inherited 
> that from the original OpenPBS sources, and as they do not hold all the 
> copyrights to the code it cannot be changed unless Altair can be persuaded.
> 
> My understanding is that the SuperCluster people picked the 2.3.12 version as 
> a starting point as that was the most recent with the most liberal license 
> (i.e. others could fork development from it).
> 
> I've CC'd this to the SuperCluster folks so they can comment and correct.
> 
> - -- 
>  Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
> 
> iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU
> QwJlxOBwfLiUT7Y543RwiIY=
> =xTbA
> -----END PGP SIGNATURE-----
> 

-- 
--------------------------------------------------------
Supercluster Development Group
Scheduling and Resource Management of Clusters and Grids
Maui Home Page   - http://supercluster.org/maui
Silver Home Page - http://supercluster.org/silver
Documentation    - http://supercluster.org/documentation

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bwegner at ekt.tu-darmstadt.de  Wed Feb 11 05:02:23 2004
From: bwegner at ekt.tu-darmstadt.de (Bernhard Wegner)
Date: Wed, 11 Feb 2004 11:02:23 +0100
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages
Message-ID: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>

Hello,

I have a really small "cluster" of 4 PC's which are connected by a normal 
Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
I thought I might be able to improve performance by connecting the machines 
via a Gigabit switch (which are really cheap nowadays).

Everything seemed to work fine. The switch indicates 1000Mbit connections to 
the PC's and transfer rate for scp-ing large files is significantly higher 
now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
with the 100 Mbit switch.

I wasn't able to actually track down the problem, but it seems that there is 
a problem with small messages. When I run the performance test provided with 
mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
byte message length, while for larger messages everything looks fine (linear 
dependancy of transfer time on message length, everything below 300 us). I 
have also tried mpich2 which shows exactly the same behavior.

Does anyone have any idea?

Here are the details of my system: 
 - Suse Linux 9.0 (kernel 2.4.21)
 - mpich-1.2.5.2
 - motherboard ASUS P4P800
 - LAN (10/100/1000) on board (3COM 3C940 chipset)
 - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M +
   8x88E1111-BAB, AT89C2051-24PI)

-- 
Mit besten Gr??en -- Best regards,

Bernhard Wegner

_______________________________________________________
=======================================================

   Dipl.-Ing. Bernhard Wegner

   Fachgebiet Energie- und Kraftwerkstechnik
   Technische Universit?t Darmstadt

   Petersenstr. 30      64287 Darmstadt      Germany

   phone: +49-6151-162357       fax: +49-6151-166555
   e-mail: bwegner at ekt.tu-darmstadt.de
_______________________________________________________
=======================================================

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From moloned at tcd.ie  Wed Feb 11 12:44:59 2004
From: moloned at tcd.ie (david moloney)
Date: Wed, 11 Feb 2004 17:44:59 +0000
Subject: [Beowulf] Profiling floating-point performance
Message-ID: <402A6A1B.2070805@tcd.ie>

I have an application written in C++ which compiles under both MSVC++ 
6.0 and gcc 2.9.6 that I would like to profile in terms of floating 
point performance.

My special requirement is that I would like not only peak and average 
flops numbers but also I would like a histogram of the actual x86 
floating point instructions executed and their contribution to those 
peak and average flops numbers.

Can anybody offer advice on how to do this?  I tried using Vtune but it 
didn't seem to have this feature.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 11:44:10 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 11:44:10 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <Pine.LNX.4.44.0402101027470.23860-100000@albert.chem.udel.edu>
Message-ID: <Pine.LNX.4.44.0402111106120.1290-100000@lilith.rgb.private.net>

On Tue, 10 Feb 2004, Dr. Yong Duan wrote:

> 
> On Tue, 10 Feb 2004, [big5] Andrew Wang wrote:
> 
> > Again, no matter how pretty the benchmarks results
> > look, in the end we still need to run on the real
> > system. So, what's the point of having benchmarks?
> > 
> > Andrew.
> > 
> 
> A guidelines, I guess. A lot of CPUs (including some rather expensive 
> ones and often call them HPC CPUs) perform at less than half the speed of 
> consumer grade CPUs. You'd definitely avoid those, for instance.
> Also, you can look at the performance in each area and figure out the 
> relative performance expected to your own code. In the end, the most 
> reliable benchmark is always on your own code, of course.

A short article this morning, as I'm debugging code and somewhat busy.

Before discussing benchmarks in general, one needs to make certain
distinctions.  There are really two kinds of benchmarks.  Maybe even
three.  Hell, more, but I'm talking broad categories.  Let's try these
three:

  * microbenchmarks
  * comparative benchmarks
  * application benchmarks

Microbenchmarks measure very specific, highly targeted areas of system
functionality.  By their very nature they are "simple", not complex --
often the pseudocode is as simple as

 start_timer();
 loop lotsatimes{
   do_something_simple = dumb*operation;
 }
 stop_timer();
 compute_speed();
 print_result();

(To compute "how fast a multiply occurs").  Simple can also describe
atomicity -- benchmarking "a single operation" where the operation might
be complex but is a standard unitary building block of complex code.

Microbenchmarks are undeniably not only useful, they are essential to
anyone who takes systems/cluster/programming engineering seriously.
Examples of microbenchmark suites that are in more or less common use
are:

  lmbench (very full featured suite; one infamous user: Linux Torvalds:-)
  stream  (very commonly cited on the list)
  cpu_rate (not so common -- wraps e.g. stream and other tests so
            variations with vector size can be explored)
  rand_rate (almost unknown, but it DOES benchmark all the gsl rands:-)
  netpipes (measure network speeds)
  netperf  (ditto, but alas no longer maintained)

I (and many others) USE these tools (I wrote two of them SO I could use
them) to study systems that we are thinking of buying and using for a
cluster, to study the kernel and see if the latest change made some
critical operation faster or slower, to figure out if the NIC/switch
combo we are using is why PVM code is moving like molasses.  They are
LESS commonly poached by vendors, fortunately - Larry Macvoy has lmbench
bristling with anti-vendor-cooking requirements at the license level.
The benchmarks are simple, but because one needs a lot of them to get an
accurate picture of overall performance they tend to be too complex for
typical mindless consumers...

Comparative benchmarks are what I think you're really referring to.
They aren't completely useless, but they do often become pissing
contests (such as the top 500 list) and there are famous stories of Evil
by corporations seeking to cook up good results on one or another
(sometimes at the expense of overall system balance and performance!).

Most of the Evil in these benchmarks arise because people end up using
them as a naive basis for purchase decisions.  "Ooo, that system has a
linpork of 4 Gigacowflops so it must be better than that one which only
gets 2.7 Gcf, so I'll buy 250 of them for my next cluster and be able to
brag about my 1 Teracowflop supercomputer and make the top third of the
top 500 list, which will impress my granting agencies and tenure board,
who are just as ignorant as I am about meaningful measures of systems
performance..."  Never mind that your application is totally
non-linpack-like, that the bus performance on the systems you got sucks,
and that the 2.7 Gcf systems you rejected cost 1/2 the 4 Gcf systems you
got so you could have had 500 at 2.7 Gcf for a net of 1.35 Tcf and
balanced memory and bus performance (and run your application faster per
dollar) if you'd bothered to do a cost benefit analysis.

The bleed of dollars attracts the vendor sharks, who often can rattle
off the aggregate specmarks and so forth for their most expensive boxes.
However, they CAN be actually useful, if one of the tests in the SPEC
suite happens to correspond to your application, if you bother to read
all the component results in the SPECmarks, if you bother to check the
compiler used and flags and system architecture in some detail to see if
they appear cooked (hand tuned or optimized, based on a compiler that is
lovely but very expensive and has to be factored into your CBA).

Finally, there are application benchmarks.  These tend to be "atomic"
but at a very high level (an application is generally very complex).
These are also subject to the Evil of comparative benchmarks (in fact
some of comparative benchmark suites, especially in the WinX world, are
a collection of application benchmarks).  They also have some evil of
their own when the application in question is commercial and not open
source -- you have effectively no control over how it was built and
tuned for your architecture, for example, and may not even have
meaningful version information.

However, they are also undeniably useful.  Especially when the
application being benchmarked is YOUR application and under your
complete control.

So the answer to your question appears to be:

  * Microbenchmarks berry berry good.  Useful.  Essential.  Fundamental.
  * Comparative benchmarks sometimes good.  Sometimes a force for Evil.
  * Application benchmark ideal if it is your application or very
similar and under your control.

Pissing contests in general are not useful, and even a useful higher
level benchmark divorced from an associated CBA is like shopping in a
store that has no price tags -- a thing of use only to those so rich
that they don't have to ask.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Wed Feb 11 13:55:01 2004
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Wed, 11 Feb 2004 12:55:01 -0600
Subject: [Beowulf] how are people doing this?
Message-ID: <20040211185501.GA31590@mikee.ath.cx>

I feel that in a proper cluster that the nodes are all (basically)
identical. I 'own' a server environment of 20+ servers that  are
all dedicated to specific applications and this is not a cluster.
However, I would like to manage config files (/etc/resolv.conf, etc),
user accounts, patches, etc., as I would in a clustered environment.
I have read the papers at infrastructures.org and agree with the
principles mentioned there. I have looked extensively at cfengine,
though I prefer the solution be in PERL as all my servers have
PERL already (the manufacturer installs PERL as default on the boxes).

How is everyone managing their cluster or what are suggestions
on how I can manage my server environment.

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 13:08:41 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 13:08:41 -0500 (EST)
Subject: [Beowulf] Profiling floating-point performance
In-Reply-To: <402A6A1B.2070805@tcd.ie>
Message-ID: <Pine.LNX.4.44.0402111247520.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, david moloney wrote:

> I have an application written in C++ which compiles under both MSVC++ 
> 6.0 and gcc 2.9.6 that I would like to profile in terms of floating 
> point performance.
> 
> My special requirement is that I would like not only peak and average 
> flops numbers but also I would like a histogram of the actual x86 
> floating point instructions executed and their contribution to those 
> peak and average flops numbers.
> 
> Can anybody offer advice on how to do this?  I tried using Vtune but it 
> didn't seem to have this feature.

I'm not sure how accurate it is overall, but see "man gprof" and compile
with the -g -p flag.  This will give you at least some useful times and so
forth.

It will NOT give you (AFAIR) "histogram of actual x86 floats etc".

I don't know of anything that will -- to get them you have to instrument
your code, probably so horribly that a la heisenberg your measurements
would bear little resemblance to actual performance (especially if your
code wants to be doing all sorts of smooth vector things in cache and
register memory and you keep calling instrumentation subroutines to try
to measure times that wreck state).

Consider that with my best, on CPU, raw assembler based timing clock
(using the onboard cycle counter) I still find the overhead of reading
that clock to be in the tens of clock cycles.  To microtime a single
multiply is thus all but impossible -- the clock itself takes 10-40
times as long to execute as a multiply might take, depending on where
the data to be multiplied is when one starts.  So timing per-instruction
is effectively out.  

Similarly, to instrument and count floating point operations requires
something to "watch the machine instructions" as they stream through the
CPU.  Unfortunately, the only thing available to watch the instructions
is the CPU itself, so you have to damn near write an
assembler-interpreter to instrument this.  Which in turn would be slow
as molasses -- an easy 10x slower than the native code in overhead alone
plus it would utterly wreck just about every code optimization known to
man.

Finally, there is the question of "what's a flop".  The answer is, not
much that's useful or consistent -- the number of floating point
operations that a system does per second varies wildly depending in a
complex way on system state, cache locality, whether the variable is
general or register, whether the instruction is part of a
complex/advanced instruction (e.g. add/multiply) or an instruction that
has to be done partly in software (divide), whether or not the
instruction is part of a stream of vectorized instructions, and more.

That's why microbenchmarks are useful.  You may not be able to extract
meaningful results from your code with a simple tool (although it isn't
terribly difficult to instrument major blocks or subroutines with timers
and counters, which is more or less with -p and gprof do) but you can
learn at least some things about how your system executes core
operations in various contexts to learn how to optimize one's code with
a good microbenchmark.  Just sweeping stream across vector sizes from 1
to 10^8 or so teaches you a whole lot about the system's performance in
different contexts, as does doing a stream-like benchmark but working
through the vector in a random order (i.e. deliberately defeating any
sort of vector optimization and cache benefit).

Good luck,

   rgb

> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 13:58:39 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 13:58:39 -0500 (EST)
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211185501.GA31590@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0402111356050.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, Mike Eggleston wrote:

> I feel that in a proper cluster that the nodes are all (basically)
> identical. I 'own' a server environment of 20+ servers that  are
> all dedicated to specific applications and this is not a cluster.
> However, I would like to manage config files (/etc/resolv.conf, etc),
> user accounts, patches, etc., as I would in a clustered environment.
> I have read the papers at infrastructures.org and agree with the
> principles mentioned there. I have looked extensively at cfengine,
> though I prefer the solution be in PERL as all my servers have
> PERL already (the manufacturer installs PERL as default on the boxes).
> 
> How is everyone managing their cluster or what are suggestions
> on how I can manage my server environment.

Mike, this is nearly a FAQ -- the list archives should have a discussion
(one of many) only a few weeks old on this very subject.

There are NIS, LDAP, rsync, cfengine, and even yum/rpm-based solutions
possible, and more.  Oh, and dhcp actually pushes lots of stuff out all
by itself these days -- it should handle the stuff in resolv.conf for
example, and you should be using dhcp anyway for scalability reasons.

   rgb

> 
> Mike
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Wed Feb 11 14:44:02 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Wed, 11 Feb 2004 13:44:02 -0600 (CST)
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211185501.GA31590@mikee.ath.cx>
References: <20040211185501.GA31590@mikee.ath.cx>
Message-ID: <Pine.GSO.4.58.0402111343300.27397@is.rice.edu>

Mike, we use systemimager, systemconfigurator and a custom utility called
"cupdate" to maintain our clusters. In our case it works beautifully and
easilly.

-Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Wed, 11 Feb 2004, Mike Eggleston wrote:

> I feel that in a proper cluster that the nodes are all (basically)
> identical. I 'own' a server environment of 20+ servers that  are
> all dedicated to specific applications and this is not a cluster.
> However, I would like to manage config files (/etc/resolv.conf, etc),
> user accounts, patches, etc., as I would in a clustered environment.
> I have read the papers at infrastructures.org and agree with the
> principles mentioned there. I have looked extensively at cfengine,
> though I prefer the solution be in PERL as all my servers have
> PERL already (the manufacturer installs PERL as default on the boxes).
>
> How is everyone managing their cluster or what are suggestions
> on how I can manage my server environment.
>
> Mike
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From scyld at jasons.us  Wed Feb 11 13:01:21 2004
From: scyld at jasons.us (scyld at jasons.us)
Date: Wed, 11 Feb 2004 13:01:21 -0500 (EST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>
References: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>
Message-ID: <20040211125741.A5961@torgo.bigbroncos.org>


On Wed, 11 Feb 2004, Bernhard Wegner wrote:

> Hello,
>
> I have a really small "cluster" of 4 PC's which are connected by a normal
> Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
> I thought I might be able to improve performance by connecting the machines
> via a Gigabit switch (which are really cheap nowadays).
>
> Everything seemed to work fine. The switch indicates 1000Mbit connections to
> the PC's and transfer rate for scp-ing large files is significantly higher
> now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
> with the 100 Mbit switch.

Have you tried setting the speed and duplex of the gig NICs to 1000/full
on both the system side and switch side?  I've found that autonegotiate
rarely does especially with 3com gear.  I'm guessing, based on its size,
your switch isn't managed so you may have to stick to locking it on the
systems and watching the behavior to see if the switch gets the
negotiation right.  (if traffic is bursty you have a speed mismatch and if
you get loads of errors it's more likely to be duplex problem)

FWIW I have the same mobo at home but haven't hooked it to gigabit yet so
I'm quite curious to see how this works out.

-Jason

-----
Jason K. Schechner  -   check out www.cauce.org and help ban spam-mail.
"All HELL would break loose if time got hacked." - Bill Kearney 02-04-03
---There is no TRUTH.  There is no REALITY.  There is no CONSISTENCY.---
   ---There are no ABSOLUTE STATEMENTS   I'm very probably wrong.---
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Wed Feb 11 15:00:15 2004
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Wed, 11 Feb 2004 14:00:15 -0600
Subject: [Beowulf] how are people doing this?
In-Reply-To: <Pine.LNX.4.44.0402111356050.1290-100000@lilith.rgb.private.net>
References: <20040211185501.GA31590@mikee.ath.cx> <Pine.LNX.4.44.0402111356050.1290-100000@lilith.rgb.private.net>
Message-ID: <20040211200015.GE31590@mikee.ath.cx>

On Wed, 11 Feb 2004, Robert G. Brown wrote:

> On Wed, 11 Feb 2004, Mike Eggleston wrote:
> 
> > I feel that in a proper cluster that the nodes are all (basically)
> > identical. I 'own' a server environment of 20+ servers that  are
> > all dedicated to specific applications and this is not a cluster.
> > However, I would like to manage config files (/etc/resolv.conf, etc),
> > user accounts, patches, etc., as I would in a clustered environment.
> > I have read the papers at infrastructures.org and agree with the
> > principles mentioned there. I have looked extensively at cfengine,
> > though I prefer the solution be in PERL as all my servers have
> > PERL already (the manufacturer installs PERL as default on the boxes).
> > 
> > How is everyone managing their cluster or what are suggestions
> > on how I can manage my server environment.
> 
> Mike, this is nearly a FAQ -- the list archives should have a discussion
> (one of many) only a few weeks old on this very subject.
> 
> There are NIS, LDAP, rsync, cfengine, and even yum/rpm-based solutions
> possible, and more.  Oh, and dhcp actually pushes lots of stuff out all
> by itself these days -- it should handle the stuff in resolv.conf for
> example, and you should be using dhcp anyway for scalability reasons.

I know it's been discussed and I apologize for asking it again. I've
just not found the way that seems to fit with the picture I'm trying
to reach. What I'm thinking of doing is writing a perl script that
can be placed into CVS. On each server a cron process checks out the
current CVS repository of server (AIX) config data and script. Then
the perl script starts to check permissions, update resolv.conf, hosts,
login, passwd, etc., and to check that specific packages are installed
or that the packages need updating. I like a lot of what cfengine
did, but I really want a script that can be maintained in CVS.

For installing packages I plan for the script to mount an NFS export
for pulling the packages.

# mkdir /tmp/nfs.$$
# mount admin:/opt/packages /tmp/nfs.$$
# installp -d /tmp/nfs.$$ package
# umount /tmp/nfs.$$
# rmdir /tmp/nfs.$$

For the account management I'm thinking of something on my admin
server that pulls LDAP (M$ ADS) at some frequency (30-60 min) updating
a local file with new users and their passwords. Then this file
is checked into CVS for distribution to other nodes/servers. Using
another file to list the users that are authorized access to the
local node/server keeps my user-space to a minimum.

Is that any more clear what I'm trying to do? I don't have a cluster,
but I want to manage all nodes as identically as I can.

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 14:35:13 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 14:35:13 -0500 (EST)
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211200015.GE31590@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0402111432580.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, Mike Eggleston wrote:

> I know it's been discussed and I apologize for asking it again. I've
> just not found the way that seems to fit with the picture I'm trying
> to reach. What I'm thinking of doing is writing a perl script that
> can be placed into CVS. On each server a cron process checks out the
> current CVS repository of server (AIX) config data and script. Then
> the perl script starts to check permissions, update resolv.conf, hosts,
> login, passwd, etc., and to check that specific packages are installed
> or that the packages need updating. I like a lot of what cfengine
> did, but I really want a script that can be maintained in CVS.

You might look into yum.  You'd have to learn python, but yum already
does most of what you want for rpm packages and could likely be hacked.
In fact, yum would do what you want for all the config files if you roll
them into an rpm package right now -- it already has precisely what it
needs to install and update according to a revision number.

You can run yum update as often as you wish.  It will run from NFS and
can be secured a variety of ways.

   rgb

> 
> For installing packages I plan for the script to mount an NFS export
> for pulling the packages.
> 
> # mkdir /tmp/nfs.$$
> # mount admin:/opt/packages /tmp/nfs.$$
> # installp -d /tmp/nfs.$$ package
> # umount /tmp/nfs.$$
> # rmdir /tmp/nfs.$$
> 
> For the account management I'm thinking of something on my admin
> server that pulls LDAP (M$ ADS) at some frequency (30-60 min) updating
> a local file with new users and their passwords. Then this file
> is checked into CVS for distribution to other nodes/servers. Using
> another file to list the users that are authorized access to the
> local node/server keeps my user-space to a minimum.
> 
> Is that any more clear what I'm trying to do? I don't have a cluster,
> but I want to manage all nodes as identically as I can.
> 
> Mike
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From canon at nersc.gov  Wed Feb 11 17:05:26 2004
From: canon at nersc.gov (canon at nersc.gov)
Date: Wed, 11 Feb 2004 14:05:26 -0800
Subject: [Beowulf] Profiling floating-point performance 
In-Reply-To: Message from david moloney <moloned@tcd.ie> 
   of "Wed, 11 Feb 2004 17:44:59 GMT." <402A6A1B.2070805@tcd.ie> 
Message-ID: <200402112205.i1BM5QwA011397@pookie.nersc.gov>


David,

You may want to look into PAPI and perfctr.  It allows you query
the performance counters built into most processors.

--Shane

------------------------------------------------------------------------
Shane Canon                             voice: 510-486-6981
PSDF Project Lead                       fax:   510-486-7520
National Energy Research Scientific
  Computing Center
1 Cyclotron Road Mailstop 943-256
Berkeley, CA 94720                      canon at nersc.gov
------------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 17:13:32 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 17:13:32 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4C0CD@orsmsx402.jf.intel.com>
Message-ID: <Pine.LNX.4.44.0402111628500.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, Lombard, David N wrote:

> > They also have some evil of
> > their own when the application in question is commercial and not open
> > source -- you have effectively no control over how it was built and
> > tuned for your architecture, for example, and may not even have
> > meaningful version information.
> 
> Let's be fair here. An ISV application is not the definition of evil.

I did not mean to imply that they were wholly evil or even evil in
intent.

> Clearly, "you have effectively no control over how an application was
> built and tuned for your architecture" has no direct correspondence to
> performance.

I would have to respectfully and vehemently disagree.  It has all sorts
of direct correspondances.  Let us make a short tally of ways that a
closed source, binary only application used as a benchmark can mislead
me with regard to the performance of a system.

  * I don't control the compiler choice.  Your compiler and mine might
result in me getting a very different performance even if your
application "resembles" mine (AFAICT given that I cannot read the
source).

  * I don't control the libraries.  Your application is (probably)
static linked in various places and might even use private libraries
that are hand-optimized.  My application would likely be linked
dynamically with completely different libraries.  Your libraries might
be out of date.  My libraries might be out of date.

  * I don't have any way of knowing whether your "canned" (say) Monte
Carlo benchmark is relevant to my Monte Carlo application.  Maybe your
code is structured to be strictly vectorized and local, but mine
requires random site access.  Yours might be CPU bound.  Mine might be
memory bound.  Since I can't see the source, I'll never know.

  * I have to pay money for the application to use as a benchmark before
I even look at hardware.  If I'm an honest soul, I probably have to buy
a separate license for every platform I plan to test even before I buy
the test platform OR run afoul of the Dumb Mutha Copyright Act (aka
known as the "Intellectual Straightjacket Act").  Or maybe I can rely on
vendor reports of the results.  This adds costs to the engineering
process.

  * Even leaving side the additional costs, there is the issue of
whether the application I'm using is tuned for the hardware I'm running
on.  strict i386 code will not run as fast as strict i586 code will not
run as fast as i686 code will not run optimally on an Athlon will not
run optimally on an Opteron.  Yet the Opteron will likely RUN i386 code.
I just won't know whether the result is at all relevant to how the
Opteron runs Opteron code.  (These effects are not necessarily small.)

  * And if I thought about it hard, I could likely come up with a few
more negatives...such as the entire raft of reasons that closed source
software is a Bad Thing to encourage on general principles.  The
principles built right into the original beowulf mission statement
(which IIRC has a very clear open source requirement for engineering
reasons).

The point being that while closed source commercial applications don't
necessarily make "evil" benchmarks in the sense that there is any intent
to hide or alter performance characteristics of a given architecture,
they add a number of sources of noise to an already arcane and uncertain
process.  They are less reliable, more likely to mislead you (quite
possibly through nobody's fault or intention), less likely to accurately
predict the performance of the architecture on your application suite.
And they are ultimately black boxes that you have to pay people to use.

I personally am a strong proponent (in case you can't tell:-) of open
source (ideally GPL) software and tools, ESPECIALLY for benchmarking.  I
even tried to talk Larry McVoy into GPL-ing lmbench back when it had a
fairly curmudgeonly license, even though it the source itself was open
enough.

Note, BTW, that all of the observations above are irrelevant if the
application being used as a benchmark is the application you intend to
use in the form you intend to use it, purchased or not.  As in:

> > However, they are also undeniably useful.  Especially when the
> > application being benchmarked is YOUR application and under your
> > complete control.
> 
> Regardless of ownership or control, they're especially useful when
> you're looking at an application being used in the way you intend on
> using it. Many industrial users buy systems to run a specific list of
> ISV applications.  In this instance, the application benchmark can be
> the most valid benchmark, as it can model the system in the way it will
> be used -- and that's the most important issue.

Sure.  Absolutely.  I'd even say that your application(s) is(are) ALWAYS
the best benchmark for many or even most purposes, with the minor caveat
that the microbenchmarks have a slightly different purpose and are best
for the purpose for which they are intended.  I doubt that Linus runs a
scripted set of userspace Gnome applications to test the performance of
kernel subsystems...

> I'm not disagreeing with your message.  I too try to make sure that
> people use the right benchmarks for the right purpose; I've seen way too
> many people jump to absurd conclusions based on a single data point or
> completely unrelated information.  I'm just trying to sharpen your
> message by pointing out some too broad brush strokes...
> 
> Well, maybe I don't put as much faith in micro benchmarks unless in the
> hands of a skilled interpreter, such as yourself.  My preference is for
> whatever benchmarks most closely describe your use of the system.

Microbenchmarks are not intended to be predictors of performance in
macro-applications, although a suite of results such as lmbench can give
an expert a surprisingly accurate idea of what to expect there.  They
are more to help you understand systems performance in certain atomic
operations that are important components of many applications.  A
networking benchmark can easily reveal problems with your network, for
example, that help you understand why this application which ran just
peachy keen at one scale as a "benchmark" suddenly turns into a pig at
another scale.  A good CPU/memory benchmark can do the same thing wrt
the memory subsystem.

This is yet another major problem with an naive application benchmark or
comparative benchmark (and even with microbenchmarks) -- they are OFTEN
run at a single scale or with a single set of parameters.  On system A,
that scale might be one that lets the application remain L2-local.  On
system B it might not be.  You might then conclude that B is much
slower.  On the scale that you intend to run it, both might be L2-local
or both might be running out of memory.  B might have a faster
processor, or a better overall balance of performance and might actually
be faster at that scale.

I don't put much faith in benchmarks, period.  With the exception of
your application(s), of course.  Faith isn't the point -- they are just
rulers, stopwatches, measuring tools.  Some of them measure "leagues per
candle", or "furlongs per semester" and aren't terribly useful.  Others
are just what you need to make sense of a system.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nixon at nsc.liu.se  Wed Feb 11 16:26:44 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Wed, 11 Feb 2004 22:26:44 +0100
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211200015.GE31590@mikee.ath.cx> (Mike Eggleston's message
 of "Wed, 11 Feb 2004 14:00:15 -0600")
References: <20040211185501.GA31590@mikee.ath.cx>
	<Pine.LNX.4.44.0402111356050.1290-100000@lilith.rgb.private.net>
	<20040211200015.GE31590@mikee.ath.cx>
Message-ID: <m3ekt15muj.fsf@nammatj.nsc.liu.se>

Mike Eggleston <mikee at mikee.ath.cx> writes:

> I know it's been discussed and I apologize for asking it again. I've
> just not found the way that seems to fit with the picture I'm trying
> to reach. What I'm thinking of doing is writing a perl script that
> can be placed into CVS. On each server a cron process checks out the
> current CVS repository of server (AIX) config data and script. Then
> the perl script starts to check permissions, update resolv.conf, hosts,
> login, passwd, etc., and to check that specific packages are installed
> or that the packages need updating. I like a lot of what cfengine
> did, but I really want a script that can be maintained in CVS.

Well, if it comes to that, surely you can place cfengine's
configuration files in CVS and let cron run a script that updates the
config files from CVS and then launches cfengine? You don't have to
run cfd, you know; you can start cfengine any way you want.

I'd really think twice before starting to re-implement cfengine's
existing functionality.

cfengine helped me keep my sanity in an earlier life while
single-handedly adminning a heterogenous Unix environment ranging from
SunOS 4.1.3_u1 through Solaris 7, diverse Tru64:s and a hodge-podge of
Linuxen.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Wed Feb 11 16:58:48 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Wed, 11 Feb 2004 13:58:48 -0800
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4C0CD@orsmsx402.jf.intel.com>

From: Robert G. Brown; Wednesday, February 11, 2004 8:44 AM
[deletia]
> 
> Finally, there are application benchmarks.  These tend to be "atomic"
> but at a very high level (an application is generally very complex).
> These are also subject to the Evil of comparative benchmarks (in fact
> some of comparative benchmark suites, especially in the WinX world,
are
> a collection of application benchmarks).

True.  I cringe to think how many systems were bought for scientific and
technical computations based on UT2003 "benchmarks".

> They also have some evil of
> their own when the application in question is commercial and not open
> source -- you have effectively no control over how it was built and
> tuned for your architecture, for example, and may not even have
> meaningful version information.

Let's be fair here. An ISV application is not the definition of evil.

Clearly, "you have effectively no control over how an application was
built and tuned for your architecture" has no direct correspondence to
performance.

Having been on the ISV side of the fence, and spent a tremendous amount
of energy making sure that each port of the application performed as
well as it could, I'm quite confident in saying we generally succeeded
in maximizing performance.  Realize that we had day after day to spend
on performance, usually with the attention of one or more experts from
the platform vendor at our beck and call -- and those experts would
spend even more time on even more narrow aspects of performance.

Having said that, there are some notable ISV applications that simply do
not perform as well as they should.  This can occur for a host of
reasons, such as they, did not care, didn't know how, could/would not to
make the effort, didn't have the time, were ignored by the vendor, &etc
-- basically the very same reasons that some people who don't work for
ISVs fail to make their own applications perform as well as they could.

> However, they are also undeniably useful.  Especially when the
> application being benchmarked is YOUR application and under your
> complete control.

Regardless of ownership or control, they're especially useful when
you're looking at an application being used in the way you intend on
using it. Many industrial users buy systems to run a specific list of
ISV applications.  In this instance, the application benchmark can be
the most valid benchmark, as it can model the system in the way it will
be used -- and that's the most important issue.

I'm not disagreeing with your message.  I too try to make sure that
people use the right benchmarks for the right purpose; I've seen way too
many people jump to absurd conclusions based on a single data point or
completely unrelated information.  I'm just trying to sharpen your
message by pointing out some too broad brush strokes...

Well, maybe I don't put as much faith in micro benchmarks unless in the
hands of a skilled interpreter, such as yourself.  My preference is for
whatever benchmarks most closely describe your use of the system.

-- 
David N. Lombard
 
My comments represent my opinions, not those of Intel Corporation.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Wed Feb 11 18:16:22 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Wed, 11 Feb 2004 15:16:22 -0800
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4C0D2@orsmsx402.jf.intel.com>

From: Robert G. Brown; Wednesday, February 11, 2004 2:14 PM
> On Wed, 11 Feb 2004, Lombard, David N wrote:
> 
> > > They also have some evil of
> > > their own when the application in question is commercial and not
open
> > > source -- you have effectively no control over how it was built
and
> > > tuned for your architecture, for example, and may not even have
> > > meaningful version information.
> >
> > Let's be fair here. An ISV application is not the definition of
evil.
> 
> I did not mean to imply that they were wholly evil or even evil in
> intent.
> 
> > Clearly, "you have effectively no control over how an application
was
> > built and tuned for your architecture" has no direct correspondence
to
> > performance.
> 
> I would have to respectfully and vehemently disagree.  It has all
sorts
> of direct correspondances.  Let us make a short tally of ways that a
> closed source, binary only application used as a benchmark can mislead
> me with regard to the performance of a system.
>
[deletia]
> 
> Note, BTW, that all of the observations above are irrelevant if the
> application being used as a benchmark is the application you intend to
> use in the form you intend to use it, purchased or not.

OK. So there's our difference.  I only consider an application benchmark
useful in this scenario.  I can't imagine using an application benchmark
of any sort if it isn't; you enumerated all the reasons for this in the
bits I just snipped.

We agree completely on this.

-- 
David N. Lombard
 
My comments represent my opinions, not those of Intel Corporation.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Feb 11 18:07:18 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 12 Feb 2004 10:07:18 +1100
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
In-Reply-To: <Pine.LNX.4.44.0402111628500.1290-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0402111628500.1290-100000@lilith.rgb.private.net>
Message-ID: <200402121007.30002.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 12 Feb 2004 09:13 am, Robert G. Brown wrote:

>   * Even leaving side the additional costs, there is the issue of
> whether the application I'm using is tuned for the hardware I'm running
> on.

Such as ISV's including IA32 executables as part of their IA64 version.  It 
wasn't all IA32, just bits.  Very odd.

We only spotted it when it failed to work on Rocks 3.1.0, which doesn't supply 
the IA32 compatability libraries (which Rocks 3.0.0 did).

No, I'm not going to name names, but the "file" and "ldd" are your friends.

cheers,
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKrWmO2KABBYQAh8RAu9JAJ41djUEj+6zEZYrY9IuPG4E9s9qugCeKhJd
2pf/pnDftPMs0zCLYb7IaRM=
=t/c6
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 18:34:06 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 18:34:06 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <200402121007.30002.csamuel@vpac.org>
Message-ID: <Pine.LNX.4.44.0402111832560.1290-100000@lilith.rgb.private.net>

On Thu, 12 Feb 2004, Chris Samuel wrote:

> No, I'm not going to name names, but the "file" and "ldd" are your
friends.

...and with that, I'm going to quit for the day and take my nameless
friends out for a beer somewhere...

(Sorry, revenge for the lies, damned lies...:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Feb 11 17:23:12 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 12 Feb 2004 09:23:12 +1100
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
In-Reply-To: <Pine.LNX.4.44.0402111106120.1290-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0402111106120.1290-100000@lilith.rgb.private.net>
Message-ID: <200402120923.19328.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 12 Feb 2004 03:44 am, Robert G. Brown wrote:

> There are really two kinds of benchmarks.  Maybe even
> three.

Lies, damn lies and statistics ?

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKqtTO2KABBYQAh8RAg+TAJ4uLkrC7zOUDlK8OYVxBuwKY/GXuQCeJFvj
vd9nT5nkEuUY/3Myv0IROaU=
=8pIh
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 18:32:28 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 18:32:28 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4C0D2@orsmsx402.jf.intel.com>
Message-ID: <Pine.LNX.4.44.0402111817570.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, Lombard, David N wrote:

> OK. So there's our difference.  I only consider an application benchmark
> useful in this scenario.  I can't imagine using an application benchmark
> of any sort if it isn't; you enumerated all the reasons for this in the
> bits I just snipped.
> 
> We agree completely on this.

I figured that we did -- I'm getting verbose on it because I think it is
an important issue to be precise on.  "What's a FLOP?" is a perfectly
reasonable question with a perfectly unintelligible and meaningless
answer, in spite of it being cited again and again over decades to sell
systems.  At the same time, benchmarks are certainly useful.

I think the confusion is probably my fault -- my age/history showing
again.  I can remember fairly clearly when awk was cited as a benchmark.
Quake too, and not for people who were USING awk or necessarily going to
play quake.  This is what I meant by an "application benchmark" -- some
sort of application that somebody thinks is a good measure of general
systems performance and manage to get people to take seriously.  Stuff
like this is still fairly commonly used in many WinXX "benchmarks" that
you'll see "published" both on the web and in real paper magazine
articles.  How fast can Excel update a spreadsheet that computes lunar
orbital trajectories, that sort of thing.

Sometimes they are almost a joke -- applications that do a lot of disk
I/O (apparently, who knows) are used as a "disk performance benchmark".
I won't even get started on this sort of thing and the number of
variables left completely uncontrolled (for example, the disk caching
subsystems both hardware and software) compared to, say, bonnie or
lmbench.  I also won't comment on just how much crap there is out there
with stuff like this in it, sometimes from supposedly "reputable"
testing companies that ought to know better or be more honest.

That's why I "trust" GPL/Open microbenchmarks the most, because I can
look at their sources, understand just what they are doing and how it
compares to what I want to do, maybe even hack them if I need to because
it isn't QUITE right, and get numbers with some meaning.  Stuff like
SPEC and linpack (where linpack should probably be considered micro)
isn't horrible but (in the case of SPEC) isn't GPL or terribly
straightforward to understand microscopically or macroscopically -- it
takes experience to know how the profile it generates compares to
features in your own code.  Great for sales-speak, though -- "Our system
gets 2301.124 specoloids/second, while THEIR system is a laughable
1721.564."  Quake isn't a useful benchmark -- it is a game, and one that
generally runs as fast as it needs to whereever it runs...but it is a
GREAT benchmark for how a system plays quake:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Wed Feb 11 19:31:43 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Wed, 11 Feb 2004 19:31:43 -0500 (EST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>
Message-ID: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal>

On Wed, 11 Feb 2004, Bernhard Wegner wrote:

> Hello,
> 
> I have a really small "cluster" of 4 PC's which are connected by a normal 
> Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
> I thought I might be able to improve performance by connecting the machines 
> via a Gigabit switch (which are really cheap nowadays).
> 
> Everything seemed to work fine. The switch indicates 1000Mbit connections to 
> the PC's and transfer rate for scp-ing large files is significantly higher 
> now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
> with the 100 Mbit switch.
> 
> I wasn't able to actually track down the problem, but it seems that there is 
> a problem with small messages. When I run the performance test provided with 
> mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
> byte message length, while for larger messages everything looks fine (linear 
> dependancy of transfer time on message length, everything below 300 us). I 
> have also tried mpich2 which shows exactly the same behavior.
> 
> Does anyone have any idea?

First, I assume you were running the 100BT through the same 
onboard NICs and got reasonable performance. So some possible
things:

- the switch is a dog or it is broken
- your cables may be old or bad (but worked fine for 100BT)
- negotiation problem

Some things to try:

Use a cross over cable (cat5e) and see if you get the same problem.
You might try using a lower level benchmark (of the micro variety)
like netperf and netpipe. 

The Beowulf Performance Suite:
http://www.clusterworld.com/article.pl?sid=03/03/17/1838236

has these tests. Also, the December and January issues of ClusterWorld
show how to test a network connection using netpipe. At some point this 
content will be showing up on the web-page. 

Also, the MPI Link-checker from Microway (www.microway.com)

http://www.clusterworld.com/article.pl?sid=04/02/09/1952250

May help.


Doug

> 
> Here are the details of my system: 
>  - Suse Linux 9.0 (kernel 2.4.21)
>  - mpich-1.2.5.2
>  - motherboard ASUS P4P800
>  - LAN (10/100/1000) on board (3COM 3C940 chipset)
>  - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M 
+
>    8x88E1111-BAB, AT89C2051-24PI)
> 
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Wed Feb 11 20:19:26 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 11 Feb 2004 20:19:26 -0500
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
	for SPEC2k (and other benchmarks?)
In-Reply-To: <200402121007.30002.csamuel@vpac.org>
References: <Pine.LNX.4.44.0402111628500.1290-100000@lilith.rgb.private.net>
	 <200402121007.30002.csamuel@vpac.org>
Message-ID: <1076548766.3950.91.camel@protein.scalableinformatics.com>

On Wed, 2004-02-11 at 18:07, Chris Samuel wrote:

> No, I'm not going to name names, but the "file" and "ldd" are your friends.

... and strace.  Amazing how useful that one is.   

-- 
Joe Landman <landman at scalableinformatics.com>
Scalable Informatics LLC

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Thu Feb 12 03:44:12 2004
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Thu, 12 Feb 2004 09:44:12 +0100
Subject: [Beowulf] Profiling floating-point performance
In-Reply-To: <402A6A1B.2070805@tcd.ie>
References: <402A6A1B.2070805@tcd.ie>
Message-ID: <200402120944.12719.joachim@ccrl-nece.de>

david moloney:
> Can anybody offer advice on how to do this?  I tried using Vtune but it
> didn't seem to have this feature.

Try PAPI: http://icl.cs.utk.edu/papi/

It offers you all information that the CPU has to offer for this. It depends 
on you how to gather them.

However, for an instruction-level histogramm, a simulator will probaby be more 
useful. And you should think about if you *really* need this - if the 
information you get is worth the effort. 

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Thu Feb 12 03:31:21 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Thu, 12 Feb 2004 09:31:21 +0100 (CET)
Subject: [Beowulf] Profiling floating-point performance
In-Reply-To: <402A6A1B.2070805@tcd.ie>
Message-ID: <Pine.LNX.4.44.0402120928560.13663-100000@druifje.clustervision.com>

On Wed, 11 Feb 2004, david moloney wrote:

> 
> My special requirement is that I would like not only peak and average 
> flops numbers but also I would like a histogram of the actual x86 
> floating point instructions executed and their contribution to those 
> peak and average flops numbers.
> 
> Can anybody offer advice on how to do this?  I tried using Vtune but it 
> didn't seem to have this feature.
> 
Can't help directly,
but you could look at Oprofile  http://oprofile.sourceforge.net/about/


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Thu Feb 12 03:17:29 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Thu, 12 Feb 2004 09:17:29 +0100 (CET)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>
Message-ID: <Pine.LNX.4.44.0402120912520.13663-100000@druifje.clustervision.com>

On Wed, 11 Feb 2004, Bernhard Wegner wrote:

> 
> Does anyone have any idea?
> 
> Here are the details of my system: 
>  - Suse Linux 9.0 (kernel 2.4.21)
>  - mpich-1.2.5.2
>  - motherboard ASUS P4P800
>  - LAN (10/100/1000) on board (3COM 3C940 chipset)
>  - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M +
>    8x88E1111-BAB, AT89C2051-24PI)

You might look at the P4_GLOBMEMSIZE parameter in the MPI job.

export P4_GLOBMEMSIZE=20194344      (say)

Try stepping through various values for this parameter,
and run the Pallas benchmark.
Let us know what the results are!


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Thu Feb 12 03:24:10 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Thu, 12 Feb 2004 09:24:10 +0100 (CET)
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211185501.GA31590@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0402120919370.13663-100000@druifje.clustervision.com>

On Wed, 11 Feb 2004, Mike Eggleston wrote:

> I feel that in a proper cluster that the nodes are all (basically)
> identical. I 'own' a server environment of 20+ servers that  are
> all dedicated to specific applications and this is not a cluster.
> However, I would like to manage config files (/etc/resolv.conf, etc),
> user accounts, patches, etc., as I would in a clustered environment.
> I have read the papers at infrastructures.org and agree with the
> principles mentioned there. I have looked extensively at cfengine,
> though I prefer the solution be in PERL as all my servers have
> PERL already (the manufacturer installs PERL as default on the boxes).

Alternatives you might look at are:

 LCFG http://www.lcfg.org/

The European Datagrid people have the Quattor project  http://quattor.org/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikhailberis at free.net.ph  Thu Feb 12 05:37:54 2004
From: mikhailberis at free.net.ph (Dean Michael C. Berris)
Date: 12 Feb 2004 18:37:54 +0800
Subject: [Beowulf] Master-Slave Problems
Message-ID: <1076582124.5002.20.camel@mikhail>

Good day everyone,

I've just finished implementing and testing a master-slave prime number
finder as a test problem for my thesis on heterogeneous cluster load
balancing for parallel applications. Test results show anomalies which
may be tied to work chunk size allocations to the slaves, but to test
whether it will hold true for other applications and is not directly
tied to the parallel prime number finder, I am in need of other problems
that may be solved using a master-slave architecture.

Sure it is easy to come up with just any problem and implement a
solution in a master-slave model, but I'm looking for computationally
intensive problems wherein the computation necessary for parts of the
problem are not equal. What I mean by this is similar to the case of the
parallel number finder, seeing whether 11 is prime requires less
computation compared to seeing whether 9999991 is prime.

Any insights or pointers to documentations or papers that have had
similar problems are most welcome.

TIA

PS: Are ther any cluster admins there willing to spare some cycles and
cluster time for a cluster needy BS Undergraduate student in the
Philippines? :D
-- 
Dean Michael C. Berris
http://mikhailberis.blogspot.com
mikhailberis at free.net.ph
+63 919 8720686
GPG 08AE6EAC


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From meetsunil80x86 at yahoo.co.in  Thu Feb 12 06:58:41 2004
From: meetsunil80x86 at yahoo.co.in (=?iso-8859-1?q?sunil=20kumar?=)
Date: Thu, 12 Feb 2004 11:58:41 +0000 (GMT)
Subject: [Beowulf] Math Coprocessor
Message-ID: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com>


 Hello everybody,
  I am a newbie in the Linux world.I would like to
know    
 know to...  
  1) program the 80x87 using C/C++/Fortran95 in linux 
     
      platform.
  2) program the 80x86 using C/C++/Fortran95 in linux 
      platform.
  3) link a C function into a fortran95 program or
vice 
      versa. 
  
  Thanks in advance,
             sunil

________________________________________________________________________
Yahoo! India Education Special: Study in the UK now.
Go to http://in.specials.yahoo.com/index1.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Thu Feb 12 09:25:30 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Thu, 12 Feb 2004 22:25:30 +0800 (CST)
Subject: [Beowulf] IA64 & AMD64 binary SPBS and SGE download
Message-ID: <20040212142530.32328.qmail@web16807.mail.tpe.yahoo.com>

Just FYI only.

AMD64 binary from offical GridEngine homepage:
http://gridengine.sunsource.net/project/gridengine/download.html
(IA64 is supported but you need to build from source)

IA64 and AMD64 binary rpm for Torque:
http://www-user.tu-chemnitz.de/~kapet/torque/

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Feb 12 09:18:31 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 12 Feb 2004 09:18:31 -0500 (EST)
Subject: [Beowulf] Master-Slave Problems
In-Reply-To: <1076582124.5002.20.camel@mikhail>
Message-ID: <Pine.LNX.4.44.0402120913180.21789-100000@coffee.psychology.mcmaster.ca>

> Sure it is easy to come up with just any problem and implement a
> solution in a master-slave model, but I'm looking for computationally
> intensive problems wherein the computation necessary for parts of the
> problem are not equal. What I mean by this is similar to the case of the
> parallel number finder, seeing whether 11 is prime requires less
> computation compared to seeing whether 9999991 is prime.

an easy if hackneyed one is a mandelbrot-family fractal zoomer.
depending on what chunk of the space you look at, I'd guess you 
could find pretty much any distribution of work-per-point.  if your
master-slave model does smart domain decomp, this might be just the thing.

true, some people will roll their eyes when they find out you're doing
fractals.  I certainly did, when someone here used them.  but they do 
have nice properties, and nice pictures always help ;)


> PS: Are ther any cluster admins there willing to spare some cycles and
> cluster time for a cluster needy BS Undergraduate student in the
> Philippines? :D

send me some email.

regards, mark hahn.
hahn at sharcnet.ca

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 12 09:35:33 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 12 Feb 2004 09:35:33 -0500 (EST)
Subject: [Beowulf] Master-Slave Problems
In-Reply-To: <1076582124.5002.20.camel@mikhail>
Message-ID: <Pine.LNX.4.44.0402120925370.1290-100000@lilith.rgb.private.net>

On 12 Feb 2004, Dean Michael C. Berris wrote:

> Good day everyone,
> 
> I've just finished implementing and testing a master-slave prime number
> finder as a test problem for my thesis on heterogeneous cluster load
> balancing for parallel applications. Test results show anomalies which
> may be tied to work chunk size allocations to the slaves, but to test
> whether it will hold true for other applications and is not directly
> tied to the parallel prime number finder, I am in need of other problems
> that may be solved using a master-slave architecture.
> 
> Sure it is easy to come up with just any problem and implement a
> solution in a master-slave model, but I'm looking for computationally
> intensive problems wherein the computation necessary for parts of the
> problem are not equal. What I mean by this is similar to the case of the
> parallel number finder, seeing whether 11 is prime requires less
> computation compared to seeing whether 9999991 is prime.
> 
> Any insights or pointers to documentations or papers that have had
> similar problems are most welcome.

Two remarks.  One, lots of problems (e.g. descent into a Mandelbrot set)
have widely variable compute times for chunks of work divvied out in a
master/slave model with very short and uniform messages distributing the
work.  

Two, why not just simulate work?  You're studying something in computer
science, not trying to compute prime numbers or random numbers or
mandelbrot sets or julia sets.  Set up your master to distribute times
for slaves to sleep and then reply.  Select the times to distribute from
the distribution (random or otherwise) of your choice, and scale a
return "results" packet accordingly.  This yields you complete control
over the statistics of the "work" distribution and network load and lets
you explore distributions that you might not easily find in the real
world.  It also lets you CONNECT the results of your simulations with
"known" distributions to the results you obtain with real problems,
which may help you identify or even categorically classify problems in
terms of work-load complexity.  This would doubtless make your thesis
still more powerful.

This is what I've been doing in my Cluster World column -- simulating
work (or nearly so) with a trivial master-slave computation of random
numbers (the return) accompanied by an adjustable "sleep time" that
permits me to effectively sweep the granularity of the computation to
demonstrate at least simple Amdahlian scaling properties of this sort of
computation.  In fact, I can likely give you a PVM program to do this
that could easily be hacked into precisely what you'd need to implement
this with little effort (a few days, INCLUDING learning how to generate
distributions with e.g. the GSL).

Let me know.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Thu Feb 12 10:25:03 2004
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Thu, 12 Feb 2004 10:25:03 -0500
Subject: [Beowulf] Virginia Tech upgrade
In-Reply-To: <Pine.LNX.4.44.0401281748400.28669-100000@boltzmann.basement-supercomputing.com>
References: <Pine.LNX.4.44.0401281748400.28669-100000@boltzmann.basement-supercomputing.com>
Message-ID: <402B9ACF.2040502@lmco.com>


In case anyone hasn't read slashdot in the last few hours,

http://apple.slashdot.org/apple/04/02/12/0613255.shtml?tid=107&tid=126&tid=181&tid=187

Now, everyone face Doug's house and say, "Doug is
always right. Doug is always right" :)

Jeff

>
> The first thought I had was "what will they do with all the old systems?"
>
> Then it hit me. They put a fancy sticker on each box that says
> "This machine was part of the third fastest supercomputer on the planet
> Nov. 2003" or something similar. Also put a serial number on the tag and
> provide a "certificate of authenticity" from VT. My guess is they can 
> make
> a little bit on the whole deal. I wager they would sell rather quickly.
> Alumni eat this kind of thing up.
>
> For those interested, my old www.cluster-rant.com site has morphed into
> the new www.clusterworld.com site. You can check out issue contents,
> submit stories, check out the polls, and rant about clusters.
>
> Doug
>
>
> -- 
> ----------------------------------------------------------------
> Editor-in-chief                   ClusterWorld Magazine
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 12 10:04:29 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 12 Feb 2004 10:04:29 -0500 (EST)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402120941550.1290-100000@lilith.rgb.private.net>

On Thu, 12 Feb 2004, sunil kumar wrote:

> 
>  
>  Hello everybody,
>   I am a newbie in the Linux world.I would like to
> know    
>  know to...  
>   1) program the 80x87 using C/C++/Fortran95 in linux 

Why?  As one of the relatively few humans on the planet to ever actually
write 8087 code (back when it was the ONLY way to use the coprocessor
with the various compilers available at the time) I can authoritatively
say that it isn't horribly difficult -- the x87 is sort of a RPN HPC
calculator for your PC with its own stack and internal floating point
commands -- but all the compilers available already use it when they can
and it is appropriate, and in MANY cases their code will be as or more
efficient and robust than what you could hand code.  There are doubtless
exceptions, but are they worth the considerable amount of work required
to realize them?

Are you planning to join the GCC project or something?

>       platform.
>   2) program the 80x86 using C/C++/Fortran95 in linux 
>       platform.

This is straightfoward.  But I'm not going to explain inlining of
assembler here (I can give you an example/code fragment of inlined code
if you want it, though).  Instead...

...Google is your friend.  Try e.g. "86 assembler reference gnu"

    http://linux.maruhn.com/cat/Development/Languages.html
    http://www.redhat.com/docs/manuals/enterprise/ RHEL-3-Manual/pdf/rhel-as-en.pdf 
    http://www.linuxgazette.com/issue94/ramankutty.html

or "gnu assembler manual"

    http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_toc.html
    ...
>   3) link a C function into a fortran95 program or
> vice 

or "gnu fortran manual"

    http://gcc.gnu.org/onlinedocs/g77/

(and that's just the beginning!)  Try other search strings.  Consider
buying a book or two if you're unfamiliar with assembler altogether -- I
don't think it is taught much anymore in CPS departments unless you are
a really serious major and select the right courses.  And they still
have somebody who can teach them -- one thing about upper level
languages is that they make assembler level programming so difficult by
comparison that it has become a vanishing and highly arcane art.  Well,
not really vanishing, but I'll bet that no more than 10% of all
programmers have a clue about what registers are and how to manipulate
them with assembler commands...maybe more like 1-2%.  And mostly Old
Guys at that.  And the serious, I mean really serious, programmers and
hackers.

Basically, all of this is throroughly documented ag gnu.org, and much of
it is REdocumented, explained, tutorialized, and hashed over many times
many other places, all on the web.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 12 10:28:37 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 12 Feb 2004 10:28:37 -0500 (EST)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402121027420.1290-100000@lilith.rgb.private.net>

On Thu, 12 Feb 2004, sunil kumar wrote:

> 
>  
>  Hello everybody,
>   I am a newbie in the Linux world.I would like to
> know    
>  know to...  
>   1) program the 80x87 using C/C++/Fortran95 in linux 
>      
>       platform.
>   2) program the 80x86 using C/C++/Fortran95 in linux 
>       platform.
>   3) link a C function into a fortran95 program or
> vice 

One last reference:

  man as86

(it even has a list of the supported x86 and x87 instructions at the
bottom, although it does NOT teach you to program in assembler in the
first place).

    rgb

>       versa. 
>   
>   Thanks in advance,
>              sunil
> 
> ________________________________________________________________________
> Yahoo! India Education Special: Study in the UK now.
> Go to http://in.specials.yahoo.com/index1.html
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Feb 12 09:12:31 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 12 Feb 2004 09:12:31 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <1076548766.3950.91.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.44.0402120906570.21789-100000@coffee.psychology.mcmaster.ca>

> ... and strace.  Amazing how useful that one is.   

true, but I've also fallen in love with ltrace,
which does both syscalls and lib calls.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ao8215 at wayne.edu  Thu Feb 12 08:13:48 2004
From: ao8215 at wayne.edu (Robson Pablo Sobradiel Peguin)
Date: Thu, 12 Feb 2004 08:13:48 -0500
Subject: [Beowulf] Message Error
Message-ID: <813143f0.10fc5818.81a9100@mirapointms3.wayne.edu>

Hi

I would like to know the meanings of these errors during the
compilation with MPICH in the cluster:

[root at master source]# make beowulf-WSU-INTEL
cp /usr/local/mpich/mpich-1.2.5_intel/include/mpif.h mpif.h
make FC=/usr/local/mpich/mpich-1.2.5_intel/bin/mpif90
FFLAGS="-O3 -tpp7 -xW -axW -c"\
CPFLAGS="-DSTRESS -D'POINTER=integer'" \
make LD="/usr/local/mpich/mpich-1.2.5_intel/bin/mpif90 -tpp7
-xW -axW -o" \
        FFLAGS="-O3 -tpp7 -xW -axW -c" \
CPFLAGS="-DSTRESS -DMPI -D'POINTER=integer'" \
EX=DLPOLY.MBE BINROOT=../execute 3pt
make[1]: Entering directory
`/home/sdr/DL_POLY/dl_poly_2.13/source'
make[1]: *** No rule to make target `make'.  Stop.
make[1]: Leaving directory `/home/sdr/DL_POLY/dl_poly_2.13/source'
make: *** [beowulf-WSU-INTEL] Error 2

Thank you very much
________________________________________________________

Robson P. S. Peguin, Graduate Student
Wayne State University
Department of Chemical Engineering and Materials Science
4815 Fourth Street, 2015 MBE,Detroit - MI 48201
phone: (313)577-1416 fax: (313)577-3810
e-mail: robson_peguin at wayne.edu
        http://chem1.eng.wayne.edu/~sdr/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Wed Feb 11 23:13:12 2004
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Wed, 11 Feb 2004 22:13:12 -0600
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages
In-Reply-To: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal>
References: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal>
Message-ID: <402AFD58.9060402@tamu.edu>

Realize that not all switches are created equal when working with small 
(and, overall, 0-byte == small) packets.  A number of otherwise decent 
network switches are less than stellar performers with small packets. 
We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test 
system running under the RFC-2544 testing suite...

There are switches that perform well with small packets, but it's been 
our experience that most switches, especially your lower cost switches 
(Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some 
others I can't recall right now) didn't perform well with smaller 
packets but did fine when the packet size was about 1500 bytes.

Going with cheap switches is usually not a good way to improve performance.

gerry

Douglas Eadline, Cluster World Magazine wrote:
> On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> 
> 
>>Hello,
>>
>>I have a really small "cluster" of 4 PC's which are connected by a normal 
>>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
>>I thought I might be able to improve performance by connecting the machines 
>>via a Gigabit switch (which are really cheap nowadays).
>>
>>Everything seemed to work fine. The switch indicates 1000Mbit connections to 
>>the PC's and transfer rate for scp-ing large files is significantly higher 
>>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
>>with the 100 Mbit switch.
>>
>>I wasn't able to actually track down the problem, but it seems that there is 
>>a problem with small messages. When I run the performance test provided with 
>>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
>>byte message length, while for larger messages everything looks fine (linear 
>>dependancy of transfer time on message length, everything below 300 us). I 
>>have also tried mpich2 which shows exactly the same behavior.
>>
>>Does anyone have any idea?
> 
> 
> First, I assume you were running the 100BT through the same 
> onboard NICs and got reasonable performance. So some possible
> things:
> 
> - the switch is a dog or it is broken
> - your cables may be old or bad (but worked fine for 100BT)
> - negotiation problem
> 
> Some things to try:
> 
> Use a cross over cable (cat5e) and see if you get the same problem.
> You might try using a lower level benchmark (of the micro variety)
> like netperf and netpipe. 
> 
> The Beowulf Performance Suite:
> http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> 
> has these tests. Also, the December and January issues of ClusterWorld
> show how to test a network connection using netpipe. At some point this 
> content will be showing up on the web-page. 
> 
> Also, the MPI Link-checker from Microway (www.microway.com)
> 
> http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> 
> May help.
> 
> 
> Doug
> 
> 
>>Here are the details of my system: 
>> - Suse Linux 9.0 (kernel 2.4.21)
>> - mpich-1.2.5.2
>> - motherboard ASUS P4P800
>> - LAN (10/100/1000) on board (3COM 3C940 chipset)
>> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M 
> 
> +
> 
>>   8x88E1111-BAB, AT89C2051-24PI)
>>
>>
> 
> 

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Thu Feb 12 22:22:02 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Thu, 12 Feb 2004 21:22:02 -0600 (CST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <402AFD58.9060402@tamu.edu>
References: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal>
 <402AFD58.9060402@tamu.edu>
Message-ID: <Pine.GSO.4.58.0402122121040.13767@is.rice.edu>

The best switch that we have found both in price and speed are the GigE
Switches from Dell. We use them in a few of our test clusters and smaller
clusters. They are actually pretty good performers and top even some of
the cisco switches.

-Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote:

> Realize that not all switches are created equal when working with small
> (and, overall, 0-byte == small) packets.  A number of otherwise decent
> network switches are less than stellar performers with small packets.
> We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test
> system running under the RFC-2544 testing suite...
>
> There are switches that perform well with small packets, but it's been
> our experience that most switches, especially your lower cost switches
> (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some
> others I can't recall right now) didn't perform well with smaller
> packets but did fine when the packet size was about 1500 bytes.
>
> Going with cheap switches is usually not a good way to improve performance.
>
> gerry
>
> Douglas Eadline, Cluster World Magazine wrote:
> > On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> >
> >
> >>Hello,
> >>
> >>I have a really small "cluster" of 4 PC's which are connected by a normal
> >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
> >>I thought I might be able to improve performance by connecting the machines
> >>via a Gigabit switch (which are really cheap nowadays).
> >>
> >>Everything seemed to work fine. The switch indicates 1000Mbit connections to
> >>the PC's and transfer rate for scp-ing large files is significantly higher
> >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
> >>with the 100 Mbit switch.
> >>
> >>I wasn't able to actually track down the problem, but it seems that there is
> >>a problem with small messages. When I run the performance test provided with
> >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0
> >>byte message length, while for larger messages everything looks fine (linear
> >>dependancy of transfer time on message length, everything below 300 us). I
> >>have also tried mpich2 which shows exactly the same behavior.
> >>
> >>Does anyone have any idea?
> >
> >
> > First, I assume you were running the 100BT through the same
> > onboard NICs and got reasonable performance. So some possible
> > things:
> >
> > - the switch is a dog or it is broken
> > - your cables may be old or bad (but worked fine for 100BT)
> > - negotiation problem
> >
> > Some things to try:
> >
> > Use a cross over cable (cat5e) and see if you get the same problem.
> > You might try using a lower level benchmark (of the micro variety)
> > like netperf and netpipe.
> >
> > The Beowulf Performance Suite:
> > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> >
> > has these tests. Also, the December and January issues of ClusterWorld
> > show how to test a network connection using netpipe. At some point this
> > content will be showing up on the web-page.
> >
> > Also, the MPI Link-checker from Microway (www.microway.com)
> >
> > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> >
> > May help.
> >
> >
> > Doug
> >
> >
> >>Here are the details of my system:
> >> - Suse Linux 9.0 (kernel 2.4.21)
> >> - mpich-1.2.5.2
> >> - motherboard ASUS P4P800
> >> - LAN (10/100/1000) on board (3COM 3C940 chipset)
> >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M
> >
> > +
> >
> >>   8x88E1111-BAB, AT89C2051-24PI)
> >>
> >>
> >
> >
>
> --
> Gerry Creager -- gerry.creager at tamu.edu
> Network Engineering -- AATLT, Texas A&M University
> Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
> Page: 979.228.0173
> Office: 903A Eller Bldg, TAMU, College Station, TX 77843
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Feb 12 22:35:51 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 13 Feb 2004 14:35:51 +1100
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages
In-Reply-To: <Pine.GSO.4.58.0402122121040.13767@is.rice.edu>
References: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal> <402AFD58.9060402@tamu.edu> <Pine.GSO.4.58.0402122121040.13767@is.rice.edu>
Message-ID: <200402131435.54453.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 13 Feb 2004 02:22 pm, Brent M. Clements wrote:

> The best switch that we have found both in price and speed are the GigE
> Switches from Dell. We use them in a few of our test clusters and smaller
> clusters. They are actually pretty good performers and top even some of
> the cisco switches.

That's bizzare, the GigE switches I've seen in a Dell cluster *were* rebadged 
Cisco switches.  Even had to do the usual "PortFast" routine in IOS to get 
PXE booting to work.

Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFALEYXO2KABBYQAh8RAm81AJoDHOfMZ+hrIyLVoBIr1lsESi70KACfcnYu
C1JcJ3iYX22Tm99gTvKlfOs=
=XWYZ
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Feb 12 23:17:26 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 12 Feb 2004 20:17:26 -0800 (PST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <Pine.GSO.4.58.0402122121040.13767@is.rice.edu>
Message-ID: <Pine.LNX.4.44.0402122009500.18247-100000@twin.uoregon.edu>

on varius revs of their code I've had regular (once a week) managment
stack crash on our dell switches which doesn't make it easy to collect
statistics, but they continue to forward packets just fine... the switches
are actually made by accton and they are also sold by smc...  depending
one who has better deals the dell 5212/5224 or smc 8612t/8624t may be
cheaper at any given time... the cisco cat-ios style cli and ssh support 
are a plus.

On Thu, 12 Feb 2004, Brent M. Clements wrote:

> The best switch that we have found both in price and speed are the GigE
> Switches from Dell. We use them in a few of our test clusters and smaller
> clusters. They are actually pretty good performers and top even some of
> the cisco switches.
> 
> -Brent
> 
> Brent Clements
> Linux Technology Specialist
> Information Technology
> Rice University
> 
> 
> On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote:
> 
> > Realize that not all switches are created equal when working with small
> > (and, overall, 0-byte == small) packets.  A number of otherwise decent
> > network switches are less than stellar performers with small packets.
> > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test
> > system running under the RFC-2544 testing suite...
> >
> > There are switches that perform well with small packets, but it's been
> > our experience that most switches, especially your lower cost switches
> > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some
> > others I can't recall right now) didn't perform well with smaller
> > packets but did fine when the packet size was about 1500 bytes.
> >
> > Going with cheap switches is usually not a good way to improve performance.
> >
> > gerry
> >
> > Douglas Eadline, Cluster World Magazine wrote:
> > > On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> > >
> > >
> > >>Hello,
> > >>
> > >>I have a really small "cluster" of 4 PC's which are connected by a normal
> > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
> > >>I thought I might be able to improve performance by connecting the machines
> > >>via a Gigabit switch (which are really cheap nowadays).
> > >>
> > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to
> > >>the PC's and transfer rate for scp-ing large files is significantly higher
> > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
> > >>with the 100 Mbit switch.
> > >>
> > >>I wasn't able to actually track down the problem, but it seems that there is
> > >>a problem with small messages. When I run the performance test provided with
> > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0
> > >>byte message length, while for larger messages everything looks fine (linear
> > >>dependancy of transfer time on message length, everything below 300 us). I
> > >>have also tried mpich2 which shows exactly the same behavior.
> > >>
> > >>Does anyone have any idea?
> > >
> > >
> > > First, I assume you were running the 100BT through the same
> > > onboard NICs and got reasonable performance. So some possible
> > > things:
> > >
> > > - the switch is a dog or it is broken
> > > - your cables may be old or bad (but worked fine for 100BT)
> > > - negotiation problem
> > >
> > > Some things to try:
> > >
> > > Use a cross over cable (cat5e) and see if you get the same problem.
> > > You might try using a lower level benchmark (of the micro variety)
> > > like netperf and netpipe.
> > >
> > > The Beowulf Performance Suite:
> > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> > >
> > > has these tests. Also, the December and January issues of ClusterWorld
> > > show how to test a network connection using netpipe. At some point this
> > > content will be showing up on the web-page.
> > >
> > > Also, the MPI Link-checker from Microway (www.microway.com)
> > >
> > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> > >
> > > May help.
> > >
> > >
> > > Doug
> > >
> > >
> > >>Here are the details of my system:
> > >> - Suse Linux 9.0 (kernel 2.4.21)
> > >> - mpich-1.2.5.2
> > >> - motherboard ASUS P4P800
> > >> - LAN (10/100/1000) on board (3COM 3C940 chipset)
> > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M
> > >
> > > +
> > >
> > >>   8x88E1111-BAB, AT89C2051-24PI)
> > >>
> > >>
> > >
> > >
> >
> > --
> > Gerry Creager -- gerry.creager at tamu.edu
> > Network Engineering -- AATLT, Texas A&M University
> > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
> > Page: 979.228.0173
> > Office: 903A Eller Bldg, TAMU, College Station, TX 77843
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Feb 12 23:19:16 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 12 Feb 2004 20:19:16 -0800 (PST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <Pine.LNX.4.44.0402122009500.18247-100000@twin.uoregon.edu>
Message-ID: <Pine.LNX.4.44.0402122018120.18247-100000@twin.uoregon.edu>

Also they support jumbo (9k) frames which is a plus for us since we do nfs 
over them.

joelja

On Thu, 12 Feb 2004, Joel Jaeggli wrote:

> on varius revs of their code I've had regular (once a week) managment
> stack crash on our dell switches which doesn't make it easy to collect
> statistics, but they continue to forward packets just fine... the switches
> are actually made by accton and they are also sold by smc...  depending
> one who has better deals the dell 5212/5224 or smc 8612t/8624t may be
> cheaper at any given time... the cisco cat-ios style cli and ssh support 
> are a plus.
> 
> On Thu, 12 Feb 2004, Brent M. Clements wrote:
> 
> > The best switch that we have found both in price and speed are the GigE
> > Switches from Dell. We use them in a few of our test clusters and smaller
> > clusters. They are actually pretty good performers and top even some of
> > the cisco switches.
> > 
> > -Brent
> > 
> > Brent Clements
> > Linux Technology Specialist
> > Information Technology
> > Rice University
> > 
> > 
> > On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote:
> > 
> > > Realize that not all switches are created equal when working with small
> > > (and, overall, 0-byte == small) packets.  A number of otherwise decent
> > > network switches are less than stellar performers with small packets.
> > > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test
> > > system running under the RFC-2544 testing suite...
> > >
> > > There are switches that perform well with small packets, but it's been
> > > our experience that most switches, especially your lower cost switches
> > > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some
> > > others I can't recall right now) didn't perform well with smaller
> > > packets but did fine when the packet size was about 1500 bytes.
> > >
> > > Going with cheap switches is usually not a good way to improve performance.
> > >
> > > gerry
> > >
> > > Douglas Eadline, Cluster World Magazine wrote:
> > > > On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> > > >
> > > >
> > > >>Hello,
> > > >>
> > > >>I have a really small "cluster" of 4 PC's which are connected by a normal
> > > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
> > > >>I thought I might be able to improve performance by connecting the machines
> > > >>via a Gigabit switch (which are really cheap nowadays).
> > > >>
> > > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to
> > > >>the PC's and transfer rate for scp-ing large files is significantly higher
> > > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
> > > >>with the 100 Mbit switch.
> > > >>
> > > >>I wasn't able to actually track down the problem, but it seems that there is
> > > >>a problem with small messages. When I run the performance test provided with
> > > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0
> > > >>byte message length, while for larger messages everything looks fine (linear
> > > >>dependancy of transfer time on message length, everything below 300 us). I
> > > >>have also tried mpich2 which shows exactly the same behavior.
> > > >>
> > > >>Does anyone have any idea?
> > > >
> > > >
> > > > First, I assume you were running the 100BT through the same
> > > > onboard NICs and got reasonable performance. So some possible
> > > > things:
> > > >
> > > > - the switch is a dog or it is broken
> > > > - your cables may be old or bad (but worked fine for 100BT)
> > > > - negotiation problem
> > > >
> > > > Some things to try:
> > > >
> > > > Use a cross over cable (cat5e) and see if you get the same problem.
> > > > You might try using a lower level benchmark (of the micro variety)
> > > > like netperf and netpipe.
> > > >
> > > > The Beowulf Performance Suite:
> > > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> > > >
> > > > has these tests. Also, the December and January issues of ClusterWorld
> > > > show how to test a network connection using netpipe. At some point this
> > > > content will be showing up on the web-page.
> > > >
> > > > Also, the MPI Link-checker from Microway (www.microway.com)
> > > >
> > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> > > >
> > > > May help.
> > > >
> > > >
> > > > Doug
> > > >
> > > >
> > > >>Here are the details of my system:
> > > >> - Suse Linux 9.0 (kernel 2.4.21)
> > > >> - mpich-1.2.5.2
> > > >> - motherboard ASUS P4P800
> > > >> - LAN (10/100/1000) on board (3COM 3C940 chipset)
> > > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M
> > > >
> > > > +
> > > >
> > > >>   8x88E1111-BAB, AT89C2051-24PI)
> > > >>
> > > >>
> > > >
> > > >
> > >
> > > --
> > > Gerry Creager -- gerry.creager at tamu.edu
> > > Network Engineering -- AATLT, Texas A&M University
> > > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
> > > Page: 979.228.0173
> > > Office: 903A Eller Bldg, TAMU, College Station, TX 77843
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Feb 13 03:40:09 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 13 Feb 2004 09:40:09 +0100 (CET)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <Pine.LNX.4.44.0402120941550.1290-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0402130931170.22103-100000@druifje.clustervision.com>

On Thu, 12 Feb 2004, Robert G. Brown wrote:

> not really vanishing, but I'll bet that no more than 10% of all
> programmers have a clue about what registers are and how to manipulate
> them with assembler commands...maybe more like 1-2%.  And mostly Old
> Guys at that.  And the serious, I mean really serious, programmers and
> hackers.
> 
Sigh. I was first taught assembler in the physics department (being as
you in the States would say a physics major).
The lab had Motorola 68000 trainer boards.
I still have a copy of "68000 Assembly Language" by Kane, Hawkins, 
Leventhal kicking around. Such a nice architecture. 

But then again I may be the only person to own "Fortran 77:
A Structured Approach".  Such perversity originating from being taught
Pascal by computer scientists then learning Fortran.

I also remember being taught about self-modifying
code by the then professor of computing science. Do they still teach
that?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rmiguel at usmp.edu.pe  Fri Feb 13 09:24:11 2004
From: rmiguel at usmp.edu.pe (Richard Miguel)
Date: Fri, 13 Feb 2004 09:24:11 -0500
Subject: [Beowulf] problmes with MPICH
References: <Pine.LNX.4.44.0402130931170.22103-100000@druifje.clustervision.com>
Message-ID: <000d01c3f23d$0e2af910$1101000a@cpn.senamhi.gob.pe>

Hi, i have problems with mpich, i have installed OSCAR with mpich
1.2.5.10-ch_p4-gcc using ssh for communicate with nodes.. that is ok. but, i
want to use rsh and i dont want reinstall OSCAR. then i change the line in
mpirun RSHCOMMAND=""ssh" by rsh these change was replicated on the nodes but
mpich not use rsh.
Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need help
in this point.

I have mpich-1.2.5.2 and fortran pgi and rsh.

Thanks

R. Miguel


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mwheeler at startext.co.uk  Fri Feb 13 09:53:38 2004
From: mwheeler at startext.co.uk (Martin WHEELER)
Date: Fri, 13 Feb 2004 14:53:38 +0000 (UTC)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <Pine.LNX.4.44.0402130931170.22103-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.33.0402131439090.20392-100000@caxton.startext.demon.co.uk>

On Fri, 13 Feb 2004, John Hearns wrote:

> But then again I may be the only person to own "Fortran 77:
> A Structured Approach".

Wow!  Bleeding edge stuff.
On the subject of pure perversity, my Fortran notes stop with a
roneotyped rip-off copy of Fortran IV Self-Taught *in French* dating
from 1968.  (Anyone else remember the 60s workhorse, the IBM 1130?
Punched card paradise?  I believe some guy in France has got one back
together and working, but I don't remember where.)

[Weeps sadly into Wincarnis as the memories flood back.]
-- 
Martin Wheeler   -   StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England
mwheeler at startext.co.uk                http://www.startext.co.uk/mwheeler/
GPG pub key : 01269BEB  6CAD BFFB DB11 653E B1B7 C62B  AC93 0ED8 0126 9BEB
      - Share your knowledge. It's a way of achieving immortality. -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joshh at cs.earlham.edu  Fri Feb 13 10:25:31 2004
From: joshh at cs.earlham.edu (joshh at cs.earlham.edu)
Date: Fri, 13 Feb 2004 10:25:31 -0500 (EST)
Subject: [Beowulf] Adding Latency to a Cluster Environment
Message-ID: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>

Here is an irregular question. I am profiling a software package that runs
over LAM-MPI on 16 node clusters [Details Below]. I would like to measure
the effect of increased latency on the run time of the program.

It would be nice if I could quantify the added latency in the process to
create some statistics. If possible, I do not want to alter the code line
of the program, or buy new hardware. I am looking for a software
solution/idea.

Bazaar Cluster:
16 Node Red Hat Linux machines running 500MHz PIII, 512MB RAM
1 100Mbps NIC card in each machine
2 100Mbps Full-Duplex switches

Cairo Cluster:
16 Node YellowDog Linux machines running 1GHz PPC G4, 1GB RAM
2 1Gbps NIC cards in each machine (only one in use)
2 1Gbps Full-Duplex switches

For more details on these clusters follow the link below:
http://cluster.earlham.edu/html/

Thank you,

Josh Hursey
Earlham College Cluster Computing Group

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Fri Feb 13 11:30:25 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Fri, 13 Feb 2004 08:30:25 -0800
Subject: [Beowulf] problmes with MPICH
Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F5563@orsmsx402.jf.intel.com>

I'm forwarding this to the OSCAR-users list, a more appropriate venue
for this question.

-- 
David N. Lombard
 
My comments represent my opinions, not those of Intel Corporation.

> -----Original Message-----
> From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On
Behalf
> Of Richard Miguel
> Sent: Friday, February 13, 2004 6:24 AM
> Cc: beowulf at beowulf.org
> Subject: [Beowulf] problmes with MPICH
> 
> Hi, i have problems with mpich, i have installed OSCAR with mpich
> 1.2.5.10-ch_p4-gcc using ssh for communicate with nodes.. that is ok.
but,
> i
> want to use rsh and i dont want reinstall OSCAR. then i change the
line in
> mpirun RSHCOMMAND=""ssh" by rsh these change was replicated on the
nodes
> but
> mpich not use rsh.
> Now i have download mpich-1.2.5.2 and i want compile it for rsh, i
need
> help
> in this point.
> 
> I have mpich-1.2.5.2 and fortran pgi and rsh.
> 
> Thanks
> 
> R. Miguel
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Fri Feb 13 11:57:00 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Fri, 13 Feb 2004 08:57:00 -0800
Subject: [Beowulf] Math Coprocessor
Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com>

> -----Original Message-----
> From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On
Behalf
> Of Martin WHEELER
> Sent: Friday, February 13, 2004 6:54 AM
> To: John Hearns
> Cc: Robert G. Brown; sunil kumar; beowulf at beowulf.org
> Subject: Re: [Beowulf] Math Coprocessor
> 
> On Fri, 13 Feb 2004, John Hearns wrote:
> 
> > But then again I may be the only person to own "Fortran 77:
> > A Structured Approach".
> 
> Wow!  Bleeding edge stuff.
> On the subject of pure perversity, my Fortran notes stop with a
> roneotyped rip-off copy of Fortran IV Self-Taught *in French* dating
> from 1968.  (Anyone else remember the 60s workhorse, the IBM 1130?
> Punched card paradise?  I believe some guy in France has got one back
> together and working, but I don't remember where.)
> 
> [Weeps sadly into Wincarnis as the memories flood back.]

Ah, another 1130 veteran!  Group hug!

There's an active 1130 group, and you too can run R2V12 on your very own
1130 simulator, complete w/ Fortran (not EMU, sigh) and other tools.
IIRC, APL may even be available.  http://ibm1130.org

One of my hobby tasks is to port the simulator GUI to Tcl/Tk or
Perl/Tk...

-- 
David N. Lombard

My comments represent my opinions, not those of Intel Corporation.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From djholm at fnal.gov  Fri Feb 13 13:15:28 2004
From: djholm at fnal.gov (Don Holmgren)
Date: Fri, 13 Feb 2004 12:15:28 -0600
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
Message-ID: <Pine.SGI.4.58.0402131043370.208687643@hppc.fnal.gov>


On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote:

> Here is an irregular question. I am profiling a software package that runs
> over LAM-MPI on 16 node clusters [Details Below]. I would like to measure
> the effect of increased latency on the run time of the program.
>
> It would be nice if I could quantify the added latency in the process to
> create some statistics. If possible, I do not want to alter the code line
> of the program, or buy new hardware. I am looking for a software
> solution/idea.
>
> Bazaar Cluster:
> 16 Node Red Hat Linux machines running 500MHz PIII, 512MB RAM
> 1 100Mbps NIC card in each machine
> 2 100Mbps Full-Duplex switches
>
> Cairo Cluster:
> 16 Node YellowDog Linux machines running 1GHz PPC G4, 1GB RAM
> 2 1Gbps NIC cards in each machine (only one in use)
> 2 1Gbps Full-Duplex switches
>
> For more details on these clusters follow the link below:
> http://cluster.earlham.edu/html/
>
> Thank you,
>
> Josh Hursey
> Earlham College Cluster Computing Group
>

Not an irregular question at all.

I tried something like this a couple of years ago to investigate the
bandwidth and latency sensitivity of an application which was using
MPICH over Myrinet.  One of D.K.Panda's students from Ohio State
University had a modified version of the "mcp" for Myrinet which added
quality of service features, tunable per connection.  The "mcp" is the
code which runs on the LANai microprocessor on the Myrinet interface
card. The modifications on top of the OSU modifications to gm used a
hardware timer on the interface card to add a fixed delay per packet for
bandwidth tuning, and a fixed delay per message (i.e., a delay added to
only the first packet of a new connection) for latency tuning.  Via
netpipe, I verified that I could independently tune the bandwidth and
latency.  Lots of fun to play with - for example, by plotting the
difference in message times for two different latency setting, the
eager-rendezvous threshold was easily identified.  All in all a very
useful experiment which told us a lot about our application.

Clearly, you want to delay the sending of a message, or the processing
of a received communication, without otherwise interfering with what the
system is doing.  Adding a 50 microsecond busy loop, say, to the
beginning of an MPI_*Send call is going to perturb your results because
the processor won't be doing useful work during that time.  That's
obviously not the same as running on a network with a switch that adds
the same 50 microseconds latency; in that case, the processor could be
doing useful work during the delay, happily overlapping computations
with communications.

Nevertheless, adding busy loops might still give you useful results.
You might want to look into using a LD_PRELOAD library to intercept MPI
calls of interest, assuming you're using a shared library for MPI.  In
your version, do the busy loop, then fall into the normal call.  A quick
google search on "LD_PRELOAD" or "library interposers" will return a lot
of examples, such as:
    http://uberhip.com/godber/interception/index.html
    http://developers.sun.com/solaris/articles/lib_interposers.html
The advantage of this approach is that no modifications to your source
code or compiled binaries are necessary.  You'll have to think carefully
about whether the added latency is slowing your application simply
because the processor is not doing work during the busy loop.  If I were
you, I'd modify your source code and time your syncronizations (eg,
MPI_Wait).  If your code is cpu-bound, these will return right away, and
adding latency via a busy loop is going to give you the wrong answer.
If your code is communications bound, these will have a variable delay
depending upon the latency and bandwidth of the network.

You are likely interested in delays of 10's of microseconds.  The most
accurate busy loops for this sort of thing use the processor hardware
timers, which tick every clock on x86.  On a G5 PPC running OS-X, the
hardware timer ticks every 60 cpu cycles.  I'm not sure what a PPC does
under Linux.  On x86, you can read the cycle timer via:
   #include <asm/msr.h>
   unsigned long long timerVal;
   rdtscll(timerVal);

A crude delay loop example:

   rdtscll(timeStart);
   do {
      rdtscll(timeEnd);
   } while ((timeEnd - timeStart) < latency * usecPerTick);

where latency is in microseconds, and usecPerTick is your calibration.

There have been other recent postings to this mailing list about using
inline assembler macros to read the time stamp counter.

Injecting small latencies w/out busy loops and without disturbing your
source code is going to be very difficult (though I'd love to be
contradicted on that statement!).  A couple of far fetched ideas in
kernel land:

 - some ethernet interfaces have very sophisticated processors aboard.
   IIRC there were gigE NICs (Broadcom, maybe???) which had a MIPS cpu.
   Perhaps the firmware can be modified similarly to the modified mcp
   for gm discussed above.  Obviously this has the huge disadvantage of
   being specific to particular network chips.

 - the local APIC on x86 processors has a programmable interval timer
   with better than  microsecond granularity which can be used to
   generate an interrupt.  Perhaps in the communications stack, or in
   the network device driver, a wait_queue could be used to postpone
   processing until after an interrupt from this timer.  I would worry
   about considerable jitter, though.
   For a sample driver using this feature,
   see
        http://www.oberle.org/apic_timer-timers.html
   The various realtime Linux folks talk about this as well:
        http://www.linuxdevices.com/articles/AT6105045931.html
   Unfortunately, IIRC this timer is now used (since 2.4 kernel) for
   interprocessor interrupts on SMP systems.  On uniprocessor systems it
   may still be available.

I hope there's something useful for you in this response.  I'm hoping
even more that there are other responses to your question - I would love
a facility which would allow me to "turn the dial" on latency and/or
bandwidth.  There's a substantial cost difference between a gigE cluster
and a Myrinet/Infiniband/Quadrix/SCI cluster, and it would be great to
simulate performance of different network architectures on specific
applications.

Don Holmgren
Fermilab
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lusk at mcs.anl.gov  Fri Feb 13 13:31:08 2004
From: lusk at mcs.anl.gov (Rusty Lusk)
Date: Fri, 13 Feb 2004 12:31:08 -0600 (CST)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heidelberg.de>
References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
	<Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <20040213.123108.12267444.lusk@localhost>

> Suggestions:
> - modify the routines that make MPI calls to call instead some wrapper 
> routines that do some thumb twiddling before making the MPI call; this 
> requires modification of the program source
> - modify the MPI routines (well, if you use an open-source MPI 
> implementation) to insert some delay, then relink your binary if static

With any standard-conforming MPI implementation, open-source or not, you
can use the MPI "profiling" interface to provide any kind of wrapper at
all.  Basically, you write your own MPI_Send, etc., which does whatever
you want and also calls PMPI_Send (required to be there) to do the real
work.  Then you link your routines in front of the MPI library, and
voila!

Cheers,
Rusty Lusk

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From modus at pr.es.to  Thu Feb 12 23:53:04 2004
From: modus at pr.es.to (Patrick Michael Kane)
Date: Thu, 12 Feb 2004 20:53:04 -0800
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages
In-Reply-To: <200402131435.54453.csamuel@vpac.org>; from csamuel@vpac.org on Fri, Feb 13, 2004 at 02:35:51PM +1100
References: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal> <402AFD58.9060402@tamu.edu> <Pine.GSO.4.58.0402122121040.13767@is.rice.edu> <200402131435.54453.csamuel@vpac.org>
Message-ID: <20040212205304.A16115@pr.es.to>

* Chris Samuel (csamuel at vpac.org) [040212 20:42]:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Fri, 13 Feb 2004 02:22 pm, Brent M. Clements wrote:
> 
> > The best switch that we have found both in price and speed are the GigE
> > Switches from Dell. We use them in a few of our test clusters and smaller
> > clusters. They are actually pretty good performers and top even some of
> > the cisco switches.
> 
> That's bizzare, the GigE switches I've seen in a Dell cluster *were* rebadged 
> Cisco switches.  Even had to do the usual "PortFast" routine in IOS to get 
> PXE booting to work.

They used to be, I believe.

Now they appear to be something else (for their latest 24 port layer-2
model).  I've had good luck with them with the latest firmware, before
that they were fairly flakey.  Check the dell forums for all the
yammering and howling on the PowerEdge 5224.

Best,
-- 
Patrick Michael Kane
<modus at pr.es.to>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Fri Feb 13 12:44:33 2004
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 13 Feb 2004 18:44:33 +0100 (CET)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
Message-ID: <Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heidelberg.de>

On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote:

> Here is an irregular question. I am profiling a software package that runs
> over LAM-MPI on 16 node clusters [Details Below]. I would like to measure
> the effect of increased latency on the run time of the program.

It appears that in your setup MPI uses TCP/IP as underlying protocol.  
Latency is a fuzzy parameter in TCP/IP. Adding to something fuzzy gives a
fuzzy result.  So there :-)

Suggestions:
- modify the routines that make MPI calls to call instead some wrapper 
routines that do some thumb twiddling before making the MPI call; this 
requires modification of the program source
- modify the MPI routines (well, if you use an open-source MPI 
implementation) to insert some delay, then relink your binary if static
- modify the kernel source to insert some delays in the TCP path - pretty 
hard as TCP is very complex
- modify the network driver to insert some delays in the Tx or Rx packet
path; not very difficult, but might be leveled by the delays of TCP.

The kernel modifications have the disadvantage that they also require some 
way to change the delay value, so adding a /proc entry, an ioctl, etc. 
unless you want to recompile the kernel and reboot after each delay 
change.

> For more details on these clusters follow the link below:
> http://cluster.earlham.edu/html/

Please tell to whoever coded that page that Opera doesn't display it 
properly. And I use Opera all the time ;-)
The page also doesn't specify an important detail: the network cards/chips 
used in the clusters.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Fri Feb 13 14:01:25 2004
From: gropp at mcs.anl.gov (William Gropp)
Date: Fri, 13 Feb 2004 13:01:25 -0600
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heid
 elberg.de>
References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
 <Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <6.0.0.22.2.20040213125745.0266bbc0@localhost>

At 11:44 AM 2/13/2004, Bogdan Costescu wrote:
>On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote:
>
> > Here is an irregular question. I am profiling a software package that runs
> > over LAM-MPI on 16 node clusters [Details Below]. I would like to measure
> > the effect of increased latency on the run time of the program.
>
>It appears that in your setup MPI uses TCP/IP as underlying protocol.
>Latency is a fuzzy parameter in TCP/IP. Adding to something fuzzy gives a
>fuzzy result.  So there :-)
>
>Suggestions:
>- modify the routines that make MPI calls to call instead some wrapper
>routines that do some thumb twiddling before making the MPI call; this
>requires modification of the program source

Actually, this is not necessary, as long as you have the object files, not 
just the executable.  The MPI profiling interface could be used to add 
latency to every send and receive operation; adding latency to collectives 
will require some care, as the exact set of communication operations that 
an MPI implementation uses is up to the implementation.  Simply write your 
own MPI routine and call the PMPI version (e.g., for MPI_Send, call 
PMPI_Send) after adding some latency.

Note also that MPI may use any communication mechanism.  Even on small 
clusters, it may use something besides TCP (e.g., when the network is 
Infiniband).  MPI on SMPs often uses a collection of communication approaches.

Bill 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mwheeler at startext.co.uk  Fri Feb 13 15:39:14 2004
From: mwheeler at startext.co.uk (Martin WHEELER)
Date: Fri, 13 Feb 2004 20:39:14 +0000 (UTC)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com>
Message-ID: <Pine.LNX.4.33.0402131709430.22724-100000@caxton.startext.demon.co.uk>

On Fri, 13 Feb 2004, Lombard, David N wrote:

> There's an active 1130 group, and you too can run R2V12 on your very own
> 1130 simulator, complete w/ Fortran (not EMU, sigh) and other tools.
> IIRC, APL may even be available.  http://ibm1130.org

Thanks for the link -- didn't know about that.  As arts faculty
post-grads (applied linguistics) we were only allowed to play with
Fortran (and even then were regarded with deep suspicion by the physics
wallahs).

Now -- where did I put that stack of cards...?

Off to the attic to dig out more stuff.
-- 
Martin Wheeler   -   StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England
mwheeler at startext.co.uk                http://www.startext.co.uk/mwheeler/
GPG pub key : 01269BEB  6CAD BFFB DB11 653E B1B7 C62B  AC93 0ED8 0126 9BEB
      - Share your knowledge. It's a way of achieving immortality. -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Fri Feb 13 16:54:28 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Fri, 13 Feb 2004 16:54:28 -0500 (EST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <402AFD58.9060402@tamu.edu>
Message-ID: <Pine.LNX.4.44.0402131639380.31576-100000@boltzmann-internal>


I wondered about your low cost switch statement. I had done this test
before, but I thought I would redo it anyway. I have an SMC 8 port GigE
EasySwitch 8508T (PriceGrabber $140 to my door). I should say that the
switch is not loaded, so it may fall down if the load were higher.
This is just two nodes running netpipe through the switch.

Latency: 0.000034
Now starting main loop
  0:         1 bytes 7287 times -->    0.22 Mbps in 0.000034 sec
  1:         2 bytes 7338 times -->    0.46 Mbps in 0.000033 sec
  2:         3 bytes 7469 times -->    0.68 Mbps in 0.000034 sec
  3:         4 bytes 4923 times -->    0.90 Mbps in 0.000034 sec
  4:         6 bytes 5545 times -->    1.36 Mbps in 0.000034 sec
  5:         8 bytes 3711 times -->    1.81 Mbps in 0.000034 sec
  6:        12 bytes 4637 times -->    2.67 Mbps in 0.000034 sec


My opinion: If you get a switch that can not "switch" then it
is broken by design. The original poster noted that his results seem to go
from OK to "really bad" for basic MPI tests. If a switch does this it is 
"really broken". Of course it may not be the switch.

BTW, the results were for a $30 NIC (netgear GA302T) running in 
a 66MHz slot. Top throughput was 800 Mbits/sec.

Doug


On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote:

> Realize that not all switches are created equal when working with small 
> (and, overall, 0-byte == small) packets.  A number of otherwise decent 
> network switches are less than stellar performers with small packets. 
> We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test 
> system running under the RFC-2544 testing suite...
> 
> There are switches that perform well with small packets, but it's been 
> our experience that most switches, especially your lower cost switches 
> (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some 
> others I can't recall right now) didn't perform well with smaller 
> packets but did fine when the packet size was about 1500 bytes.
> 
> Going with cheap switches is usually not a good way to improve performance.
> 
> gerry
> 
> Douglas Eadline, Cluster World Magazine wrote:
> > On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> > 
> > 
> >>Hello,
> >>
> >>I have a really small "cluster" of 4 PC's which are connected by a normal 
> >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
> >>I thought I might be able to improve performance by connecting the machines 
> >>via a Gigabit switch (which are really cheap nowadays).
> >>
> >>Everything seemed to work fine. The switch indicates 1000Mbit connections to 
> >>the PC's and transfer rate for scp-ing large files is significantly higher 
> >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
> >>with the 100 Mbit switch.
> >>
> >>I wasn't able to actually track down the problem, but it seems that there is 
> >>a problem with small messages. When I run the performance test provided with 
> >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
> >>byte message length, while for larger messages everything looks fine (linear 
> >>dependancy of transfer time on message length, everything below 300 us). I 
> >>have also tried mpich2 which shows exactly the same behavior.
> >>
> >>Does anyone have any idea?
> > 
> > 
> > First, I assume you were running the 100BT through the same 
> > onboard NICs and got reasonable performance. So some possible
> > things:
> > 
> > - the switch is a dog or it is broken
> > - your cables may be old or bad (but worked fine for 100BT)
> > - negotiation problem
> > 
> > Some things to try:
> > 
> > Use a cross over cable (cat5e) and see if you get the same problem.
> > You might try using a lower level benchmark (of the micro variety)
> > like netperf and netpipe. 
> > 
> > The Beowulf Performance Suite:
> > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> > 
> > has these tests. Also, the December and January issues of ClusterWorld
> > show how to test a network connection using netpipe. At some point this 
> > content will be showing up on the web-page. 
> > 
> > Also, the MPI Link-checker from Microway (www.microway.com)
> > 
> > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> > 
> > May help.
> > 
> > 
> > Doug
> > 
> > 
> >>Here are the details of my system: 
> >> - Suse Linux 9.0 (kernel 2.4.21)
> >> - mpich-1.2.5.2
> >> - motherboard ASUS P4P800
> >> - LAN (10/100/1000) on board (3COM 3C940 chipset)
> >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M 
> > 
> > +
> > 
> >>   8x88E1111-BAB, AT89C2051-24PI)
> >>
> >>
> > 
> > 
> 
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Fri Feb 13 17:46:38 2004
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 13 Feb 2004 23:46:38 +0100 (CET)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.SGI.4.58.0402131043370.208687643@hppc.fnal.gov>
Message-ID: <Pine.LNX.4.44.0402132227420.1786-100000@kenzo.iwr.uni-heidelberg.de>

On Fri, 13 Feb 2004, Don Holmgren wrote:

> I tried something like this a couple of years ago to investigate the
> bandwidth and latency sensitivity of an application which was using
> MPICH over Myrinet.

... which is pretty different from the setup of the original poster :-)
But I'd like to see it discussed in general, so let's go on.

> a modified version of the "mcp" for Myrinet which added ...

Is this publicly available ? I'd like to give it a try.

> The modifications on top of the OSU modifications to gm

Well, that's a very important point: using GM, which doesn't try to make 
too many things like TCP does. I haven't used GM directly nor looked at 
its code, but I think that it doesn't introduce delays, like TCP does in 
some cases. Moreover, based on the description in the GM docs, GM is not 
needed to be optimized by the compiler as it's not in the fast path. 
Obviously, in such conditions, the results can be relied upon.

> Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send
> call is going to perturb your results because the processor won't be
> doing useful work during that time.

In the case of TCP, the processor doesn't appear to be doing anything
useful for "long" times, as it spends time in kernel space. So, a 50
microseconds busy loop might not make a difference. And given the somehow
non-deterministic behaviour of TCP in this respect, it might be that
adding the delay before the PMPI_* or after PMPI_* calls might make a
difference.

The delays don't have to be busy-loops. Busy-loops are probably precise,
but might have some side-effects; for example, reading some hardware
counter (even more as it is on a PCI device, which is "far" from the CPU
and might be even "farther" if it has any PCI bridge(s) in between)  
repeatedly will generate lots of "in*" operations during which the CPU is
stalled waiting for data. Especially with today's CPU speeds, I/O 
operations are expensive in terms of CPU cycles...

> You are likely interested in delays of 10's of microseconds.

Well, it depends :-) The latencies for today's HW+SW seem to be in a range
of about 2 orders of magnitude, so giving absolute figures doesn't make
much sense IMHO. Apart from this I would rather suggest an exponential
increase in the delay value.

>  - some ethernet interfaces have very sophisticated processors aboard.
>    IIRC there were gigE NICs (Broadcom, maybe???) which had a MIPS cpu.

Well, if the company releases enough documentation about the chip, then
yes ;-) 3Com has the 990 line which is still FastE but has a programmable
processor, so it's not only GigE.

>    Obviously this has the huge disadvantage of being specific to
>    particular network chips.

But there aren't so many programmable network chips these days. Those 
Ethernet chips might even be in wider use than Myrinet[1] and more people 
might benefit from such development. If I'd have to choose for the next 
cluster purchase the GigE network cards and I'd know that one offers such 
capabilities while not having significant flaws compared to the others, 
I'd certainly buy it.

Another hardware approach: the modern 3Com cards driven by 3c59x, Cyclone
and Tornado, have the means to delay a packet in their (hardware) Tx
queue. There is however a catch: there is not guarantee that the packet
will be sent at the exact time specified, it can be delayed; the only
guarantee is that the packet is not sent before that time. However, I 
somehow think that this is true for most other approaches, so it's not so 
bad as it sounds :-)
The operation is pretty simple, as the packet is "stamped" with the time 
when it should be transmitted, expressed as some internal clock ticks. 
Only one "in" operation to read the current clock is needed per packet, so 
this is certainly much less intrusive as the busy-loop.
[ I'm too busy (but not busy-looping :-)) to try this at the moment. If
somebody feels the urge, I can provide some guidance :-) ]

However, anything that still uses TCP (as both your Broadcom approach and 
my 3Com one do) will likely generate unreliable results...

> it would be great to simulate performance of different network
> architectures on specific applications.

Certainly ! Especially as this would provide means to justify spending 
money on fast interconnect ;-)

[1] I don't want this to look like I'm saying "compared with Myrinet as
it's the most widely used high-performance interconnect" and neglect
Infiniband, SCI, etc; I have no idea about "market share" of the different
interconnects. I compare with Myrinet because the original message talked
about it and because I'm ignorant WRT programmable processors on other
interconnect NICs.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From djholm at fnal.gov  Fri Feb 13 19:49:05 2004
From: djholm at fnal.gov (Don Holmgren)
Date: Fri, 13 Feb 2004 18:49:05 -0600
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To:  <Pine.LNX.4.44.0402132227420.1786-100000@kenzo.iwr.uni-heidelberg.de>
References:  <Pine.LNX.4.44.0402132227420.1786-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.SGI.4.58.0402131705490.251303101@hppc.fnal.gov>


On Fri, 13 Feb 2004, Bogdan Costescu wrote:

> On Fri, 13 Feb 2004, Don Holmgren wrote:
>
> > I tried something like this a couple of years ago to investigate the
> > bandwidth and latency sensitivity of an application which was using
> > MPICH over Myrinet.
>
> ... which is pretty different from the setup of the original poster :-)
> But I'd like to see it discussed in general, so let's go on.
>
> > a modified version of the "mcp" for Myrinet which added ...
>
> Is this publicly available ? I'd like to give it a try.

I'm afraid not, sorry, since the modified code base from OSU isn't
publically available.  IIRC it was part of a project for a masters
degree; if it's OK with them, it's OK with me (we can take this
offline).  The modified MCP had a bug I never fixed which required me to
reset the card and reload the driver when some counter overflowed, at
something like a gigabyte of messages.  Long enough to get very good
statistics, though.

>
> > The modifications on top of the OSU modifications to gm
>
> Well, that's a very important point: using GM, which doesn't try to make
> too many things like TCP does. I haven't used GM directly nor looked at
> its code, but I think that it doesn't introduce delays, like TCP does in
> some cases. Moreover, based on the description in the GM docs, GM is not
> needed to be optimized by the compiler as it's not in the fast path.
> Obviously, in such conditions, the results can be relied upon.

I miswrote a bit; to be precise, this was a modification to the MCP,
which is the NIC firmware, rather than to GM, which is the user space
code that interacts with the NIC hardware.  The modification caused the
NIC itself to introduce interpacket delays of a configurable value.  To
the application (well, to MPICH and to GM) it simply looked like the
external Myrinet network had a different bandwidth and/or latency.
There were tiny code changes to MPICH and to GM to allow modification of
the interpacket delay values in the MCP; otherwise I would have had to
recompile or patch the firmware image and reload that image for each new
value.

You are absolutely correct that GM, like all good OS-bypass software,
doesn't introduce the delays that you'd encounter with communications
protocols like TCP that have to pass through the kernel/user space
boundary.  Much more deterministic.

>
> > Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send
> > call is going to perturb your results because the processor won't be
> > doing useful work during that time.
>
> In the case of TCP, the processor doesn't appear to be doing anything
> useful for "long" times, as it spends time in kernel space. So, a 50
> microseconds busy loop might not make a difference. And given the somehow
> non-deterministic behaviour of TCP in this respect, it might be that
> adding the delay before the PMPI_* or after PMPI_* calls might make a
> difference.

TCP processing is likely a significant component of the natural latency,
and, as you point out, during that time the CPU is busy in kernel space
and isn't doing useful work.  But the goal here is to add additional
artificial latency in a manner that mimics a slower physical network,
i.e., so that during this artificial delay the application can still be
crunching numbers.  In user space I don't see how to accomplish this
goal (adding latency, yes; adding latency during which the cpu can do
calculations, no).

If delay code is added correctly in kernel space, say in the TCP/IP
stack (sounds like a nasty bit of careful work!), then during that 50
usec period the CPU could certainly be doing useful work in user space.
Small delays, relative to the timer tick, are very difficult to do
accurately in non-realtime kernels unless you have a handy source of
interrupts, like the local APIC.

Assuming that LAM MPI isn't multithreaded (I have no idea), then adding
a delay in the user space code in the MPI call, whether it's a sleep or
a busy loop, guarantees that no useful application work can done during
the delay.

I'm confess to be totally ignorant of the PMPI_* calls (time for
homework!) and defer humbly to the MPI masters from ANL.  I'm definitely
curious as to how these added latencies are implemented.

>
> The delays don't have to be busy-loops. Busy-loops are probably precise,
> but might have some side-effects; for example, reading some hardware
> counter (even more as it is on a PCI device, which is "far" from the CPU
> and might be even "farther" if it has any PCI bridge(s) in between)
> repeatedly will generate lots of "in*" operations during which the CPU is
> stalled waiting for data. Especially with today's CPU speeds, I/O
> operations are expensive in terms of CPU cycles...

Agreed, though I'd hope on x86 that reading the time stamp counter is
very quick and with minimal impact - it's got to be more like a
register-to-register move than an I/O access.  Hopefully on a modern
superscalar processor this doesn't interfere with the other execution
units.

[As I write this, I just ran a program that reads the time stamp counter
back to back to different registers, multiple times.  The difference in
values was a consistent 84 counts or 56 nsec on this 1.5 GHz Xeon - so,
definitely minimal impact.]

Without busy loops, achieving accurate delays of the order of 10's to
100's of microseconds with little jitter is a real trick in user space,
(and kernel space as well!).  nanosleep() won't work, delivering order
10 or 20 msec (i.e., the next timer tick) instead of the 50 usec
request.

>
> > You are likely interested in delays of 10's of microseconds.
>
> Well, it depends :-) The latencies for today's HW+SW seem to be in a range
> of about 2 orders of magnitude, so giving absolute figures doesn't make
> much sense IMHO. Apart from this I would rather suggest an exponential
> increase in the delay value.

True.  I was really thinking of my specific problem, not his!

The relevent latency range for deciding between Infiniband and switched
ethernet is ~ 6 usec to ~ 100+ usec, and the bandwidth range is ~ 100
MB/sec (gigE) to ~ 700 MB/sec (I.B.).  It would be really useful to be
able to inject latencies in that latency range with a precision of 5
usec or so, and to dial the bandwidth with a precision of ~ 50 MB/sec.
Of course, if latency really matters, one would drop TCP/IP and use an
OS-bypass, like GAMMA or MVIA.

> ...
>
> > it would be great to simulate performance of different network
> > architectures on specific applications.
>
> Certainly ! Especially as this would provide means to justify spending
> money on fast interconnect ;-)

What we need is some kind corporate soul to put up a large public
cluster with the lowest latency, highest bandwidth network fabric
available.  Then, we can add our adjustable firmware and degrade that
fabric to mimic less expensive networks, and figure out what we should
really buy.  Works for me!


Don Holmgren
Fermilab
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sat Feb 14 04:47:30 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sat, 14 Feb 2004 10:47:30 +0100 (CET)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com>
Message-ID: <Pine.LNX.4.44.0402141044320.31047-100000@druifje.clustervision.com>

On Fri, 13 Feb 2004, Lombard, David N wrote:

> 
> Ah, another 1130 veteran!  Group hug!
> 
Talking about 'mature' computer systems,
I was at the ATLAS centre at RAL yesterday, where they display the
console of te IBM 360 in the front hall.
Plenty of blinkenlights and switches to toggle.

The notice beside it said it was a 15 MIPS machine. Seems impressive
for a machineof this vintage.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sat Feb 14 04:43:37 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sat, 14 Feb 2004 10:43:37 +0100 (CET)
Subject: [Beowulf] problmes with MPICH
In-Reply-To: <000d01c3f23d$0e2af910$1101000a@cpn.senamhi.gob.pe>
Message-ID: <Pine.LNX.4.44.0402141038400.31047-100000@druifje.clustervision.com>

On Fri, 13 Feb 2004, Richard Miguel wrote:

> Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need help
> in this point.
> 
> I have mpich-1.2.5.2 and fortran pgi and rsh.
> 
./configure -rsh=RSHCOMMAND   


>From the configure.in:

"The environment variable 'RSHCOMMAND' allows you to select an alternative
remote shell command (by default, configure will use 'rsh' or 'remsh' from
your 'PATH').  If your remote shell command does not support the '-l' 
option
(some AFS versions of 'rsh' have this bug), also give the option
'-rshnol'.  These options are useful only when building a network version
of MPICH (e.g., '--with-device=ch_p4').
The configure option '-rsh' is supported for backward compatibility."

SO rsh is the defautl behaviour.

You can compile with the rsh command set to the rsh under $SGE_HOME/mpi 
also.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Sat Feb 14 11:31:51 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Sat, 14 Feb 2004 11:31:51 -0500 (EST)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.SGI.4.58.0402131705490.251303101@hppc.fnal.gov>
Message-ID: <Pine.LNX.4.44.0402141126510.10373-100000@coffee.psychology.mcmaster.ca>

given the difficulty of accurately adding a small amount of latency
to a message passing interface, how about this: hack the driver to 
artificially pre/append a constant number of bytes to each message.  
they will appear to take longer to process, giving high-resolution added
delays.  course, this will also saturate earlier, but that's only the upper
knee of the curve: you can still learn what you want...

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From konstantin_kudin at yahoo.com  Sat Feb 14 14:28:22 2004
From: konstantin_kudin at yahoo.com (Konstantin Kudin)
Date: Sat, 14 Feb 2004 11:28:22 -0800 (PST)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
Message-ID: <20040214192822.35170.qmail@web21203.mail.yahoo.com>

 I am curious if anyone is using SMART monitoring of
ide drives in a big cluster.

 Basically, the question is in what percentage of the
situations when a drive fails SMART is able to give
some kind of a reasonable warning beforehand, let's
say more than 24 hours. And how often it does not
predict failure at all?

 The reason I am asking is that recently I had a drive
that started getting bunch of I/O errors on certain
sectors, yet SMART seemed to indicate that things were
fine.

 Thanks!

 Konstantin

 
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Sat Feb 14 18:12:38 2004
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Sun, 15 Feb 2004 00:12:38 +0100 (CET)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.LNX.4.44.0402141126510.10373-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0402142347230.13996-100000@kenzo.iwr.uni-heidelberg.de>

On Sat, 14 Feb 2004, Mark Hahn wrote:

> hack the driver to artificially pre/append a constant number of
> bytes to each message.

I thought of this as well, but I dsmissed it because:

- if the higher level protocol uses fragmentation and checksums, I
think that it's pretty hard for the driver to mess with the messages.
- a side effect might be faster filling up of some FIFO buffers on the 
receiver side, which might influence in unexpected ways the latency 
that we want to measure. Another side effect might be on the switch 
(assuming a network that uses switches) where data might be kept 
longer in buffers or peak bandwidth might be reached for short times, 
but enough to make a difference...
- for networks that offer a very low latency, simulating a large 
latency might require adding a big lot of junk data, many times larger 
than the original message.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Mon Feb 16 09:08:54 2004
From: timm at fnal.gov (Steven Timm)
Date: Mon, 16 Feb 2004 08:08:54 -0600 (CST)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <20040214192822.35170.qmail@web21203.mail.yahoo.com>
References: <20040214192822.35170.qmail@web21203.mail.yahoo.com>
Message-ID: <Pine.LNX.4.58.0402160807020.19741@boxer.fnal.gov>

We are using the SMART monitoring on our cluster.  It depends
on the drive model how much predictive power you will get.
On the drives where we have had the most failures we've kept track
of how well SMART predicted it pretty well.. it finds an error
in advance about half the time.

Steve Timm


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Core Support Services Dept.
Assistant Group Leader, Scientific Computing Support Group
Lead of Computing Farms Team

On Sat, 14 Feb 2004, Konstantin Kudin wrote:

>  I am curious if anyone is using SMART monitoring of
> ide drives in a big cluster.
>
>  Basically, the question is in what percentage of the
> situations when a drive fails SMART is able to give
> some kind of a reasonable warning beforehand, let's
> say more than 24 hours. And how often it does not
> predict failure at all?
>
>  The reason I am asking is that recently I had a drive
> that started getting bunch of I/O errors on certain
> sectors, yet SMART seemed to indicate that things were
> fine.
>
>  Thanks!
>
>  Konstantin
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Finance: Get your refund fast by filing online.
> http://taxes.yahoo.com/filing.html
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From camm at enhanced.com  Mon Feb 16 10:47:01 2004
From: camm at enhanced.com (Camm Maguire)
Date: 16 Feb 2004 10:47:01 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <Pine.LNX.4.44.0309121013000.6195-100000@twin.uoregon.edu>
Message-ID: <54brnzrpqi.fsf@intech19.enhanced.com>

Greetings!  The subject line says it all -- where can one get the most
bang per watt among systems currently available?

Take care,
-- 
Camm Maguire			     			camm at enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Mon Feb 16 11:19:53 2004
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Mon, 16 Feb 2004 11:19:53 -0500 (EST)
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <54brnzrpqi.fsf@intech19.enhanced.com>
References: <54brnzrpqi.fsf@intech19.enhanced.com>
Message-ID: <Pine.LNX.4.58.0402161113330.23278@chaos.egr.duke.edu>

On Mon, 16 Feb 2004 at 10:47am, Camm Maguire wrote

> Greetings!  The subject line says it all -- where can one get the most
> bang per watt among systems currently available?

I have no numbers or benchmarks, but my search for a quiet but powerful 
set of nodes led me to buy Dell Optiplex SX270s.  They've got the Intel 
865G chipset (800MHz FSB, 400MHz dual channel memory), P4 HT up to 3.2GHz, 
onboard e1000, laptop-style HDD, a 150W power supply, and little else.  
They're sweet little systems.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gmpc at sanger.ac.uk  Mon Feb 16 12:10:43 2004
From: gmpc at sanger.ac.uk (Guy Coates)
Date: Mon, 16 Feb 2004 17:10:43 +0000 (GMT)
Subject: Subject: [Beowulf] S.M.A.R.T usage in big clusters 
In-Reply-To: <200402151704.i1FH4Vh21871@NewBlue.scyld.com>
References: <200402151704.i1FH4Vh21871@NewBlue.scyld.com>
Message-ID: <Pine.OSF.4.44.0402160956310.1755229-100000@ecs2c.internal.sanger.ac.uk>

> Message: 1
> Date: Sat, 14 Feb 2004 11:28:22 -0800 (PST)
> From: Konstantin Kudin <konstantin_kudin at yahoo.com>
> To: beowulf at beowulf.org
> Subject: [Beowulf] S.M.A.R.T usage in big clusters
>
>  I am curious if anyone is using SMART monitoring of
> ide drives in a big cluster.

Yes. We use smartmon tools

http://smartmontools.sourceforge.net/

Hard drive failures are by far the most common hardware failure we see on
our systems. We've hooked smartmontools into the batch queueing system we
use, so that if drives are flagged as failing, the host gets closed to new
jobs. (You could extend this to do checkpoint/migration if your code
supports it, ours doesn't.)

Our cluster typically runs fairly short jobs (less than 1 hour or so) so
jobs usually finish before the drive finally fails.  I haven't collected
any hard statistics on how many failures we catch before it impacts on a
user's work, but my gut feeling is that it catches over 80% of the cases,
and certainly enough for it to be worthwhile implementing.

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Mon Feb 16 16:00:34 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 16 Feb 2004 13:00:34 -0800
Subject: [Beowulf] Max flops to watts hardware for a cluster
References: <54brnzrpqi.fsf@intech19.enhanced.com>
Message-ID: <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422>

This is an exceedingly sophisticated question..

Do you count:
Wall plug watts to flops? or CPU watts to flops?
does the interconnect count?  (just the power in the line drivers and
terminations is a big power consumer for spaceflight hardware... why LVDS is
overtaking RS-422 ... 300mV into 100 ohms is a lot better than 12-15V into
100 ohms. Too bad LVDS parts don't have the common mode voltage tolerance.

I'll bet that gigabit backplane in the switch burns a fair amount of
power...

does the memory count?  This would drive more vs less cache decisions, which
affect algorithm partitioning and data locality of reference.

Is there a constraint on a "total minimum speed" or "maximum number of
nodes"?  The interesting tradeoff in speed of nodes vs number of nodes
manifests itself in many ways: more interconnects, bigger switches, etc.

More nodes means Larger physical size means longer cables means more cable
capacitance to charge and discharge on each bit means more power in the line
drivers.

What's your message latency requirement?  Can you do store and forward
through the nodes (a'la iPSC/1 hypercubes) (saving you the switch, but
adding some power in the CPU to shuffle messages around)

Can free space optical interconnects be used?  (power hungry Tx and Rx, but
no cable length issues)


Anyway.. this is an issue that is very near and dear to my heart (since I'm
designing power constrained systems).  One problem you'll find is that
reliable and comparable (across processors/architectures) numbers are very
hard to come by.  I've spent a fair amount of time explaining why 40 MFLOPs
in a 20 MHz DSP can actually be a lot more "crunch" at a lot less power than
a 200 MIPS PowerPC 750 running at 133 MHz.


Jim Lux
Spacecraft Telecommunications Section
Jet Propulsion Lab


----- Original Message -----
From: "Camm Maguire" <camm at enhanced.com>
To: <beowulf at scyld.com>
Sent: Monday, February 16, 2004 7:47 AM
Subject: [Beowulf] Max flops to watts hardware for a cluster


> Greetings!  The subject line says it all -- where can one get the most
> bang per watt among systems currently available?
>
> Take care,
> --
> Camm Maguire      camm at enhanced.com
> ==========================================================================
> "The earth is but one country, and mankind its citizens."  --  Baha'u'llah
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From amacater at galactic.demon.co.uk  Mon Feb 16 18:11:50 2004
From: amacater at galactic.demon.co.uk (Andrew M.A. Cater)
Date: Mon, 16 Feb 2004 23:11:50 +0000
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422>
References: <54brnzrpqi.fsf@intech19.enhanced.com> <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422>
Message-ID: <20040216231150.GA3060@galactic.demon.co.uk>

On Mon, Feb 16, 2004 at 01:00:34PM -0800, Jim Lux wrote:
> This is an exceedingly sophisticated question..
> 
> Do you count:
> Wall plug watts to flops? or CPU watts to flops?
Via Eden / Nehemiah chips at 1GHz for 7W or Acorn ARM e.g. Simtec 
evaluation boards ?

> does the interconnect count?  (just the power in the line drivers and
> terminations is a big power consumer for spaceflight hardware... why LVDS is
> overtaking RS-422 ... 300mV into 100 ohms is a lot better than 12-15V into
> 100 ohms. Too bad LVDS parts don't have the common mode voltage tolerance.
> 
Cheap slow ASICs and serial port type speeds? Low power Bluetooth 
devices?

> I'll bet that gigabit backplane in the switch burns a fair amount of
> power...
> 
> does the memory count?  This would drive more vs less cache decisions, which
> affect algorithm partitioning and data locality of reference.
> 
The early Seymour Cray model - minimum numbers of standard parts that 
are ultra fast?

> Is there a constraint on a "total minimum speed" or "maximum number of
> nodes"?  The interesting tradeoff in speed of nodes vs number of nodes
> manifests itself in many ways: more interconnects, bigger switches, etc.
> 

Buckyball of PDA's anyone ? :)

> More nodes means Larger physical size means longer cables means more cable
> capacitance to charge and discharge on each bit means more power in the line
> drivers.
> 

Xilinx FPGA type architecture? Inmos transputer-style? Node on chip? AVR 
Atmel-type chips?

> What's your message latency requirement?  Can you do store and forward
> through the nodes (a'la iPSC/1 hypercubes) (saving you the switch, but
> adding some power in the CPU to shuffle messages around)
> 
> Can free space optical interconnects be used?  (power hungry Tx and Rx, but
> no cable length issues)
> 

ThinkGeek do an _ultra cool_ looking green pumped laser pointer which 
will reach low cloudbases :)

> 
> Anyway.. this is an issue that is very near and dear to my heart (since I'm
> designing power constrained systems).  One problem you'll find is that
> reliable and comparable (across processors/architectures) numbers are very
> hard to come by.  I've spent a fair amount of time explaining why 40 MFLOPs
> in a 20 MHz DSP can actually be a lot more "crunch" at a lot less power than
> a 200 MIPS PowerPC 750 running at 133 MHz.
> 
If 5W of power goes to/from Mars - then the JPL are the ones to beat on 
this [makes QRP radio hams look positively profligate] :)

Andy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Mon Feb 16 20:45:49 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 17 Feb 2004 12:45:49 +1100
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422>
References: <54brnzrpqi.fsf@intech19.enhanced.com> <20040216231150.GA3060@galactic.demon.co.uk> <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422>
Message-ID: <200402171245.51746.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 17 Feb 2004 12:22 pm, Jim Lux wrote:

> For those interested, all the deep space comm stuff is documented in CCSDS
> specs at http://www.ccsds.org/

Cool.

http://www1.ietf.org/mail-archive/ietf-announce/Current/msg27294.html

This document describes how to encapsulate Internet Protocol 
version 4 and version 6 packets can be encapsulated in 
Consultative Committee for Space Data Systems (CCSDS) Space 
Data Link Protocols.

That's going to be one hell of a round trip time for pings..

<ObCluster>

What about distributed processing between spacecraft ?   OK, maybe 
interplanetary would be a bit much, but what about lander(s) and orbiter(s) ?

</ObCluster>

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAMXJNO2KABBYQAh8RAkTUAKCDfbAaswt3oWYDrEzXecdrqPfIPACff5cS
UUAVTMwPAR3XA3lHjjf9lYc=
=+LJH
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Mon Feb 16 20:22:51 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 16 Feb 2004 17:22:51 -0800
Subject: [Beowulf] Max flops to watts hardware for a cluster
References: <54brnzrpqi.fsf@intech19.enhanced.com> <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> <20040216231150.GA3060@galactic.demon.co.uk>
Message-ID: <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422>

> >
> If 5W of power goes to/from Mars - then the JPL are the ones to beat on
> this [makes QRP radio hams look positively profligate] :)

that 15W from Mars, on the omni antenna, only gets you 7-8 bits/second,
working into a 70 meter diameter dish and a cryogenically cooled receiver
front end.  A bit beyond the typical ham's rig or budget.

Going the other way, it's hundreds of kW into the dish.  Beyond QRO.

More realistically, they get a hundred kbps or so on the UHF link to the
orbiter from a basically omni antenna on the rover. I can't recall what the
max rate on the "direct to earth" X-band high gain antenna (which is about
20 cm in diameter) is, but it's probably in the same ballpark.

That's the actual signalling rate, also... there's some coding going on as
well, so the "data rate" is lower, after you take out framing, error
correction etc.

For those interested, all the deep space comm stuff is documented in CCSDS
specs at http://www.ccsds.org/

---
Actually, the low power per function (or more accurately, low energy per
function) champs are probably the cellphone folks.. Battery life is a real
selling point.  The little GPS receivers for cellphones are actually spec'd
in milliJoules/fix, for instance.

That said, I don't see anyone building a big crunching cluster out of
cellphones...  It's all those other issues you have to deal with..
interconnects, cluster management, memory, etc.  They all require energy.

Jim Lux
Spacecraft Telecommunications Equipment Section
Jet Propulsion Laboratory

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Tue Feb 17 00:34:30 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 16 Feb 2004 21:34:30 -0800
Subject: [Beowulf] Max flops to watts hardware for a cluster
References: <54brnzrpqi.fsf@intech19.enhanced.com> <20040216231150.GA3060@galactic.demon.co.uk> <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> <200402171245.51746.csamuel@vpac.org>
Message-ID: <001a01c3f517$b7bbffb0$36a8a8c0@LAPTOP152422>

>
> This document describes how to encapsulate Internet Protocol
> version 4 and version 6 packets can be encapsulated in
> Consultative Committee for Space Data Systems (CCSDS) Space
> Data Link Protocols.
>
> That's going to be one hell of a round trip time for pings..
>
> <ObCluster>
>
> What about distributed processing between spacecraft ?   OK, maybe
> interplanetary would be a bit much, but what about lander(s) and
orbiter(s) ?
>
> </ObCluster>
>

Such ideas are being contemplated, and not only by me.  There are
distributed computing/ cooperative robotics sorts of things, and also
"formation flying" sorts of things, not to mention "sensor webs".

 Probably the biggest problem is not a technology one but a philosophical
one.  Spacecraft and mission design is exceedingly conservative, and you'd
have to show that it would enable something that's needed, that can't be
done by conventional approaches. It's sufficiently unusual that it doesn't
fit well with the usual analysis models for spacecraft; which tend to push
towards "one big X" supplied by power from "one big Y" using "one big Z" to
talk to home, etc.  The costing spreadsheets used in speculative mission
planning don't have cells for "number of processors in cluster" and "power
per node"   You need a fairly straightforward model that says, in effect,
you can process "x" amount of data with "y" mass and "z" watts/joules. That
model must be backed up by credible analysis and experience ("heritage" in
space speak).   In general, the perception is that "more parts = more
potential failure points = higher risk" so it's gotta be a "this is the ONLY
way to make the measurement" or it's not going to fly.

You're going to spend years and years getting ready to go, and you can't go
fix it if it breaks.  Spaceflight is a very, very, very different conceptual
and planning model. (we won't even get into what you have to do if it's
connected to human space flight in any way...).  The time from "great idea"
to "mission launch" is probably in the area of 5-7 years.  The CPU flying on
the Mars Rovers is a Rad6000, which is based on an old MIPS processor.
Current missions in planning and development use things like PowerPC750's
(derated) and Sparc7s and 8's (aka ERC32 and/or LEON) and ADSP21020 clones.
Nobody is thinking about flying ARMs or Transmetas or even Pentiums.  The
popular scheme these days is various and sundry microcores (6502, 8051,
PPC604s) in Xilinx megagate FPGAs.

Actually, though, the fact that only these relatively low powered
(computationally) processors are what are flying is what makes clusters
attractive.  If you need hundreds of megaflops to do your measurement,
you're only going to get it with multiple processors.

Jim Lux
JPL

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikhailberis at free.net.ph  Tue Feb 17 06:56:34 2004
From: mikhailberis at free.net.ph (Dean Michael C. Berris)
Date: 17 Feb 2004 19:56:34 +0800
Subject: [Beowulf] Best Setup for Batch Systems
Message-ID: <1077018992.18450.21.camel@mikhail>

Good day everyone,

I have just a 5 node cluster networked together with a 100 Mbps Ethernet
hub (well, not the best setup). The master acts as a NAT host for the
internal hosts, and only the master node has 2 nics, one facing the
internet and another facing the internal net. The master node is
accessible from the internet, and I login to it to run jobs in the
background (using screen).

I've been reading a lot about OpenPBS and the Maui scheduler, but as
mentioned in the list and also evident in the website, the OpenPBS
system is not readily downloadable/distributable. Are there any
alternatives to OpenPBS which does most of the same thing (batch
scheduling of jobs for clusters)? Interfaceability using a GUI frontend
(without having to make one of my own) is definitely a plus.

TIA
 
-- 
Dean Michael C. Berris
http://mikhailberis.blogspot.com
mikhailberis at free.net.ph
+63 919 8720686
GPG 08AE6EAC


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb 17 08:47:41 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 17 Feb 2004 14:47:41 +0100 (CET)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077018992.18450.21.camel@mikhail>
Message-ID: <Pine.LNX.4.44.0402171444310.3421-100000@druifje.clustervision.com>

On 17 Feb 2004, Dean Michael C. Berris wrote:

> Good day everyone,
> 
> 
> I've been reading a lot about OpenPBS and the Maui scheduler, but as
> mentioned in the list and also evident in the website, the OpenPBS
> system is not readily downloadable/distributable. Are there any
> alternatives to OpenPBS which does most of the same thing (batch
> scheduling of jobs for clusters)? Interfaceability using a GUI frontend
> (without having to make one of my own) is definitely a plus.

Gridengine is probably a good bet for you.
http://gridengine.sunsource.net
The GUI is called qmon (I don't use it much)

There are binaries available, and clear instructions on how to install 
it. If you have problems, join the Gridengine list where we'll help.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb 17 08:40:46 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 17 Feb 2004 14:40:46 +0100 (CET)
Subject: [Beowulf] Linux-HA conference and tutorial, UK
Message-ID: <Pine.LNX.4.44.0402171038390.1881-100000@druifje.clustervision.com>

If anyone is interested in Linux-HA,
the UKUUG are having a tutorial and conference in Bournemouth.

The people leading the tutorial are Alan Robertson and 
Lars Markowsky-Bree, who head up the Linux-HA project.
http://www.ukuug.org/events/winter2004/


(ps. I won't be there)


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From camm at enhanced.com  Tue Feb 17 11:41:19 2004
From: camm at enhanced.com (Camm Maguire)
Date: 17 Feb 2004 11:41:19 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <Pine.LNX.4.44.0402161635110.1671-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0402161635110.1671-100000@coffee.psychology.mcmaster.ca>
Message-ID: <547jylk6a8.fsf@intech19.enhanced.com>

Greetings, and thanks for the fascinating discussion!  

I'm mostly interested in dram flops, and also not the absolute
maximum, mars-rover level technology, but say within 10% of the best
available options on a more or less commodity basis.

Take care,

Mark Hahn <hahn at physics.mcmaster.ca> writes:

> > Greetings!  The subject line says it all -- where can one get the most
> > bang per watt among systems currently available?
> 
> depends on which kind of flops: cache-friendly or dram-oriented?
> 
> 
> 
> 

-- 
Camm Maguire			     			camm at enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Tue Feb 17 14:59:52 2004
From: atp at piskorski.com (Andrew Piskorski)
Date: Tue, 17 Feb 2004 14:59:52 -0500
Subject: [Beowulf] ECC RAM or not?
Message-ID: <20040217195952.GA50999@piskorski.com>

For a low-cost cluster, would you insist on ECC RAM or not, and why?

My inclination would be to always use ECC for anything, but it looks
as if there is no such thing as an inexpensive motherboard which also
supports ECC RAM.  Either you can have a cheap motherboard (well under
$100) with no ECC, or a pricey (well over $100) motherboard with ECC.
Am I mistaken about this, are are there really no exceptions to this
seeming "ECC motherboads are always expensive" rule?

Also at least some large production clusters out there, KASY0, for
example, doesn't use ECC RAM, do not use ECC - I wonder why:

  http://aggregate.org/KASY0/cost.html

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Tue Feb 17 18:20:12 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Tue, 17 Feb 2004 18:20:12 -0500 (EST)
Subject: [Beowulf] ECC RAM or not?
In-Reply-To: <20040217195952.GA50999@piskorski.com>
Message-ID: <Pine.LNX.4.44.0402171741240.10055-100000@coffee.psychology.mcmaster.ca>

> For a low-cost cluster, would you insist on ECC RAM or not, and why?

how low-cost, and what kind of code?

technically, the chances of seeing dram corruption depends on how much
ram you have, and how much you use it (as well as environmental factors,
such as altitude, of course!)  for a sufficiently low-cost cluster,
you'd expect to have relatively little ram, and little CPU power to churn it,
and therefore low rate of bit-flips.  otoh, you can bet that the recent 
ECC upgrade of the VT cluster had a significant real cost (probably eaten
by vendors for PR reasons...)

some kinds of codes are "rad hard", in the sense that if a failure gives
you a possibly-wront answer, you can just check the answer.  that definition
pretty much excludes traditional supercomputing, and certainly all
physics-based simulations.  searching/optimization stuff might work well
in that mode, though rechecking only catches false positives, doesn't 
recover from false negatives.  I suspect that doing ECC is cheaper than 
messing around with this kind of uncertainty, even for these specialized codes.

> My inclination would be to always use ECC for anything, but it looks
> as if there is no such thing as an inexpensive motherboard which also
> supports ECC RAM.  Either you can have a cheap motherboard (well under
> $100) with no ECC, or a pricey (well over $100) motherboard with ECC.

well, you're really pointing out the difference between desktop and 
workstation/server markets.  for instance, there's not much physical
difference between the i875 and i865 chipsets, but the former shows 
up in $200 boards that need a video card, and the latter in $100 ones
that have integrated video.

> Am I mistaken about this, are are there really no exceptions to this
> seeming "ECC motherboads are always expensive" rule?

it's a marketing/market-driven phenomenon.

it's useful to work out the risks when you make this kind of decision.

if you have 32 low-overhead nodes containing 20K-hour power supplies, you'll
need to think about doing a replacement per month.

if you have a 1M-hour disk in each of 1100 nodes, you shouldn't be shocked
to get a couple failures a week.

if 1100 nodes with 4G but no ECC see a two undetected corruptions a day,
then 32 nodes with 1G will go a couple months between events...

regards, mark hahn.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dieter at engr.uky.edu  Tue Feb 17 18:18:20 2004
From: dieter at engr.uky.edu (William Dieter)
Date: Tue, 17 Feb 2004 18:18:20 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <200402171701.i1HH13h07766@NewBlue.scyld.com>
Message-ID: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu>

Try the cluster design tool at 
<http://cgi.aggregate.org/cgi-bin/cdr.cgi>.  You can enter your basic 
memory, memory bandwidth, etc requirements, then set the metric 
weighting to choose designs with the least power consumption first.

For example, for the default requirements (minimal memory, disk, and 
network requirements, at least 50 GFLOPS, and a $10,000 budget), and 
weighting power consumption first then memory bandwidth, followed by 
GFLOPS I get the following as the best design:

23 	Generic Fast Ethernet NIC 			  $8.00 	$184.00
23 	Cat5 Cable for Fast Ethernet 			  $2.00 	$46.00
1 	Generic 24 Port Fast Ethernet Switch 	$76.00 	$76.00
23 	Pentium 4 2.4GHz 					$166.00 	$3818.00
23 	Generic Socket 478 					$56.00 	$1288.00
69 	Generic PC3200 256MB DDR 			$44.00 	$3036.00
23 	Generic Mid-Tower Case 				$50.00 	$1150.00
3 	Generic 2x2 Shelving Unit with Wheels 	$50.00 	$150.00
									Total 	$9748.00

The above design gets you 50 GFLOPS and 2.67 bytes/FLOP for about 30 
Amps (you get to convert Amps to Watts.)  Everything else in the design 
is pretty minimal, but you can adjust the requirements on the form to 
get what you need (or if you can't let me know why not :-)

The CGI tries all designs with the parts in its database to find the 
ones that meet your requirements and metric weighting.  The model 
includes current consumption for switches and compute nodes based on 
the power supply.  The parts database is a bit out of date right now...

let me know what you think.

Bill Dieter.
dieter at engr.uky.edu

On Tuesday, February 17, 2004, at 12:01 PM, Camm Maguire wrote:
> Greetings, and thanks for the fascinating discussion!
>
> I'm mostly interested in dram flops, and also not the absolute
> maximum, mars-rover level technology, but say within 10% of the best
> available options on a more or less commodity basis.
>
> Take care,
>
> Mark Hahn <hahn at physics.mcmaster.ca> writes:
>
>>> Greetings!  The subject line says it all -- where can one get the 
>>> most
>>> bang per watt among systems currently available?
>>
>> depends on which kind of flops: cache-friendly or dram-oriented?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb 17 21:38:39 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 18 Feb 2004 13:38:39 +1100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077018992.18450.21.camel@mikhail>
References: <1077018992.18450.21.camel@mikhail>
Message-ID: <200402181338.50678.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 17 Feb 2004 10:56 pm, Dean Michael C. Berris wrote:

> I've been reading a lot about OpenPBS and the Maui scheduler, but as
> mentioned in the list and also evident in the website, the OpenPBS
> system is not readily downloadable/distributable.

There is a forked version of OpenPBS called 'Torque' (it was called 
ScalablePBS, but Altair requested it changed its name) which includes a whole 
host of bug fixes and enhancements (including massive scalability) and is 
freely downloadable under an earlier, more free, OpenPBS license.

It's under active development and has an active user community, though the 
mailing list is moderated for some bizzare reason, which means posts can take 
a little while to get through.

The website is at:

	http://www.supercluster.org/projects/torque/

Good luck!
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAMtAvO2KABBYQAh8RAp8cAJsHNJuoCmIxYMNUWguwpoueopKUxACdHJiq
p0nGW3X3ATurlzaV+Iw5jtg=
=xwcU
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Tue Feb 17 23:20:37 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Wed, 18 Feb 2004 12:20:37 +0800 (CST)
Subject: [Beowulf] SLURM - newest (and greatest?) batch system
Message-ID: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com>

One of the new features of SGE 6.0 is the parallelized
job container (qmaster).

Another batch system called SLURM (Simple Linux
Utility for Resource Management) will be releasing
soon.

http://www.llnl.gov/linux/slurm/slurm.html

- Like SGE 6.0, it also uses threads to parallelize
the job container.
- licensed under the GPL!!
- developed by the US gov
- uses Maui
- designed to be simple :)
- supports lots of interconnect switches.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Tue Feb 17 23:02:54 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Wed, 18 Feb 2004 12:02:54 +0800 (CST)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077018992.18450.21.camel@mikhail>
Message-ID: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>

You can choose between SGE and SPBS.

SGE has more features, better fault tolerance, better
documentation, and better user support.

http://gridengine.sunsource.net

SPBS is closer to what you have now, so you and your
users (BTW, are you the only one?) don't need to learn
something new.

http://www.supercluster.org/

Andrew.


 --- "Dean Michael C. Berris"
<mikhailberis at free.net.ph> ????> Good day
everyone,
> 
> I have just a 5 node cluster networked together with
> a 100 Mbps Ethernet
> hub (well, not the best setup). The master acts as a
> NAT host for the
> internal hosts, and only the master node has 2 nics,
> one facing the
> internet and another facing the internal net. The
> master node is
> accessible from the internet, and I login to it to
> run jobs in the
> background (using screen).
> 
> I've been reading a lot about OpenPBS and the Maui
> scheduler, but as
> mentioned in the list and also evident in the
> website, the OpenPBS
> system is not readily downloadable/distributable.
> Are there any
> alternatives to OpenPBS which does most of the same
> thing (batch
> scheduling of jobs for clusters)? Interfaceability
> using a GUI frontend
> (without having to make one of my own) is definitely
> a plus.
> 
> TIA
>  
> -- 
> Dean Michael C. Berris
> http://mikhailberis.blogspot.com
> mikhailberis at free.net.ph
> +63 919 8720686
> GPG 08AE6EAC
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Tue Feb 17 23:37:00 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 17 Feb 2004 20:37:00 -0800
Subject: [Beowulf] ECC RAM or not?
References: <Pine.LNX.4.44.0402171741240.10055-100000@coffee.psychology.mcmaster.ca>
Message-ID: <003101c3f5d8$d9b03250$36a8a8c0@LAPTOP152422>

> some kinds of codes are "rad hard", in the sense that if a failure gives
> you a possibly-wront answer, you can just check the answer.

My practical experience with DRAM designs has been that bit errors are more
likely due to noise/design issues than radiation induced single event
upsets. Back in the 80's I worked on a Multibus system where we used to get
double bit errors in 11/8 ecc several times a week.  Everyone just said
"well, that's why we have ECC" until I did some quick statistics on what the
ratio between single bit (corrected but counted) and double bit errors
should have been. Such high rates defied belief, and it turned out to be a
bus drive problem.


 that definition
> pretty much excludes traditional supercomputing, and certainly all
> physics-based simulations.  searching/optimization stuff might work well
> in that mode, though rechecking only catches false positives, doesn't
> recover from false negatives.  I suspect that doing ECC is cheaper than
> messing around with this kind of uncertainty, even for these specialized
codes.


There are a number of algorithms which have inherent self checking built in.
In the accounting business, this is why there's double entry, and/or
checksums. In the signal processing world, there are checks you can do on
things like FFTs, where total power in should equal total power out.

>
>
> if you have 32 low-overhead nodes containing 20K-hour power supplies,
you'll
> need to think about doing a replacement per month.
>
> if you have a 1M-hour disk in each of 1100 nodes, you shouldn't be shocked
> to get a couple failures a week.

Shades of replacing tubes in Eniac or the Q-7A

MIL-HDBK-217A is the "bible" on these sorts of computations.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Tue Feb 17 23:28:53 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 17 Feb 2004 20:28:53 -0800
Subject: [Beowulf] Max flops to watts hardware for a cluster
References: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu>
Message-ID: <002701c3f5d7$ceae2980$36a8a8c0@LAPTOP152422>

This kind of thing is way cool..
Have you published the algorithm behind the page in a concise form
somewhere?  It would be handy to be able to point mission/system planners to
it.

----- Original Message -----
From: "William Dieter" <dieter at engr.uky.edu>
To: <beowulf at scyld.com>
Sent: Tuesday, February 17, 2004 3:18 PM
Subject: Re: [Beowulf] Max flops to watts hardware for a cluster


> Try the cluster design tool at
> <http://cgi.aggregate.org/cgi-bin/cdr.cgi>.  You can enter your basic
> memory, memory bandwidth, etc requirements, then set the metric
> weighting to choose designs with the least power consumption first.
>
> For example, for the default requirements (minimal memory, disk, and
> network requirements, at least 50 GFLOPS, and a $10,000 budget), and
> weighting power consumption first then memory bandwidth, followed by
> GFLOPS I get the following as the best design:
>
> 23 Generic Fast Ethernet NIC   $8.00 $184.00
> 23 Cat5 Cable for Fast Ethernet   $2.00 $46.00
> 1 Generic 24 Port Fast Ethernet Switch $76.00 $76.00
> 23 Pentium 4 2.4GHz $166.00 $3818.00
> 23 Generic Socket 478 $56.00 $1288.00
> 69 Generic PC3200 256MB DDR $44.00 $3036.00
> 23 Generic Mid-Tower Case $50.00 $1150.00
> 3 Generic 2x2 Shelving Unit with Wheels $50.00 $150.00
> Total $9748.00
>
> The above design gets you 50 GFLOPS and 2.67 bytes/FLOP for about 30
> Amps (you get to convert Amps to Watts.)  Everything else in the design
> is pretty minimal, but you can adjust the requirements on the form to
> get what you need (or if you can't let me know why not :-)
>
> The CGI tries all designs with the parts in its database to find the
> ones that meet your requirements and metric weighting.  The model
> includes current consumption for switches and compute nodes based on
> the power supply.  The parts database is a bit out of date right now...
>
> let me know what you think.
>
> Bill Dieter.
> dieter at engr.uky.edu
>
> On Tuesday, February 17, 2004, at 12:01 PM, Camm Maguire wrote:
> > Greetings, and thanks for the fascinating discussion!
> >
> > I'm mostly interested in dram flops, and also not the absolute
> > maximum, mars-rover level technology, but say within 10% of the best
> > available options on a more or less commodity basis.
> >
> > Take care,
> >
> > Mark Hahn <hahn at physics.mcmaster.ca> writes:
> >
> >>> Greetings!  The subject line says it all -- where can one get the
> >>> most
> >>> bang per watt among systems currently available?
> >>
> >> depends on which kind of flops: cache-friendly or dram-oriented?
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Wed Feb 18 01:26:38 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 17 Feb 2004 22:26:38 -0800
Subject: [Beowulf] ECC RAM or not?
References: <Pine.LNX.4.44.0402180026230.14755-100000@coffee.psychology.mcmaster.ca>
Message-ID: <000601c3f5e8$2a8430f0$36a8a8c0@LAPTOP152422>


----- Original Message -----
From: "Mark Hahn" <hahn at physics.mcmaster.ca>
To: "Jim Lux" <James.P.Lux at jpl.nasa.gov>
Sent: Tuesday, February 17, 2004 9:36 PM
Subject: Re: [Beowulf] ECC RAM or not?


> > > some kinds of codes are "rad hard", in the sense that if a failure
gives
> > > you a possibly-wront answer, you can just check the answer.
> >
> > My practical experience with DRAM designs has been that bit errors are
more
> > likely due to noise/design issues than radiation induced single event
> > upsets.
>
> understood.  then again, you're using deliberately selected rad-hard-ware,
no?

Nope... that was off the shelf DRAMs in a commercial environment (in 1980ish
time frame, so they were none too dense DRAMs, either..  256kB on a board I
think, many, many, pieces.. probably 64kbit parts..)


> I was mostly thinking about a talk I saw by the folks who care for ASCI-Q,
> which is in Los Alamos.  they say that the altitude alone is worth a 14x
> increase in particle flux, and that this caused big problems for them with
> a particular register on the ES40 data path that was not ecc'ed.

Indeed.. ECC on memory is only part of the problem.. you really need ECC on
address and data lines for full coverage (or, more properly EDAC)..   The
classic paper on altitude effects was done by folks at IBM, where they ran
boards in NY and in Denver and, underground in Denver.  Good experimental
technique, etc.


>
> > Back in the 80's I worked on a Multibus system where we used to get
> > double bit errors in 11/8 ecc several times a week.  Everyone just said
> > "well, that's why we have ECC" until I did some quick statistics on what
the
> > ratio between single bit (corrected but counted) and double bit errors
> > should have been. Such high rates defied belief, and it turned out to be
a
> > bus drive problem.
>
> makes sense.  to be honest, I don't see many single-bit errors even,
> but today we've only < 200 GB ram online.  inside a year, it'll probably
> be more like 2TB, so maybe things will get more exciting ;)

It's a very mixed bag, depending on what's causing the errors. If it's
radiation, smaller feature sizes mean that there's a smaller target to hit,
and the amount of energy transferred is less (of course, less energy is
stored in the memory cell, too)


> we're also pretty much at sealevel, with lots of building over us.
> reactor next door, though ;)

Type of particle, and it's energy, has a huge effect on the SEU effects.  I
would maintain, though, that run of the mill timing margin effects,
particularly over temperature; and EMI/EMC effects are probably a more
important source of bit hits in modern computers.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikhailberis at free.net.ph  Wed Feb 18 05:25:22 2004
From: mikhailberis at free.net.ph (Dean Michael C. Berris)
Date: 18 Feb 2004 18:25:22 +0800
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>
References: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>
Message-ID: <1077099918.4818.15.camel@mikhail>

Thanks sir, and to everyone else that responded. I'm currently reading
on SGE, and am going to be choosing as soon as I get the full picture.
Currently my preference is still towards SPBS (Torque) mainly because it
doesn't seem as complicated to set up.

However, as a Debian user, I did an apt-cache search on batch system and
a couple of packages were Queue and DQS (Distributed Queueing System). I
went over to the DQS website, and I'm reading on it right now. What I'd
like to know would be how different DQS (and/or Queue) is with regards
to SPBS and SGE?

It would seem like from what I've been reading, SGE and SPBS are really
for clusters (and grids), and DQS is for a collection of computers that
really don't work as a cluster (or as a parallel computer). How accurate
is this assessment of mine?

Are there any articles written by people in the group regarding
comparisons between SGE and SPBS with regards to effectivity and
reliability? Scalability is also a factor because the cluster may grow
as more funding and problems get into the cluster project.

I hope I never cease to get enlightened from posts in the group, and
insights would be most appreciated.

Thanks very much and have a nice day! :)

On Wed, 2004-02-18 at 12:02, Andrew Wang wrote:
> You can choose between SGE and SPBS.
> 
> SGE has more features, better fault tolerance, better
> documentation, and better user support.
> 
> http://gridengine.sunsource.net
> 
> SPBS is closer to what you have now, so you and your
> users (BTW, are you the only one?) don't need to learn
> something new.
> 
> http://www.supercluster.org/
> 
> Andrew.
> 
> 
>  --- "Dean Michael C. Berris"
> <mikhailberis at free.net.ph> ????> Good day
> everyone,
> > 
> > I have just a 5 node cluster networked together with
> > a 100 Mbps Ethernet
> > hub (well, not the best setup). The master acts as a
> > NAT host for the
> > internal hosts, and only the master node has 2 nics,
> > one facing the
> > internet and another facing the internal net. The
> > master node is
> > accessible from the internet, and I login to it to
> > run jobs in the
> > background (using screen).
> > 
> > I've been reading a lot about OpenPBS and the Maui
> > scheduler, but as
> > mentioned in the list and also evident in the
> > website, the OpenPBS
> > system is not readily downloadable/distributable.
> > Are there any
> > alternatives to OpenPBS which does most of the same
> > thing (batch
> > scheduling of jobs for clusters)? Interfaceability
> > using a GUI frontend
> > (without having to make one of my own) is definitely
> > a plus.
> > 
> > TIA
> >  
> > -- 
> > Dean Michael C. Berris
> > http://mikhailberis.blogspot.com
> > mikhailberis at free.net.ph
> > +63 919 8720686
> > GPG 08AE6EAC
> > 
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or
> > unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> -----------------------------------------------------------------
> ??? Yahoo!??
> ??????????????????????
> http://tw.promo.yahoo.com/mail_premium/stationery.html
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-- 
Dean Michael C. Berris
http://mikhailberis.blogspot.com
mikhailberis at free.net.ph
+63 919 8720686
GPG 08AE6EAC


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Wed Feb 18 05:47:27 2004
From: ds10025 at cam.ac.uk (ds10025 at cam.ac.uk)
Date: Wed, 18 Feb 2004 10:47:27 +0000
Subject: [Beowulf] Howto setup jobs using MPI
In-Reply-To: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>
References: <1077018992.18450.21.camel@mikhail>
Message-ID: <5.1.1.6.0.20040218104616.02a89e00@imap.hermes.cam.ac.uk>

Hi

How best to setup jobs using MPI?

Dan

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mack.joseph at epa.gov  Wed Feb 18 07:22:07 2004
From: mack.joseph at epa.gov (Joseph Mack)
Date: Wed, 18 Feb 2004 07:22:07 -0500
Subject: [Beowulf] S.M.A.R.T usage in big clusters
References: <20040214192822.35170.qmail@web21203.mail.yahoo.com> <Pine.LNX.4.58.0402160807020.19741@boxer.fnal.gov>
Message-ID: <403358EF.7F0BDE75@epa.gov>

Steven Timm wrote:
> 

> On the drives where we have had the most failures we've kept track
> of how well SMART predicted it pretty well.. it finds an error
> in advance about half the time.

How do you get your information out of smartd?

I've found output in syslog - presumably I can grep for this.

I can get e-mail if I want (from the docs).

To look at the output of the long and short tests it appears that
I have to interactively use smartctl.

Is there anyway to have a flag that can be looked at periodically to 
say "this disk is about to fail"?

Thanks Joe
-- 
Joseph Mack PhD, High Performance Computing & Scientific Visualization
SAIC, Supporting the EPA Research Triangle Park, NC 919-541-0007
Federal Contact - John B. Smith 919-541-1087 - smith.johnb at epa.gov
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Wed Feb 18 09:16:48 2004
From: timm at fnal.gov (Steven Timm)
Date: Wed, 18 Feb 2004 08:16:48 -0600 (CST)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <403358EF.7F0BDE75@epa.gov>
References: <20040214192822.35170.qmail@web21203.mail.yahoo.com>
 <Pine.LNX.4.58.0402160807020.19741@boxer.fnal.gov> <403358EF.7F0BDE75@epa.gov>
Message-ID: <Pine.LNX.4.58.0402180815110.22943@boxer.fnal.gov>

On Wed, 18 Feb 2004, Joseph Mack wrote:

> Steven Timm wrote:
> >
>
> > On the drives where we have had the most failures we've kept track
> > of how well SMART predicted it pretty well.. it finds an error
> > in advance about half the time.
>
> How do you get your information out of smartd?
>
> I've found output in syslog - presumably I can grep for this.

At the moment we are not using smartd.  I was running an older
version that didn't have it as part of the package. I wrote
some cron scripts that do a short test every night and capture
the output to a file.  But we are going to transition and
use smartd and use an agent we already have that is grepping
/var/log/messages for other purposes.

Steve Timm


>
> I can get e-mail if I want (from the docs).
>
> To look at the output of the long and short tests it appears that
> I have to interactively use smartctl.
>
> Is there anyway to have a flag that can be looked at periodically to
> say "this disk is about to fail"?
>
> Thanks Joe
> --
> Joseph Mack PhD, High Performance Computing & Scientific Visualization
> SAIC, Supporting the EPA Research Triangle Park, NC 919-541-0007
> Federal Contact - John B. Smith 919-541-1087 - smith.johnb at epa.gov
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dieter at engr.uky.edu  Wed Feb 18 09:35:55 2004
From: dieter at engr.uky.edu (William Dieter)
Date: Wed, 18 Feb 2004 09:35:55 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <002701c3f5d7$ceae2980$36a8a8c0@LAPTOP152422>
Message-ID: <C2BE53A4-621F-11D8-B4A2-000393BF25C6@engr.uky.edu>


On Tuesday, February 17, 2004, at 11:28 PM, Jim Lux wrote:
> This kind of thing is way cool..
> Have you published the algorithm behind the page in a concise form
> somewhere?  It would be handy to be able to point mission/system 
> planners to
> it.

We just submitted the paper to IEEE Computer for review last week.  If 
you want to look at the source code, it is available through 
<http://bdr.sourceforge.net>.  I haven't made an official tarball 
release yet, but you can get the latest code through CVS.

If you want to make your own parts database on our website you can do 
that, too.  It copies one of the existing databases into a new one, so 
if you just want to update a few prices, or add a few new parts, it 
doesn't take too much effort.

Bill.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hanzl at noel.feld.cvut.cz  Wed Feb 18 11:28:25 2004
From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz)
Date: Wed, 18 Feb 2004 17:28:25 +0100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077099918.4818.15.camel@mikhail>
References: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>
	<1077099918.4818.15.camel@mikhail>
Message-ID: <20040218172825E.hanzl@unknown-domain>

> However, as a Debian user, I did an apt-cache search on batch system and
> a couple of packages were Queue and DQS (Distributed Queueing System). I
> went over to the DQS website, and I'm reading on it right now. What I'd
> like to know would be how different DQS (and/or Queue) is with regards
> to SPBS and SGE?

DQS is SGE's grandfather, the genealogy goes somehow like this:

  DQS(Florida State Univ.) -> CODINE(Genias) -> SGE(Sun)

so you can expect DQS to be much simpler but also you can expect SGE
to be much improoved.

(My personal choice is SGE and I am quite happy with it.)

Regards

Vaclav Hanzl

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Wed Feb 18 09:35:23 2004
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Wed, 18 Feb 2004 15:35:23 +0100 (CET)
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu>
Message-ID: <Pine.LNX.4.44.0402181528530.7667-100000@kenzo.iwr.uni-heidelberg.de>

On Tue, 17 Feb 2004, William Dieter wrote:

> 23 	Generic Fast Ethernet NIC 	  $8.00 	$184.00

How much in terms of power have you assigned to this item ?
If you really buy a cheap low-end FE NIC, you'll most probably end up 
with a RTL8139 based card. This chip by design puts quite a load on 
the main CPU especially if you use it in a cluster context (=lots of 
network activity). This might increase significantly the power 
consumption or reduce the available flops...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dieter at engr.uky.edu  Wed Feb 18 10:24:31 2004
From: dieter at engr.uky.edu (William Dieter)
Date: Wed, 18 Feb 2004 10:24:31 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <Pine.LNX.4.44.0402181528530.7667-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <8CD39EDD-6226-11D8-B4A2-000393BF25C6@engr.uky.edu>


On Wednesday, February 18, 2004, at 09:35 AM, Bogdan Costescu wrote:

> On Tue, 17 Feb 2004, William Dieter wrote:
>
>> 23 	Generic Fast Ethernet NIC 	  $8.00 	$184.00
>
> How much in terms of power have you assigned to this item ?

The tool is not perfect.  We have not broken down the power to that 
level of detail.  There is a tradeoff between how much work you have to 
do for each component and how much detail the model has.

> If you really buy a cheap low-end FE NIC, you'll most probably end up
> with a RTL8139 based card. This chip by design puts quite a load on
> the main CPU especially if you use it in a cluster context (=lots of
> network activity). This might increase significantly the power
> consumption or reduce the available flops...

To get really accurate power consumption numbers we would have to 
measure for many different CPU/Motherboard/NIC combinations.

OTOH, there are some really cheap cards based on the Davicom 9102 
chipset, (newegg.com has at least two different brands for $4.00 to 
$6.00).  The Davicom 9102  is enough of a tulip clone that the Ethernet 
HOWTO recommends trying the tulip driver before the manufacturer 
supplied driver...

Bill.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Wed Feb 18 13:27:55 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Wed, 18 Feb 2004 12:27:55 -0600 (CST)
Subject: [Beowulf] Best or standard hpc kernel sysctl settings.
Message-ID: <Pine.GSO.4.58.0402181225040.14188@is.rice.edu>

As part of our standards documentation, I'd like to set a good starting
point for tuning various kernel parameters for clusters on Rice's campus.

We have a few sysctl settings that we do based on the requirements of
certain codes, but I'd like to know how everyone else is tuning their
linux systems in their clusters.

Can I get from you guys the sysctl parameter, it's value, and the reason
why you set it that way?

Thanks,
Brent Clements
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dag at sonsorol.org  Wed Feb 18 15:54:58 2004
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed, 18 Feb 2004 15:54:58 -0500
Subject: [Beowulf] 2nd call for speakers -- Bioclusters 2004 Workshop -- March 30 Boston,
 MA
Message-ID: <4033D122.4080008@sonsorol.org>


{ Apologies for the cross-posting }

Enclosed is a meeting announcement for a 1 day workshop we are 
organizing alongside the much larger 'BioITWorld Expo' in Boston, Ma.

The goals are two-fold -- recreating the vibe from the OReilly 
Bioinformatics Technology conference series that was recently cancelled 
as well as providing a forum where folks involved at the intersection of 
life science research and high performance IT can come together to talk 
shop.

Feel free to pass along the enclosed announcement as appropriate. We are 
actively seeking technical talks and presentations focusing on how 
challenging problems were solved or overcome.

Regards,
Chris {on behalf of the organizing committee}
Email: bioclusters04 at open-bio.org


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bioclusters-workshop.txt
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20040218/4a933dd1/attachment.txt>

From nixon at nsc.liu.se  Tue Feb 17 09:16:39 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Tue, 17 Feb 2004 15:16:39 +0100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077018992.18450.21.camel@mikhail> (Dean Michael C. Berris's
 message of "17 Feb 2004 19:56:34 +0800")
References: <1077018992.18450.21.camel@mikhail>
Message-ID: <m3znbhaj08.fsf@nammatj.nsc.liu.se>

"Dean Michael C. Berris" <mikhailberis at free.net.ph> writes:

> I've been reading a lot about OpenPBS and the Maui scheduler, but as
> mentioned in the list and also evident in the website, the OpenPBS
> system is not readily downloadable/distributable. 

Torque (a.k.a Storm, a.k.a. Scalable PBS) is a fork of the OpenPBS
source tree, with active maintenance and a reasonable license.

http://www.supercluster.org/projects/torque

It plays nicely with Maui.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.giesen at kodak.com  Tue Feb 17 15:58:57 2004
From: david.giesen at kodak.com (David J Giesen)
Date: Tue, 17 Feb 2004 15:58:57 -0500
Subject: [Beowulf] Cluster questions for Quantum Chemistry
Message-ID: <40328091.A0200730@kodak.com>

Hello-

(Apologies to those who have seen a similar question on the CCL mailing
list)

We may be in the market for a new Linux cluster these days. 
Unfortunately, I haven't kept up on all the latest issues, and I'd
appreciate any answers you all have for any of these questions.

We want to run mainly QM codes such as Gaussian 98/Gaussian 03, Jaguar
and PQS on these
machines with linux.  We'd likely be running in parallel, typically
across 3-4 dual-processor nodes.

1) Xeon vs P4:  
[a] At the same GHz and front-side bus speed is there a difference in
performance between these chips?

[b] Is there a difference in reliability?

2) AMD Opteron vs Athlon: 
[a] Does any QM code actually take advantage of Opteron's 64-bit
technology? 

[b] Have people moved away from Athlon boxes because of heat problems?

3) AMD vs Intel: How to compare speeds between these two different types
of processors for QM codes?  Does an Athlon 2800 (2.08 GHz) run more
like a 2.0 GHz P4 or a 2.8 GHz P4?

3) How important is front-side bus speed these days for quantum
chemistry problems?

4) How important are 100 MHz ethernet versus 1 Gb ethernet connections
between the nodes for quantum chemistry problems?

Thanks in advance!

Dave

Any questions which highlight out my extreme stupidity are a result of
exactly that (my own stupidity) rather than a reflection on the
positions of the Eastman Kodak Company.

-- 
Dr. David J. Giesen
Eastman Kodak Company                           david.giesen at kodak.com
2/83/RL MC 02216                                (ph) 1-585-58(8-0480)
Rochester, NY 14650                             (fax)1-585-588-1839


-- 
Dr. David J. Giesen
Eastman Kodak Company                           david.giesen at kodak.com
2/83/RL MC 02216                                (ph) 1-585-58(8-0480)
Rochester, NY 14650                             (fax)1-585-588-1839

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Feb 18 22:09:53 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Thu, 19 Feb 2004 11:09:53 +0800 (CST)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077099918.4818.15.camel@mikhail>
Message-ID: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com>

 --- "Dean Michael C. Berris" 
> I'm  currently reading
> on SGE, and am going to be choosing as soon as I get
> the full picture.
> Currently my preference is still towards SPBS
> (Torque) mainly because it
> doesn't seem as complicated to set up.

To install SGE, you don't even need to compile the
source, just download the pre-compiled binary package,
or grab the rpm.

And also, SGE doesn't require root access, it can
untar the package in your home directory, run the
install scripts, and start playing with it.

> What I'd
> like to know would be how different DQS (and/or
> Queue) is with regards
> to SPBS and SGE?

Debian is planning to replace DQS with SGE, but the
maintainer of DQS was gone (he left the university).

DQS and SGE are very similar. And PBS and SPBS are
very similar too.

> It would seem like from what I've been reading, SGE
> and SPBS are really
> for clusters (and grids), and DQS is for a
> collection of computers that
> really don't work as a cluster (or as a parallel
> computer). How accurate
> is this assessment of mine?

Are you talking about compute farms?

SGE is also used in compute farms as well, where
people run EDA simulations, graphic rendering jobs,
BLAST jobs, etc.

SGE has quite a lot of resource management features.

SPBS/PBS are used in HPC clusters, since before SGE
was opensource, PBS was free/opensource, so more
people used it in those environments.


> Are there any articles written by people in the
> group regarding
> comparisons between SGE and SPBS with regards to
> effectivity and
> reliability?

SGE vs PBS on the rocks cluster mailing list:
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-September/002980.html

SPBS has lots of patches integrated, but still if your
SPBS master node crashes, your cluster is gone.

In SGE, the admin can config 1 or more shadow masters,
so in theory as long as any one machine in the cluster
is running, your cluster is not dead.

>  Scalability is also a factor because
> the cluster may grow
> as more funding and problems get into the cluster
> project.

Both SGE and SPBS can scale to thousands of nodes, the
question is, do you have the funding? :-)

(SGE 6.0 will scale even further)

> I hope I never cease to get enlightened from posts
> in the group, and
> insights would be most appreciated.

I think you should try to install both, it is better
to feel it than to just listen to other people.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Feb 18 22:38:18 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 19 Feb 2004 14:38:18 +1100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com>
References: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com>
Message-ID: <200402191438.19333.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 19 Feb 2004 02:09 pm, Andrew Wang wrote:

> SPBS has lots of patches integrated, but still if your
> SPBS master node crashes, your cluster is gone.

Well, depends on your definition of "gone" really.

People can't queue new jobs, jobs waiting to run won't be started, but as long 
as your filestore is elsewhere then running jobs won't be interrupted.

However, if your filestore server disappears then you're stuffed. :-)

Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFANC+qO2KABBYQAh8RApu3AKCET1tayR/fx4dStcQXO+AXJgThUACdE+3q
jHWTp4HmlzO8CnmObbFarWA=
=PrTq
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Thu Feb 19 10:41:26 2004
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Thu, 19 Feb 2004 07:41:26 -0800 (PST)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <200402191438.19333.csamuel@vpac.org>
Message-ID: <20040219154126.56423.qmail@web11411.mail.yahoo.com>

I think it is one of the biggest problems with *PBS, especially in the
compute farm environment.

The more advanced batch systems (SGE and LSF) have this feature for
years, not sure why *PBS still don't have it.

(AFAIK, PBSPro 5.4 will include it, but isn't it late??)

Rayson


--- Chris Samuel <csamuel at vpac.org> wrote:

> Well, depends on your definition of "gone" really.
> 
> People can't queue new jobs, jobs waiting to run won't be started,
> but as long 
> as your filestore is elsewhere then running jobs won't be
> interrupted.
> 
> However, if your filestore server disappears then you're stuffed. :-)
> 
> Chris
> 

__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.brookes at quadrics.com  Thu Feb 19 10:50:43 2004
From: john.brookes at quadrics.com (john.brookes at quadrics.com)
Date: Thu, 19 Feb 2004 15:50:43 -0000
Subject: [Beowulf] Best Setup for Batch Systems
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F6234EB15@tardis0.quadrics.com>

If you keep the db on a separate filestore then - if your pbs server goes down - you can just have a failover server that 'becomes' (takes over the ipaddr and hostname - the other nodes won't even notice the difference) the original server if the original gets screwed. We've got a couple of customers that do this, but YMMV as they use: a) a somewhat non-standard PBS; b) out-of-band management to ensure that the node isn't just temporarily unresponsive.

Cheers,

John Brookes
Quadrics

> -----Original Message-----
> From: Chris Samuel [mailto:csamuel at vpac.org]
> Sent: 19 February 2004 03:38
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] Best Setup for Batch Systems
> 
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Thu, 19 Feb 2004 02:09 pm, Andrew Wang wrote:
> 
> > SPBS has lots of patches integrated, but still if your
> > SPBS master node crashes, your cluster is gone.
> 
> Well, depends on your definition of "gone" really.
> 
> People can't queue new jobs, jobs waiting to run won't be 
> started, but as long 
> as your filestore is elsewhere then running jobs won't be interrupted.
> 
> However, if your filestore server disappears then you're stuffed. :-)
> 
> Chris
> - -- 
>  Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
> 
> iD8DBQFANC+qO2KABBYQAh8RApu3AKCET1tayR/fx4dStcQXO+AXJgThUACdE+3q
> jHWTp4HmlzO8CnmObbFarWA=
> =PrTq
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From radams at csail.mit.edu  Thu Feb 19 14:03:38 2004
From: radams at csail.mit.edu (Ryan Adams)
Date: Thu, 19 Feb 2004 14:03:38 -0500
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
Message-ID: <1077217418.4982.35.camel@localhost>

Please forgive the length of this email, as I'm going to try to be
comprehensive:

I have a problem that divides nicely (embarrassingly?) into
parallelizable chunks.  Each chunk takes approximately 2 to 5 seconds to
complete and requires no communication during that time.  Essentially
there is a piece of data, around 500KB that must be processed and a
result returned.  I'd like to process as many of these pieces of data as
possible.  I am considering building a small heterogeneous cluster to do
this (at home, basically), and am trying to decide exactly how to
architect the task distribution.  

The network will probably be Fast Ethernet.  Initially there will be
four machines processing the data, but I could imagine as many as ten in
the near term.  My current back-of-the-envelope math puts an aggregate
load (assuming 2.0s per job, 500KB transferred each, with ten nodes) of
2.5MB/s on the network, so it would seem that 100BT can get the job done
without introducing much delay compared to the 2.0s execution time. 
Perhaps I am doing this math wrong, but I was also thinking that since
the download of the data is such an I/O-intensive task that it would be
reasonable to place that in a separate thread from the floating point
calculations.  This way, I could hope to work on data while my socket
read is blocking.

My question is basically this: is 2-5 seconds too small of a job to
justify a batching system like *PBS or Gridengine?  It would seem that
the overhead for a job that requires a few hours would be very
insignificant, but what about a few seconds?  Certainly, one option
would be to bundle sets of these chunks together for a larger effective
job.  Am I wasting my time thinking about this?

I've been considering rolling my own scheduling system using some kind
of RPC, but I've been around software development long enough to know
that it is better to use something off-the-shelf if at all possible.

Thanks in advance...

Ryan


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 19 14:20:04 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 19 Feb 2004 14:20:04 -0500 (EST)
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <1077217418.4982.35.camel@localhost>
Message-ID: <Pine.LNX.4.44.0402191351170.21826-100000@lilith.rgb.private.net>

On Thu, 19 Feb 2004, Ryan Adams wrote:

> My question is basically this: is 2-5 seconds too small of a job to
> justify a batching system like *PBS or Gridengine?  It would seem that
> the overhead for a job that requires a few hours would be very
> insignificant, but what about a few seconds?  Certainly, one option
> would be to bundle sets of these chunks together for a larger effective
> job.  Am I wasting my time thinking about this?
> 
> I've been considering rolling my own scheduling system using some kind
> of RPC, but I've been around software development long enough to know
> that it is better to use something off-the-shelf if at all possible.
> 
> Thanks in advance...

I personally think that it is too small a task to use a batching system,
especially since you're likely not going to architect it as a true
batching system.

I think you have three primary options for ways to develop your code.
Well, four if you count NFS.

The SIMPLEST way is to put your data blocks in files on an NFS
crossmounted filesystem, and start jobs inside e.g. a simple perl script
loop that grabs "the next data file" and runs on it and writes out its
results back to the NFS file system for dealing with or accruing later.
You're basically using NFS as your transport mechanism.  Now, NFS isn't
horribly efficient relative to raw peak network speed, but neither is it
completely horrible -- at 100 BT (say 9-10 MB/sec peak BW) you ought to
be able to get at least half of that on an NFS read of a big file.  At 5
MB/sec, your 1/2 MB file should take a 0.1 seconds to be read (plus a
latency hit) which is "small" (as you note) compared to a run time of
2-5 seconds so you should be able to get nice parallel speedup on four
or five hosts.  You can test your combined latency and bandwidth with a
simple perl script or binary that opens a dozen (different!) files
inside a loop.  Beware caching, which will give you insane numbers if
you aren't careful (as in don't run the test twice on the files without
modifying them on the server).

The other three ways do it "properly" and permit you both finer control
(with the NFS method you'll have to work out file locking and work
distribution to make sure two nodes don't try to work on the same file
at the same time) and higher BW, close to the full bandwidth of the
network.  They'll ALSO require more programming.

  a) PVM

  b) MPI

  c) raw networking.

PVM is a message passing library.  There is a PVM program template on my
personal GPL software website:

 http://www.phy.duke.edu/~rgb/General/general.php

that might suffice to get you started -- it should just compile and run
a simple master/slave program, and you should be able to modify it
fairly simply to have the master distribute the next block of work to
the first worker/slave to finish.  If your CPUs are well balanced the
I/O transactions will antibunch and communications will be very
efficient.

MPI is another message passing library.  I don't have an MPI template,
but there are example programs in the MPI distributions and on many
websites, and there are books (on both PVM and MPI) from e.g. MIT press
that are quite excellent.  There is also a regular MPI column in Cluster
World Magazine that has been working through intro level MPI
applications, and old columns by Forrest Hoffman in Linux Magazine
ditto.  At least -- google is your friend.

Both PVM and MPI are likely to be similar in ease of programming, hassle
of setting up a parallel environment, and speed, and both of them should
give you a very healthy fraction of wirespeed while shielding you from
having to directly manipulate the network.

Finally there are raw sockets (which it sounds like you are inclined
towards).  Now, I have nothing against raw socket programming (he says
having spent the day on xmlsysd/wulflogger/libwulf, a raw socket-based
monitoring program:-).  However, it is NOT trivial -- you have to invent
all sorts of wheels that are already invented for you and wrapped up in
simple library calls with PVM or MPI.  Its advantages are maximal speed
-- you can't get faster than a point to point network connection -- the
ability to thread the connection/I/O component and MAYBE take advantage
of letting the NIC do some of the work via DMA while the CPU is doing
other work, and complete control.  The disadvantages are that you'll be
responsible for determining e.g. message length, dealing with a dropped
connection without crashing everything, debugging your server daemon and
worker clients (or worker daemons and master client) in parallel when
they are running on different machines, and so forth.

I >>might<< be able to provide you with some applications that aren't
exactly templates but that illustrate how to get started on this
approach (and refer you to some key books) but if you really are a
networking novice you'll need to want to do this as an excuse to stop
being a novice by writing your own application or it isn't worth it.
You'll need to be a much better and more skilled programmer altogether
in order to debug everything and check for the myriad of error
conditions that can occur and deal with them robustly.

There are really a few other approaches -- perl now supports threads so
you CAN use a perl script and ssh as a master/work distribution system
-- but raw sockets aren't much easier to manage in perl than they are in
C and using ssh as a transport layer adds overhead at least equal to or
in excess to NFS, so you'd probably want to use NFS as transport and the
perl script to just manage task distribution (for which it is ideally
suited in this simple a context).  I have a nice example threaded perl
task distribution script (which I wrote for MY Cluster Magazine column
some months ago:-) which I can put somewhere if this interests you.

 HTH,

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dcs at et.byu.edu  Thu Feb 19 15:19:56 2004
From: dcs at et.byu.edu (Dave Stirling)
Date: Thu, 19 Feb 2004 13:19:56 -0700 (MST)
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?
Message-ID: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu>

Hi all,

While performance (latency, bandwidth) usually comes to the fore in
discussions about high performance interconnects for MPI clusters, I'm
curious as to what your experiences are from the standpoint of
manageability -- NIC's and spines and switches all fail at one time or
another, but I'd like input as to how individual products (Myrinet,
Quadrics, Infiniband, etc) handle this.  In your clusters does the
hardware replacement involve simple steps (swap out the NIC, rerun some
config utilities) or something more complex (such as bringing down the
entire high speed network to reconfigure it so all the nodes can talk to
the new hardware); i.e., How painful is it to replace a single failed NIC?

I'd imagine that most cluster admins are reluctant to interrupt running
jobs in order to re-initialize the equipment after hardware replacement.
Any information about how your clusters running high-speed interconnects
handle interconnect hardware failure/replacement would be very helpful.

Thanks,

Dave Stirling
Brigham Young University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Feb 19 17:22:38 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 20 Feb 2004 09:22:38 +1100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <20040219154126.56423.qmail@web11411.mail.yahoo.com>
References: <20040219154126.56423.qmail@web11411.mail.yahoo.com>
Message-ID: <200402200922.39632.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 20 Feb 2004 02:41 am, Rayson Ho wrote:

[No failover support in the pbs_server]

> I think it is one of the biggest problems with *PBS, especially in the
> compute farm environment.

Torque (formerly SPBS) is very stable, especially since we helped the 
SuperCluster folks clobber the various memory leaks in the server.

Our pbs_server has been running for almost a month now since I last restarted 
it (because I was doing a bit of system maintenance, not because of PBS 
problems, I think it'd been running for about 2 months before that) and it's 
only VSZ 3148 and RSS 2136. :-)

NB: I'm still running an SPBS release from early November as that's when we 
fixed the last memory leak and it's worked like a dream since then.

> The more advanced batch systems (SGE and LSF) have this feature for
> years, not sure why *PBS still don't have it.

I believe it's on the SuperCluster folks list of things to do, but they've 
been busy working on the stability front (as well as MAUI and Silver).

CC'd to the SuperCluster folks so they can respond.

> (AFAIK, PBSPro 5.4 will include it, but isn't it late??)

No idea, don't use it.

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFANTcuO2KABBYQAh8RAk8AAJ0ZGx3+qLPHWMjFkG7PGD8pPzwBWwCeKnUQ
u1aXnixvHrknKTqtNVDRVhM=
=28y0
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel.kidger at quadrics.com  Thu Feb 19 18:13:20 2004
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Thu, 19 Feb 2004 23:13:20 +0000
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?
In-Reply-To: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu>
References: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu>
Message-ID: <200402192313.20932.daniel.kidger@quadrics.com>

Dave,

> While performance (latency, bandwidth) usually comes to the fore in
> discussions about high performance interconnects for MPI clusters, I'm
> curious as to what your experiences are from the standpoint of
> manageability -- NIC's and spines and switches all fail at one time or
> another, but I'd like input as to how individual products (Myrinet,
> Quadrics, Infiniband, etc) handle this.  In your clusters does the
> hardware replacement involve simple steps (swap out the NIC, rerun some
> config utilities) or something more complex (such as bringing down the
> entire high speed network to reconfigure it so all the nodes can talk to
> the new hardware); i.e., How painful is it to replace a single failed NIC?
>
> I'd imagine that most cluster admins are reluctant to interrupt running
> jobs in order to re-initialize the equipment after hardware replacement.
> Any information about how your clusters running high-speed interconnects
> handle interconnect hardware failure/replacement would be very helpful.


AFAIK all interconnects would allow the swap of a NIC without bringing down 
the whole network - but in all cases any parallel job running on that node would 
need to be aborted since in general high-speed interconect PCI cards are not
hot-swappable - that node woudl need to be power-cycled.

As for the cables and switches, I can't speak for other vendors - but for example a
line card in a Quadrics Switch can be hot-swapped even while there are running 
MPI jobs that are sending data through that line card at the time - the jobs simply 
pause until the cables are reconnected. I would expect that other interconnects
are the same in this respect?


Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Thu Feb 19 18:07:43 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 20 Feb 2004 00:07:43 +0100 (CET)
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <1077217418.4982.35.camel@localhost>
Message-ID: <Pine.LNX.4.44.0402200005500.26723-100000@druifje.clustervision.com>

On Thu, 19 Feb 2004, Ryan Adams wrote:

> Please forgive the length of this email, as I'm going to try to be
> comprehensive:
> 
There was a discussion on the Gridengine user list recently,
regarding submitting lots and lots of short jobs in a bank in London.
It developed into quite an interesting discussion, and I learned lots.

Sorry - I tried to find the thread, but can't quite get the correct
keywords.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Thu Feb 19 20:04:32 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Fri, 20 Feb 2004 09:04:32 +0800 (CST)
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <1077217418.4982.35.camel@localhost>
Message-ID: <20040220010432.88699.qmail@web16802.mail.tpe.yahoo.com>

--- Ryan Adams <radams at csail.mit.edu> ???? 
> My question is basically this: is 2-5 seconds too
> small of a job to
> justify a batching system like *PBS or Gridengine? 

Yes, 10 minutes or greater sound more reasonable.

May be you can chunk 100 or more of those tasks into a
job and submit it into a batch system.

Also, from the "Tuning guide" HOWTO on the GridEngine
website, SGE has a feature called
"scheduling-on-demand" -- seems like it will help a
lot since the scheduler is activated whenever a job
arrives or a machine becomes available.

Andrew.


> Certainly, one option
> would be to bundle sets of these chunks together for
> a larger effective
> job.  Am I wasting my time thinking about this?
> 
> I've been considering rolling my own scheduling
> system using some kind
> of RPC, but I've been around software development
> long enough to know
> that it is better to use something off-the-shelf if
> at all possible.
> 
> Thanks in advance...
> 
> Ryan
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Thu Feb 19 20:13:18 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Fri, 20 Feb 2004 09:13:18 +0800 (CST)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <200402200922.39632.csamuel@vpac.org>
Message-ID: <20040220011318.63921.qmail@web16804.mail.tpe.yahoo.com>

 --- Chris Samuel <csamuel at vpac.org> ????> -----
> Torque (formerly SPBS) is very stable, especially
> since we helped the 
> SuperCluster folks clobber the various memory leaks
> in the server.

It's not whether PBS itself is stable or not. There
are human errors, machine problems, network problems,
etc...

And besides, the master machine also needed to be
taken offline for OS upgrade.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From tobeveryhonest at hotmail.com  Fri Feb 20 03:55:22 2004
From: tobeveryhonest at hotmail.com (Salman Guy)
Date: Fri, 20 Feb 2004 08:55:22 +0000
Subject: [Beowulf] want to implement a Beowulf cluster
Message-ID: <BAY8-F96MSoshy1Zzej00026803@hotmail.com>

hi all,
I want to learn Beowulf cluster implementation practically and for this 
purpose i need some help from u ppl.....I need reading material and ebooks 
so if anyone of u has done some practical work on Beowulf clusters then plz 
guide me or send me information regarding this,

help will be appreciated ...thanx in advance

_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.  
http://join.msn.com/?page=features/virus&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Feb 20 06:13:02 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 20 Feb 2004 12:13:02 +0100 (CET)
Subject: [Beowulf] want to implement a Beowulf cluster
In-Reply-To: <BAY8-F96MSoshy1Zzej00026803@hotmail.com>
Message-ID: <Pine.LNX.4.44.0402201211480.31095-100000@druifje.clustervision.com>

On Fri, 20 Feb 2004, Salman Guy wrote:

> hi all,
> I want to learn Beowulf cluster implementation practically and for this 
> purpose i need some help from u ppl.....I need reading material and ebooks 
> so if anyone of u has done some practical work on Beowulf clusters then plz 
> guide me or send me information regarding this,
> 
I think we need a FAQ here :-)
Sorry I'm in a rush to go off on the train to FOSDEM in Brussels.

SO I always say:
Look at Robert Browns webpages at Duke

The books 'Linux Clustering' by Charles Bookman
and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Fri Feb 20 05:10:38 2004
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Fri, 20 Feb 2004 11:10:38 +0100
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?
In-Reply-To: <200402192313.20932.daniel.kidger@quadrics.com>
References: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu> <200402192313.20932.daniel.kidger@quadrics.com>
Message-ID: <200402201110.38850.joachim@ccrl-nece.de>

Dan Kidger:
> AFAIK all interconnects would allow the swap of a NIC without bringing down
> the whole network - but in all cases any parallel job running on that node
> would need to be aborted since in general high-speed interconect PCI cards
> are not hot-swappable - that node woudl need to be power-cycled.

AFAIK, this is the same for SCI, but I would need to check this to be sure. 
Anyway, the application using the adapter to be swapped would have to be 
restarted anyway as its resources are gone. Avoiding this would be very hard, 
if at all possible.

> As for the cables and switches, I can't speak for other vendors - but for
> example a line card in a Quadrics Switch can be hot-swapped even while
> there are running MPI jobs that are sending data through that line card at
> the time - the jobs simply pause until the cables are reconnected. I would
> expect that other interconnects are the same in this respect?

SCI typically uses no external switches, and concerning the exchange of 
adapters or cables, there are two strategies: the application(s) has/have to 
wait until transfers are again successful, or the driver recognizes the 
problem and changes the routing. Of course, this can be combined into a 
two-phase strategy. I guess this is the way Scali is doing it.

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at yahoo.com  Fri Feb 20 08:41:11 2004
From: lathama at yahoo.com (Andrew Latham)
Date: Fri, 20 Feb 2004 05:41:11 -0800 (PST)
Subject: [Beowulf] want to implement a Beowulf cluster
In-Reply-To: <Pine.LNX.4.44.0402201211480.31095-100000@druifje.clustervision.com>
Message-ID: <20040220134111.27571.qmail@web60305.mail.yahoo.com>

or download the mailing list archive for the last year!

thats an ebook all to its self

--- John Hearns <john.hearns at clustervision.com> wrote:
> On Fri, 20 Feb 2004, Salman Guy wrote:
> 
> > hi all,
> > I want to learn Beowulf cluster implementation practically and for this 
> > purpose i need some help from u ppl.....I need reading material and ebooks 
> > so if anyone of u has done some practical work on Beowulf clusters then plz
> 
> > guide me or send me information regarding this,
> > 
> I think we need a FAQ here :-)
> Sorry I'm in a rush to go off on the train to FOSDEM in Brussels.
> 
> SO I always say:
> Look at Robert Browns webpages at Duke
> 
> The books 'Linux Clustering' by Charles Bookman
> and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


=====
*----------------------------------------------------------*
Andrew Latham AKA: LATHAMA (lay-th-ham-eh) - LATHAMA.COM
LATHAMA at LATHAMA.COM - LATHAMA at YAHOO.COM
If yahoo.com is down we have bigger problems than my email!
*----------------------------------------------------------*
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gmpc at sanger.ac.uk  Fri Feb 20 13:39:49 2004
From: gmpc at sanger.ac.uk (Guy Coates)
Date: Fri, 20 Feb 2004 18:39:49 +0000 (GMT)
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <200402201109.i1KB99h12383@NewBlue.scyld.com>
References: <200402201109.i1KB99h12383@NewBlue.scyld.com>
Message-ID: <Pine.OSF.4.44.0402201831230.865376-100000@ecs2a.internal.sanger.ac.uk>

>
> My question is basically this: is 2-5 seconds too small of a job to
> justify a batching system like *PBS or Gridengine?

That workload is do-able with the right queuing system. LSF (don't know
about gridengine off hand) has a concept of "job chunking" for dealing
with short running jobs.

The queuing system batches up a number of jobs (eg 10 or 20) and then
submits them all on one go to the work host where they run sequentially.
This cuts down on the scheduling overhead.

We've just had a user push 250,000 short running jobs though our cluster
this-afternoon using this approach.

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Fri Feb 20 15:12:09 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Fri, 20 Feb 2004 15:12:09 -0500 (EST)
Subject: [Beowulf] want to implement a Beowulf cluster
In-Reply-To: <Pine.LNX.4.44.0402201211480.31095-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.44.0402201448570.5271-100000@boltzmann-internal>

On Fri, 20 Feb 2004, John Hearns wrote:

> On Fri, 20 Feb 2004, Salman Guy wrote:
> 
> > hi all,
> > I want to learn Beowulf cluster implementation practically and for this 
> > purpose i need some help from u ppl.....I need reading material and ebooks 
> > so if anyone of u has done some practical work on Beowulf clusters then plz 
> > guide me or send me information regarding this,
> > 
> I think we need a FAQ here :-)

There are the old FAQ and HOWTO's (still some relevant background 
information):

http://www.canonical.org/~kragen/beowulf-faq.txt
http://yara.ecn.purdue.edu/~pplinux/PPHOWTO/pphowto.html#toc1
http://www.tldp.org/HOWTO/Beowulf-HOWTO.html

There are other links at ClusterWorld.com (on the right side, scroll down)
that may be useful.

Now is a good time to announce my effort to update the FAQ (and possibly
the HOWTO).  Starting next week, I plan on updating the FAQ by using the
ClusterWorld.com site as a place to collect questions and answers. Stay
tuned. 

<shameless plug>
Of course ClusterWorld magazine is designed to provide this type of 
information as well. 
</shameless plug>

> Sorry I'm in a rush to go off on the train to FOSDEM in Brussels.
> 
> SO I always say:
> Look at Robert Browns webpages at Duke
> 

and book:
  http://www.phy.duke.edu/brahma/Resources/beowulf_book.php

> The books 'Linux Clustering' by Charles Bookman

IMO, this is not a good book for HPC clusters.

> and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent.

New edition:
http://www.amazon.com/exec/obidos/tg/detail/-/0262692929/102-0957058-4520116?v=glance


Doug
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mg_india at sancharnet.in  Sat Feb 21 19:18:55 2004
From: mg_india at sancharnet.in (Sawan Gupta)
Date: Sun, 22 Feb 2004 05:48:55 +0530
Subject: [Beowulf] Movie Editing Requirements
Message-ID: <000001c3f8d9$79436f00$8bd2003d@myserver>


Hello,

My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
system with 512 DDRAM and a 128 MB Graphic Card.

But when he perform some rendering operations, it takes nearly 10-15
minutes to complete.

He wishes to upgrade his system to dual XEON with more RAM to minimize
this time delay.

I want to know whether this will suit his requirments or a cluster is
just what he needs.
Please tell me which cluster can suit his requirements i.e.
Windows/Linux.
I mean which cluster can best suit these requirements.

Also are the softwares used by him also available for Linux or not. (If
the solution suggested is in Linux)


Regards,

Sawan Gupta
|| Mg_India at sancharnet.in ||

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Sat Feb 21 20:50:54 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Sat, 21 Feb 2004 20:50:54 -0500 (EST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <000001c3f8d9$79436f00$8bd2003d@myserver>
Message-ID: <Pine.LNX.4.44.0402212033390.9487-100000@coffee.psychology.mcmaster.ca>

> My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> system with 512 DDRAM and a 128 MB Graphic Card.

I would guess that none of this work is done by the graphics card,
so that his performance is strictly dependent on the P4 and the fairly
modest amount of ram he has.

I would guess that most of these applications are fairly memory-intensive,
and not particularly cache-friendly.  I doubt HT would matter in this case,
except that IIRC all HT CPUs are the 'c' model, and thus run with 6.4 GB/s
of dram bandwidth.  I'm sure you already know that 512M probably too low.

> But when he perform some rendering operations, it takes nearly 10-15
> minutes to complete.

if this was linux, I'd advise you to use tools like oprofile, vmstat, etc
to find out where it's spending its time.  since it's only windows, you'll
probably have to resort to watching the disk light, and running that nasty
little windows accessory that tells you about cpu/memory usage.

> He wishes to upgrade his system to dual XEON with more RAM to minimize
> this time delay.

sure.  though he'd almost certainly run faster with a dual-opteron,
since such systems deliver noticably more memory bandwidth and lower latency.

a dual-xeon can actually be slower than a uni P4c system.  it would probably
make sense to talk to him about how his machine and apps are configured first.
for instance, is he actually using HT, and does he notice any performance 
difference if he turns it off?  is his ram dual-bank-PC3200?  any sense of 
how much time is spent on disk IO?

> I want to know whether this will suit his requirments or a cluster is
> just what he needs.

clusters are clearly more scalable, and are widely used in the render/effects
industry.  comparing a pair of P4c's to a single dual-opteron, though,
I have no idea.  I think it would depend on his applications, mainly.

there's no clear answer to price/performance when it comes to clusters of 
duals vs unis.  unis tend to be too large, and in most cases wind up
replicating too many components, especially moving parts, to compete.
I believe most clusters, in any industry, are not unis.

> Please tell me which cluster can suit his requirements i.e.
> Windows/Linux.

windows is the right choice in exactly one situation: when the exact
configuration you need is available off-the-shelf, and you already know how
to use it.

linux (unix in general) is far more robust, easy-to-manage, flexible,
scalable, cheap, etc.  all those TCO studies sponsored by msft consist
of the following astonishing conclusion: if you have windows-only users
and a supply of cheap msce's and are comfortable with the crappy level of 
support that the ms world provides, then indeed windows is cheaper.

> Also are the softwares used by him also available for Linux or not. (If
> the solution suggested is in Linux)

only he can decide that.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Sat Feb 21 23:30:13 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Sat, 21 Feb 2004 22:30:13 -0600 (CST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu>
References: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu>
Message-ID: <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>

Actually a beowulf cluster can also run windows. There is a port of maya
to clusters...There are also many other movie editing software
distributions that work very well on clusters..It also doesn't matter what
os a beowulf cluster runs.

-Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Sat, 21 Feb 2004, Joel Jaeggli wrote:

> Given that it sounds like you're on windows, a beowulf cluster is not
> appropriate from your application...
>
>
> On Sun, 22 Feb 2004, Sawan Gupta wrote:
>
> >
> > Hello,
> >
> > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> > system with 512 DDRAM and a 128 MB Graphic Card.
> >
> > But when he perform some rendering operations, it takes nearly 10-15
> > minutes to complete.
> >
> > He wishes to upgrade his system to dual XEON with more RAM to minimize
> > this time delay.
> >
> > I want to know whether this will suit his requirments or a cluster is
> > just what he needs.
> > Please tell me which cluster can suit his requirements i.e.
> > Windows/Linux.
> > I mean which cluster can best suit these requirements.
> >
> > Also are the softwares used by him also available for Linux or not. (If
> > the solution suggested is in Linux)
> >
> >
> > Regards,
> >
> > Sawan Gupta
> > || Mg_India at sancharnet.in ||
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> --
> --------------------------------------------------------------------------
> Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu
> GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Sat Feb 21 23:18:09 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sat, 21 Feb 2004 20:18:09 -0800 (PST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <000001c3f8d9$79436f00$8bd2003d@myserver>
Message-ID: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu>

Given that it sounds like you're on windows, a beowulf cluster is not 
appropriate from your application...


On Sun, 22 Feb 2004, Sawan Gupta wrote:

> 
> Hello,
> 
> My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> system with 512 DDRAM and a 128 MB Graphic Card.
> 
> But when he perform some rendering operations, it takes nearly 10-15
> minutes to complete.
> 
> He wishes to upgrade his system to dual XEON with more RAM to minimize
> this time delay.
> 
> I want to know whether this will suit his requirments or a cluster is
> just what he needs.
> Please tell me which cluster can suit his requirements i.e.
> Windows/Linux.
> I mean which cluster can best suit these requirements.
> 
> Also are the softwares used by him also available for Linux or not. (If
> the solution suggested is in Linux)
> 
> 
> Regards,
> 
> Sawan Gupta
> || Mg_India at sancharnet.in ||
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From c00jsh00 at nchc.org.tw  Sun Feb 22 04:32:41 2004
From: c00jsh00 at nchc.org.tw (Jyh-Shyong Ho)
Date: Sun, 22 Feb 2004 17:32:41 +0800
Subject: [Beowulf] 64-bit Gaussian 03 on Opteron/SuSE
References: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com>
Message-ID: <40387739.3891D96C@nchc.org.tw>

Hi,

We have managed to built a native 64-bit version of Gaussian 03 Rev.B05
on a dual Opteron box running SLSE8 for AMD64 with 64-bit PGI
Workstation 
5.1.3 compiler and 64-bit GOTO library.

We ran all the test cases included in Gaussian 03 source code and
compared
the results against the reference results ran on SGI. All tests cases
are
successfully completed except test602 and test605 with error at the last
stage when l9999 tries to close files. 

There are several files in directory bsd need some modification:

machine.c  (add one section to return "x86_64" as machine
identification)
mdutil.c (add one section for x86_64)
mdutil.f (add one section for x86_64)
bldg03 (modify the file so it can pick up x86_64.make as g03.make)

and create a make file x86_64.make (use i386.make as a template)

The compiler used is pgf90, but l906 and l609 has to be compiled with
pgf77, in order to pass all the test cases.

We are running more tests and comparing the performance of 64-bit
version abd 32-bit version.

Regards

Jyh-Shyong Ho, Ph.D.
Research Scientist
National Center for High-Performance Computing
Hsinchu, Taiwan, ROC
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mirk at vsnl.com  Sat Feb 21 10:31:35 2004
From: mirk at vsnl.com (Mohd Irfan R Khan)
Date: Sat, 21 Feb 2004 21:01:35 +0530
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?
In-Reply-To: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu>
Message-ID: <GEEGLNJALCLEOGJEFECPIEJIGLAA.mirk@vsnl.com>


hi I am one using SCI (Dolphin) cards and I think in dolphin u don't have to
stop the whole cluster in case of failure.
In this there is a matrix where it always has redundancy if one machine
fails and the software provided by it (SCALI)
will route the data to other machine and will reroute it back once it finds
the line working properly.


Regards.
-----Original Message-----.

From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf
Of Dave Stirling
Sent: Friday, February 20, 2004 1:50 AM
To: beowulf at beowulf.org
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?


Hi all,

While performance (latency, bandwidth) usually comes to the fore in
discussions about high performance interconnects for MPI clusters, I'm
curious as to what your experiences are from the standpoint of
manageability -- NIC's and spines and switches all fail at one time or
another, but I'd like input as to how individual products (Myrinet,
Quadrics, Infiniband, etc) handle this.  In your clusters does the
hardware replacement involve simple steps (swap out the NIC, rerun some
config utilities) or something more complex (such as bringing down the
entire high speed network to reconfigure it so all the nodes can talk to
the new hardware); i.e., How painful is it to replace a single failed NIC?

I'd imagine that most cluster admins are reluctant to interrupt running
jobs in order to re-initialize the equipment after hardware replacement.
Any information about how your clusters running high-speed interconnects
handle interconnect hardware failure/replacement would be very helpful.

Thanks,

Dave Stirling
Brigham Young University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mwheeler at startext.co.uk  Sun Feb 22 09:57:05 2004
From: mwheeler at startext.co.uk (Martin WHEELER)
Date: Sun, 22 Feb 2004 14:57:05 +0000 (UTC)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>
Message-ID: <Pine.LNX.4.33.0402221452240.20737-100000@caxton.startext.demon.co.uk>

On Sat, 21 Feb 2004, Brent M. Clements wrote:

> It also doesn't matter what
> os a beowulf cluster runs.

 ..as long as that OS conforms to the definition of free software, that
is..

Or am I just an old fuddy-duddy, with out-of-date concepts?
-- 
Martin Wheeler   -   StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England
mwheeler at startext.co.uk                http://www.startext.co.uk/mwheeler/
GPG pub key : 01269BEB  6CAD BFFB DB11 653E B1B7 C62B  AC93 0ED8 0126 9BEB
      - Share your knowledge. It's a way of achieving immortality. -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Sun Feb 22 11:04:17 2004
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Sun, 22 Feb 2004 17:04:17 +0100 (CET)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <403358EF.7F0BDE75@epa.gov>
Message-ID: <Pine.LNX.4.21.0402221140410.384-100000@localhost>

On Wed, 18 Feb 2004, Joseph Mack wrote:
> How do you get your information out of smartd?
>
> I've found output in syslog - presumably I can grep for this.

I've done this for a while to get temperature information from a
server in our small group server room (together with MRTG we have a
nice history of temperature to show to the facilities people when the
temperature was too high again...).

The problem with greping for smartd information in the syslog file is
that there is no current information after a log rotation. That's why
I changed our cron jobs. Now I use a small setuid-root program which
starts "smartctl -a /dev/sdX" and then greps for the temperature.

- Felix

---
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From agrajag at dragaera.net  Sun Feb 22 10:20:20 2004
From: agrajag at dragaera.net (Jag)
Date: 22 Feb 2004 10:20:20 -0500
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>
References: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu>
	 <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>
Message-ID: <1077463220.2561.4.camel@loiosh>

On Sat, 2004-02-21 at 23:30, Brent M. Clements wrote:
> Actually a beowulf cluster can also run windows. There is a port of maya
> to clusters...There are also many other movie editing software
> distributions that work very well on clusters..It also doesn't matter what
> os a beowulf cluster runs.

By definition, a beowulf cluster uses a free/open OS.  So, a beowulf
cluster can't run windows.  However, an HPC (High Performance Computing)
cluster doesn't have that requirement.

I know its kinda nitpicking to try to distinguish between Beowulf
cluster and HPC cluster, but in some ways it is important.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Sun Feb 22 11:30:03 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Sun, 22 Feb 2004 10:30:03 -0600 (CST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <1077463220.2561.4.camel@loiosh>
References: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu> 
 <Pine.GSO.4.58.0402212228410.14383@is.rice.edu> <1077463220.2561.4.camel@loiosh>
Message-ID: <Pine.GSO.4.58.0402221029260.5724@is.rice.edu>

Please don't start a flame war guys, I just had my terms mixed up...it was
1 am in the morning when I replied.

-Brent


On Sun, 22 Feb 2004, Jag wrote:

> On Sat, 2004-02-21 at 23:30, Brent M. Clements wrote:
> > Actually a beowulf cluster can also run windows. There is a port of maya
> > to clusters...There are also many other movie editing software
> > distributions that work very well on clusters..It also doesn't matter what
> > os a beowulf cluster runs.
>
> By definition, a beowulf cluster uses a free/open OS.  So, a beowulf
> cluster can't run windows.  However, an HPC (High Performance Computing)
> cluster doesn't have that requirement.
>
> I know its kinda nitpicking to try to distinguish between Beowulf
> cluster and HPC cluster, but in some ways it is important.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Mon Feb 23 06:37:25 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Mon, 23 Feb 2004 12:37:25 +0100 (CET)
Subject: [Beowulf] Flashmobcomputing
Message-ID: <Pine.LNX.4.44.0402231233030.31934-100000@druifje.clustervision.com>

I hesitate a bit to send things seen on Slashdot to the list,
but this is probably relevant:

http://www.flashmobcomputing.org/


It might be worth a bit of a debate though.
Given that this cluster will be composed of differing CPUs,
and conneced together by 100Mbps links will it really have chance
of getting into the Top 500?

The bootable CF they are using is a Knoppix variant.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Mon Feb 23 07:27:20 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Mon, 23 Feb 2004 07:27:20 -0500 (EST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>
Message-ID: <Pine.LNX.4.44.0402230711110.5271-100000@boltzmann-internal>

On Sat, 21 Feb 2004, Brent M. Clements wrote:

> Actually a beowulf cluster can also run windows. There is a port of maya
> to clusters...There are also many other movie editing software
> distributions that work very well on clusters..It also doesn't matter what
> os a beowulf cluster runs.

>From time to time, I think it is a important to recall the original
definition of Beowulf. In the book "How to Build Beowulf", Sterling,
Salmon, Becker, Savarese define Beowulf as:

"A Beowulf is a collection of personal computers (PCs) interconnected by 
widely available networking technology running one of several open-source 
Unix like operating systems."

There is often confusion as to "what is a Beowulf?" because the definition
is more of a framework for building clusters and less of a recipe.

I suppose, one could come up with definition of an HPC cluster which
would read something like"

"An HPC cluster is collection of commodity processors interconnected by
widely available networking technology running a widely available OS."

Rather broad. I think the keyword in all this is "commodity", which to me
means choice and implies low cost.

Doug

> 
> -Brent
> 
> Brent Clements
> Linux Technology Specialist
> Information Technology
> Rice University
> 
> 
> On Sat, 21 Feb 2004, Joel Jaeggli wrote:
> 
> > Given that it sounds like you're on windows, a beowulf cluster is not
> > appropriate from your application...
> >
> >
> > On Sun, 22 Feb 2004, Sawan Gupta wrote:
> >
> > >
> > > Hello,
> > >
> > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> > > system with 512 DDRAM and a 128 MB Graphic Card.
> > >
> > > But when he perform some rendering operations, it takes nearly 10-15
> > > minutes to complete.
> > >
> > > He wishes to upgrade his system to dual XEON with more RAM to minimize
> > > this time delay.
> > >
> > > I want to know whether this will suit his requirments or a cluster is
> > > just what he needs.
> > > Please tell me which cluster can suit his requirements i.e.
> > > Windows/Linux.
> > > I mean which cluster can best suit these requirements.
> > >
> > > Also are the softwares used by him also available for Linux or not. (If
> > > the solution suggested is in Linux)
> > >
> > >
> > > Regards,
> > >
> > > Sawan Gupta
> > > || Mg_India at sancharnet.in ||
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> >
> > --
> > --------------------------------------------------------------------------
> > Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu
> > GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Feb 23 09:11:00 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 23 Feb 2004 09:11:00 -0500 (EST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.LNX.4.33.0402221452240.20737-100000@caxton.startext.demon.co.uk>
Message-ID: <Pine.LNX.4.44.0402230910000.1955-100000@lilith.rgb.private.net>

On Sun, 22 Feb 2004, Martin WHEELER wrote:

> On Sat, 21 Feb 2004, Brent M. Clements wrote:
> 
> > It also doesn't matter what
> > os a beowulf cluster runs.
> 
>  ..as long as that OS conforms to the definition of free software, that
> is..
> 
> Or am I just an old fuddy-duddy, with out-of-date concepts?

No, you're absolutely right.  It's right in there in the original
beowulf documents and description, IIRC.

There are some excellent reasons for this, BTW, as you'll discover the
first time something doesn't just work for you "out of the box".

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Mon Feb 23 08:36:05 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Mon, 23 Feb 2004 07:36:05 -0600 (CST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.LNX.4.44.0402230711110.5271-100000@boltzmann-internal>
References: <Pine.LNX.4.44.0402230711110.5271-100000@boltzmann-internal>
Message-ID: <Pine.GSO.4.58.0402230732300.7155@is.rice.edu>

Again, I go back to my last email concerning this..I didn't want to start
people flaming me(which has now happened), I wrote my original response at
1am in the morning and was sloppy with my terms. For that I apologize.
This tangent of explanations from now over 50 people can be gotten off of
and people can go about there business..Nothing to see here, move along.

-Brent


On Mon, 23 Feb 2004, Douglas Eadline, Cluster World Magazine wrote:

> On Sat, 21 Feb 2004, Brent M. Clements wrote:
>
> > Actually a beowulf cluster can also run windows. There is a port of maya
> > to clusters...There are also many other movie editing software
> > distributions that work very well on clusters..It also doesn't matter what
> > os a beowulf cluster runs.
>
> >From time to time, I think it is a important to recall the original
> definition of Beowulf. In the book "How to Build Beowulf", Sterling,
> Salmon, Becker, Savarese define Beowulf as:
>
> "A Beowulf is a collection of personal computers (PCs) interconnected by
> widely available networking technology running one of several open-source
> Unix like operating systems."
>
> There is often confusion as to "what is a Beowulf?" because the definition
> is more of a framework for building clusters and less of a recipe.
>
> I suppose, one could come up with definition of an HPC cluster which
> would read something like"
>
> "An HPC cluster is collection of commodity processors interconnected by
> widely available networking technology running a widely available OS."
>
> Rather broad. I think the keyword in all this is "commodity", which to me
> means choice and implies low cost.
>
> Doug
>
> >
> > -Brent
> >
> > Brent Clements
> > Linux Technology Specialist
> > Information Technology
> > Rice University
> >
> >
> > On Sat, 21 Feb 2004, Joel Jaeggli wrote:
> >
> > > Given that it sounds like you're on windows, a beowulf cluster is not
> > > appropriate from your application...
> > >
> > >
> > > On Sun, 22 Feb 2004, Sawan Gupta wrote:
> > >
> > > >
> > > > Hello,
> > > >
> > > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> > > > system with 512 DDRAM and a 128 MB Graphic Card.
> > > >
> > > > But when he perform some rendering operations, it takes nearly 10-15
> > > > minutes to complete.
> > > >
> > > > He wishes to upgrade his system to dual XEON with more RAM to minimize
> > > > this time delay.
> > > >
> > > > I want to know whether this will suit his requirments or a cluster is
> > > > just what he needs.
> > > > Please tell me which cluster can suit his requirements i.e.
> > > > Windows/Linux.
> > > > I mean which cluster can best suit these requirements.
> > > >
> > > > Also are the softwares used by him also available for Linux or not. (If
> > > > the solution suggested is in Linux)
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Sawan Gupta
> > > > || Mg_India at sancharnet.in ||
> > > >
> > > > _______________________________________________
> > > > Beowulf mailing list, Beowulf at beowulf.org
> > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > > >
> > >
> > > --
> > > --------------------------------------------------------------------------
> > > Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu
> > > GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
> > >
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> --
> ----------------------------------------------------------------
> Editor-in-chief                   ClusterWorld Magazine
> Desk: 610.865.6061
> Cell: 610.390.7765         Redefining High Performance Computing
> Fax:  610.865.6618                www.clusterworld.com
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel at labtie.mmt.upc.es  Mon Feb 23 10:18:34 2004
From: daniel at labtie.mmt.upc.es (Daniel Fernandez)
Date: Mon, 23 Feb 2004 16:18:34 +0100
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <Pine.LNX.4.21.0402221140410.384-100000@localhost>
References: <Pine.LNX.4.21.0402221140410.384-100000@localhost>
Message-ID: <1077549514.31096.0.camel@qeldroma.cttc.org>

El dom, 22-02-2004 a las 17:04, Felix Rauch escribi?:
> On Wed, 18 Feb 2004, Joseph Mack wrote:
> > How do you get your information out of smartd?
> >
> > I've found output in syslog - presumably I can grep for this.
> 
> I've done this for a while to get temperature information from a
> server in our small group server room (together with MRTG we have a
> nice history of temperature to show to the facilities people when the
> temperature was too high again...).
> 
> The problem with greping for smartd information in the syslog file is
> that there is no current information after a log rotation. That's why
> I changed our cron jobs. Now I use a small setuid-root program which
> starts "smartctl -a /dev/sdX" and then greps for the temperature.
> 
> - Felix
> 
> ---
> Felix Rauch                      | Email: rauch at inf.ethz.ch
> Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> 
> 

> On Wed, 18 Feb 2004, Joseph Mack wrote:
> > How do you get your information out of smartd?
> >
> > I've found output in syslog - presumably I can grep for this.
> 
> I've done this for a while to get temperature information from a
> server in our small group server room (together with MRTG we have a
> nice history of temperature to show to the facilities people when the
> temperature was too high again...).
> 
> The problem with greping for smartd information in the syslog file is
> that there is no current information after a log rotation. That's why
> I changed our cron jobs. Now I use a small setuid-root program which
> starts "smartctl -a /dev/sdX" and then greps for the temperature.
> 
> - Felix
> 
> ---
> Felix Rauch                      | Email: rauch at inf.ethz.ch
> Institute for Computer Systems   | Homepage:
http://www.cs.inf.ethz.ch/~rauch/
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> 

On the other hand, is possible to deviate smartd log to a specific file
and check it regularly when it's updated, adding this parameter to
smartd:

        -l facility

Of course some syslog.conf modifying will be needed to instruct syslogd
to log on a specific file from the "facility" specified.

        facility.* /var/log/smartd.log

Also, the '-M' coupled with the 'exec' directive should work, a script
could be run to update some flags for example:

        -M exec /usr/bin/smartd_alert.sh

-- 
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Centre tecnol?gic de transfer?ncia de calor - CTTC
www.cttc.upc.edu
c/ Colom n?11
UPC Campus Industrials Terrassa , Edifici TR4

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From m-valerio at onu.edu  Mon Feb 23 11:24:23 2004
From: m-valerio at onu.edu (Matt Valerio)
Date: Mon, 23 Feb 2004 11:24:23 -0500
Subject: [Beowulf] Anyone use MOSIX?
Message-ID: <200402231626.i1NGQQBf052721@postoffice.onu.edu>

Has anyone on this list used MOSIX before?

I'm particularly interested in how it compares to other clustering software
such as PVM and MPI.

Any information regarding what you're using MOSIX for, recommendations about
setting it up, comparisons to other software, etc, would be welcomed. 

Thanks!


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kpodesta at redbrick.dcu.ie  Mon Feb 23 13:45:44 2004
From: kpodesta at redbrick.dcu.ie (Karl Podesta)
Date: Mon, 23 Feb 2004 18:45:44 +0000
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231233030.31934-100000@druifje.clustervision.com>
References: <Pine.LNX.4.44.0402231233030.31934-100000@druifje.clustervision.com>
Message-ID: <20040223184544.GB30983@carbon.redbrick.dcu.ie>

On Mon, Feb 23, 2004 at 12:37:25PM +0100, John Hearns wrote:
> http://www.flashmobcomputing.org/
> 
> It might be worth a bit of a debate though.
> Given that this cluster will be composed of differing CPUs,
> and conneced together by 100Mbps links will it really have chance
> of getting into the Top 500?
> 
> The bootable CF they are using is a Knoppix variant.

It seems a bit loose or unfair to suggest a project like this 'registers'
for the top500 list? It's a once-off, temporary system, dedicated (seemingly)
to nothing but qualification to the list. They say in the FAQ that if the
system proves itself, it could potentially be used for bigger problems, 
which is a noble idea - but they obviously don't read the beowulf list often
("it all depends on the application", etc.) :-)

Additionally, a flashmob system would have a limited shelf-life, before
the owners want to take their computers home. Distributed projects
like SETI at home and Folding at home etc. have been running for years...

I'm not familiar with the entry rules to the top500, but to be fair to 
existing, dedicated installations - they would have a certain 'reliability'
in terms of their existence. If you needed to perform a serious calculation 
to a scale of 36 TFLOPS etc., then you know that there is a system that
can do it. They might want to be critical of how sustainable the result from 
the Flashmob is, if they wanted to 'call' on it's power at any particular
time in the future. (whoops, it was raining today, that's 10 TFLOPS down 
the drain...). Pardon the pun.

Kp
--
Karl Podesta
Dublin City University, Ireland 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Feb 23 14:11:48 2004
From: becker at scyld.com (Donald Becker)
Date: Mon, 23 Feb 2004 14:11:48 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231233030.31934-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain>

On Mon, 23 Feb 2004, John Hearns wrote:

> I hesitate a bit to send things seen on Slashdot to the list,
> but this is probably relevant:
> 
> http://www.flashmobcomputing.org/

  >> A Flash Mob computer, unlike an ordinary cluster, is temporary and
  >> organized on-the-fly for the purpose of working on a single
  >> problem. Flash Mob I is the first of it's kind.

A bit of hype here.
Flash Mob is a fun demo, but not a new system architecture.  All of the
software is on a live CD, which Yggdrasil pioneered back in 1993, and
it's far from being the first on-the-fly cluster.

One of first public demo of Scyld Beowulf was temporarily converting the
email-reading machines at the ALS conference into a cluster.  We did
that in a few minutes, taking only a few second beyond the amount of
time it took to boot the machines from floppy.  Today there is the
opportunity to use PXE boot, which makes configuration even easier.

A key was the innovative approach of making most of the systems
specialized compute slaves, with only the environment needed to support
the fully-cached running application.  (Note that NFS root sounds like a
likely alternative, but doesn't scale and has a run-time performance
impact.)

> It might be worth a bit of a debate though.
> Given that this cluster will be composed of differing CPUs,
> and conneced together by 100Mbps links will it really have chance
> of getting into the Top 500?
> The bootable CF they are using is a Knoppix variant.

The differing CPUs and full workstation-oriented distribution will
likely pose more a problem than the switched Fast Ethernet.

Unless they make significant modifications, they will run into the
scalability problem that every full-installation system encounters: at
every timestep a few of the machines will be paging, running cron, or
doing something else that slows the machine.  That would be barely
noticed in a workstation environment, but is a major problem with most
cluster jobs.

Still, it sounds like a fun, demystifying demo that introduces people to
scalable computing.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster systems
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bernd-schubert at web.de  Mon Feb 23 17:05:36 2004
From: bernd-schubert at web.de (Bernd Schubert)
Date: Mon, 23 Feb 2004 23:05:36 +0100
Subject: [Beowulf] 64-bit Gaussian 03 on Opteron/SuSE
In-Reply-To: <40387739.3891D96C@nchc.org.tw>
References: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com> <40387739.3891D96C@nchc.org.tw>
Message-ID: <200402232305.36711.bernd-schubert@web.de>

On Sunday 22 February 2004 10:32, Jyh-Shyong Ho wrote:
> Hi,
>
> We have managed to built a native 64-bit version of Gaussian 03 Rev.B05
> on a dual Opteron box running SLSE8 for AMD64 with 64-bit PGI
> Workstation
> 5.1.3 compiler and 64-bit GOTO library.
>

Hello,

thanks for this great information! I've forwarded it to the CCL list, since I 
guess on this list many people are interested in this topic.

Cheers,
	Bernd 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Mon Feb 23 17:47:27 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Mon, 23 Feb 2004 17:47:27 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain>
Message-ID: <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>

> Still, it sounds like a fun, demystifying demo that introduces people to
> scalable computing.

demystification is always good.  IMO, the best part of this is that 
it'll actually demonstrate why a flash mob *CAN'T* build a supercomputer.
partly the reason is hetrogeneity and other "practical" downers.  

but mainly, a super-computer needs a super-network.

of course, in the grid nirvana, 
all computers would have multiple ports of infiniband,
and the word would be 5 us across ;)

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Mon Feb 23 15:52:58 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Mon, 23 Feb 2004 15:52:58 -0500 (EST)
Subject: [Beowulf] Anyone use MOSIX?
In-Reply-To: <200402231626.i1NGQQBf052721@postoffice.onu.edu>
Message-ID: <Pine.LNX.4.44.0402231501080.3858-100000@coffee.psychology.mcmaster.ca>

> Has anyone on this list used MOSIX before?

I expect many have given it a try.

> I'm particularly interested in how it compares to other clustering software
> such as PVM and MPI.

apples and oranges, I believe.  mosix more or less tries to virtualize
a cluster by making multiple machines share things like a single pid space,
with forwarding of signals, etc.  the idea is that the OS takes care of 
migrating jobs across nodes, including using proxies for resources that can't
be directly moved (pages can, for instance).

from the PVM/MPI perspective, the most important resource would be sockets.
as far as I know, MPI-on-Mosix would use proxied sockets, and would therefore
have performance problems for anything closely-coupled or high-bandwidth.
in principle, Mosix could provide some sort of clusterized group-comm
mechanism that wouldn't require proxies, but that would be a large effort.

in a way, it's a shame that MPI is such a fat interface, since there's a lot
of really good work that could be done in this direction, but is simply too 
large for a typical thesis project :(

regards, mark hahn.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at pathscale.com  Mon Feb 23 20:53:18 2004
From: lindahl at pathscale.com (Greg Lindahl)
Date: Mon, 23 Feb 2004 17:53:18 -0800
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain> <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20040224015318.GB6942@greglaptop.internal.keyresearch.com>

On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:

> of course, in the grid nirvana, 
> all computers would have multiple ports of infiniband,
> and the word would be 5 us across ;)

In grid nirvana, the speed of light would rise with Moore's Law.

5 usec is a long time now, and much longer a year from now. That's
not nirvana.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb 24 03:22:16 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 24 Feb 2004 09:22:16 +0100 (CET)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <20040224015318.GB6942@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0402240918310.8598-100000@druifje.clustervision.com>

On Mon, 23 Feb 2004, Greg Lindahl wrote:

> On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> 
> > of course, in the grid nirvana, 
> > all computers would have multiple ports of infiniband,
> > and the word would be 5 us across ;)
> 
> In grid nirvana, the speed of light would rise with Moore's Law.
> 
An odd fact I always remember is that light travels at a foot per 
nanosecond.

(Useful to know if you are plugging coax delay lines into trigger 
circuits)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Tue Feb 24 04:35:48 2004
From: jakob at unthought.net (Jakob Oestergaard)
Date: Tue, 24 Feb 2004 10:35:48 +0100
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <Pine.LNX.4.04.10402010204440.3455-100000@c-24-18-245-161.client.comcast.net>
References: <1075512676.4915.207.camel@protein.scalableinformatics.com> <Pine.LNX.4.04.10402010204440.3455-100000@c-24-18-245-161.client.comcast.net>
Message-ID: <20040224093548.GA29776@unthought.net>

On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote:
> > I could easily optimize it more (do the work on a larger buffer at a
> > once), but I think enough waste heat has been created here.  This is a
> > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3.
> 
> Enough time wasted on finding different solutions to a simple problem?  Surely
> not.  Let me toss my hat into the ring:
...

Hi guys!

Guess who's back - yes, it's your friendly neighborhood language
evangelist :)

I said I'd be gone one week - well, I put instant coffe in the
microwave, and *wooosh* went three weeks ahead in time.

What a fantastic thread this turned into - awk, perl, more C, java and
God knows what.  I'm almost surprised I didn't see a Fortran
implementation.

See, I was trying to follow up on the challenge, then things got
complicated (mainly by me not being able to get the performance I wanted
out of my code) - so instead of flooding your inboxes, I wrote a little
"article" on my findings.

It's up at:
 http://unthought.net/c++/c_vs_c++.html

Highlights:
 *) Benchmarks - real numbers.
 *) A C++'ification of the fast C implementation (that turns out to be
    negligibly faster than the C implementation although the same
    algorithm and the same system calls are used), which is generalized
    and generally made usable as a template library routine (for
    convenient re-use in other projects - yes, this requires all that
    boring non-sexy stuff like freeing up memory etc.)
 *) Two new C++ implementations - another 15 liner that's "only" twice
    as slow as the C code, and another longer different-algorithm C++
    implementation that is significantly faster than the fastest C
    implementation (so far).

Now, I did not include all the extra implementations presented here. I
would like to update the document with those, but I will need a little
feedback from various people.

First; how do I compile the java implementation?  GCC-3.3.2 gives me
----------------------------------------------------------------
[falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java
wordcount.java: In class `wordcount':
wordcount.java: In method `wordcount.main(java.lang.String[])':
wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)'
in type `java.util.regex.Pattern'.
                   words = p.split(s);
                            ^
1 error
----------------------------------------------------------------

Second; another much faster C implementation was posted - I'd like to
test against that one as well. I'm curious as to how it was done, and
I'd like to use it as an example in the document if it turns out that it
makes sense to write a generic C++ implementation of whatever algorithm
is used there.  Well, if the code is not a government secret   ;)

So, well, clearly my document isn't completely updated with all the
great things from this thread - but at least I think it is a decent
reply to the mail where the 'programming pearl' C implementation was
presented.

I guess this could turn into a nice little reference/FAQ/fact type of
document - the oppinions stated there are biased of course, but not
completely unreasonable in my own (biased) oppinion - besides, there's
real-world numbers for solving a real-world problem, that's a pretty
good start I would say  :)

I'd love to hear what people think - if you have the time to give it a
look.

Let me know, flame away, give me Fortran code that is faster than my
'ego-booster' implementation at the bottom of the document!  ;)

Cheers all :)

 / jakob

BTW: Yes, I had a great vacation;
   http://unthought.net/avoriaz/p1010050.jpg  ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel.kidger at quadrics.com  Tue Feb 24 04:35:54 2004
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Tue, 24 Feb 2004 09:35:54 +0000
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <Pine.LNX.4.33.0402131439090.20392-100000@caxton.startext.demon.co.uk>
References: <Pine.LNX.4.33.0402131439090.20392-100000@caxton.startext.demon.co.uk>
Message-ID: <200402240935.54623.daniel.kidger@quadrics.com>

> On Fri, 13 Feb 2004, John Hearns wrote:
> > But then again I may be the only person to own "Fortran 77:
> > A Structured Approach".

I don't have that but I do have on my bookshelf "A Fortran Primer" by Elliot Organick, Addison-Wiley (1963)
- so go on: does anyone own any even older Fortran texts ?

Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Tue Feb 24 05:56:45 2004
From: jakob at unthought.net (Jakob Oestergaard)
Date: Tue, 24 Feb 2004 11:56:45 +0100
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <1077217418.4982.35.camel@localhost>
References: <1077217418.4982.35.camel@localhost>
Message-ID: <20040224105645.GC29776@unthought.net>

On Thu, Feb 19, 2004 at 02:03:38PM -0500, Ryan Adams wrote:
...
> I have a problem that divides nicely (embarrassingly?) into
> parallelizable chunks.  Each chunk takes approximately 2 to 5 seconds to
> complete and requires no communication during that time.  Essentially
> there is a piece of data, around 500KB that must be processed and a
> result returned.  I'd like to process as many of these pieces of data as
> possible.  I am considering building a small heterogeneous cluster to do
> this (at home, basically), and am trying to decide exactly how to
> architect the task distribution.  

I had the following problem; lots and lots of compile jobs. They take
from a few seconds to a few minutes each.

No batch scheduling system that I tried, was up to the task (simply
waaay too long latency in the scheduling).

...
> I've been considering rolling my own scheduling system using some kind
> of RPC, but I've been around software development long enough to know
> that it is better to use something off-the-shelf if at all possible.

Maybe you would want to take a quick look at ANTS
 http://unthought.net/antsd/

ANTS was the solution I developed for the problem I had, and from the
sound of it, I think your problem may be a good fit for ANTS as well.

I've been updating it as of lately, but haven't put new releases on the
web site.  If you're interested, I can provide you with the new releases
(featuring krellm2 applet! ;) - but the basic functionality is unchanged
from the old release on the web site.

ANTS specifically schedules jobs very quickly - but it lacks the
advanced features of "real" batch systems (like accounting, gang
scheduling, job restart, etc. etc.).

 / jakob

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Feb 24 08:34:16 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 24 Feb 2004 08:34:16 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <20040224015318.GB6942@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private.net>

On Mon, 23 Feb 2004, Greg Lindahl wrote:

> On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> 
> > of course, in the grid nirvana, 
> > all computers would have multiple ports of infiniband,
> > and the word would be 5 us across ;)
> 
> In grid nirvana, the speed of light would rise with Moore's Law.

I'll have to think about that one.

Exponential growth of the speed of light.  Hmmm.  Some sort of
inflationary model?  Space flattening towards non-relativistic
classical?  The physics of Nirvana would be veeeery interesting...

:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ajt at rri.sari.ac.uk  Tue Feb 24 08:56:31 2004
From: ajt at rri.sari.ac.uk (Tony Travis)
Date: Tue, 24 Feb 2004 13:56:31 +0000
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
References: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
Message-ID: <403B580F.1020009@rri.sari.ac.uk>

Matt Valerio wrote:
> Hello,
> 
> I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
> languages.
> 
> That being said, I think it would be interesting to see what the creator of
> both C and C++ has said about the two.  I ran across this interview with
> Bjorn Stroustrup at
> http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
> 
> Like anything on the internet, it should be taken with a grain of salt.  Can
> anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
> stick with C?

Hello, Matt.

I think most people know that Brian Kernighan and Denis Richie created 
'C', not Bjorn Stroustrup: He created C++. The IEEE 'interview' is a 
hoax, of course! but Bjorn Stroustrup doesn't think it's funny:

http://www.research.att.com/~bs/bs_faq.html#IEEE

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctibirna at giref.ulaval.ca  Tue Feb 24 09:15:20 2004
From: ctibirna at giref.ulaval.ca (Cristian Tibirna)
Date: Tue, 24 Feb 2004 09:15:20 -0500
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
References: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
Message-ID: <200402240915.20284.ctibirna@giref.ulaval.ca>

On Tuesday, 24 February 2004 08:13, Matt Valerio wrote:

>
> Like anything on the internet, it should be taken with a grain of salt. 
> Can anyone vouch for its validity, or is it a hoax to get us to all hate
> C++ and stick with C?

Of course it's a hoax ;o)

http://www.research.att.com/~bs/bs_faq.html#IEEE

And in fact all the FAQ deserve a reading, no matter which language one 
preaches as being the Holy Grail.

-- 
Cristian Tibirna				(418) 656-2131 / 4340
  Laval University - Qu?bec, CAN ... http://www.giref.ulaval.ca/~ctibirna
  Research professional - GIREF ... ctibirna at giref.ulaval.ca
  Chemical Engineering PhD Student ... tibirna at gch.ulaval.ca
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From m-valerio at onu.edu  Tue Feb 24 08:13:12 2004
From: m-valerio at onu.edu (Matt Valerio)
Date: Tue, 24 Feb 2004 08:13:12 -0500
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <20040224093548.GA29776@unthought.net>
Message-ID: <200402241315.i1ODFGBf084523@postoffice.onu.edu>

Hello,

I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
languages.

That being said, I think it would be interesting to see what the creator of
both C and C++ has said about the two.  I ran across this interview with
Bjorn Stroustrup at
http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.

Like anything on the internet, it should be taken with a grain of salt.  Can
anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
stick with C?

-Matt


-----Original Message-----
From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of
Jakob Oestergaard
Sent: Tuesday, February 24, 2004 4:36 AM
To: Beowulf
Subject: Re: [Beowulf] C vs C++ challenge

On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote:
> > I could easily optimize it more (do the work on a larger buffer at a
> > once), but I think enough waste heat has been created here.  This is a
> > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3.
> 
> Enough time wasted on finding different solutions to a simple problem?
Surely
> not.  Let me toss my hat into the ring:
...

Hi guys!

Guess who's back - yes, it's your friendly neighborhood language
evangelist :)

I said I'd be gone one week - well, I put instant coffe in the
microwave, and *wooosh* went three weeks ahead in time.

What a fantastic thread this turned into - awk, perl, more C, java and
God knows what.  I'm almost surprised I didn't see a Fortran
implementation.

See, I was trying to follow up on the challenge, then things got
complicated (mainly by me not being able to get the performance I wanted
out of my code) - so instead of flooding your inboxes, I wrote a little
"article" on my findings.

It's up at:
 http://unthought.net/c++/c_vs_c++.html

Highlights:
 *) Benchmarks - real numbers.
 *) A C++'ification of the fast C implementation (that turns out to be
    negligibly faster than the C implementation although the same
    algorithm and the same system calls are used), which is generalized
    and generally made usable as a template library routine (for
    convenient re-use in other projects - yes, this requires all that
    boring non-sexy stuff like freeing up memory etc.)
 *) Two new C++ implementations - another 15 liner that's "only" twice
    as slow as the C code, and another longer different-algorithm C++
    implementation that is significantly faster than the fastest C
    implementation (so far).

Now, I did not include all the extra implementations presented here. I
would like to update the document with those, but I will need a little
feedback from various people.

First; how do I compile the java implementation?  GCC-3.3.2 gives me
----------------------------------------------------------------
[falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java
wordcount.java: In class `wordcount':
wordcount.java: In method `wordcount.main(java.lang.String[])':
wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)'
in type `java.util.regex.Pattern'.
                   words = p.split(s);
                            ^
1 error
----------------------------------------------------------------

Second; another much faster C implementation was posted - I'd like to
test against that one as well. I'm curious as to how it was done, and
I'd like to use it as an example in the document if it turns out that it
makes sense to write a generic C++ implementation of whatever algorithm
is used there.  Well, if the code is not a government secret   ;)

So, well, clearly my document isn't completely updated with all the
great things from this thread - but at least I think it is a decent
reply to the mail where the 'programming pearl' C implementation was
presented.

I guess this could turn into a nice little reference/FAQ/fact type of
document - the oppinions stated there are biased of course, but not
completely unreasonable in my own (biased) oppinion - besides, there's
real-world numbers for solving a real-world problem, that's a pretty
good start I would say  :)

I'd love to hear what people think - if you have the time to give it a
look.

Let me know, flame away, give me Fortran code that is faster than my
'ego-booster' implementation at the bottom of the document!  ;)

Cheers all :)

 / jakob

BTW: Yes, I had a great vacation;
   http://unthought.net/avoriaz/p1010050.jpg  ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From m-valerio at onu.edu  Tue Feb 24 09:12:49 2004
From: m-valerio at onu.edu (Matt Valerio)
Date: Tue, 24 Feb 2004 09:12:49 -0500
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <403B580F.1020009@rri.sari.ac.uk>
Message-ID: <200402241414.i1OEErBf096719@postoffice.onu.edu>

Wow, I guess I didn't do my homework!  Apologizes to everyone for the
misinformation!

As Tony pointed out, the real interview may be found at
http://www.research.att.com/~bs/ieee_interview.html.

-Matt


-----Original Message-----
From: Tony Travis [mailto:ajt at rri.sari.ac.uk] 
Sent: Tuesday, February 24, 2004 8:57 AM
To: Matt Valerio
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] C vs C++ challenge

Matt Valerio wrote:
> Hello,
> 
> I'm an avid C++ programmer myself, and C++ is definitely my choice of the
2
> languages.
> 
> That being said, I think it would be interesting to see what the creator
of
> both C and C++ has said about the two.  I ran across this interview with
> Bjorn Stroustrup at
> http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
> 
> Like anything on the internet, it should be taken with a grain of salt.
Can
> anyone vouch for its validity, or is it a hoax to get us to all hate C++
and
> stick with C?

Hello, Matt.

I think most people know that Brian Kernighan and Denis Richie created 
'C', not Bjorn Stroustrup: He created C++. The IEEE 'interview' is a 
hoax, of course! but Bjorn Stroustrup doesn't think it's funny:

http://www.research.att.com/~bs/bs_faq.html#IEEE

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Feb 24 09:33:37 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 24 Feb 2004 09:33:37 -0500 (EST)
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
Message-ID: <Pine.LNX.4.44.0402240931270.3849-100000@lilith.rgb.private.net>

On Tue, 24 Feb 2004, Matt Valerio wrote:

> Hello,
> 
> I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
> languages.
> 
> That being said, I think it would be interesting to see what the creator of
> both C and C++ has said about the two.  I ran across this interview with
> Bjorn Stroustrup at
> http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
> 
> Like anything on the internet, it should be taken with a grain of salt.  Can
> anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
> stick with C?

I posted that up there when I found it because it is hilarious.  I
assume that it is a satire (not exactly the same thing as a "hoax":-).
However, as is the case with much satire, it contains a lot of little
nuggets that (should) make you think... about "good practice" ways of
coding in C++ if nothing else.

  r-still-a-C-guy-at-heart-gb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcownie at etnus.com  Tue Feb 24 10:37:12 2004
From: jcownie at etnus.com (James Cownie)
Date: Tue, 24 Feb 2004 15:37:12 +0000
Subject: [Beowulf] Adding Latency to a Cluster Environment 
In-Reply-To: Message from joshh@cs.earlham.edu 
   of "Fri, 13 Feb 2004 10:25:31 EST." <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> 
Message-ID: <1AvecS-6wh-00@etnus.com>


> I am profiling a software package that runs over LAM-MPI on 16 node
> cluster s [Details Below]. I would like to measure the effect of
> increased latency on the run time of the program.
> 

Look for "dimemas" on Google.

It's a simulator from Cepba for parallel architectures which is
intended to allow you to adjust exactly this kind of parameter.

At one point they had it coupled up with Pallas' Vampir so that it
could read Vampir trace files and then simulate the same execution
with modified communication properties, or modified CPU properties.

-- 
-- Jim
--
James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Tue Feb 24 10:40:29 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Tue, 24 Feb 2004 07:40:29 -0800
Subject: [Beowulf] C vs C++ challenge
Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F55AB@orsmsx402.jf.intel.com>

From: Matt Valerio;  Tuesday, February 24, 2004 6:13 AM
> 
> Wow, I guess I didn't do my homework!  Apologizes to everyone for the
> misinformation!
> 
> As Tony pointed out, the real interview may be found at
> http://www.research.att.com/~bs/ieee_interview.html.

For a Stroustrup statement that C proponents (as am I) will also agree
with, see http://www.research.att.com/~bs/bs_faq.html#really-say-that

FYI, the top of the FAQ has a .wav file with the proper pronunciation of
his name...

-- 
David N. Lombard
 
My comments represent my opinions, not those of Intel Corporation.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Tue Feb 24 12:15:01 2004
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Tue, 24 Feb 2004 18:15:01 +0100
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain>
References: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain>
Message-ID: <200402241815.01355.joachim@ccrl-nece.de>

Donald Becker:
> On Mon, 23 Feb 2004, John Hearns wrote:
> > I hesitate a bit to send things seen on Slashdot to the list,
> > but this is probably relevant:
> >
> >
> > http://www.flashmobcomputing.org/
>
> A bit of hype here.
[...]

Exactly. It's a nice idea (although the wrong approach, as Donald elaborated - 
maybe they will find out), but they shouldn't seriously clame to be the first 
with this "revolutionary idea" (sic!).

In addition to Donald's references to earlier "on-the-fly clusters", here's 
another one from Germany (December 1998):
http://www.heise.de/ix/artikel/E/1999/01/010/

I don't know if they actually submitted results to TOP500 - I could not find a 
matching entry for 1999.

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Tue Feb 24 12:15:45 2004
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 24 Feb 2004 09:15:45 -0800
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
References: <20040224093548.GA29776@unthought.net>
Message-ID: <5.2.0.9.2.20040224091430.017cb1d8@mailhost4.jpl.nasa.gov>

At 08:13 AM 2/24/2004 -0500, Matt Valerio wrote:
>Hello,
>
>I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
>languages.
>
>That being said, I think it would be interesting to see what the creator of
>both C and C++ has said about the two.  I ran across this interview with
>Bjorn Stroustrup at
>http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
>
>Like anything on the internet, it should be taken with a grain of salt.  Can
>anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
>stick with C?
>
>-Matt

That's a classic hoax interview (and I think identified as such by RGB), 
and remarkably funny.  Almost as good as Dijkstra's apocryphal comment that 
more brains have been ruined by BASIC than ....


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Tue Feb 24 12:21:18 2004
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 24 Feb 2004 09:21:18 -0800
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private
 .net>
References: <20040224015318.GB6942@greglaptop.internal.keyresearch.com>
Message-ID: <5.2.0.9.2.20040224091607.0350aa38@mailhost4.jpl.nasa.gov>

At 08:34 AM 2/24/2004 -0500, Robert G. Brown wrote:
>On Mon, 23 Feb 2004, Greg Lindahl wrote:
>
> > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> >
> > > of course, in the grid nirvana,
> > > all computers would have multiple ports of infiniband,
> > > and the word would be 5 us across ;)
> >
> > In grid nirvana, the speed of light would rise with Moore's Law.
>
>I'll have to think about that one.
>
>Exponential growth of the speed of light.  Hmmm.  Some sort of
>inflationary model?  Space flattening towards non-relativistic
>classical?  The physics of Nirvana would be veeeery interesting...

5 usec gives you a "grid diameter" of a mile or so... (if you don't worry 
about pesky things like wires or fibers to carry the signals).  You could 
fit a LOT of processors in a sphere a mile in diameter.  Does bring up some 
interesting questions about optimum interconnection strategies.  Even if 
you put nodes on the surface of that sphere (so you can use free space 
optical interconnects across the middle of the sphere, you'd have about 7.2 
million square meters to fool with. Say you can fit a 100 nodes in a square 
meter.  That's almost a billion nodes.

If you need bigger, one could always use fancy stuff like quantum 
entanglement, about which I don't know much, but which might provide a 
solution to communicating across large distances very quickly (at least in 
one frame of reference)


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From orion at cora.nwra.com  Tue Feb 24 18:17:31 2004
From: orion at cora.nwra.com (Orion Poplawski)
Date: Tue, 24 Feb 2004 16:17:31 -0700
Subject: [Beowulf] G5 cluster for testing
Message-ID: <403BDB8B.7060904@cora.nwra.com>

Anyone (vendors?) out there have a G5 cluster available for some 
testing?  I've been charged with putting together a small cluster and 
have been asked to look into G5 systems as well (I guess 64 bit powerPC 
really....)

Thanks

-- 
Orion Poplawski
System Administrator                   303-415-9701 x222
Colorado Research Associates/NWRA      FAX: 303-415-9702
3380 Mitchell Lane, Boulder CO 80301   http://www.co-ra.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mwheeler at startext.co.uk  Tue Feb 24 17:57:07 2004
From: mwheeler at startext.co.uk (Martin WHEELER)
Date: Tue, 24 Feb 2004 22:57:07 +0000 (UTC)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.33.0402242251250.8151-100000@caxton.startext.demon.co.uk>

On Tue, 24 Feb 2004, Robert G. Brown wrote:

> > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> >
> > In grid nirvana, the speed of light would rise with Moore's Law.
>
> I'll have to think about that one.

Then you'll have to think *very* (exponentially?) fast.  Just to keep up
with where you were when you started...

Shades of the Red Queen.  :)

Maybe Lewis Carroll already described the physics?
-- 
Martin Wheeler   -   StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England
mwheeler at startext.co.uk                http://www.startext.co.uk/mwheeler/
GPG pub key : 01269BEB  6CAD BFFB DB11 653E B1B7 C62B  AC93 0ED8 0126 9BEB
      - Share your knowledge. It's a way of achieving immortality. -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Tue Feb 24 09:10:31 2004
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Tue, 24 Feb 2004 08:10:31 -0600
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
References: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
Message-ID: <403B5B57.8080403@tamu.edu>

Since he's now faculty here, I guess I'll walk down the hall and ask him.

gerry

Matt Valerio wrote:
> Hello,
> 
> I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
> languages.
> 
> That being said, I think it would be interesting to see what the creator of
> both C and C++ has said about the two.  I ran across this interview with
> Bjorn Stroustrup at
> http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
> 
> Like anything on the internet, it should be taken with a grain of salt.  Can
> anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
> stick with C?
> 
> -Matt
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of
> Jakob Oestergaard
> Sent: Tuesday, February 24, 2004 4:36 AM
> To: Beowulf
> Subject: Re: [Beowulf] C vs C++ challenge
> 
> On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote:
> 
>>>I could easily optimize it more (do the work on a larger buffer at a
>>>once), but I think enough waste heat has been created here.  This is a
>>>simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3.
>>
>>Enough time wasted on finding different solutions to a simple problem?
> 
> Surely
> 
>>not.  Let me toss my hat into the ring:
> 
> ...
> 
> Hi guys!
> 
> Guess who's back - yes, it's your friendly neighborhood language
> evangelist :)
> 
> I said I'd be gone one week - well, I put instant coffe in the
> microwave, and *wooosh* went three weeks ahead in time.
> 
> What a fantastic thread this turned into - awk, perl, more C, java and
> God knows what.  I'm almost surprised I didn't see a Fortran
> implementation.
> 
> See, I was trying to follow up on the challenge, then things got
> complicated (mainly by me not being able to get the performance I wanted
> out of my code) - so instead of flooding your inboxes, I wrote a little
> "article" on my findings.
> 
> It's up at:
>  http://unthought.net/c++/c_vs_c++.html
> 
> Highlights:
>  *) Benchmarks - real numbers.
>  *) A C++'ification of the fast C implementation (that turns out to be
>     negligibly faster than the C implementation although the same
>     algorithm and the same system calls are used), which is generalized
>     and generally made usable as a template library routine (for
>     convenient re-use in other projects - yes, this requires all that
>     boring non-sexy stuff like freeing up memory etc.)
>  *) Two new C++ implementations - another 15 liner that's "only" twice
>     as slow as the C code, and another longer different-algorithm C++
>     implementation that is significantly faster than the fastest C
>     implementation (so far).
> 
> Now, I did not include all the extra implementations presented here. I
> would like to update the document with those, but I will need a little
> feedback from various people.
> 
> First; how do I compile the java implementation?  GCC-3.3.2 gives me
> ----------------------------------------------------------------
> [falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java
> wordcount.java: In class `wordcount':
> wordcount.java: In method `wordcount.main(java.lang.String[])':
> wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)'
> in type `java.util.regex.Pattern'.
>                    words = p.split(s);
>                             ^
> 1 error
> ----------------------------------------------------------------
> 
> Second; another much faster C implementation was posted - I'd like to
> test against that one as well. I'm curious as to how it was done, and
> I'd like to use it as an example in the document if it turns out that it
> makes sense to write a generic C++ implementation of whatever algorithm
> is used there.  Well, if the code is not a government secret   ;)
> 
> So, well, clearly my document isn't completely updated with all the
> great things from this thread - but at least I think it is a decent
> reply to the mail where the 'programming pearl' C implementation was
> presented.
> 
> I guess this could turn into a nice little reference/FAQ/fact type of
> document - the oppinions stated there are biased of course, but not
> completely unreasonable in my own (biased) oppinion - besides, there's
> real-world numbers for solving a real-world problem, that's a pretty
> good start I would say  :)
> 
> I'd love to hear what people think - if you have the time to give it a
> look.
> 
> Let me know, flame away, give me Fortran code that is faster than my
> 'ego-booster' implementation at the bottom of the document!  ;)
> 
> Cheers all :)
> 
>  / jakob
> 
> BTW: Yes, I had a great vacation;
>    http://unthought.net/avoriaz/p1010050.jpg  ;)
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ashley at quadrics.com  Tue Feb 24 09:10:09 2004
From: ashley at quadrics.com (Ashley Pittman)
Date: Tue, 24 Feb 2004 14:10:09 +0000
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>
References: 	 <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>
Message-ID: <1077631809.646.83.camel@ashley>

On Mon, 2004-02-23 at 22:47, Mark Hahn wrote:
> > Still, it sounds like a fun, demystifying demo that introduces people to
> > scalable computing.
> 
> demystification is always good.  IMO, the best part of this is that 
> it'll actually demonstrate why a flash mob *CAN'T* build a supercomputer.
> partly the reason is hetrogeneity and other "practical" downers.  

It will be interesting to see, I don't expect they are going to get much
time to benchmark but it would be nice to have a plot of achieved
performance against CPU count in this kind of configuration.  Anybody
care to predict how many CPU's you will need before wall clock
performance starts dropping?

> but mainly, a super-computer needs a super-network.

That I won't dispute but does a single linpack run require a
super-computer?

Ashley,

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raiders at phreaker.net  Tue Feb 24 10:39:19 2004
From: raiders at phreaker.net (raiders at phreaker.net)
Date: Tue, 24 Feb 2004 23:39:19 +0800
Subject: [Beowulf] Subclusters...
Message-ID: <200402242339.19310.raiders@phreaker.net>

We are on a project as described below:

- IA32 linux cluster for general parallel programming
- five head nodes, each head node will have about 15 compute nodes and 
dedicated storage 
- groups of cluster-users will be restricted to their own clusters normally 
(some exclusions may apply)
- SGE/PBS, GbE etc are standard choices

But the people in power want one single software or admin console (cluster 
toolkit?) to manage the entire cluster from one adm station (which may or may 
not be one of the head nodes).

I looked around and could not find any suitable solution (ROCKS, oscar, etc). 
ROCKS, oscar etc can manage only one cluster at a time and cannot handle 
subclusters. (I might be wrong) 

 I believe that only custom programming can help. Appreciate any expert 
opinion

Thanks,
Shawn


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Tue Feb 24 23:38:54 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Tue, 24 Feb 2004 20:38:54 -0800 (PST)
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <403BDB8B.7060904@cora.nwra.com>
Message-ID: <Pine.LNX.4.44.0402242035100.6690-100000@twin.uoregon.edu>

I'd suggest asking your friendly IBM sales guy about ppc970 blades...

joelja

On Tue, 24 Feb 2004, Orion Poplawski wrote:

> Anyone (vendors?) out there have a G5 cluster available for some 
> testing?  I've been charged with putting together a small cluster and 
> have been asked to look into G5 systems as well (I guess 64 bit powerPC 
> really....)
> 
> Thanks
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at cse.ucdavis.edu  Wed Feb 25 02:10:39 2004
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Tue, 24 Feb 2004 23:10:39 -0800
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <403BDB8B.7060904@cora.nwra.com>
References: <403BDB8B.7060904@cora.nwra.com>
Message-ID: <20040225071039.GA29125@cse.ucdavis.edu>

On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote:
> Anyone (vendors?) out there have a G5 cluster available for some 

For the most part I'm finding that cluster performance is mostly
predictable by single node performance, and the scaling of the
interconnect.  At least as an approximation, I'm going to use to find
a good place to start for my next couple cluster designs.

I'm current benchmarking:
	Dual G5
	Opteron duals (1.4, 1.8, and 2.2)
	Opteron quad 1.4
	Itanium dual 1.4 GHz
	Dual P4-3.0 GHz+HT
	Single P4-3.0 GHz+HT

Alas, my single node performance testing on the G5 has been foiled by
my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem 
working.

Anyone else have MPICH and shared memory working on OSX?  Or maybe a dual
g5 linux account for an evening of benchmarking?

Normally using ch_p4 and localhost wouldn't be to big a deal, but
ping localhost on OSX is something like 40 times than linux, mpich with
ch_p4 on OSX is around 20 times worse than linux with shared memory.

> testing?  I've been charged with putting together a small cluster and 
> have been asked to look into G5 systems as well (I guess 64 bit powerPC 
> really....)

Assuming all the applications and tools work under all environments your
considering I'd figure out what interconnect you want to get first.

-- 
Bill Broadley
Computational Science and Engineering
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Wed Feb 25 03:34:38 2004
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 25 Feb 2004 09:34:38 +0100 (CET)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <1077542843.26492.12.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.21.0402250910490.393-100000@localhost>

On Mon, 23 Feb 2004, Daniel Fernandez wrote:
[...]
> On the other hand, is possible to deviate smartd log to a specific file
> and check it regularly when it's updated, adding this parameter to
> smartd:
>
> 	-l facility
>
> Of course some syslog.conf modifying will be needed to instruct syslogd
> to log on a specific file from the "facility" specified.

Thanks for the hints, I was not yet aware of the -l and -M flags.

Still, I think directly calling "smartctl" from a cron job is the
better solution. With just smartd and the flags above, you still won't
get any updates if the smartd simply dies and you won't even notice,
because grep simply finds the last entry in the log. Besides, you
still have the problem of log rotate (except if you let grow your log
file forever...).

Regards,
Felix

---
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Wed Feb 25 04:18:14 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Wed, 25 Feb 2004 10:18:14 +0100 (CET)
Subject: [Beowulf] Subclusters...
In-Reply-To: <200402242339.19310.raiders@phreaker.net>
Message-ID: <Pine.LNX.4.44.0402251011160.19127-100000@druifje.clustervision.com>

On Tue, 24 Feb 2004 raiders at phreaker.net wrote:

> - groups of cluster-users will be restricted to their own clusters normally 
> (some exclusions may apply)
> - SGE/PBS, GbE etc are standard choices
> 
A very quick answer from me is to think of the whole thing as one cluster,
then install it.

In SGE, it is possible to have groups of users defined, and to allow only
certain groups/users access to each queue.
So (say) you could have a Physics group, a Chemistry group etc.

As for access to the public facing nodes, again quickly off the top of my
head, you just need authentication which allows logins only from the
appropriate group.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Wed Feb 25 07:01:02 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Wed, 25 Feb 2004 13:01:02 +0100 (CET)
Subject: [Beowulf] FOSDEM talk
Message-ID: <Pine.LNX.4.44.0402251249540.19985-100000@druifje.clustervision.com>

There is a current thread on SMART usage.
There was also a thread about six months ago on lm_sensors,
about the output format of sensors, and how one has to parse it.

Sorry if this message is a bit of a ramble.
At FOSDEM over the weekend I went to a talk by Robert Love
on his work on Linux kernel and destop integration,
on HAL and DBUS.
One slide made me sit up and take notice, as he had an example
of a kernel message saying 'overheating'.
The message format was something like an SNMP OID,
as I remember  org.kernel.processor.overheating
(or something like that).

One could then think of a process listening on the netlink socket,
generating (for example) an SNMP trap on receiving events of this
category.
A better way of doing things than running sensors periodically then
parsing the output.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Wed Feb 25 04:55:22 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Wed, 25 Feb 2004 10:55:22 +0100 (CET)
Subject: [Beowulf] Subclusters...
In-Reply-To: <200402242339.19310.raiders@phreaker.net>
Message-ID: <Pine.LNX.4.44.0402251043200.19127-100000@druifje.clustervision.com>

On Tue, 24 Feb 2004 raiders at phreaker.net wrote:

> We are on a project as described below:
> 
> - IA32 linux cluster for general parallel programming
> - five head nodes, each head node will have about 15 compute nodes and 
> dedicated storage 
> - groups of cluster-users will be restricted to their own clusters normally 
> (some exclusions may apply)
> - SGE/PBS, GbE etc are standard choices
> 
> But the people in power want one single software or admin console (cluster 
> toolkit?) to manage the entire cluster from one adm station (which may or may 
> not be one of the head nodes).

Thinking about this, the way I would architect things is to stop thinking
of subclusters - yet of course give the users their allocation of 
resources.

So, choose your cluster install method of choice.
Have one admin/master node and install all 75 nodes.

Have 5 public facing machines, and have logins go through a load-balancer 
or round robin. When a user logs in they get directed to the least loaded 
machine.
Why? If one machine goes down (fault or upgrade) the users still have four 
machines. They don't "see" this as you have entries in the DNS for e.g.
necromancy.hogwarts defence-darkarts.hogwarts potions.hogwarts
spells.hogwarts magical-creatures.hogwarts
all pointing the same way.

It would be better to have 5 separate storage nodes, but the login
machines in your scenario will have to do that job also. Just allocate
storage per group.

The 75 compute nodes are installed within the cluster.

Now, at a first pass you want to 'saw things up' into 15 node lumps.
This can be done easily - just put a queue or queues on each and allow 
only certain groups access.

But I will contend this is a bad idea. Batch queueing systems have 
facilities to look after fair shares of resources between groups.

Say you have the 5 separate groups scenario.
Say today Professor Snape isn't doing any potions work.
The 15 potions machines will lie idel, while there are plenty of jobs in
necromancy just dying to run.

Use the fairshare in SGE or LSF. 
Each group will get their allocated share of CPU.
You'll also have redundancy - so that you can take machines out for
maintenance/repairs without impacting any one group, ie. the load is 
shared across 75 machines not 5.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From patricka at its.uct.ac.za  Wed Feb 25 09:01:49 2004
From: patricka at its.uct.ac.za (Patrick)
Date: Wed, 25 Feb 2004 16:01:49 +0200
Subject: [Beowulf] G5 cluster for testing
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu>
Message-ID: <026001c3fba7$e9af5d00$a61b9e89@nawty>

Has anyone here actually tried out Xgrid ? Apples grid stuff. It seems to be
not so fussy in regards to the type of macs you attach and suchlike ? as
well as them being configurable via Zeroconf.

P
----- Original Message ----- 
From: "Bill Broadley" <bill at cse.ucdavis.edu>
To: "Orion Poplawski" <orion at cora.nwra.com>
Cc: <beowulf at beowulf.org>
Sent: Wednesday, February 25, 2004 9:10 AM
Subject: Re: [Beowulf] G5 cluster for testing


> On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote:
> > Anyone (vendors?) out there have a G5 cluster available for some
>
> For the most part I'm finding that cluster performance is mostly
> predictable by single node performance, and the scaling of the
> interconnect.  At least as an approximation, I'm going to use to find
> a good place to start for my next couple cluster designs.
>
> I'm current benchmarking:
> Dual G5
> Opteron duals (1.4, 1.8, and 2.2)
> Opteron quad 1.4
> Itanium dual 1.4 GHz
> Dual P4-3.0 GHz+HT
> Single P4-3.0 GHz+HT
>
> Alas, my single node performance testing on the G5 has been foiled by
> my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem
> working.
>
> Anyone else have MPICH and shared memory working on OSX?  Or maybe a dual
> g5 linux account for an evening of benchmarking?
>
> Normally using ch_p4 and localhost wouldn't be to big a deal, but
> ping localhost on OSX is something like 40 times than linux, mpich with
> ch_p4 on OSX is around 20 times worse than linux with shared memory.
>
> > testing?  I've been charged with putting together a small cluster and
> > have been asked to look into G5 systems as well (I guess 64 bit powerPC
> > really....)
>
> Assuming all the applications and tools work under all environments your
> considering I'd figure out what interconnect you want to get first.
>
> -- 
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Feb 25 08:22:15 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Wed, 25 Feb 2004 21:22:15 +0800 (CST)
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu>
Message-ID: <20040225132215.21497.qmail@web16813.mail.tpe.yahoo.com>

I've heard that LAM works better with OSX.

Andrew.


 --- Bill Broadley <bill at cse.ucdavis.edu> ????> On

> Alas, my single node performance testing on the G5
> has been foiled by
> my inability to get MPICH, OSX, and ./configure
> --with-device=ch_shmem 
> working.
> 
> Anyone else have MPICH and shared memory working on
> OSX?  Or maybe a dual
> g5 linux account for an evening of benchmarking?
> 
> Normally using ch_p4 and localhost wouldn't be to
> big a deal, but
> ping localhost on OSX is something like 40 times
> than linux, mpich with
> ch_p4 on OSX is around 20 times worse than linux
> with shared memory.
> 
> > testing?  I've been charged with putting together
> a small cluster and 
> > have been asked to look into G5 systems as well (I
> guess 64 bit powerPC 
> > really....)
> 
> Assuming all the applications and tools work under
> all environments your
> considering I'd figure out what interconnect you
> want to get first.
> 
> -- 
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Feb 25 08:27:54 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Wed, 25 Feb 2004 21:27:54 +0800 (CST)
Subject: [Beowulf] Subclusters...
In-Reply-To: <200402242339.19310.raiders@phreaker.net>
Message-ID: <20040225132754.14108.qmail@web16809.mail.tpe.yahoo.com>

GridEngine has the concept of a CELL.

It is not well documented, but it works like pointing
to a different cell gives you a different
configuration, ie. different subcluster.

When you setup SGE, it will ask you for the name of
the cell, so on the same head node, each time you run
the sge install script, use a different cell name.
This way you will get 5 different SGE clusters
controlled by the same headnode.

Better ask on the SGE mailing list since I've never
played around with this too much.

http://gridengine.sunsource.net/project/gridengine/maillist.html

Andrew.


 --- raiders at phreaker.net ????> We are on a
project as described below:
> 
> - IA32 linux cluster for general parallel
> programming
> - five head nodes, each head node will have about 15
> compute nodes and 
> dedicated storage 
> - groups of cluster-users will be restricted to
> their own clusters normally 
> (some exclusions may apply)
> - SGE/PBS, GbE etc are standard choices
> 
> But the people in power want one single software or
> admin console (cluster 
> toolkit?) to manage the entire cluster from one adm
> station (which may or may 
> not be one of the head nodes).
> 
> I looked around and could not find any suitable
> solution (ROCKS, oscar, etc). 
> ROCKS, oscar etc can manage only one cluster at a
> time and cannot handle 
> subclusters. (I might be wrong) 
> 
>  I believe that only custom programming can help.
> Appreciate any expert 
> opinion
> 
> Thanks,
> Shawn
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ashley at quadrics.com  Wed Feb 25 08:03:02 2004
From: ashley at quadrics.com (Ashley Pittman)
Date: Wed, 25 Feb 2004 13:03:02 +0000
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu>
References: <403BDB8B.7060904@cora.nwra.com>
	 <20040225071039.GA29125@cse.ucdavis.edu>
Message-ID: <1077714182.656.235.camel@ashley>

On Wed, 2004-02-25 at 07:10, Bill Broadley wrote:
> On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote:
> > Anyone (vendors?) out there have a G5 cluster available for some 
> 
> For the most part I'm finding that cluster performance is mostly
> predictable by single node performance, and the scaling of the
> interconnect.

There is a third issue here which you've missed which is that
interconnect performance can depends on the PCI bridge that it's plugged
into.  It would be more correct to say that performance is predictable
by dual-node performance and scaling of the interconnect.  Of course
this may not make a difference for Ethernet or even gig-e but it does
matter at the high end.

Ashley,

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at pathscale.com  Wed Feb 25 14:32:12 2004
From: lindahl at pathscale.com (Greg Lindahl)
Date: Wed, 25 Feb 2004 11:32:12 -0800
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <1077714182.656.235.camel@ashley>
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <1077714182.656.235.camel@ashley>
Message-ID: <20040225193212.GA14558@greglaptop.internal.keyresearch.com>

On Wed, Feb 25, 2004 at 01:03:02PM +0000, Ashley Pittman wrote:

> There is a third issue here which you've missed which is that
> interconnect performance can depends on the PCI bridge that it's plugged
> into.

Doesn't the G5 have exactly one chipset implementation available?

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From pesch at attglobal.net  Thu Feb 26 05:38:09 2004
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Thu, 26 Feb 2004 02:38:09 -0800
Subject: [Beowulf] Flashmobcomputing
References: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private.net>
Message-ID: <403DCC91.5A10A77B@attglobal.net>

Nothing moves faster than the speed of light - with the exception of bad news (according to the late Douglas
Adams); therefore, at the grid nirvana, bad news must get increasingly more bad. Which leads me to the hypothesis
that nirvana is that locus at the irs which stores the access codes for the pentium microcode backdoors...

"Robert G. Brown" wrote:

> On Mon, 23 Feb 2004, Greg Lindahl wrote:
>
> > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> >
> > > of course, in the grid nirvana,
> > > all computers would have multiple ports of infiniband,
> > > and the word would be 5 us across ;)
> >
> > In grid nirvana, the speed of light would rise with Moore's Law.
>
> I'll have to think about that one.
>
> Exponential growth of the speed of light.  Hmmm.  Some sort of
> inflationary model?  Space flattening towards non-relativistic
> classical?  The physics of Nirvana would be veeeery interesting...
>
> :-)
>
>    rgb
>
> --
> Robert G. Brown                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Feb 25 22:02:03 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 26 Feb 2004 14:02:03 +1100
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <403DCC91.5A10A77B@attglobal.net>
References: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private.net> <403DCC91.5A10A77B@attglobal.net>
Message-ID: <200402261402.05015.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 26 Feb 2004 09:38 pm, pesch at attglobal.net wrote:

> Nothing moves faster than the speed of light - with the exception of bad
> news (according to the late Douglas Adams);

The only things known to go faster than ordinary light is monarchy, according 
to the philosopher Ly Tin Weedle. He reasoned like this: you can't have more 
than one king, and tradition demands that there is no gap between kings, so 
when a king dies the succession must therefore pass to the heir 
instantaneously. Presumably, he said, there must be some elementary particles 
- -- kingons, or possibly queons -- that do this job, but of course succession 
sometimes fails if, in mid-flight, they strike an anti-particle, or 
republicon. His ambitious plans to use his discovery to send messages, 
involving the careful torturing of a small king in order to modulate the 
signal, were never fully expanded because, at that point, the bar closed.

- -- (Terry Pratchett, Mort)

courtesy of:

	http://www.co.uk.lspace.org/books/pqf/mort.html

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAPWGrO2KABBYQAh8RAsm3AJ4zV3fEk8q/8Jm/zqY4xiBzGvKj4ACfeT+N
3NhDhvgiJyhukmnzBFHUaMQ=
=NgG+
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at cse.ucdavis.edu  Wed Feb 25 21:32:59 2004
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Wed, 25 Feb 2004 18:32:59 -0800
Subject: [Beowulf] Cray buys Octigabay
Message-ID: <20040226023258.GA9211@cse.ucdavis.edu>


An interesting development:
	http://www.octigabay.com/
	http://www.octigabay.com/newsEvents/cray_release.htm
	http://www.cray.com/
	http://www.cray.com/media/2004/february/octigabay.html

-- 
Bill Broadley
Computational Science and Engineering
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hpcatcnc at yahoo.com  Thu Feb 26 01:56:09 2004
From: hpcatcnc at yahoo.com (prakash borade)
Date: Wed, 25 Feb 2004 22:56:09 -0800 (PST)
Subject: [Beowulf] predifined nodes for a job
Message-ID: <20040226065609.19075.qmail@web21507.mail.yahoo.com>

can any body tell how can i allot
some fix predefined machines from my cluster to the
job

i have tried using option -machinefile mcfile
where mcfile is the fiel in a local dir contaning
required machinnames 

also i don't want to use the machine from which i will
issue mpirun command and the mpich is installed opn
thish machine

do i have any solution  fro this

__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hpcatcnc at yahoo.com  Thu Feb 26 02:04:08 2004
From: hpcatcnc at yahoo.com (prakash borade)
Date: Wed, 25 Feb 2004 23:04:08 -0800 (PST)
Subject: [Beowulf] predifined nodes for a job
Message-ID: <20040226070408.73987.qmail@web21509.mail.yahoo.com>

can any body tell how can i allot
some fix predefined machines from my cluster to the
job

i have tried using option -machinefile mcfile
where mcfile is the fiel in a local dir contaning
required machinnames 

also i don't want to use the machine from which i will
issue mpirun command and the mpich is installed opn
thish machine

do i have any solution  fro this

__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sdutta at deas.harvard.edu  Thu Feb 26 07:43:51 2004
From: sdutta at deas.harvard.edu (Suvendra Nath Dutta)
Date: Thu, 26 Feb 2004 07:43:51 -0500 (EST)
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu>
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu>
Message-ID: <Pine.GSO.4.58.0402260735480.27557@mass>

I fought for a while to get a OSX cluster up, precisely to test the G5
performance. I had lots of problems with setting up NFS and setting
up MPICH to use shared memory on the dual processors. I was able to take
advantage of the firewire networking built into OS X. We were taking the
harder route of staying away from all non-open source tools to do NFS
(NFSManager) or MPI (Pooch).

As was pointed out in another message, we are mostly keen on just testing
performance of three applications that we will run on our cluster rather
than HPL numbers. Finally we gave up the struggle. We are now working with
Apple to benchmark on an existing setup instead of us trying to set
everything up ourselves. Unfortunately there isn't a howto on doing this
yet.

I'll post numbers when we get it.

Suvendra.


On Tue, 24 Feb 2004, Bill Broadley wrote:

> On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote:
> > Anyone (vendors?) out there have a G5 cluster available for some
>
> For the most part I'm finding that cluster performance is mostly
> predictable by single node performance, and the scaling of the
> interconnect.  At least as an approximation, I'm going to use to find
> a good place to start for my next couple cluster designs.
>
> I'm current benchmarking:
> 	Dual G5
> 	Opteron duals (1.4, 1.8, and 2.2)
> 	Opteron quad 1.4
> 	Itanium dual 1.4 GHz
> 	Dual P4-3.0 GHz+HT
> 	Single P4-3.0 GHz+HT
>
> Alas, my single node performance testing on the G5 has been foiled by
> my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem
> working.
>
> Anyone else have MPICH and shared memory working on OSX?  Or maybe a dual
> g5 linux account for an evening of benchmarking?
>
> Normally using ch_p4 and localhost wouldn't be to big a deal, but
> ping localhost on OSX is something like 40 times than linux, mpich with
> ch_p4 on OSX is around 20 times worse than linux with shared memory.
>
> > testing?  I've been charged with putting together a small cluster and
> > have been asked to look into G5 systems as well (I guess 64 bit powerPC
> > really....)
>
> Assuming all the applications and tools work under all environments your
> considering I'd figure out what interconnect you want to get first.
>
> --
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at cse.ucdavis.edu  Thu Feb 26 06:55:22 2004
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Thu, 26 Feb 2004 03:55:22 -0800
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <1077714182.656.235.camel@ashley>
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <1077714182.656.235.camel@ashley>
Message-ID: <20040226115522.GA12286@cse.ucdavis.edu>

On Wed, Feb 25, 2004 at 01:03:02PM +0000, Ashley Pittman wrote:
> There is a third issue here which you've missed which is that
> interconnect performance can depends on the PCI bridge that it's plugged
> into.  It would be more correct to say that performance is predictable
> by dual-node performance and scaling of the interconnect.  Of course
> this may not make a difference for Ethernet or even gig-e but it does
> matter at the high end.

Take this chart for instance:
	http://www.myri.com/myrinet/PCIX/bus_performance.html

On any decent size cluster the node performance or interconnect
performance is likely to be significantly larger effects on cluster
performance then any of the differences on that chart.

Or maybe your talking about sticking $1200 Myrinet cards in a 
133 MB/sec PCI slot?

Don't forget peak bandwidth measurements assume huge (10000-64000 byte
packets), latency tolerance, and zero computation.  Not exactly the use
I'd expect in a typical production cluster.

So my suggestion is:
#1  Pick your application(s), this is why your buying a cluster right?
#2  For compatible nodes pick the node with the best perf
    or price/perf.
#3  For compatible interconnects pick the one with the best
    scaling or price/scaling for the number of nodes you can afford/fit.
#3  If you get a choice of PCI-X bridges, sure consult the URL above
    and pick the fastest one.

-- 
Bill Broadley
Computational Science and Engineering
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 26 11:51:42 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 26 Feb 2004 11:51:42 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <403DCC91.5A10A77B@attglobal.net>
Message-ID: <Pine.LNX.4.44.0402261148060.6202-100000@lilith.rgb.private.net>

On Thu, 26 Feb 2004 pesch at attglobal.net wrote:

> Nothing moves faster than the speed of light - with the exception of bad
> news (according to the late Douglas Adams); therefore, at the grid
> nirvana, bad news must get increasingly more bad. Which leads me to the
> hypothesis that nirvana is that locus at the irs which stores the access
> codes for the pentium microcode backdoors...

This is not exactly correct.  Or rather, it might well be true
(something mandala-like in the image of that locus:-) but isn't strictly
logical or on topic for the list.  The correct LIST conclusion is that
for us to build transluminal clusters, we need to insure that all the
messages (news) carried are bad.

Now, who is going to develop BMPI (Bad Message Passing Interface)?  Any
volunteers?

;-)

   rgb


-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From choyhauyan at yahoo.com  Thu Feb 26 00:51:24 2004
From: choyhauyan at yahoo.com (choy hau yan)
Date: Wed, 25 Feb 2004 21:51:24 -0800 (PST)
Subject: [Beowulf] shared distributed memory ?
Message-ID: <20040226055124.33033.qmail@web41313.mail.yahoo.com>

I am a user for scyld beowulf cluster. I use mpi  for
parallel computing.I have some question:

> I got 2 processors that in shared memory and then
> connect with TCP/IP to another 2 processors in
shared
> memory.
>
> I use mpisend/recv for communication, but why can't
I
> call this shared distributed memory?
> The speedup with this architecture is very low.why?
>
> speedup:
> 2 processor: 1.61
> 3 processor: 2.31
> 4 processor: 2.30
> actually with shared memory, the speedup is more
high
> that distributed becasue almost no cos communication
> in shared memory.right? Hope that some one can
answer my question. thanks..
>


__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From moor007 at bellsouth.net  Thu Feb 26 16:51:03 2004
From: moor007 at bellsouth.net (moor007 at bellsouth.net)
Date: Thu, 26 Feb 2004 15:51:03 -0600
Subject: [Beowulf] Cluster HW
Message-ID: <200402261551.03785.moor007@bellsouth.net>

I apologize for having to ask in this forum...but I really do not know where 
to begin.  I just upgraded my interconnects from the Dolphinics (SCI) and 
want to sell them (rarely used) because only one of the four applications I 
use would utilize them.  Is there a forum/market, besides Ebay, for this type 
of specialty HW?

Tim

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Feb 26 17:34:08 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 27 Feb 2004 09:34:08 +1100
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <Pine.GSO.4.58.0402260735480.27557@mass>
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <Pine.GSO.4.58.0402260735480.27557@mass>
Message-ID: <200402270934.10022.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 26 Feb 2004 11:43 pm, Suvendra Nath Dutta wrote:

> We were taking the harder route of staying away from all non-open source
> tools to do NFS (NFSManager) or MPI (Pooch).

There is also Black Lab Linux from Terrasoft which build clusters on YDL with 
BProc, MPICH, etc for Macs. No idea whether it supports G5's or how FOSS it 
is though..

cheers!
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD4DBQFAPnRgO2KABBYQAh8RAq6sAJMEJwyT1vn3MV9RM/Fwpy6gs4CZAJ9QAGf2
oyEbIVcHgTfcs+Jk2xb7dg==
=92C8
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Thu Feb 26 19:01:51 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Thu, 26 Feb 2004 19:01:51 -0500 (EST)
Subject: [Beowulf] shared distributed memory ?
In-Reply-To: <20040226055124.33033.qmail@web41313.mail.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402261854260.29407-100000@boltzmann-internal>


A few questions:

What are your processor speeds?

What is your interconnect?

What is your application?

Having to communicate with another node vs the same node
is not the same thing. (ping local host and ping the other
node) Obviously your application is sensitive to the interconnect
(either bandwidth of latency)

Really fast processors and a slow interconnect usually means poor
scalability for some applications. 

Doug

On Wed, 25 Feb 2004, choy hau yan wrote:

> I am a user for scyld beowulf cluster. I use mpi  for
> parallel computing.I have some question:
> 
> > I got 2 processors that in shared memory and then
> > connect with TCP/IP to another 2 processors in
> shared
> > memory.
> >
> > I use mpisend/recv for communication, but why can't
> I
> > call this shared distributed memory?
> > The speedup with this architecture is very low.why?
> >
> > speedup:
> > 2 processor: 1.61
> > 3 processor: 2.31
> > 4 processor: 2.30
> > actually with shared memory, the speedup is more
> high
> > that distributed becasue almost no cos communication
> > in shared memory.right? Hope that some one can
> answer my question. thanks..
> >
> 
> 
> __________________________________
> Do you Yahoo!?
> Get better spam protection with Yahoo! Mail.
> http://antispam.yahoo.com/tools
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From graham.mullier at syngenta.com  Fri Feb 27 05:12:18 2004
From: graham.mullier at syngenta.com (graham.mullier at syngenta.com)
Date: Fri, 27 Feb 2004 10:12:18 -0000
Subject: [Beowulf] Flashmobcomputing
Message-ID: <0B27450D68F1D511993E0001FA7ED2B30437D3BD@ukjhmbx12.ukjh.zeneca.com>

Hmm, presumably a 'bad' message will need to have the Evil Bit set
(http://www.ietf.org/rfc/rfc3514.txt)?

Graham

-----Original Message-----
From: Robert G. Brown [mailto:rgb at phy.duke.edu]
[...] The correct LIST conclusion is that
for us to build transluminal clusters, we need to insure that all the
messages (news) carried are bad.

Now, who is going to develop BMPI (Bad Message Passing Interface)?  Any
volunteers?

;-)
[...]
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From anantanagb at yahoo.com  Fri Feb 27 04:49:38 2004
From: anantanagb at yahoo.com (anantanag bhat)
Date: Fri, 27 Feb 2004 01:49:38 -0800 (PST)
Subject: [Beowulf] P4_error: net_recv read : probable EOF on socket:1
Message-ID: <20040227094938.32769.qmail@web21322.mail.yahoo.com>

Sir,
I have installed MPICH on my 8 processor Cluster.
Every thing was running fine for first few days. Now
if I starts the run in the node4, it is getting stuck.
after 2hour. the error in the .out file is as below
"P4_error: net_recv read : probable EOF on socket:1"

But it is not the same in first 3 nodes. In these runs
are going fine.
Can anybody please help me to solve this.
Thanks in advance


__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Feb 27 08:11:43 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 27 Feb 2004 08:11:43 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B30437D3BD@ukjhmbx12.ukjh.zeneca.com>
Message-ID: <Pine.LNX.4.44.0402270801240.6948-100000@lilith.rgb.private.net>

On Fri, 27 Feb 2004 graham.mullier at syngenta.com wrote:

> Hmm, presumably a 'bad' message will need to have the Evil Bit set
> (http://www.ietf.org/rfc/rfc3514.txt)?

You know, I just joined the ietf.org list a week or two ago to see if
there was any possibility of leveraging their influence on e.g. AV
vendors to get them to stop mailing bounce messages back to the "From"
address on viruses, given that there hasn't been a virus that hasn't
forged its From header to an innocent third party for several years now.

Finding myself sucked into an endless discussion with people who want
the ietf to issue an RFC to call for digitally signing all mail and
using said signatures to drive all spam white/blacklisting (imagine the
keyservice THAT would require and the gazillion dollar profits it would
generate) I have gradually started to wonder if the ietf has degenerated
into a kind of a cruel joke.

This RFC, however, lifts my spirits and renews my confidence that the
original luminaries that designed in the Internet have not fully stopped
glowing in the chaotic darkness that surrounds them.

Armed with the complete confidence that my design is based on both sound
protocol and Dr. D. Adams' valuable empirical observation about bad
news, I will start work on a PVM version that sets the Evil Bit right
away.  I fully expect to win a Nobel Prize from the proof that
communications are transluminal in the resulting cluster.  It must be
that the Evil Bit is somehow a time-reversal bit or a tachyonic bit --
Bad News must somehow propagate backwards in time from the event.

I most certainly will acknowledge all of the contributions of all you
"little people" when I receive my invitation to Stockholm.

I'm so happy.

Sniff.

   rgb

> 
> Graham
> 
> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> [...] The correct LIST conclusion is that
> for us to build transluminal clusters, we need to insure that all the
> messages (news) carried are bad.
> 
> Now, who is going to develop BMPI (Bad Message Passing Interface)?  Any
> volunteers?
> 
> ;-)
> [...]
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Feb 27 12:38:09 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 27 Feb 2004 18:38:09 +0100 (CET)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402270801240.6948-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0402271833190.9065-100000@druifje.clustervision.com>

On Fri, 27 Feb 2004, Robert G. Brown wrote:

> 
> Armed with the complete confidence that my design is based on both sound
> protocol and Dr. D. Adams' valuable empirical observation about bad
> news, I will start work on a PVM version that sets the Evil Bit right
> away.  I fully expect to win a Nobel Prize from the proof that
> communications are transluminal in the resulting cluster.  It must be
> that the Evil Bit is somehow a time-reversal bit or a tachyonic bit --
> Bad News must somehow propagate backwards in time from the event.
> 
Once this phase of the research has been completed,
can we make an application to the NSF for an extension into using
SEP fields for systems management?
http://www.fact-index.com/s/so/somebody_else_s_problem_field.html

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at comcast.net  Fri Feb 27 16:23:23 2004
From: laytonjb at comcast.net (Jeffrey B. Layton)
Date: Fri, 27 Feb 2004 16:23:23 -0500
Subject: [Beowulf] Single Processor vs SMP
In-Reply-To: <20040227092124.GA8410@blackTiger>
References: <20040227092124.GA8410@blackTiger>
Message-ID: <403FB54B.5030401@comcast.net>

Paulo,

   I hoping someone will jump and say that the answer depends
upon the code(s) you're running. If possible test your codes on
a dual CPU box with one copy running and then two copies
running (make sure one copy is on one CPU). Test this on the
architectures you are interested in. If you can also test on
multiple nodes with some kind of interconnect to judge how
the code(s) scale with number of nodes and with interconnect.
   For example, at work I use a code that we tested on single
and dual CPU machines. It was an older PIII/500 box that used
the old Intel 440BX chipset (if I remember correctly). We
found that running two copies only resulted in a 30% penalty
for running duals.
   We also tested on a cluster with Myrinet and GigE. Myrinet
only gave this code about a 2% decrease in wall clock time
(we measure speed in wall clock time since that is what is
important to us).
   Then we got quotes for machines and did the price/performance
calculation and determined which cluster was the best.
    I highly recommend doing the same thing for your code(s).
Be sure to check out Opterons since they have an interesting
memory subsytem that should allow your codes to have little
penalty in running on dual machines ("should" is the operative
word. You should test your codes to determine if this is true).

Good Luck!

Jeff


>Hello,
>
>I'm currently working in a physics department that is in the process of
>building a high performance Beowulf cluster and I have some doubts in
>terms of what type of hardware to acquire.
>
>The programming systems that will be used are MPI and HPF. Does anyone
>knows any study comparing the performance of single cpu machines vs smp
>machines or even between the several cpu's available (intel p4, amd athlon,
>powerpc g5, ...)?
>
>Thanks for any advice
>  
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From paulojjs at bragatel.pt  Fri Feb 27 04:21:24 2004
From: paulojjs at bragatel.pt (Paulo Silva)
Date: Fri, 27 Feb 2004 09:21:24 +0000
Subject: [Beowulf] Single Processor vs SMP
Message-ID: <20040227092124.GA8410@blackTiger>

Hello,

I'm currently working in a physics department that is in the process of
building a high performance Beowulf cluster and I have some doubts in
terms of what type of hardware to acquire.

The programming systems that will be used are MPI and HPF. Does anyone
knows any study comparing the performance of single cpu machines vs smp
machines or even between the several cpu's available (intel p4, amd athlon,
powerpc g5, ...)?

Thanks for any advice
-- 
Paulo Jorge Jesus Silva
perl -we 'print "paulojjs".reverse "\ntp.letagarb@"'

The best you get is an even break.
		-- Franklin Adams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20040227/74f316b3/attachment.sig>

From ddw at dreamscape.com  Sat Feb 28 01:19:37 2004
From: ddw at dreamscape.com (Daniel Williams)
Date: Sat, 28 Feb 2004 01:19:37 -0500
Subject: [Beowulf] Flashmobcomputing - the evil bit
References: <200402271705.i1RH5vh16216@NewBlue.scyld.com>
Message-ID: <404032F8.F5DDDA59@dreamscape.com>

The problem with this idea is that Linux is too good at dealing with flawed or
malicious data, so even if the evil bit is set, it still would not qualify as
"bad news", and thus would not travel superluminally.  Consequently, I would
speculate that the only system that could communicate superluminally is one
running some form of Winblows, since *any* data, of *any* kind, with or
without the evil bit set is bad news for MS operating systems, and likely to
cause a crash.  The problem with superluminal cluster computing then becomes
obvious - you can't get any actual useful calculation done faster than
lightspeed, because the only operating systems that work at that speed can't
do any useful work.

DDW


> Hmm, presumably a 'bad' message will need to have the Evil Bit set
> (http://www.ietf.org/rfc/rfc3514.txt)?
> 
> Graham
> 
> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> [...] The correct LIST conclusion is that
> for us to build transluminal clusters, we need to insure that all the
> messages (news) carried are bad.
> 
> Now, who is going to develop BMPI (Bad Message Passing Interface)?  Any
> volunteers?
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sun Feb  1 00:39:40 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sun, 1 Feb 2004 13:39:40 +0800 (CST)
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
Message-ID: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>

http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html

2.6 looks very promising, wondering when distributions
will include it.

Also ia64 performance looks bad when compared to Xeon
or amd64. Intel switching to amd64 is a good choice
;->

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From amacater at galactic.demon.co.uk  Sun Feb  1 05:40:34 2004
From: amacater at galactic.demon.co.uk (Andrew M.A. Cater)
Date: Sun, 1 Feb 2004 10:40:34 +0000
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
Message-ID: <20040201104034.GA9280@galactic.demon.co.uk>

On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote:
> http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
> 
> 2.6 looks very promising, wondering when distributions
> will include it.
> 
Debian unstable does today.  The new installer for the next release
of Debian (currently Debian testing) which is in beta test may well
include a 2.6 kernel option.

> Also ia64 performance looks bad when compared to Xeon
> or amd64. Intel switching to amd64 is a good choice
> ;->
> 
Newsflash: Severe weather means Hell freezes over, preventing flying 
pigs from taking off :)

IIRC: Since you seem well aware of SPBS / storm - is the newest storm 
release fully free / GPL'd such that I can use it anywhere?

Thanks,

Andy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Sun Feb  1 05:57:37 2004
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Sun, 1 Feb 2004 02:57:37 -0800 (PST)
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <1075512676.4915.207.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.04.10402010204440.3455-100000@c-24-18-245-161.client.comcast.net>

> I could easily optimize it more (do the work on a larger buffer at a
> once), but I think enough waste heat has been created here.  This is a
> simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3.

Enough time wasted on finding different solutions to a simple problem?  Surely
not.  Let me toss my hat into the ring:

              Awk     Perl       C    My program (C)
wrnpc10.txt  1.771   1.125   0.506     0.164
shaks12.txt  3.055   1.877   0.955     0.243
big.txt     20.339  12.792   5.858     1.196
vbig.txt   101.466  63.770  29.079     5.666

All times are from a dual PIII-1GHz on a ServerWorks board with 1GB dual
channel PC133 ram.  Each time is the best of three runs and is wall time.

The awk version is by Selva Nair, Perl by Joe Landman, C version by Robert G
Brown.  The Java version isn't portable enough for me to run (go Java!) and I
didn't see the source for a C++/STL version.  Compiler used was gcc 2.96,
awk was 3.1.0, and perl was 5.6.1.

The actual results for shaks12.txt, which are of course never the same:
version  total   unique
awk     902299    31384
perl              23903
C       902299    37499
My      906912    27321
wc      901325

I considered words to be formed from 0-9, a-z, A-Z, and '.  Everything is
lower cased.  The shaks12.txt is complicated by the use of the single quote
for as both for quotations and for contractions.  I also have the list of
words and counts, sorted no less, but do not print it.

I'll give you guys a few days, and see if anyone finds a solution before I
reveal my secrets.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sun Feb  1 06:51:22 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sun, 1 Feb 2004 12:51:22 +0100 (CET)
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402011249380.29291-100000@druifje.clustervision.com>

On Sun, 1 Feb 2004, [big5] Andrew Wang wrote:

> http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
> 
> 2.6 looks very promising, wondering when distributions
> will include it.
> 
It's already possible to use yum to get a 2.6 kernel for Fedora.
(Must start testing it myself).

This prompted me to look at the Fedora roadmap:

http://fedora.redhat.com/participate/schedule/

Looks like 2.6 will be in Fedora 2, scheduled for April.

And very interestingly:
"and integrating work on other architectures (at least AMD64, and possibly 
also SPARC)."

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From klamman.gard at telia.com  Sun Feb  1 07:30:52 2004
From: klamman.gard at telia.com (Per Lindstrom)
Date: Sun, 01 Feb 2004 13:30:52 +0100
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk>
Message-ID: <401CF17C.8080706@telia.com>

I have experienced some problems to compile SMP support for the 
2.6.1-kernel on my Intel Xeon based workstation:
M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
Chipset: Intel 7505
CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB
Graph: GeForce FX 5200 128MB

The SMP support works fine all the way up to kernel 2.4.22 but when 
there is stop for the XEON.

The SMP support works fine for the Intel Tualatin workstation all the 
way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to 
build a 2.6.0.

Please advice if some one have solved this problem.

Best regards
Per Lindstrom
.
.
Andrew M.A. Cater wrote:

>On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote:
>  
>
>>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
>>
>>2.6 looks very promising, wondering when distributions
>>will include it.
>>
>>    
>>
>Debian unstable does today.  The new installer for the next release
>of Debian (currently Debian testing) which is in beta test may well
>include a 2.6 kernel option.
>
>  
>
>>Also ia64 performance looks bad when compared to Xeon
>>or amd64. Intel switching to amd64 is a good choice
>>;->
>>
>>    
>>
>Newsflash: Severe weather means Hell freezes over, preventing flying 
>pigs from taking off :)
>
>IIRC: Since you seem well aware of SPBS / storm - is the newest storm 
>release fully free / GPL'd such that I can use it anywhere?
>
>Thanks,
>
>Andy
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sun Feb  1 06:44:18 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sun, 1 Feb 2004 12:44:18 +0100 (CET)
Subject: [Beowulf] HVAC and room cooling...
In-Reply-To: <401C253E.9040206@obs.unige.ch>
Message-ID: <Pine.LNX.4.44.0402011243180.29291-100000@druifje.clustervision.com>

On Sat, 31 Jan 2004, Pfenniger Daniel wrote:

> 
> Note that in the responded message John was confusing N2 and NO2.

Eeek! I am outed as a physicist...
I've come out of the lab (closet). Guess I can now wear a slide rule
with pride.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Sun Feb  1 12:50:44 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Sun, 1 Feb 2004 12:50:44 -0500 (EST)
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <401CF17C.8080706@telia.com>
Message-ID: <Pine.LNX.4.44.0402011248310.16245-100000@coffee.psychology.mcmaster.ca>

> M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
> Chipset: Intel 7505
> CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
> RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB

all extremely mundane and FULLY supported.

> Graph: GeForce FX 5200 128MB

bzzt.  take it out, try again.  don't even *think* about loading the 
binary nvidia driver.

> The SMP support works fine all the way up to kernel 2.4.22 but when 
> there is stop for the XEON.

needless to say, 2.6 has been extensively tested on xeons, and it works fine.
your problem is specific to your config.

if you want help, you'll have to start by describing how it fails.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From shaeffer at neuralscape.com  Sun Feb  1 13:17:15 2004
From: shaeffer at neuralscape.com (Karen Shaeffer)
Date: Sun, 1 Feb 2004 10:17:15 -0800
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <401CF17C.8080706@telia.com>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk> <401CF17C.8080706@telia.com>
Message-ID: <20040201181715.GB8159@synapse.neuralscape.com>

On Sun, Feb 01, 2004 at 01:30:52PM +0100, Per Lindstrom wrote:
> I have experienced some problems to compile SMP support for the 
> 2.6.1-kernel on my Intel Xeon based workstation:
> M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
> Chipset: Intel 7505
> CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
> RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB
> Graph: GeForce FX 5200 128MB
> 
> The SMP support works fine all the way up to kernel 2.4.22 but when 
> there is stop for the XEON.


I am compiling linux-2.6.2-rc2 on dual XEONs with no problems. The kernels
actually run too. I'm just starting performance testing, but results are
very promising.

Thanks,
Karen

> 
> The SMP support works fine for the Intel Tualatin workstation all the 
> way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to 
> build a 2.6.0.
> 
> Please advice if some one have solved this problem.
> 
> Best regards
> Per Lindstrom
> .
> .
> Andrew M.A. Cater wrote:
> 
> >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote:
> > 
> >
> >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
> >>
> >>2.6 looks very promising, wondering when distributions
> >>will include it.
> >>
> >>   
> >>
> >Debian unstable does today.  The new installer for the next release
> >of Debian (currently Debian testing) which is in beta test may well
> >include a 2.6 kernel option.
> >
> > 
> >
> >>Also ia64 performance looks bad when compared to Xeon
> >>or amd64. Intel switching to amd64 is a good choice
> >>;->
> >>
> >>   
> >>
> >Newsflash: Severe weather means Hell freezes over, preventing flying 
> >pigs from taking off :)
> >
> >IIRC: Since you seem well aware of SPBS / storm - is the newest storm 
> >release fully free / GPL'd such that I can use it anywhere?
> >
> >Thanks,
> >
> >Andy
> >_______________________________________________
> >Beowulf mailing list, Beowulf at beowulf.org
> >To change your subscription (digest mode or unsubscribe) visit 
> >http://www.beowulf.org/mailman/listinfo/beowulf
> >
> > 
> >
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
---end quoted text---

-- 
 Karen Shaeffer
 Neuralscape, Palo Alto, Ca. 94306
 shaeffer at neuralscape.com  http://www.neuralscape.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From poobah_99 at hotmail.com  Sun Feb  1 14:24:03 2004
From: poobah_99 at hotmail.com (Ryan Kastrukoff)
Date: Sun, 01 Feb 2004 11:24:03 -0800
Subject: [Beowulf] unsubscribe universe beowulf@beowulf.org
Message-ID: <Sea2-F65n4ZHkRQq5xV0003baf7@hotmail.com>


_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail  
http://join.msn.com/?page=dept/bcomm&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Sun Feb  1 14:33:03 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Sun, 01 Feb 2004 14:33:03 -0500
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk>
Message-ID: <401D546F.7090109@scalableinformatics.com>


Andrew M.A. Cater wrote:

>
>>Also ia64 performance looks bad when compared to Xeon
>>or amd64. Intel switching to amd64 is a good choice
>>;->
>>
>>    
>>
>Newsflash: Severe weather means Hell freezes over, preventing flying 
>pigs from taking off :)
>  
>

Note:   http://www.hometownvalue.com/hell.htm   which is zip code 48169
According to weather.com, this zip code is about 27 F right now.  As 32 
F is officially "freezing over", we can with all accuracy note that 
indeed, Hell (MI) has frozen over.

Note 2:  It was quite a bit colder last week and up to yesterday where 
southeast Michigan was hovering in the low negative/positive single 
digits in degrees F.   We shouldn't complain as the folks in Minnesota 
have not seen the high side of 0 very much recently.

As for the aerodynamic porcine units, you are on your own.

Joe
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From toon at moene.indiv.nluug.nl  Sun Feb  1 10:37:37 2004
From: toon at moene.indiv.nluug.nl (Toon Moene)
Date: Sun, 01 Feb 2004 16:37:37 +0100
Subject: [Beowulf] HVAC and room cooling...
In-Reply-To: <401C2C97.8020903@tamu.edu>
References: <Pine.LNX.4.44.0401311219210.13853-100000@druifje.clustervision.com> <401BE891.708@obs.unige.ch> <401C0807.4000209@telia.com> <401C253E.9040206@obs.unige.ch> <401C2C97.8020903@tamu.edu>
Message-ID: <401D1D41.8090709@moene.indiv.nluug.nl>

Gerry Creager (N5JXS) wrote:

> That's the end of gas exchange physiology I.  There will be a short quiz 
> Monday.  We'll continue with the next module.  I encourage everyone to 
> have read the Pulmonary Medicine chapters in Harrison's for the next 
> lecture.

Hmmm, I won't hold my breath on that one :-)

-- 
Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Feb  1 15:53:54 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 1 Feb 2004 15:53:54 -0500 (EST)
Subject: [Beowulf] HVAC and room cooling...
In-Reply-To: <401D1D41.8090709@moene.indiv.nluug.nl>
Message-ID: <Pine.LNX.4.44.0402011553070.1223-100000@lilith.rgb.private.net>

On Sun, 1 Feb 2004, Toon Moene wrote:

> Gerry Creager (N5JXS) wrote:
> 
> > That's the end of gas exchange physiology I.  There will be a short quiz 
> > Monday.  We'll continue with the next module.  I encourage everyone to 
> > have read the Pulmonary Medicine chapters in Harrison's for the next 
> > lecture.
> 
> Hmmm, I won't hold my breath on that one :-)

Careful or I'll beat you with John's slide rule (what kinda physicist
uses a slide rule for anything other than a blunt instrument?;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sun Feb  1 21:35:43 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Mon, 2 Feb 2004 10:35:43 +0800 (CST)
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk>
Message-ID: <20040202023543.11015.qmail@web16807.mail.tpe.yahoo.com>

 --- "Andrew M.A. Cater" 
> IIRC: Since you seem well aware of SPBS / storm - is
> the newest storm 
> release fully free / GPL'd such that I can use it
> anywhere?

They now call it "torque", not sure when they are
going to get a new name again :(

Not sure what you mean by "use it anywhere". You can
use SPBS (yes, I like this name better) in commerical
environments. If you make modifications to SPBS, you
need to provide the source code for download.

If you want to modify the source, and sell it as a
product, you may want to use SGE.

AFAIK, SGE uses a license similar to the BSD, while
OpenPBS uses a license similar to GPL.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nixon at nsc.liu.se  Mon Feb  2 05:19:30 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Mon, 02 Feb 2004 11:19:30 +0100
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <1075566655.2560.8.camel@loiosh> (agrajag@dragaera.net's
 message of "31 Jan 2004 11:30:56 -0500")
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
	<1075566655.2560.8.camel@loiosh>
Message-ID: <m3d68xyea5.fsf@nammatj.nsc.liu.se>

Jag <agrajag at dragaera.net> writes:

> On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
>
>> NIS works fine for many purposes as well, but be warned -- in certain
>> configurations and for certain tasks it becomes a very high overhead
>> protocol.  In particular, it adds an NIS hit to every file stat, for
>> example, so that it can check groups and permissions.  
>
> A good way around this is to run nscd (Name Services Caching Daemon). 

I'm really, really suspicious against nscd. I've more than once seen
it hang on to stale information forever for no good reason at all.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Mon Feb  2 07:45:05 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Mon, 2 Feb 2004 06:45:05 -0600 (CST)
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <m3d68xyea5.fsf@nammatj.nsc.liu.se>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
 <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se>
Message-ID: <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>

Nscd is a necessary evil sometimes though.

-B

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Mon, 2 Feb 2004, Leif Nixon wrote:

> Jag <agrajag at dragaera.net> writes:
>
> > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> >
> >> NIS works fine for many purposes as well, but be warned -- in certain
> >> configurations and for certain tasks it becomes a very high overhead
> >> protocol.  In particular, it adds an NIS hit to every file stat, for
> >> example, so that it can check groups and permissions.
> >
> > A good way around this is to run nscd (Name Services Caching Daemon).
>
> I'm really, really suspicious against nscd. I've more than once seen
> it hang on to stale information forever for no good reason at all.
>
> --
> Leif Nixon                                    Systems expert
> ------------------------------------------------------------
> National Supercomputer Centre           Linkoping University
> ------------------------------------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Mon Feb  2 10:32:01 2004
From: timm at fnal.gov (Steven Timm)
Date: Mon, 2 Feb 2004 09:32:01 -0600 (CST)
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <1075730850.3936.19.camel@protein.scalableinformatics.com>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
 <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se>
 <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
 <1075730850.3936.19.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.58.0402020928020.11811@boxer.fnal.gov>

On Mon, 2 Feb 2004, Joe Landman wrote:

> I have tried to avoid NIS on linux, as it appears not to be as stable as
> needed under heavy load.  I have had customers bring it crashing down
> when it serves login information, just by running simple scripts across
> the cluster.

To clarify, the problem is when there is some cron job (or reboot)
in which a couple of hundred nodes all go after the NIS server
at once.  It's magnified by the fact that there's an NIS lookup
done even when it's a user in the local password file such as root.

The problems can be mitigated by having a lot of nodes be slaves.
At one point I had all of the nodes of my cluster be slaves.  But
the problem with that is that the transmission protocol is not
perfect and every once in a while you wind up with a slave
server that is down a map or two.

We've now shifted to pushing out our password files via rsync.

>
> I prefer pushing name service lookups through DNS, and I tend to use
> dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html).
> Setting up a full blown named/bind system for a cluster seems like
> significant overkill in most cases.
>
> On the authentication side, I had high hopes for LDAP, but haven't been
> able to easily/repeatably make a working LDAP server with databases.  I
> am starting to think more along the lines of a simple database with pam
> modules on the frontend.  See
> http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or
> http://sourceforge.net/projects/pam-mysql/ for examples.

Our set of kerberos 5 kdc's have thus far been able to handle the load
of some 1500 nodes with more still coming.  Plus then we have no
real passwords in the passwd file and thus the security issues
of distributing it are much less critical.

Steve Timm


>
>
>
> On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote:
> > Nscd is a necessary evil sometimes though.
> >
> > -B
> >
> > Brent Clements
> > Linux Technology Specialist
> > Information Technology
> > Rice University
> >
> >
> > On Mon, 2 Feb 2004, Leif Nixon wrote:
> >
> > > Jag <agrajag at dragaera.net> writes:
> > >
> > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> > > >
> > > >> NIS works fine for many purposes as well, but be warned -- in certain
> > > >> configurations and for certain tasks it becomes a very high overhead
> > > >> protocol.  In particular, it adds an NIS hit to every file stat, for
> > > >> example, so that it can check groups and permissions.
> > > >
> > > > A good way around this is to run nscd (Name Services Caching Daemon).
> > >
> > > I'm really, really suspicious against nscd. I've more than once seen
> > > it hang on to stale information forever for no good reason at all.
> > >
> > > --
> > > Leif Nixon                                    Systems expert
> > > ------------------------------------------------------------
> > > National Supercomputer Centre           Linkoping University
> > > ------------------------------------------------------------
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Mon Feb  2 09:07:30 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 02 Feb 2004 09:07:30 -0500
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
	 <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se>
	 <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
Message-ID: <1075730850.3936.19.camel@protein.scalableinformatics.com>

I have tried to avoid NIS on linux, as it appears not to be as stable as
needed under heavy load.  I have had customers bring it crashing down
when it serves login information, just by running simple scripts across
the cluster. 

I prefer pushing name service lookups through DNS, and I tend to use
dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html). 
Setting up a full blown named/bind system for a cluster seems like
significant overkill in most cases.  

On the authentication side, I had high hopes for LDAP, but haven't been
able to easily/repeatably make a working LDAP server with databases.  I
am starting to think more along the lines of a simple database with pam
modules on the frontend.  See
http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or
http://sourceforge.net/projects/pam-mysql/ for examples.  


On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote:
> Nscd is a necessary evil sometimes though.
> 
> -B
> 
> Brent Clements
> Linux Technology Specialist
> Information Technology
> Rice University
> 
> 
> On Mon, 2 Feb 2004, Leif Nixon wrote:
> 
> > Jag <agrajag at dragaera.net> writes:
> >
> > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> > >
> > >> NIS works fine for many purposes as well, but be warned -- in certain
> > >> configurations and for certain tasks it becomes a very high overhead
> > >> protocol.  In particular, it adds an NIS hit to every file stat, for
> > >> example, so that it can check groups and permissions.
> > >
> > > A good way around this is to run nscd (Name Services Caching Daemon).
> >
> > I'm really, really suspicious against nscd. I've more than once seen
> > it hang on to stale information forever for no good reason at all.
> >
> > --
> > Leif Nixon                                    Systems expert
> > ------------------------------------------------------------
> > National Supercomputer Centre           Linkoping University
> > ------------------------------------------------------------
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Mon Feb  2 09:24:25 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Mon, 2 Feb 2004 08:24:25 -0600 (CST)
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <1075730850.3936.19.camel@protein.scalableinformatics.com>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
  <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se> 
 <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
 <1075730850.3936.19.camel@protein.scalableinformatics.com>
Message-ID: <Pine.GSO.4.58.0402020822350.8906@is.rice.edu>

We use ldap extensively here on all of our clusters that IT maintains. We
like it because it allows great flexibility if we need to write web
based account management systems for groups on campus. LDAP is actually
very very easy to implement, especially if you use redhat as your
distribution. We use redhat mostly exclusive here so our setup and
configuration for ldap is pretty cookie-cutter.


-Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Mon, 2 Feb 2004, Joe Landman wrote:

> I have tried to avoid NIS on linux, as it appears not to be as stable as
> needed under heavy load.  I have had customers bring it crashing down
> when it serves login information, just by running simple scripts across
> the cluster.
>
> I prefer pushing name service lookups through DNS, and I tend to use
> dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html).
> Setting up a full blown named/bind system for a cluster seems like
> significant overkill in most cases.
>
> On the authentication side, I had high hopes for LDAP, but haven't been
> able to easily/repeatably make a working LDAP server with databases.  I
> am starting to think more along the lines of a simple database with pam
> modules on the frontend.  See
> http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or
> http://sourceforge.net/projects/pam-mysql/ for examples.
>
>
>
> On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote:
> > Nscd is a necessary evil sometimes though.
> >
> > -B
> >
> > Brent Clements
> > Linux Technology Specialist
> > Information Technology
> > Rice University
> >
> >
> > On Mon, 2 Feb 2004, Leif Nixon wrote:
> >
> > > Jag <agrajag at dragaera.net> writes:
> > >
> > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> > > >
> > > >> NIS works fine for many purposes as well, but be warned -- in certain
> > > >> configurations and for certain tasks it becomes a very high overhead
> > > >> protocol.  In particular, it adds an NIS hit to every file stat, for
> > > >> example, so that it can check groups and permissions.
> > > >
> > > > A good way around this is to run nscd (Name Services Caching Daemon).
> > >
> > > I'm really, really suspicious against nscd. I've more than once seen
> > > it hang on to stale information forever for no good reason at all.
> > >
> > > --
> > > Leif Nixon                                    Systems expert
> > > ------------------------------------------------------------
> > > National Supercomputer Centre           Linkoping University
> > > ------------------------------------------------------------
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Mon Feb  2 09:29:49 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 02 Feb 2004 09:29:49 -0500
Subject: [Beowulf] Authentication within beowulf clusters.
In-Reply-To: <Pine.GSO.4.58.0402020822350.8906@is.rice.edu>
References: <Pine.LNX.4.44.0401311011290.1223-100000@lilith.rgb.private.net>
	 <1075566655.2560.8.camel@loiosh> <m3d68xyea5.fsf@nammatj.nsc.liu.se>
	 <Pine.GSO.4.58.0402020644540.4268@is.rice.edu>
	 <1075730850.3936.19.camel@protein.scalableinformatics.com>
	 <Pine.GSO.4.58.0402020822350.8906@is.rice.edu>
Message-ID: <1075732189.3936.28.camel@protein.scalableinformatics.com>

On Mon, 2004-02-02 at 09:24, Brent M. Clements wrote:
> We use ldap extensively here on all of our clusters that IT maintains. We
> like it because it allows great flexibility if we need to write web
> based account management systems for groups on campus. LDAP is actually
> very very easy to implement, especially if you use redhat as your
> distribution. We use redhat mostly exclusive here so our setup and
> configuration for ldap is pretty cookie-cutter.

I know the clients are rather easy, it is setting up the server that I
found somewhat difficult.  I did go through the howto's, used the RH
packages.  Had some issues I could not find resolution to.  This was
about a year ago.  

I have a nice LDAP server set up with a completely read-only database
now.  I haven't been able to convince it to let clients write (e.g.
password and other changes).  Not sure what I am doing wrong, relatively
sure it is pilot error.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jonbernard at uab.edu  Mon Feb  2 11:46:21 2004
From: jonbernard at uab.edu (Jon B Bernard)
Date: Mon, 2 Feb 2004 10:46:21 -0600
Subject: [Beowulf] HVAC and room cooling...
Message-ID: <92E49C92F9CDBF4EA106E2E7154938830202B1F3@UABEXMB1.ad.uab.edu>

The American Society of Heating, Refrigerating and Air-Conditioning
Engineers (www.ashrae.org) has just released "Thermal Guidelines for
Data Processing Environments". It looks like there's also a summary
available in the January issue of their journal, or online for $8.

Jon

-----Original Message-----
From: Brent M. Clements [mailto:bclem at rice.edu]
Sent: Friday, January 30, 2004 11:18 PM
To: rossini at u.washington.edu
Cc: John Bushnell; beowulf at beowulf.org
Subject: Re: [Beowulf] HVAC and room cooling...


I have found that the best thing to do is outsource the colocation of
your
equipment. The cost of installing and maintaining the proper type of
cooling and ventilation for mid-large size clusters costs more than to
colocate.

We are currently exploring placing our larger clusters in colocation
facilities right now.

The only downside that we have is that we can't find colocation
facilities
that will give us 24/7 physical access to our equipment. As you all
know...researchers push beowulf hardware to the limits and the meantime
to
failure is higher.

-B

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Fri, 30 Jan 2004, A.J. Rossini wrote:

> John Bushnell <bushnell at chem.ucsb.edu> writes:
>
> > (So many watts) times 'x' equals how many "tons" of AC.  Multiply
> > by at least two of course ;-) 
>
> Or 3, sigh...
>
> >>Also, does anyone have any brilliant thoughts for cooling an
internal
> >>room that can't affordably get chilled water?  (I've been suggesting
> >>to people that it isn't possible, but someone brought up "portable
> >>liquid nitrogen" -- for the room, NOT for overclocking -- I'm trying
> >>to get stable systems, not instability :-).
> >
> >   You can have an external heat exchanger.  If you are lucky and
are,
> > say, on the first floor somewhere close to an external wall, it is
> > pretty simple to run a small pipe between the internal AC and the
> > heat exchanger outside.  Don't know how far it is practical to run
> > one though.  We have one in our computer room, but it is only six
> > feet or so from the exchanger outside.  Our newer AC runs on chilled
> > water which was quoted for a lot less than another inside/outside
> > combo, but we already had a leftover chilled water supply in the
> > computer room.
>
> I've looked at the chilled-water approach.  They estimated between
> $40k-$80k.  oops (this room is REALLY in the middle of the building.
> Great for other computing purposes, but not for cooling).
>
> I'm looking for the proverbial vent-free A/C.  Sort of like
> frictionless tables and similar devices I recall from undergraduate
> physics...
>
> Thanks for the comments!
>
> best,
> -tony
>
> --
> rossini at u.washington.edu
http://www.analytics.washington.edu/
> Biomedical and Health Informatics   University of Washington
> Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research
Center
> UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
> FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email
>
> CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be
> confidential and privileged. If you received this message in error,
> please destroy it and notify the sender. Thank you.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Eckhoff.Peter at epamail.epa.gov  Mon Feb  2 16:27:25 2004
From: Eckhoff.Peter at epamail.epa.gov (Eckhoff.Peter at epamail.epa.gov)
Date: Mon, 02 Feb 2004 16:27:25 -0500
Subject: [Beowulf] Re: HVAC and room cooling...
Message-ID:  <OF3CA7E126.F8954003-ON85256E2E.0070752C-85256E2E.0076A971@epamail.epa.gov>


We have 3 - 16 hard drive file servers, 13 compute nodes and a master
unit.  We
had to spread the load from 3 to 4 - 20 Amp circuits to keep from
popping
circuit breakers.

We have AC coming into an interior room and experienced several
problems.

Problem 1:  There was no adequate exhaust system.  5 active vents in , 1
passive
    vent out and in the wrong location.

Solution:  We substituted in several grates in place of acoustic tiles.
     The heat is vented up into the plenum above.  There are fans atop
the rack
    venting the interior of the rack into one of the grates above.  The
other heat follows.

Problem 2:  What do you do when the AC stops?  Maintenance and the
    occasional AC system oops can be devastating to a cluster in a small
room.

Solution 2a:  We are tied directly into a security system.  When a
sensor in the room
   reaches a temperature level, "Security" responds dependent upon the
level
   detected.

Solution 2b:  We installed a backup automated telephone dialer.  Not
that we don't
   trust "Security", but we wanted a backup to let us know what was
going on.
   When the temperature reaches a certain level, the phone dials us with
an
   automated message:
   " This is the Sensaphone 1108.  The time is 1:36 AM and ...
   [ ed.  your CPUs are about to fry... Have a nice night!!!"  ;-)  ]

Solution 2c:  Install a thermal sensor into a serial or tcp/ip socket.
Some vendors
   have software that read these sensors and will shut down the
machines.  We are
   still working on our system.  Others' experiences and solutions are
welcomed.
   We are using dual Tyan motherboards with dual AMD MP processors.

Good luck!!

Peter

*******************************************
Peter Eckhoff
Environmental Scientist
U.S. Environmental Protection Agency
4930 Page Road, D243-01
Research Triangle Park, NC 27709

Tel: (919) 541-5385
Fax: (919) 541-0044
E-mail: eckhoff.peter at epa.gov
Website:  www.epa.gov/scram001

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Mon Feb  2 19:56:33 2004
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 02 Feb 2004 16:56:33 -0800
Subject: [Beowulf] Re: HVAC and room cooling...
Message-ID: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov>

At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote:


>Problem 2:  What do you do when the AC stops?  Maintenance and the 
>occasional AC system oops can be devastating to a cluster in a small room.
>
>Solution 2a:  We are tied directly into a security system.  When a
>sensor in the room reaches a temperature level, "Security" responds 
>dependent upon the
>level detected.
>
>Solution 2b:  We installed a backup automated telephone dialer.  Not
>that we don't trust "Security", but we wanted a backup to let us know what was
>going on.
>    When the temperature reaches a certain level, the phone dials us with
>an
>    automated message:
>    " This is the Sensaphone 1108.  The time is 1:36 AM and ...
>    [ ed.  your CPUs are about to fry... Have a nice night!!!"  ;-)  ]

YOu need to seriously consider a "failsafe" totally automated shutdown (as 
in chop the power when temperature gets to, say, 40C, in the room)... 
Security might be busy (maybe there was a big problem with the chiller 
plant catching fire or the boiler exploding.. if they're directing fire 
engine traffic, the last thing they're going to be thinking about is going 
over to your machine room and shutting down your hardware.

The autodialer is nice, but, what if you're out of town when the balloon 
goes up?

A simple temperature sensor with a contact closure wired into the "shunt 
trip" on your power distribution will work quite nicely as a "kill it 
before it melts". Sure, the file system will be corrupted, and so forth, 
but, at least, you'll have functioning hardware to rebuild it on.

Automated monitoring and tcp sockets are nice for management in the day to 
day situation, ideal for answering questions like: Should we get another 
fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if 
there's a DDoS attack on someone near you, and netops decides to shut down 
the router. What if all those Windows desktops run amok, sending mass 
emails to each other or trying to remotely manage each other's IIS, 
bringing the network to a grinding halt.

The upshot is: Do not trust computers to save your computers in the 
ultimate extreme.  Have a totally separate, bulletproof system.  It's 
cheap, it's reliable, all that stuff.


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Tue Feb  3 08:40:25 2004
From: gropp at mcs.anl.gov (William Gropp)
Date: Tue, 03 Feb 2004 07:40:25 -0600
Subject: [Beowulf] O'Reilly's _Building Linux Clusters_
In-Reply-To: <20040203125618.GA6026@mikee.ath.cx>
References: <20040203125618.GA6026@mikee.ath.cx>
Message-ID: <6.0.0.22.2.20040203073727.02614538@localhost>

At 06:56 AM 2/3/2004, Mike Eggleston wrote:
>This book from 2000 discusses building clusters from linux. I
>bought it from a discount store not because I'm going to build
>another cluster from linux, but rather because of the discussions
>on cluster management. Has anyone read/implemented his approach?
>What other cluster management techniques/solutions are out there?

Beowulf Cluster Computing With Linux, 2nd edition (MIT Press) includes 
chapters on cluster setup and cluster management (new in the 2nd 
edition).  Disclaimer: I'm one of the editors of this book.

Bill 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Tue Feb  3 09:05:07 2004
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Tue, 3 Feb 2004 08:05:07 -0600
Subject: [Beowulf] O'Reilly's _Building Linux Clusters_
In-Reply-To: <6.0.0.22.2.20040203073727.02614538@localhost>
References: <20040203125618.GA6026@mikee.ath.cx> <6.0.0.22.2.20040203073727.02614538@localhost>
Message-ID: <20040203140507.GB6026@mikee.ath.cx>

On Tue, 03 Feb 2004, William Gropp wrote:

> At 06:56 AM 2/3/2004, Mike Eggleston wrote:
> >This book from 2000 discusses building clusters from linux. I
> >bought it from a discount store not because I'm going to build
> >another cluster from linux, but rather because of the discussions
> >on cluster management. Has anyone read/implemented his approach?
> >What other cluster management techniques/solutions are out there?
> 
> Beowulf Cluster Computing With Linux, 2nd edition (MIT Press) includes 
> chapters on cluster setup and cluster management (new in the 2nd 
> edition).  Disclaimer: I'm one of the editors of this book.
> 
> Bill 
> 
> 

I have the 1st edition and it does have a chapter discussing
some of the management. How would this method scale to managing
a (not really a cluster) group of AIX servers?

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Eckhoff.Peter at epamail.epa.gov  Tue Feb  3 09:26:37 2004
From: Eckhoff.Peter at epamail.epa.gov (Eckhoff.Peter at epamail.epa.gov)
Date: Tue, 03 Feb 2004 09:26:37 -0500
Subject: [Beowulf] Re: HVAC and room cooling...
Message-ID:  <OFD63F60DF.CB249E13-ON85256E2F.004BD6FB-85256E2F.0050232D@epamail.epa.gov>


Hello Jim

The main goal for us is to stay up and running as long as we can.

 (Please read the last paragraph before responding to this one:)
Most of our temperature problems have been caused by AC maintenance
induced
temperature spikes.  Having "security" open the doors slows the room
heating
process.  The Sensaphone call to us helps us to know that there is a
problem
and we can phone in to be briefed.  "Do we have to come in or has the
room already
begun to cool?"

The last of the Solutions is for just the type of incident that you
describe.  These are
very rare but like you say, they need to be planned for.  Our ideal goal
would be one
that signals a problem to the cluster.  The cluster takes the signal and
gracefully shuts
down the programs and then shuts down the nodes.  We did not find such a
solution on
the commercial market for our "came with the room" UPS.

Instead we found a sensor/software combination where the sensor ties
into the
serial port of one of the nodes.  So far we **have** been able to
gracefully shut down the
programs that are running.  We have **not** found a way to automatically
turn off the
various cluster nodes.  That's where we need some help/suggestions.


*******************************************
Peter Eckhoff
Environmental Scientist
U.S. Environmental Protection Agency
4930 Page Road, D243-01
Research Triangle Park, NC 27709

Tel: (919) 541-5385
Fax: (919) 541-0044
E-mail: eckhoff.peter at epa.gov
Website:  www.epa.gov/scram001


                      Jim Lux                                                                                                          
                      <James.P.Lux at jpl.        To:       Peter Eckhoff/RTP/USEPA/US at EPA, beowulf at scyld.com                             
                      nasa.gov>                cc:                                                                                     
                                               Subject:  Re: [Beowulf] Re: HVAC and room cooling...                                    
                      02/02/04 07:56 PM                                                                                                
                                                                                                                                       
                                                                                                                                       
At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote:


>Problem 2:  What do you do when the AC stops?  Maintenance and the
>occasional AC system oops can be devastating to a cluster in a small
room.
>
>Solution 2a:  We are tied directly into a security system.  When a
>sensor in the room reaches a temperature level, "Security" responds
>dependent upon the
>level detected.
>
>Solution 2b:  We installed a backup automated telephone dialer.  Not
>that we don't trust "Security", but we wanted a backup to let us know
what was
>going on.
>    When the temperature reaches a certain level, the phone dials us
with
>an
>    automated message:
>    " This is the Sensaphone 1108.  The time is 1:36 AM and ...
>    [ ed.  your CPUs are about to fry... Have a nice night!!!"  ;-)  ]

YOu need to seriously consider a "failsafe" totally automated shutdown
(as
in chop the power when temperature gets to, say, 40C, in the room)...
Security might be busy (maybe there was a big problem with the chiller
plant catching fire or the boiler exploding.. if they're directing fire
engine traffic, the last thing they're going to be thinking about is
going
over to your machine room and shutting down your hardware.

The autodialer is nice, but, what if you're out of town when the balloon

goes up?

A simple temperature sensor with a contact closure wired into the "shunt

trip" on your power distribution will work quite nicely as a "kill it
before it melts". Sure, the file system will be corrupted, and so forth,

but, at least, you'll have functioning hardware to rebuild it on.

Automated monitoring and tcp sockets are nice for management in the day
to
day situation, ideal for answering questions like: Should we get another

fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if

there's a DDoS attack on someone near you, and netops decides to shut
down
the router. What if all those Windows desktops run amok, sending mass
emails to each other or trying to remotely manage each other's IIS,
bringing the network to a grinding halt.

The upshot is: Do not trust computers to save your computers in the
ultimate extreme.  Have a totally separate, bulletproof system.  It's
cheap, it's reliable, all that stuff.


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From grid at iki.fi  Tue Feb  3 09:26:53 2004
From: grid at iki.fi (Michael Kustaa Gindonis)
Date: Tue, 3 Feb 2004 16:26:53 +0200
Subject: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset
In-Reply-To: <200402021546.i12Fk4h24131@NewBlue.scyld.com>
References: <200402021546.i12Fk4h24131@NewBlue.scyld.com>
Message-ID: <200402031626.53453.grid@iki.fi>

Hi,

I noticed in the Linux kernel configuration that there is support for LSI's 
Fusion-MPT chipset. Also, it is possible to run MPI over this.

Do any readers  of this list have any experiences in this area?

Knowledge about LSI's plans to support this chipset in the future?

... Mike 


On Monday 02 February 2004 17:46, beowulf-request at scyld.com wrote:
> Send Beowulf mailing list submissions to
> 	beowulf at beowulf.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://www.beowulf.org/mailman/listinfo/beowulf
> or, via email, send a message with subject or body 'help' to
> 	beowulf-request at beowulf.org
>
> You can reach the person managing the list at
> 	beowulf-admin at beowulf.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Beowulf digest..."
>
>
> Today's Topics:
>
>    1. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Mark Hahn)
>    2. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Karen Shaeffer)
>    3. unsubscribe universe beowulf at beowulf.org (Ryan Kastrukoff)
>    4. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Joe Landman)
>    5. Re: HVAC and room cooling... (Toon Moene)
>    6. Re: HVAC and room cooling... (Robert G. Brown)
>    7. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (=?big5?q?Andrew=20Wang?=)
>    8. Re: Authentication within beowulf clusters. (Leif Nixon)
>    9. Re: Authentication within beowulf clusters. (Brent M. Clements)
>   10. Re: Authentication within beowulf clusters. (Joe Landman)
>   11. Re: Authentication within beowulf clusters. (Brent M. Clements)
>   12. Re: Authentication within beowulf clusters. (Joe Landman)
>   13. Re: Authentication within beowulf clusters. (Steven Timm)
>
> --__--__--
>
> Message: 1
> Date: Sun, 1 Feb 2004 12:50:44 -0500 (EST)
> From: Mark Hahn <hahn at physics.mcmaster.ca>
> To: Per Lindstrom <klamman.gard at telia.com>
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
>
> > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
> > Chipset: Intel 7505
> > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
> > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB
>
> all extremely mundane and FULLY supported.
>
> > Graph: GeForce FX 5200 128MB
>
> bzzt.  take it out, try again.  don't even *think* about loading the
> binary nvidia driver.
>
> > The SMP support works fine all the way up to kernel 2.4.22 but when
> > there is stop for the XEON.
>
> needless to say, 2.6 has been extensively tested on xeons, and it works
> fine. your problem is specific to your config.
>
> if you want help, you'll have to start by describing how it fails.
>
>
> --__--__--
>
> Message: 2
> Date: Sun, 1 Feb 2004 10:17:15 -0800
> From: Karen Shaeffer <shaeffer at neuralscape.com>
> To: Per Lindstrom <klamman.gard at telia.com>
> Cc: "Andrew M.A. Cater" <amacater at galactic.demon.co.uk>,
> beowulf at beowulf.org Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs
> amd64
>
> On Sun, Feb 01, 2004 at 01:30:52PM +0100, Per Lindstrom wrote:
> > I have experienced some problems to compile SMP support for the
> > 2.6.1-kernel on my Intel Xeon based workstation:
> > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz
> > Chipset: Intel 7505
> > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB
> > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB
> > Graph: GeForce FX 5200 128MB
> >
> > The SMP support works fine all the way up to kernel 2.4.22 but when
> > there is stop for the XEON.
>
> I am compiling linux-2.6.2-rc2 on dual XEONs with no problems. The kernels
> actually run too. I'm just starting performance testing, but results are
> very promising.
>
> Thanks,
> Karen
>
> > The SMP support works fine for the Intel Tualatin workstation all the
> > way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to
> > build a 2.6.0.
> >
> > Please advice if some one have solved this problem.
> >
> > Best regards
> > Per Lindstrom
> > .
> > .
> >
> > Andrew M.A. Cater wrote:
> > >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote:
> > >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html
> > >>
> > >>2.6 looks very promising, wondering when distributions
> > >>will include it.
> > >
> > >Debian unstable does today.  The new installer for the next release
> > >of Debian (currently Debian testing) which is in beta test may well
> > >include a 2.6 kernel option.
> > >
> > >>Also ia64 performance looks bad when compared to Xeon
> > >>or amd64. Intel switching to amd64 is a good choice
> > >>;->
> > >
> > >Newsflash: Severe weather means Hell freezes over, preventing flying
> > >pigs from taking off :)
> > >
> > >IIRC: Since you seem well aware of SPBS / storm - is the newest storm
> > >release fully free / GPL'd such that I can use it anywhere?
> > >
> > >Thanks,
> > >
> > >Andy
> > >_______________________________________________
> > >Beowulf mailing list, Beowulf at beowulf.org
> > >To change your subscription (digest mode or unsubscribe) visit
> > >http://www.beowulf.org/mailman/listinfo/beowulf
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
> ---end quoted text---
>
> --
>  Karen Shaeffer
>  Neuralscape, Palo Alto, Ca. 94306
>  shaeffer at neuralscape.com  http://www.neuralscape.com
>
> --__--__--
>
> Message: 3
> From: "Ryan Kastrukoff" <poobah_99 at hotmail.com>
> To: beowulf at beowulf.org
> Date: Sun, 01 Feb 2004 11:24:03 -0800
> Subject: [Beowulf] unsubscribe universe beowulf at beowulf.org
>
>
>
> _________________________________________________________________
> The new MSN 8: smart spam protection and 2 months FREE*
> http://join.msn.com/?page=features/junkmail
> http://join.msn.com/?page=dept/bcomm&pgmarket=en-ca&RU=http%3a%2f%2fjoin.ms
>n.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca
>
>
> --__--__--
>
> Message: 4
> Date: Sun, 01 Feb 2004 14:33:03 -0500
> From: Joe Landman <landman at scalableinformatics.com>
> To: "Andrew M.A. Cater" <amacater at galactic.demon.co.uk>
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
>
> Andrew M.A. Cater wrote:
> >>Also ia64 performance looks bad when compared to Xeon
> >>or amd64. Intel switching to amd64 is a good choice
> >>;->
> >
> >Newsflash: Severe weather means Hell freezes over, preventing flying
> >pigs from taking off :)
>
> Note:   http://www.hometownvalue.com/hell.htm   which is zip code 48169
> According to weather.com, this zip code is about 27 F right now.  As 32
> F is officially "freezing over", we can with all accuracy note that
> indeed, Hell (MI) has frozen over.
>
> Note 2:  It was quite a bit colder last week and up to yesterday where
> southeast Michigan was hovering in the low negative/positive single
> digits in degrees F.   We shouldn't complain as the folks in Minnesota
> have not seen the high side of 0 very much recently.
>
> As for the aerodynamic porcine units, you are on your own.
>
> Joe
>
> --__--__--
>
> Message: 5
> Date: Sun, 01 Feb 2004 16:37:37 +0100
> From: Toon Moene <toon at moene.indiv.nluug.nl>
> Organization: Moene Computational Physics, Maartensdijk, The Netherlands
> To: gerry.creager at tamu.edu
> CC: Pfenniger Daniel <daniel.pfenniger at obs.unige.ch>,
>    Per Lindstrom <klamman.gard at telia.com>,
>    John Hearns <john.hearns at clustervision.com>, rossini at u.washington.edu,
>    beowulf at beowulf.org
> Subject: Re: [Beowulf] HVAC and room cooling...
>
> Gerry Creager (N5JXS) wrote:
> > That's the end of gas exchange physiology I.  There will be a short quiz
> > Monday.  We'll continue with the next module.  I encourage everyone to
> > have read the Pulmonary Medicine chapters in Harrison's for the next
> > lecture.
>
> Hmmm, I won't hold my breath on that one :-)
>
> --
> Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
> GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction)
>
>
> --__--__--
>
> Message: 6
> Date: Sun, 1 Feb 2004 15:53:54 -0500 (EST)
> From: "Robert G. Brown" <rgb at phy.duke.edu>
> To: Toon Moene <toon at moene.indiv.nluug.nl>
> Cc: gerry.creager at tamu.edu, Pfenniger Daniel
> <daniel.pfenniger at obs.unige.ch>, Per Lindstrom <klamman.gard at telia.com>,
>    John Hearns <john.hearns at clustervision.com>, <rossini at u.washington.edu>,
>    <beowulf at beowulf.org>
> Subject: Re: [Beowulf] HVAC and room cooling...
>
> On Sun, 1 Feb 2004, Toon Moene wrote:
> > Gerry Creager (N5JXS) wrote:
> > > That's the end of gas exchange physiology I.  There will be a short
> > > quiz Monday.  We'll continue with the next module.  I encourage
> > > everyone to have read the Pulmonary Medicine chapters in Harrison's for
> > > the next lecture.
> >
> > Hmmm, I won't hold my breath on that one :-)
>
> Careful or I'll beat you with John's slide rule (what kinda physicist
> uses a slide rule for anything other than a blunt instrument?;-)
>
>    rgb
>
> --
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
>
> --__--__--
>
> Message: 7
> Date: Mon, 2 Feb 2004 10:35:43 +0800 (CST)
> From: =?big5?q?Andrew=20Wang?= <andrewxwang at yahoo.com.tw>
> Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
> To: beowulf at beowulf.org
>
>  --- "Andrew M.A. Cater"
>
> > IIRC: Since you seem well aware of SPBS / storm - is
> > the newest storm
> > release fully free / GPL'd such that I can use it
> > anywhere?
>
> They now call it "torque", not sure when they are
> going to get a new name again :(
>
> Not sure what you mean by "use it anywhere". You can
> use SPBS (yes, I like this name better) in commerical
> environments. If you make modifications to SPBS, you
> need to provide the source code for download.
>
> If you want to modify the source, and sell it as a
> product, you may want to use SGE.
>
> AFAIK, SGE uses a license similar to the BSD, while
> OpenPBS uses a license similar to GPL.
>
> Andrew.
>
>
> -----------------------------------------------------------------
> ?C???? Yahoo!?_??
> ?????C???B?????????B?R?A???????A???b?H??????
> http://tw.promo.yahoo.com/mail_premium/stationery.html
>
> --__--__--
>
> Message: 8
> To: Beowulf Mailing List <beowulf at beowulf.org>
> Subject: Re: [Beowulf] Authentication within beowulf clusters.
> From: Leif Nixon <nixon at nsc.liu.se>
> Date: Mon, 02 Feb 2004 11:19:30 +0100
>
> Jag <agrajag at dragaera.net> writes:
> > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote:
> >> NIS works fine for many purposes as well, but be warned -- in certain
> >> configurations and for certain tasks it becomes a very high overhead
> >> protocol.  In particular, it adds an NIS hit to every file stat, for
> >> example, so that it can check groups and permissions.
> >
> > A good way around this is to run nscd (Name Services Caching Daemon).
>
> I'm really, really suspicious against nscd. I've more than once seen
> it hang on to stale information forever for no good reason at all.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nixon at nsc.liu.se  Tue Feb  3 04:21:24 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Tue, 03 Feb 2004 10:21:24 +0100
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov> (Jim
 Lux's message of "Mon, 02 Feb 2004 16:56:33 -0800")
References: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov>
Message-ID: <m3ad40plgr.fsf@nammatj.nsc.liu.se>

Jim Lux <James.P.Lux at jpl.nasa.gov> writes:

> YOu need to seriously consider a "failsafe" totally automated shutdown
> (as in chop the power when temperature gets to, say, 40C, in the
> room)... Security might be busy (maybe there was a big problem with
> the chiller plant catching fire or the boiler exploding.. if they're
> directing fire engine traffic, the last thing they're going to be
> thinking about is going over to your machine room and shutting down
> your hardware.

Ah, that reminds me of the bad old days in industry.

The A/C went belly up the night between Friday and Saturday. That
triggered the alarm down at Security, who promptly called the on-duty
ventilation technicians and notified us. Excellent.

Except that the A/C alarm was never reset properly, so when the A/C
failed again Saturday afternoon nobody noticed.

When the temperature reached 35C, the thermal kill switch triggered
automatically. Pity that the electrician had never got around to
actually, like, *wire* it to anything.

We arrived Monday morning to the smell of frying electronics.
Expensive weekend, that.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From verycoldpenguin at hotmail.com  Tue Feb  3 11:24:18 2004
From: verycoldpenguin at hotmail.com (Gareth Glaccum)
Date: Tue, 03 Feb 2004 16:24:18 +0000
Subject: [Beowulf] Re: HVAC and room cooling...
Message-ID: <Law15-F58TXipOejTVs000394e5@hotmail.com>

We sell solutions with automated power-off scripts upon node overheat using 
some of the APC products controlled from a linux master. Not that particular 
unit though.
Gareth


>From: Joshua Baker-LePain <jlb17 at duke.edu>
>To: Eckhoff.Peter at epamail.epa.gov
>CC: beowulf at scyld.com
>Subject: Re: [Beowulf] Re: HVAC and room cooling...
>Date: Tue, 3 Feb 2004 10:11:32 -0500 (EST)
>
>On Tue, 3 Feb 2004 at 9:26am, Eckhoff.Peter at epamail.epa.gov wrote
>
> > Instead we found a sensor/software combination where the sensor ties
> > into the
> > serial port of one of the nodes.  So far we **have** been able to
> > gracefully shut down the
> > programs that are running.  We have **not** found a way to automatically
> > turn off the
> > various cluster nodes.  That's where we need some help/suggestions.
>
>Well, your high-temperature-triggered scripts should call a 'shutdown -h
>now'.  *If* your nodes are on motherboards that support it, and *if* the
>BIOS is new enough to support it, and *if* the nodes were booted with
>'apm=power-off' on the kernel command line, then they should actually
>power off.
>
>Another option would be something like this:
>
>http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP7960
>
>With that (ungodly expensive) power strip, you can remotely cut the power
>to selected outlets.  It probably can be automated, but you'd have to
>check that.
>
>As Jim said, though, all this is great, but there really does need to be
>one final level of hardware level failsafe.  It is entirely conceivable
>that all your software monitoring could fail, and the temperature will
>still be climbing.  There needs to be a piece of hardware in the room that
>literally cuts power to the whole damn room at a set temperature that is
>(obviously) above the one that trips your software shutdown scripts.
>
>--
>Joshua Baker-LePain
>Department of Biomedical Engineering
>Duke University
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

_________________________________________________________________
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rmiguel at usmp.edu.pe  Tue Feb  3 12:06:50 2004
From: rmiguel at usmp.edu.pe (Richard Miguel)
Date: Tue, 3 Feb 2004 12:06:50 -0500
Subject: [Beowulf] about cluster's tunning
References: <200402021546.i12Fk4h24131@NewBlue.scyld.com> <200402031626.53453.grid@iki.fi>
Message-ID: <015f01c3ea78$24daccc0$1101000a@cpn.senamhi.gob.pe>

Hi.. i have a cluster with 27 nodes PIV Intel .. I have installed a model
for climate forecast. My question is how i can improvement the performance
of my cluster.. there is techniques for tunning of clusters througth
operative system or network hardware?.

thanks for yours anwers.. and suggests..

R. Miguel

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Tue Feb  3 13:12:24 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Tue, 3 Feb 2004 13:12:24 -0500 (EST)
Subject: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset
In-Reply-To: <200402031626.53453.grid@iki.fi>
Message-ID: <Pine.LNX.4.44.0402031256030.12463-100000@coffee.psychology.mcmaster.ca>

> I noticed in the Linux kernel configuration that there is support for LSI's 
> Fusion-MPT chipset. Also, it is possible to run MPI over this.

huh?  afaikt, it's just another overly expensive, overly complicated hw raid
controller.  I guess there must be a market for this kind of wrongheaded
crap, but I really don't understand it.

I guess it's just the impulse to offload whatever possible from the host;
that's an understandable idea, but you really need to look at whether it
makes sense, or whether it's just a holdover from bygone days when your 
million-dollar mainframe was actually compute-bound ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb  3 23:01:17 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 4 Feb 2004 15:01:17 +1100
Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64
In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com>
Message-ID: <200402041501.19592.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 1 Feb 2004 04:39 pm, Andrew Wang wrote:

> 2.6 looks very promising, wondering when distributions
> will include it.

Mandrake 10 will include it (beta 2 just appeared with 2.6.2rc3 - they reckon 
the final 2.6.2 will make the release of Mdk10).

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAIG6NO2KABBYQAh8RAlCaAJ9Y5LKBLZQjGvCJCzO7ViuwZMGFiQCePiI+
Q2x2XGPUUWKYDT2nRv/5DHI=
=S0ef
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb  4 08:17:30 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 4 Feb 2004 08:17:30 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <m3ad40plgr.fsf@nammatj.nsc.liu.se>
Message-ID: <Pine.LNX.4.44.0402040811110.5431-100000@lilith.rgb.private.net>

On Tue, 3 Feb 2004, Leif Nixon wrote:

> Jim Lux <James.P.Lux at jpl.nasa.gov> writes:
> 
> > YOu need to seriously consider a "failsafe" totally automated shutdown
> > (as in chop the power when temperature gets to, say, 40C, in the
> > room)... Security might be busy (maybe there was a big problem with
> > the chiller plant catching fire or the boiler exploding.. if they're
> > directing fire engine traffic, the last thing they're going to be
> > thinking about is going over to your machine room and shutting down
> > your hardware.
> 
> Ah, that reminds me of the bad old days in industry.
> 
> The A/C went belly up the night between Friday and Saturday. That
> triggered the alarm down at Security, who promptly called the on-duty
> ventilation technicians and notified us. Excellent.
> 
> Except that the A/C alarm was never reset properly, so when the A/C
> failed again Saturday afternoon nobody noticed.
> 
> When the temperature reached 35C, the thermal kill switch triggered
> automatically. Pity that the electrician had never got around to
> actually, like, *wire* it to anything.
> 
> We arrived Monday morning to the smell of frying electronics.
> Expensive weekend, that.

Did you ever manage to track down the electrician and put bamboo slivers
underneath his toenails or something?  That one seems like it would be
worth some sort of retaliation.  A small nuclear device planted in his
front lawn.  An anonymous call to the IRS.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Wed Feb  4 17:34:21 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Wed, 4 Feb 2004 17:34:21 -0500 (EST)
Subject: [Beowulf] about cluster's tunning
In-Reply-To: <015f01c3ea78$24daccc0$1101000a@cpn.senamhi.gob.pe>
Message-ID: <Pine.LNX.4.44.0402041732220.28584-100000@boltzmann.basement-supercomputing.com>


You may want to look at the online course mentioned here:

http://www.clusterworld.com/article.pl?sid=03/11/12/1919210&mode=thread&tid=10


Doug

On Tue, 3 Feb 2004, Richard Miguel wrote:

> Hi.. i have a cluster with 27 nodes PIV Intel .. I have installed a model
> for climate forecast. My question is how i can improvement the performance
> of my cluster.. there is techniques for tunning of clusters througth
> operative system or network hardware?.
> 
> thanks for yours anwers.. and suggests..
> 
> R. Miguel
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nixon at nsc.liu.se  Wed Feb  4 15:08:04 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Wed, 04 Feb 2004 21:08:04 +0100
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <Pine.LNX.4.44.0402040811110.5431-100000@lilith.rgb.private.net> (Robert
 G. Brown's message of "Wed, 4 Feb 2004 08:17:30 -0500 (EST)")
References: <Pine.LNX.4.44.0402040811110.5431-100000@lilith.rgb.private.net>
Message-ID: <m3ektaobff.fsf@nammatj.nsc.liu.se>

"Robert G. Brown" <rgb at phy.duke.edu> writes:

> On Tue, 3 Feb 2004, Leif Nixon wrote:
>
>> When the temperature reached 35C, the thermal kill switch triggered
>> automatically. Pity that the electrician had never got around to
>> actually, like, *wire* it to anything.
>> 
>> We arrived Monday morning to the smell of frying electronics.
>> Expensive weekend, that.
>
> Did you ever manage to track down the electrician and put bamboo slivers
> underneath his toenails or something?

Sadly, no. And don't get me started on luser electricians.

"Ooops, did that feed go to the computer room?"

"Hmmm, what's on this circuit? Let's toggle it and see what reboots."
(Yes, it really happened. I don't often shout at people, but that time...)

Dropping a fine gauge wire across the main power rails was an
interesting stunt, too. Too bad he didn't even get flash burns.


I think the main point here is: If you get hold of a competent
electrician, take *real* good care of him.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From waitt at saic.com  Thu Feb  5 07:41:24 2004
From: waitt at saic.com (Tim Wait)
Date: Thu, 05 Feb 2004 07:41:24 -0500
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <m3ektaobff.fsf@nammatj.nsc.liu.se>
References: <Pine.LNX.4.44.0402040811110.5431-100000@lilith.rgb.private.net> <m3ektaobff.fsf@nammatj.nsc.liu.se>
Message-ID: <402239F4.8030304@saic.com>


> Dropping a fine gauge wire across the main power rails was an
> interesting stunt, too. Too bad he didn't even get flash burns.

How about an electrician, who, while working on your building
power conditioning, sends 180V through your 120V building,
frying everything not on UPS?

We were not amused.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Thu Feb  5 11:23:13 2004
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Thu, 5 Feb 2004 11:23:13 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <402239F4.8030304@saic.com>
Message-ID: <Pine.LNX.4.44.0402051114500.9928-100000@ra.thebes>

On Thu, 5 Feb 2004, Tim Wait wrote:

> 
> > Dropping a fine gauge wire across the main power rails was an
> > interesting stunt, too. Too bad he didn't even get flash burns.
> 
> How about an electrician, who, while working on your building
> power conditioning, sends 180V through your 120V building,
> frying everything not on UPS?
> 
> We were not amused.

Oh, give the guy a break:  Red, Black, White...it is all very confusing!

My most serious problem has been with the computer room UPS begin shutdown
accidentally, dropping a half-dozen raid servers.  Many TBs of data were
endangered.  I might be able to forgive them if it only happened once, but
I've needed to force myself to stop counting events because doing so
interferes with my ability to properly suppress homocidal urges.

Seriously, one would think that a Darwinian effect would kick in at some 
point and cull the electrical service hurd.  My observations (and others 
here as well) seem to dispute that.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From michael.gindonis at hip.fi  Thu Feb  5 12:27:20 2004
From: michael.gindonis at hip.fi (Michael Gindonis)
Date: Thu, 5 Feb 2004 19:27:20 +0200
Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs
In-Reply-To: <200402041702.i14H2Jh03108@NewBlue.scyld.com>
References: <200402041702.i14H2Jh03108@NewBlue.scyld.com>
Message-ID: <200402051927.21214.michael.gindonis@hip.fi>

On Wednesday 04 February 2004 19:02, beowulf-request at scyld.com wrote:
> From: Mark Hahn <hahn at physics.mcmaster.ca>
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset
>
> > I noticed in the Linux kernel configuration that there is support for
> > LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this.
>
> huh? ?afaikt, it's just another overly expensive, overly complicated hw
> raid controller. ?I guess there must be a market for this kind of
> wrongheaded crap, but I really don't understand it.

Hi Mark, 

When purchasing a cluster or cluster hardware, one can spend as little as 20 
Euro ( ~30 CAD) per node on interconnects to more than 1000 Euro per node for 
Myrinet or Scali.

The Fusion-MPT chipset adds about 100 Euro to the cost of a motherboard. 100 
Euro per node is much eaier to justify than 1000 Euro per node when the 
Cluster when the cluster will not be primarly running tighly coupled parallel 
problems. If the performance of MPI of Fusion-MPT is much better than than 
Ethernet with good latency, it becomes a cheap way to add flexibilty to a 
cluster.

Here is some info about it the Chipset... 

http://www.lsilogic.com/files/docs/marketing_docs/storage_stand_prod/
integrated_circuits/fusion.pdf

http://www.lsilogic.com/technologies/lsi_logic_innovations/
fusion___mpt_technology.html

There is also information in the in the linux kernel documentation about 
running MPI over this kind of interconnect.

... Mike
--
Michael Kustaa Gindonis
Helsinki Institute of Physics, Technology Program
michael.gindonis at hip.fi

http://wikihip.cern.ch/twiki/bin/view/Main/MichaelGindonis

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Feb  5 21:12:58 2004
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 5 Feb 2004 18:12:58 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402051114500.9928-100000@ra.thebes>
Message-ID: <Pine.LNX.3.96.1040205181040.5420A-100000@Maggie.Linux-Consulting.com>


hi ya

On Thu, 5 Feb 2004, Michael T. Prinkey wrote:

> On Thu, 5 Feb 2004, Tim Wait wrote:
> 
> > How about an electrician, who, while working on your building
> > power conditioning, sends 180V through your 120V building,
> > frying everything not on UPS?
> > 
> > We were not amused.
> 
> Oh, give the guy a break:  Red, Black, White...it is all very confusing!

dont forget blue and green too ... 
	- fun to disconnect the wires at the main 
	and move wires around ... while the bldg is "lit"

i think its crazy that the "nuetral" side is tied together
at the panel .. but the outlets in the building seems to work ..

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at yahoo.com  Thu Feb  5 22:07:18 2004
From: lathama at yahoo.com (Andrew Latham)
Date: Thu, 5 Feb 2004 19:07:18 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <Pine.LNX.4.44.0402051114500.9928-100000@ra.thebes>
Message-ID: <20040206030718.46964.qmail@web60310.mail.yahoo.com>

Trained Electrician Here

Worked at a HVAC system fab. plant. I wired large Air Make Up Units. I was
trained in by a very old school guy (CS degree from 1962). I watched as
turnovers in workers happened and started to notice the lower paid 
guys that would work on 480V extension cords while they where hot with 60hp
motors drawing on them! I strayed from that path for a while until a friend,
who was handed the task of managing the renovation of an old building downtown.
She had questions I had time. I ended up finding a retired electrician that
knew his stuff. I asked him how he kept up to date. His reply was that he is on
the writing committee for the National Electric Code. Needless to say I keep in
contact with him on various topics. 

Note: CatV + Lighting = PCs + Fire
Note2: Attic Access doors do not belong in the ceiling of a wiring closet.
Something about fire wanting to go upwards, maybe some of you physics guys can
explain it better.


--- "Michael T. Prinkey" <mprinkey at aeolusresearch.com> wrote:
> On Thu, 5 Feb 2004, Tim Wait wrote:
> 
> > 
> > > Dropping a fine gauge wire across the main power rails was an
> > > interesting stunt, too. Too bad he didn't even get flash burns.
> > 
> > How about an electrician, who, while working on your building
> > power conditioning, sends 180V through your 120V building,
> > frying everything not on UPS?
> > 
> > We were not amused.
> 
> Oh, give the guy a break:  Red, Black, White...it is all very confusing!
> 
> My most serious problem has been with the computer room UPS begin shutdown
> accidentally, dropping a half-dozen raid servers.  Many TBs of data were
> endangered.  I might be able to forgive them if it only happened once, but
> I've needed to force myself to stop counting events because doing so
> interferes with my ability to properly suppress homocidal urges.
> 
> Seriously, one would think that a Darwinian effect would kick in at some 
> point and cull the electrical service hurd.  My observations (and others 
> here as well) seem to dispute that.
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

=====
/------------------------------------------------------------\
Andrew Latham AKA: LATHAMA (lay-th-ham-eh) - LATHAMA.COM

Penguin Loving, Moralist Agnostic.
What Is an agnostic? -  An agnostic thinks it impossible to know
the truth in matters such as, a superbeing or the future with which 
religions are mainly concerned with. Or, if not impossible, at least 
impossible at the present time.

lathama at lathama.com - lathama at yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb  5 23:15:52 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 5 Feb 2004 23:15:52 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.3.96.1040205181040.5420A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0402052314340.8662-100000@lilith.rgb.private.net>

On Thu, 5 Feb 2004, Alvin Oga wrote:

> i think its crazy that the "nuetral" side is tied together
> at the panel .. but the outlets in the building seems to work ..

That's not crazy, that's actually rather sane.  What would be crazy
would be grounding the neutrals and/or ground wire in different places.
Can you say "ground loop"?

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Feb  5 23:26:37 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 5 Feb 2004 23:26:37 -0500 (EST)
Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs
In-Reply-To: <200402051927.21214.michael.gindonis@hip.fi>
Message-ID: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>

> When purchasing a cluster or cluster hardware, one can spend as little as 20 
> Euro ( ~30 CAD) per node on interconnects

or less, actually.  you seem to be thinking of gigabit, which is indeed a 
very attractive cluster interconnect.  otoh, there are lots of even more
loosely-coupled, non-IO-intensive apps that run just fine on 100bT.

> to more than 1000 Euro per node for 
> Myrinet or Scali.

or IB.

> The Fusion-MPT chipset adds about 100 Euro to the cost of a motherboard. 

yes, obviously.  I'd probably rather have another gigabit port or two;
bear in mind that some very elegant things can be done when each node has
multiple network connections...

really, the chipset isn't the point; it's just a $5 coprocessor.  what counts
is coming up with a physical layer, including affordable switches, and
somehow getting millions of people to make/buy them.

> 100 
> Euro per node is much eaier to justify than 1000 Euro per node when the 
> Cluster when the cluster will not be primarly running tighly coupled parallel 
> problems. 

hmm, we've already established that gigabit is much cheaper, and for
loose-coupled systems, chances are good that even 100bT will suffice.

> If the performance of MPI of Fusion-MPT is much better than than 
> Ethernet with good latency,

but does it even exist?  so far, all I can find is two lines on a marketing glossy...

> it becomes a cheap way to add flexibilty to a 
> cluster.

many things could happen; I'm not optimistic about this Fusion-MPT thing.
it seems to fly in the face of "do one thing, well".

> Here is some info about it the Chipset... 
> 
> http://www.lsilogic.com/files/docs/marketing_docs/storage_stand_prod/
> integrated_circuits/fusion.pdf

that's the vapid marketing glossy.

> http://www.lsilogic.com/technologies/lsi_logic_innovations/
> fusion___mpt_technology.html

that is even worse.

> There is also information in the in the linux kernel documentation about 
> running MPI over this kind of interconnect.

I'm not sure what "kind" here means, do you mean over scsi?  the traditional
problem with *-over-scsi (and there have been more than a couple) has been
that scsi interfaces aren't optimized for low-latency.  the bandwidth isn't
that hard, really - 320 MB/s is around Myrinet speed, and significantly
slower than IB.  OK, how about FC?  it's obviously got an advantage over U320
in that FC switches exist (oops, expensive) but it's really just a 1-2 Gb
network protocol with 2k packets.  as for the "high performance ARM-based
architecture" part, well, I must admit that I don't associate ARM with high
performance of the gigabyte-per-second sort.  

personally, I'd love to see sort of the network equivalent of the old
smart-frame-buffer idea.  practically, though, it really boils down to the
gritty details like availability of switches, choosing a physical-layer
standard, etc.  gigabit is the obvious winner there, but IB is trying hard
to get over that bump...

(Myri seems not to be very ambitious, and 10G eth seems to be straying into
a morass of tcp-offload and the like...)

regards, mark hahn.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb  5 11:36:39 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 5 Feb 2004 11:36:39 -0500 (EST)
Subject: [Beowulf] wulflogger, wulfstat's dumber cousin...
Message-ID: <Pine.LNX.4.44.0402051129090.13789-100000@ganesh.phy.duke.edu>

On request I've got a second xmlsysd client going called "wulflogger".
Wulflogger is just wulfstat with the ncurses stuff stripped off so that
it manages connections to the xmlsysd's on a cluster, reads them at some
input frequency, and writes selected status data to stdout in a simple
table.  The advantage of this tool is that it makes it really easy to
write web or script or report applications, and it also makes it very
easy to maintain a dynamic logfile of selected statistics for the entire
cluster.

This is and will likely remain a very simple tool.  The only fanciness I
envision for the future is an output descriptor format of some sort that
could be input at run time, so that a user could select output fields
and formats instead of getting the collections I've prebuilt.  That's
pretty complex (especially since wulflogger/wulfstat throttle xmlsysd to
return only the collective stats it needs) so it won't be anytime soon.

Only -t 1 is probably "finished" as output format goes, although -t 0
will probably get mostly cosmetic changes at this point as well.

Anyway, any wulfstat/xmlsysd users might want to grab it and give it a
try.  It makes it pretty simple to write a perl script to generate e.g.
rrd images or other graphical representations of the cluster -- in a
future release I'll provide sample perl scripts for parsing out fields
and doing stuff with it.

It is for the moment only available from my personal website:

 http://www.phy.duke.edu/~rgb/Beowulf/wulflogger.php

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb  3 10:38:56 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 3 Feb 2004 16:38:56 +0100 (CET)
Subject: [Beowulf] O'Reilly's _Building Linux Clusters_
In-Reply-To: <20040203125618.GA6026@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0402031633480.18515-100000@druifje.clustervision.com>

On Tue, 3 Feb 2004, Mike Eggleston wrote:

> This book from 2000 discusses building clusters from linux. I
> bought it from a discount store not because I'm going to build
> another cluster from linux, but rather because of the discussions

Mike, I bought this book almost when it came out.
Its easy to do injustice to someone with a quick email,
especially as David Spector put a lot of effort into the book,
and I haven't.
However, this OReilly is reckoned not to be one of the best.

I always recommend 'Linux Clustering' by Charles Bookman,
and 'Beowulf Cluster Computing with Linux' edited by Thomas Sterling.

Online, there is the book by Bob Brown
http://www.phy.duke.edu/brahma/Resources/beowulf_book.php

For cluster management specifically, google for Rocks and Oscar,
and there are lots of other pages.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Tue Feb  3 07:56:18 2004
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Tue, 3 Feb 2004 06:56:18 -0600
Subject: [Beowulf] O'Reilly's _Building Linux Clusters_
Message-ID: <20040203125618.GA6026@mikee.ath.cx>

This book from 2000 discusses building clusters from linux. I
bought it from a discount store not because I'm going to build
another cluster from linux, but rather because of the discussions
on cluster management. Has anyone read/implemented his approach?
What other cluster management techniques/solutions are out there?

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Tue Feb  3 10:11:32 2004
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Tue, 3 Feb 2004 10:11:32 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <OFD63F60DF.CB249E13-ON85256E2F.004BD6FB-85256E2F.0050232D@epamail.epa.gov>
References: <OFD63F60DF.CB249E13-ON85256E2F.004BD6FB-85256E2F.0050232D@epamail.epa.gov>
Message-ID: <Pine.LNX.4.58.0402031003590.23278@chaos.egr.duke.edu>

On Tue, 3 Feb 2004 at 9:26am, Eckhoff.Peter at epamail.epa.gov wrote

> Instead we found a sensor/software combination where the sensor ties
> into the
> serial port of one of the nodes.  So far we **have** been able to
> gracefully shut down the
> programs that are running.  We have **not** found a way to automatically
> turn off the
> various cluster nodes.  That's where we need some help/suggestions.

Well, your high-temperature-triggered scripts should call a 'shutdown -h 
now'.  *If* your nodes are on motherboards that support it, and *if* the 
BIOS is new enough to support it, and *if* the nodes were booted with 
'apm=power-off' on the kernel command line, then they should actually 
power off.

Another option would be something like this:

http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP7960

With that (ungodly expensive) power strip, you can remotely cut the power 
to selected outlets.  It probably can be automated, but you'd have to 
check that.

As Jim said, though, all this is great, but there really does need to be 
one final level of hardware level failsafe.  It is entirely conceivable 
that all your software monitoring could fail, and the temperature will 
still be climbing.  There needs to be a piece of hardware in the room that 
literally cuts power to the whole damn room at a set temperature that is 
(obviously) above the one that trips your software shutdown scripts.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Wed Feb  4 08:26:55 2004
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 4 Feb 2004 14:26:55 +0100 (CET)
Subject: [Beowulf] Re: HVAC and room cooling...
In-Reply-To: <m3ad40plgr.fsf@nammatj.nsc.liu.se>
Message-ID: <Pine.LNX.4.33.0402041419050.32236-100000@maloney.ethz.ch>

On Tue, 3 Feb 2004, Leif Nixon wrote:
> Ah, that reminds me of the bad old days in industry.

That in turn reminds me of recent construction work around here...

Since the building with our offices and our small server room had to
be renovated, the water-based cooling system for the server room had
to be temporarily replaced with a mobile unit that pumps the heat into
the hallway. The company responsible had no better idea than to
replace the cooling system on friday afternoon -- of course without
telling anybody. As the mobile unit was much too small, the server
room had turned into sauna until monday when we discovered the
problem. Ups.

Luckily no hardware was damaged, even though the sensors in the
hard-disk drives of our server measured a maximum of 47C.

Regards,
Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H16             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From patrick at myri.com  Fri Feb  6 04:17:43 2004
From: patrick at myri.com (Patrick Geoffray)
Date: Fri, 06 Feb 2004 04:17:43 -0500
Subject: [Beowulf] Ambition
In-Reply-To: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>
Message-ID: <40235BB7.6010802@myri.com>

Ah Mark, I could not resist. Actually I could, but the list has been a 
little boring lately, so... :-)

Mark Hahn wrote:
> (Myri seems not to be very ambitious, and 10G eth seems to be straying into
> a morass of tcp-offload and the like...)

Myri is very ambitious, but you can be carefully ambitious or 
marketingly ambitious. Nobody buys an interconnect looking only at the 
specs. People try, benchmark, run their code, rationalize and buy what 
they need at the right price. If you look at what people are doing, 
there is a lot of Ethernet (Fast and GigE) because thats good enough for 
many, many codes. Then there is a smaller market for more demanding 
needs, either in term of performance or scalability, where you want to 
find the sweet spot in the performance/price curve.

Does it make sense to have 10Gb now ? I don't think so, and for several 
reasons:
* PCI-Express is not here yet: It's coming, yes, but it's not available 
in volume. Today, PCI-X supports 1 GB bidirectional, which is 4 Gb link 
speed. It's clearly the bottleneck right now. HyperTransport looks 
attractive, but there is no connector defined yet and vendors should be 
able to see a potential for volume before to commit resources for a 
native HT interface.
* 10 Gb optics are still expensive: price is going down, but there is 
not enough volume yet to drive the price down faster. Copper ? I still 
have nightmares about copper. 10 GigE will drive the technology price 
down as the 10 GigE market blossoms.
* 10 GigE is not attractive enough yet because there is no clear 
improvement at the application level. Running a naive IP stack at 10 Gb 
requires a lot of resources on the host. RDMA is just a buzword, it's 
not The Solution. Storage may leverage RDMA, but not IP and certainly 
not MPI. That's why people are working to put processing on the data 
path, but it is far from obvious so it takes some time.


Gigabit is the clear winner today and 10 GigE will be the clear winner 
tomorrow, because Ethernet is the de facto Standard. Everybody else are 
parasites, either breading on niches or marketing poop...

Patrick

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sigut at id.ethz.ch  Fri Feb  6 06:31:51 2004
From: sigut at id.ethz.ch (G.M.Sigut)
Date: Fri, 6 Feb 2004 12:31:51 +0100 (MET)
Subject: [Beowulf] about cluster's tunning
Message-ID: <200402061131.i16BVpCQ002951@grisnir.ethz.ch>

> Date: Thu, 5 Feb 2004 12:04:04 -0500
> Subject: Beowulf digest, Vol 1 #1657 - 4 msgs
...
> --__--__--
> 
> Message: 1
> Date: Wed, 4 Feb 2004 17:34:21 -0500 (EST)
> From: "Douglas Eadline, Cluster World Magazine" <deadline at linux-mag.com>
> Subject: Re: [Beowulf] about cluster's tunning
> 
> You may want to look at the online course mentioned here:
> 
> http://www.clusterworld.com/article.pl?sid=03/11/12/1919210&mode=thread&tid=10

Oh yeah. Very nice. Especially after you register (for the course)
and are told your browser is no good. There is a page which helps
you to select an approved browser - and that says:

    Unable to detect your operating system.
    
    Please select your operating system:
    
      -> Windows operating system
      -> Mac operating system.

What a pity that I am working on a Sun. (and Linux) ...

George  :-(   (is there a smiley for "I'm going to puke"?)

 >>>>>>>>>>>>>>>>>>>>>>>>>  George M. Sigut  <<<<<<<<<<<<<<<<<<<<<<<<<<<
 ETH Zurich,  Informatikdienste,  Sektion Systemdienste,  CH-8092 Zurich
 Swiss Federal Inst. of Technology,  Computing Services, System Services
 e-mail:  sigut at id.ethz.ch,  Phone: +41 1 632 5763,  Fax: +41 1 632 1022
 >>>> if my regular address does not work, try "sigut at pop.agri.ch"  <<<<

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nathan at iwantka.com  Fri Feb  6 08:52:20 2004
From: nathan at iwantka.com (Nathan Littlepage)
Date: Fri, 6 Feb 2004 07:52:20 -0600
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402052314340.8662-100000@lilith.rgb.private.net>
Message-ID: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com>

> That's not crazy, that's actually rather sane.  What would be crazy
> would be grounding the neutrals and/or ground wire in 
> different places.
> Can you say "ground loop"?
> 


Grounding loops.. truly a bane. I remember one instance where someone
wired a telecommunications switch to two different grounds. The -48v DC
power had it's own ground, and someone had grounded the chassis to a
different feed. I little lesser know fact was the lightning rod on the
tower next to the building was linked to the same ground as the power.
When lightning did strike, nothing but smoke as the charge rolled from
one ground to the other on each bay.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Feb  6 09:30:10 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 6 Feb 2004 09:30:10 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com>
Message-ID: <Pine.LNX.4.44.0402060917520.8662-100000@lilith.rgb.private.net>

On Fri, 6 Feb 2004, Nathan Littlepage wrote:

> > That's not crazy, that's actually rather sane.  What would be crazy
> > would be grounding the neutrals and/or ground wire in 
> > different places.
> > Can you say "ground loop"?
> > 
> 
> 
> Grounding loops.. truly a bane. I remember one instance where someone
> wired a telecommunications switch to two different grounds. The -48v DC
> power had it's own ground, and someone had grounded the chassis to a
> different feed. I little lesser know fact was the lightning rod on the
> tower next to the building was linked to the same ground as the power.
> When lightning did strike, nothing but smoke as the charge rolled from
> one ground to the other on each bay.

There is also a memorable instance of powered racks with incoming two
phase power split into two circuits having a polarity reversal so its
neutral wire on one circuit was 120V above chassic ground and the
neutral on the other circuit.  When somebody plugged a single unit with
components on both lines -- I think it was more like "meltdown and
fire".  Not really a ground loop, of course...

...but plenty of people have been electrocuted or fires started
because there was a lot more resistance on the neutral line to a remote
"ground" than there was to a nice, local, piece of metal.  Basically,
AFAICT there is really nothing in the NEC or CEC that is "stupid".  In
fact, I think that most of the code has undergone a near-Darwinian
selection process, as in electricians who fail to wire to code (and
often their clients) not infrequently fail to reproduce.

I don't think code is conservative ENOUGH, if anything, and like to
overwire for any given situation.  12-2 is just as easy and cheap to
work with as 14-2, for example.  10-2 unfortunately is not, but it
gives me comfort to use it whereever I can.  And I kinda wish that all
circuit breakers were GFCI by code as well, not just ones servicing
lines near water and pipes.  However, these are still available as user
choices -- code permits you to go over, just not under.

Anybody curious about wiring should definitely google for the electrical
wiring FAQ site.  It explains wiring in relatively simple terms.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Feb  6 09:23:48 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 6 Feb 2004 15:23:48 +0100 (CET)
Subject: [Beowulf] about cluster's tunning
In-Reply-To: <200402061131.i16BVpCQ002951@grisnir.ethz.ch>
Message-ID: <Pine.LNX.4.44.0402061523010.18103-100000@druifje.clustervision.com>

It just worked fine for me.
Mozilla 1.4.1  running on Fedora

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sp at scali.com  Fri Feb  6 03:52:50 2004
From: sp at scali.com (Steffen Persvold)
Date: Fri, 06 Feb 2004 09:52:50 +0100
Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs
In-Reply-To: <200402051927.21214.michael.gindonis@hip.fi>
References: <200402041702.i14H2Jh03108@NewBlue.scyld.com> <200402051927.21214.michael.gindonis@hip.fi>
Message-ID: <402355E2.2040909@scali.com>

Michael Gindonis wrote:
> On Wednesday 04 February 2004 19:02, beowulf-request at scyld.com wrote:
> 
>>From: Mark Hahn <hahn at physics.mcmaster.ca>
>>To: beowulf at beowulf.org
>>Subject: Re: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset
>>
>>
>>>I noticed in the Linux kernel configuration that there is support for
>>>LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this.
>>
>>huh?  afaikt, it's just another overly expensive, overly complicated hw
>>raid controller.  I guess there must be a market for this kind of
>>wrongheaded crap, but I really don't understand it.
> 
> 
> Hi Mark, 
> 
> When purchasing a cluster or cluster hardware, one can spend as little as 20 
> Euro ( ~30 CAD) per node on interconnects to more than 1000 Euro per node for 
> Myrinet or Scali.
> 

Michael,

I'm not entirely sure what you mean by "Scali" here. Scali is a _software_ vendor and our MPI can use all of the interconnects that are popular within HPC today (GbE, Myrinet, 
InfiniBand and SCI).

Best regards,
-- 
Steffen Persvold
Senior Software Engineer
mob. +47 92 48 45 11
tel. +47 22 62 89 50
fax. +47 22 62 89 51

Scali - http://www.scali.com
High Performance Clustering

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel.kidger at quadrics.com  Fri Feb  6 11:11:43 2004
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Fri, 6 Feb 2004 16:11:43 +0000
Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs
In-Reply-To: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0402052219380.31198-100000@coffee.psychology.mcmaster.ca>
Message-ID: <200402061611.43045.daniel.kidger@quadrics.com>

On Friday 06 February 2004 4:26 am, Mark Hahn added:
>> When purchasing a cluster or cluster hardware, one can spend as little as 20 
>> Euro ( ~30 CAD) per node on interconnects
>> to more than 1000 Euro per node for
>> Myrinet or Scali.
>
> or IB.

I guess you should add QsNet II to that list too 
(except that our cards are under e1000 - not counting switches)

Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Fri Feb  6 13:12:49 2004
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Fri, 06 Feb 2004 10:12:49 -0800
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402060917520.8662-100000@lilith.rgb.private
 .net>
References: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com>
Message-ID: <5.2.0.9.2.20040206094811.018e3688@mailhost4.jpl.nasa.gov>

At 09:30 AM 2/6/2004 -0500, Robert G. Brown wrote:
>On Fri, 6 Feb 2004, Nathan Littlepage wrote:
>
> > > That's not crazy, that's actually rather sane.  What would be crazy
> > > would be grounding the neutrals and/or ground wire in
> > > different places.
> > > Can you say "ground loop"?
> > >
> >
> >
> > Grounding loops.. truly a bane. I remember one instance where someone
> > wired a telecommunications switch to two different grounds. The -48v DC
> > power had it's own ground, and someone had grounded the chassis to a
> > different feed. I little lesser know fact was the lightning rod on the
> > tower next to the building was linked to the same ground as the power.
> > When lightning did strike, nothing but smoke as the charge rolled from
> > one ground to the other on each bay.
>
>There is also a memorable instance of powered racks with incoming two
>phase power split into two circuits having a polarity reversal so its
>neutral wire on one circuit was 120V above chassic ground and the
>neutral on the other circuit.  When somebody plugged a single unit with
>components on both lines -- I think it was more like "meltdown and
>fire".  Not really a ground loop, of course...

The classic error is wiring two sets of receptacles (e.g two racks full of 
gear) on the two sides of the 220, with neutrals properly connected, then 
having the neutral conductor fail, so the two 110V loads are in series 
across 110V.  Works fine as long as the loads are balanced, but when you 
start to turn off the loads on one side, the voltages don't balance any more.


>...but plenty of people have been electrocuted or fires started
>because there was a lot more resistance on the neutral line to a remote
>"ground" than there was to a nice, local, piece of metal.

The notorious MGM Grand fire in Las Vegas, for instance, was caused by a 
ground/neutral/resistance thing.

>  Basically,
>AFAICT there is really nothing in the NEC or CEC that is "stupid".  In
>fact, I think that most of the code has undergone a near-Darwinian
>selection process, as in electricians who fail to wire to code (and
>often their clients) not infrequently fail to reproduce.
>
>I don't think code is conservative ENOUGH, if anything, and like to
>overwire for any given situation.  12-2 is just as easy and cheap to
>work with as 14-2, for example.

Not if you buy your wire in traincarload lots when wiring a 
subdivision.  That extra copper adds up, not only in copper cost, but 
shipping, etc.  Consider that the wiring harness in an automobile weighs on 
the order of 50-100kg, and you see why they're interested in going to 
multiplex buses and 42V systems.  Ballparking for my house, which is, give 
or take 50 feet long, 20 feet wide, and 20 feet high, I'd say there are 
wiring runs comparable to, say, 3000 feet.  That's 9000 total feet of 
conductors (Black,White, Ground).  12AWG is 19.8 lb/1000 ft, 14 is 12.4 
lb/1000ft. Using AWG14 instead of AWG12 saves the contractor 70 pounds of 
copper. Copper, in huge quantities, is about $0.70/lb, so by the time it 
gets to the wire maker, it's probably a dollar a pound, so it saves the 
contractor $70 (not counting any shipping costs, etc. which could be 
another $0.10/lb or so)

$70/house is a bunch o' bux to a builder putting up 500 homes in a 
tract.  They make a profit by watching a thousand little details, each of 
which is some tiny fraction of the overall price ($70 on a 2000 ft house is 
0.035/square foot, compared to $70-100/ft construction cost).  It's much 
like automotive applications, or mass market consumer electronics, where 
they obssess about BOM (bill of materials) cost changes of pennies. (Do you 
really, really need that bypass capacitor?  Does it have to be that big? 
How many product returns will we get if we leave it out?)

This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
8.96, even after you factor in the fact that you might need more aluminum 
(because it's lower conductivity), it's still better than 2:1 weight 
difference. (Aluminum and Copper are about the same price these days, but 
copper has bigger fluctuations... back in the 70's copper was expensive and 
aluminum cheap (about 2:1))  So, 2:1 mass, 2:1 price.. changes the cost of 
the wire alone from $200/house down to $50/house...


Consider an office building with 20-30 floors, of 10,000 square feet 
each.  AWG12 vs AWG14 can be a BIG deal.  There was a lot of arguing about 
the heavier neutral wire needed in light industrial office 208Y/120 wiring 
with all the poor power factor loads (i.e. computers with lightly loaded 
switching power supplies).


>   10-2 unfortunately is not, but it
>gives me comfort to use it whereever I can.  And I kinda wish that all
>circuit breakers were GFCI by code as well, not just ones servicing
>lines near water and pipes.  However, these are still available as user
>choices -- code permits you to go over, just not under.
>
>Anybody curious about wiring should definitely google for the electrical
>wiring FAQ site.  It explains wiring in relatively simple terms.
>
>    rgb
>
>--
>Robert G. Brown                        http://www.phy.duke.edu/~rgb/
>Duke University Dept. of Physics, Box 90305
>Durham, N.C. 27708-0305
>Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From fant at pobox.com  Fri Feb  6 13:59:24 2004
From: fant at pobox.com (Andrew Fant)
Date: Fri, 6 Feb 2004 13:59:24 -0500 (EST)
Subject: [Beowulf] Gentoo for Science and Engineering
Message-ID: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net>


Hello,

I am sending this out to let people know about a new mailing-list/IRC
channel which is being organized for people interested in the use of
Gentoo Linux in Computational Science and Engineering applications.  At
this point we are just getting started, but hopefully we will grow into an
organization which presents a one-stop resource about applying Gentoo to
CS&E applications from the desktop to HPC clusters and grids.  In
addition, we will be working closely with Gentoo developers and the Core
Gentoo management to provide feedback and guidance in how it can most
closely meet the needs of technical end-users.  Anyone who has an interest
in computational science and engineering and who is interested in learning
more about Gentoo or making it a better CS&E platform is most cordially
invited to join

About Gentoo Linux:

Gentoo Linux is a source-based distribution that makes the assumption that
the end-user or administrator knows more about what the system is supposed
to do than the distribution developers.  At the core of this is a package
system known as Portage, which is similar in form to the BSD ports system.
It uses the rough equivalent of an RPM spec file (called an ebuild within
Gentoo) to automatically download source, compile the package (and any
prerequisites) with appropriate optimizations and options as defined by
the user, and install it in such a way that it can be removed or upgraded
at a later time.  Sometimes referred to as a meta-distribution by the
developers, Gentoo initially installs a minimal environment and doesn't
force the end-user to install packages and services that are unwanted or
unnecessary.  Also, no network daemons are started on a system unless an
administrator expressly starts them.

Gentoo Linux is developed by a community of developers, much as Fedora and
Debian are.  At present, there are over 6000 different ebuilds for
different system utilities and applications in Portage.  Of these, more
than 100 are classified as scientific applications, including bioperl,
octave, spice, and gromacs.  In addition, many common scientific libraries
and HPC tools are present, including Atlas, FFTW, gmp, LAM/MPI and
openpbs.  The main website can be found at http://www.gentoo.org.

Contact information:

The mailing-list is only starting now, and is rather quiet, though I hope
to change that over the next couple of weeks.  To subscribe, send a blank
email to gentoo-science-subscribe at gentoo.org.  You will get a confirmation
message back.  For those who want to just ask questions or find out more
in a real-time setting, we are on IRC at irc.freenode.org in
#gentoo-science.  Of course, questions may also be directed to me at
afant at geekmail.cc.

Thank you for your time.  Please feel free to forward this information to
other groups that you feel would be interested.  I apologize to anyone who
considered this an off-topic post.

Andy Fant

Andrew Fant      |   This    |  "If I could walk THAT way...
Molecular Geek   |   Space   |     I wouldn't need the talcum powder!"
fant at pobox.com   |    For    |          G. Marx (apropos of Aerosmith)
Boston, MA USA   |   Hire    |    http://www.pharmawulf.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Fri Feb  6 22:19:58 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sat, 7 Feb 2004 11:19:58 +0800 (CST)
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net>
Message-ID: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com>

Can you add GridEngine (SGE) and Torque (SPBS)?

The problem with OpenPBS is not only it is broken, it
is not under development these days, but also I found
that Altair is not allowing new users to download
OpenPBS. I went to its homepage today but it only
leads me to the PBSPro page.

SGE already has a FreeBSD-style "port", so adding a
port for Gentoo Linux should also be easy. And I think
SGE is more popluar these days too.

SPBS is basically PBS, but with lots of problems
fixed, and better Maui scheduler support. Also, please
support the mpiexec parallel job starter, as it allows
OpenPBS and SPBS to control slave MPI tasks.

SGE: http://gridengine.sunsource.net
SPBS: http://www.supercluster.org/projects/torque/
mpiexec: http://www.osc.edu/~pw/mpiexec/

Thx :->
Andrew.

--- Andrew Fant <fant at pobox.com> ????
> In addition, many
> common scientific libraries
> and HPC tools are present, including Atlas, FFTW,
> gmp, LAM/MPI and
> openpbs.  The main website can be found at
> http://www.gentoo.org.
> 
> Contact information:
> 
> The mailing-list is only starting now, and is rather
> quiet, though I hope
> to change that over the next couple of weeks.  To
> subscribe, send a blank
> email to gentoo-science-subscribe at gentoo.org.  You
> will get a confirmation
> message back.  For those who want to just ask
> questions or find out more
> in a real-time setting, we are on IRC at
> irc.freenode.org in
> #gentoo-science.  Of course, questions may also be
> directed to me at
> afant at geekmail.cc.
> 
> Thank you for your time.  Please feel free to
> forward this information to
> other groups that you feel would be interested.  I
> apologize to anyone who
> considered this an off-topic post.
> 
> Andy Fant
> 
> Andrew Fant      |   This    |  "If I could walk
> THAT way...
> Molecular Geek   |   Space   |     I wouldn't need
> the talcum powder!"
> fant at pobox.com   |    For    |          G. Marx
> (apropos of Aerosmith)
> Boston, MA USA   |   Hire    |   
> http://www.pharmawulf.com
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sat Feb  7 03:18:47 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sat, 7 Feb 2004 09:18:47 +0100 (CET)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <5.2.0.9.2.20040206094811.018e3688@mailhost4.jpl.nasa.gov>
Message-ID: <Pine.LNX.4.44.0402070915240.26424-100000@druifje.clustervision.com>

On Fri, 6 Feb 2004, Jim Lux wrote:

> This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> 8.96, even after you factor in the fact that you might need more aluminum 
> (because it's lower conductivity), it's still better than 2:1 weight 

Oh yes.
Lots of telephone circuits were wired in aluminium in the 1960's in the 
UK. Corrosion now means these customers have difficulty getting ADSL.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From amacater at galactic.demon.co.uk  Sat Feb  7 06:21:19 2004
From: amacater at galactic.demon.co.uk (Andrew M.A. Cater)
Date: Sat, 7 Feb 2004 11:21:19 +0000
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com>
References: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net> <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com>
Message-ID: <20040207112119.GA5120@galactic.demon.co.uk>

On Sat, Feb 07, 2004 at 11:19:58AM +0800, Andrew Wang wrote:
> Can you add GridEngine (SGE) and Torque (SPBS)?
> 
> The problem with OpenPBS is not only it is broken, it
> is not under development these days, but also I found
> that Altair is not allowing new users to download
> OpenPBS. I went to its homepage today but it only
> leads me to the PBSPro page.
> 
To clarify things a bit, I hope.

In the beginning was PBS - developed in house at NASA by engineers
who needed a Portable Batch System.  If you understand Cray NQS syntax 
and concepts it's familiar :) They left / sold to Veridian who in turn 
sold to Altair.  The original PBS was GPL or a close equivalent, if I 
understand correctly.

Altair are marketing a propietary development of PBS as PBSPro.  OpenPBS
remains available, though you have to register with Altair for download.
What they have done very recently, which is rather sneaky, is for the 
site to oblige you to register for an evaluation copy of PBSPro and 
potentially answer a questionnaire prior to providing the link to allow 
you to download OpenPBS.

OpenPBS is not under active development and PBSPro may have stalled.  
Certainly the price per node that Altair are quoting has apparently 
dropped significantly - though their salesmen are still persistent :)

The academic community and the active users forked OpenPBS to create 
Scalable PBS [SPBS] which is the name most widely known.  They've added
patches, fixes and features, though there is still an Altair licence for 
OpenPBS in there.  In the last couple of months, SPBS changed its name
initially to StORM and then to Torque.

HTH other relative newbies who may be confused by trying to find the 
product :)

Andy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sat Feb  7 09:19:21 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sat, 7 Feb 2004 22:19:21 +0800 (CST)
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <20040207112119.GA5120@galactic.demon.co.uk>
Message-ID: <20040207141921.68156.qmail@web16810.mail.tpe.yahoo.com>

--- "Andrew M.A. Cater" ????  
> Certainly the price per node that Altair are quoting
> has apparently 
> dropped significantly - though their salesmen are
> still persistent :)

Both LSF and PBSPro dropped their price significantly.
LSF used to be US $1000 per CPU is now $50, and PBSPro
used to be a few hundred dollars, and now lower than
$30.

SGE (GridEngine) 6.0 has a lot of new enchancements
and the SGE mailing lists are very popular; and SPBS
is gaining a lot of OpenPBS users' acceptance; and
Condor is adding another set of new features and then
opensource in the next few months.

See if LSF and PBSPro are going to drop their price
again in the very near future.

BTW, it is just like Linux vs M$, at the beginning,
Linux wasn't there, and M$ could charge as much as it
wanted, and then Linux slowly came, and M$ found it
harder and harder to compete with Linux.

Linux won't kill M$, and SGE/SPBS/Condor won't kill
LSF or PBSPro, not in this few years. The only thing
we will see, however, is the lower cost, more
features, and better support by Platform Computing
(LSF) and Altair (PBSPro) in order to fight back, so
users win.

Andrew.

> The academic community and the active users forked
> OpenPBS to create 
> Scalable PBS [SPBS] which is the name most widely
> known.  They've added
> patches, fixes and features, though there is still
> an Altair licence for 
> OpenPBS in there.  In the last couple of months,
> SPBS changed its name
> initially to StORM and then to Torque.
> 
> HTH other relative newbies who may be confused by
> trying to find the 
> product :)
> 
> Andy
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From br66 at HPCL.CSE.MsState.Edu  Sat Feb  7 13:48:06 2004
From: br66 at HPCL.CSE.MsState.Edu (Balaji Rangasamy)
Date: Sat, 7 Feb 2004 12:48:06 -0600 (CST)
Subject: [Beowulf] Cluster applications.
In-Reply-To: <40235BB7.6010802@myri.com>
Message-ID: <Pine.GSO.4.44.0402071237100.13084-100000@aurora.cs.msstate.edu>

Hi,
I am looking for a real high performance computing application to evaluate
the performance of a 2-node cluster running RH9.0, connected back to back
by 1GbE. Here are some characteristics of the application I am looking
for:
1 Communication intensive, should not be embarassingly parallel.
2 Should be able to stress the network to the maximum.
3 Should not be a benchmark, a real application.
4 Tunable message sizes.
5 Preferably MPI
6 Free (am I greedy?).
Can someone point out one/some application(s) with at least first 3
features in the above list? Thank you very much.
Regards,
Balaji.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Sat Feb  7 10:11:55 2004
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Sat, 07 Feb 2004 09:11:55 -0600
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402070915240.26424-100000@druifje.clustervision.com>
References: <Pine.LNX.4.44.0402070915240.26424-100000@druifje.clustervision.com>
Message-ID: <4025003B.3020105@tamu.edu>

Should we mention the problems in household wiring caused by use of 
aluminum wiring, then using breakers, outlets and fixtures designed for 
copper?  I almost lost a house in Houston to that once.  I spent the 8 
hours after the fire department left retightening all the connections 
throughout.

John Hearns wrote:
> On Fri, 6 Feb 2004, Jim Lux wrote:
> 
> 
>>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
>>8.96, even after you factor in the fact that you might need more aluminum 
>>(because it's lower conductivity), it's still better than 2:1 weight 
> 
> 
> Oh yes.
> Lots of telephone circuits were wired in aluminium in the 1960's in the 
> UK. Corrosion now means these customers have difficulty getting ADSL.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From clwang at csis.hku.hk  Fri Feb  6 20:51:51 2004
From: clwang at csis.hku.hk (Cho Li Wang)
Date: Sat, 07 Feb 2004 09:51:51 +0800
Subject: [Beowulf] CFP: 2004 IFIP International Conference on Network and Parallel
 Computing (NPC2004)
Message-ID: <402444B7.5E50DC8B@csis.hku.hk>

                            NPC2004
IFIP International Conference on Network and Parallel Computing
                      October 18-20, 2004
                         Wuhan, China
                http://grid.hust.edu.cn/npc04

****************************************************************
Call For Papers

  The goal of IFIP International Conference on Network and Parallel
Computing (NPC 2004) is to establish an international forum for
engineers and scientists to present their excellent ideas and
experiences in all system fields of network and parallel computing. NPC
2004, hosted by the Huazhong University of Science and Technology, will
be held in the city of Wuhan, China - the "Homeland of White Clouds and
the Yellow Crane." Topics of interest include, but are not limited to:

        -Grid-based Computing            
        -Cluster-based Computing 
        -Peer-to-peer Computing          
        -Network Security 
        -Ubiquitous Computing            
        -Network Architectures 
        -Advanced Web and Proxy Services 
        -Mobile Agents                   
        -Network Storage
        -Multimedia Streaming Services
        -Middleware Frameworks and Toolkits
        -Parallel & Distributed Architectures and Algorithms
        -Performance Modeling/ Evaluation 
        -Programming Environments and Tools for Parallel and 
         Distributed Platforms

  Submitted papers may not have appeared in or be considered for another
conference. Papers must be written in English and must be in PDF format.
Detailed electronic submission instructions will be posted on the
conference web site. The conference proceedings will be published by
Springer Verlag in the Lecture Notes in Computer Science (LNCS) Series
(pending).

**************************************************************************
Committee

  General Co-Chairs: 
        H. J. Siegel            Colorado State University, USA
        Guo-jie Li              Chinese Academy of Sciences, China
  Steering Committee Chair:
        Kemal Ebcioglu          IBM T.J. Watson Research Center, USA
  Program Co-Chairs:
        Guang-rong Gao          University of Delaware, USA
        Zhi-wei Xu              Chinese Academy of Sciences, China
  Program Vice-Chairs:
        Victor K. Prasanna      University of Southern California, USA
        Albert Y. Zomaya        University of Sydney, Australia
        Hai Jin                 Huazhong University of Science and
                                Technology, China
  Local Arrangement Chair:
        Song Wu                 Huazhong University of Science and
                                Technology, China

***************************************************************************
Important Dates

  Paper Submission                      March 15, 2004
  Author Notification                   May 1, 2004
  Final Camera Ready Manuscript         June 1, 2004

***************************************************************************
  
  For more information, please contact the program vice-chair at the
address below:

        Dr. Hai Jin, Professor
        Director, Cluster and Grid Computing Lab
        Vice-Dean, School of Computer
        Huazhong University of Science and Technology
        Wuhan, 430074, China
        Tel:    +86-27-87543529
        Fax:   +86-27-87557354
        e-fax:  +1-425-920-8937
        e-mail: hjin at hust.edu.cn
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Sat Feb  7 14:40:29 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sat, 7 Feb 2004 11:40:29 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402070915240.26424-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.44.0402071135330.23926-100000@twin.uoregon.edu>

On Sat, 7 Feb 2004, John Hearns wrote:

> On Fri, 6 Feb 2004, Jim Lux wrote:
> 
> > This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> > 8.96, even after you factor in the fact that you might need more aluminum 
> > (because it's lower conductivity), it's still better than 2:1 weight 
> 
> Oh yes.
> Lots of telephone circuits were wired in aluminium in the 1960's in the 
> UK. Corrosion now means these customers have difficulty getting ADSL.

yeah but that's 24-26awg twisted pair for phone a 14 12 10 or 8 awg cable 
for power have substantialy less surface area relative to it's volume.
 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Sat Feb  7 17:21:38 2004
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Sat, 7 Feb 2004 14:21:38 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling... wires - al
In-Reply-To: <4025003B.3020105@tamu.edu>
Message-ID: <Pine.LNX.3.96.1040207141437.17720B-100000@Maggie.Linux-Consulting.com>


hi ya

On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote:

> Should we mention the problems in household wiring caused by use of 
> aluminum wiring, then using breakers, outlets and fixtures designed for 
> copper?  I almost lost a house in Houston to that once.  I spent the 8 
> hours after the fire department left retightening all the connections 
> throughout.

people install wires with al or steel cores in the wire cause its
way way cheaper than copper ... copper is only needed for good conduction
on the outside of the wire

al corrosion ...  coat it with stuff :-) or wrap it w/ copper but
now you have to worry about copper corrosion 
	- house or building wiring is different animals than
	high voltage transmission lines too

aluminum "pixie" dust does whacky things ..

c ya 
alvin

- i've always wondered why people put massive heatsinks on top of the
  cpu ... air will have a harder time to cool a big mass of metal
  as opposed to cooling a smaller piece of metal  or cooling it some
  other way ..
	- problems of getting the heat out of the cpu ( 0.25"sq metal lid)
	- problems of getting the heat out of the cpu heatsink

	- blowing air down onto the heatsink is silly too .. left over
	from the 20-30 yr old ideas i guess

> 
> John Hearns wrote:
> > On Fri, 6 Feb 2004, Jim Lux wrote:
> > 
> > 
> >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> >>8.96, even after you factor in the fact that you might need more aluminum 
> >>(because it's lower conductivity), it's still better than 2:1 weight 
> > 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Sat Feb  7 21:36:50 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Sat, 7 Feb 2004 21:36:50 -0500 (EST)
Subject: [Beowulf] Cluster applications.
In-Reply-To: <Pine.GSO.4.44.0402071237100.13084-100000@aurora.cs.msstate.edu>
Message-ID: <Pine.LNX.4.44.0402072132240.28584-100000@boltzmann.basement-supercomputing.com>


Check out:

http://www.clusterworld.com/article.pl?sid=03/03/17/1838236&mode=thread&tid=8

Also, the "Right Stuff" Column in ClusterWorld addresses some of these 
issues. To see the a small summary of the columns look at:

http://www.clusterworld.com/issues.shtml


Doug


On Sat, 7 Feb 2004, Balaji Rangasamy wrote:

> Hi,
> I am looking for a real high performance computing application to evaluate
> the performance of a 2-node cluster running RH9.0, connected back to back
> by 1GbE. Here are some characteristics of the application I am looking
> for:
> 1 Communication intensive, should not be embarassingly parallel.
> 2 Should be able to stress the network to the maximum.
> 3 Should not be a benchmark, a real application.
> 4 Tunable message sizes.
> 5 Preferably MPI
> 6 Free (am I greedy?).
> Can someone point out one/some application(s) with at least first 3
> features in the above list? Thank you very much.
> Regards,
> Balaji.
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From klamman.gard at telia.com  Sun Feb  8 03:50:22 2004
From: klamman.gard at telia.com (Per Lindstrom)
Date: Sun, 08 Feb 2004 09:50:22 +0100
Subject: [Beowulf] Cluster applications.
In-Reply-To: <Pine.GSO.4.44.0402071237100.13084-100000@aurora.cs.msstate.edu>
References: <Pine.GSO.4.44.0402071237100.13084-100000@aurora.cs.msstate.edu>
Message-ID: <4025F84E.4040706@telia.com>

Hi Balaji,

May I suggest the use of the GNU FEA-software CALCULIX, http://calculix.de/

When will it be up to you to decide how demanding problem your cluster 
have to solve.

Best regards
Per Lindstrom

Balaji Rangasamy wrote:

>Hi,
>I am looking for a real high performance computing application to evaluate
>the performance of a 2-node cluster running RH9.0, connected back to back
>by 1GbE. Here are some characteristics of the application I am looking
>for:
>1 Communication intensive, should not be embarassingly parallel.
>2 Should be able to stress the network to the maximum.
>3 Should not be a benchmark, a real application.
>4 Tunable message sizes.
>5 Preferably MPI
>6 Free (am I greedy?).
>Can someone point out one/some application(s) with at least first 3
>features in the above list? Thank you very much.
>Regards,
>Balaji.
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Feb  8 10:52:44 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 8 Feb 2004 10:52:44 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <4025003B.3020105@tamu.edu>
Message-ID: <Pine.LNX.4.44.0402080807550.13314-100000@lilith.rgb.private.net>

On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote:

> Should we mention the problems in household wiring caused by use of 
> aluminum wiring, then using breakers, outlets and fixtures designed for 
> copper?  I almost lost a house in Houston to that once.  I spent the 8 
> hours after the fire department left retightening all the connections 
> throughout.

You mean the part where aluminum turns out to burn like magnesium,
incredibly hot and impossible to quench?

I would under no circumstances put aluminum wiring in, well, anything.
Certainly not anything where a serious overload or arcing situation
could occur, which is nearly anything.

I seem to remember the government finding out about aluminum the hard
way with some of their armored fighting vehicles a decade or two ago.
When struck with a hot enough round, the armor itself just burned right
up.

   rgb

> 
> John Hearns wrote:
> > On Fri, 6 Feb 2004, Jim Lux wrote:
> > 
> > 
> >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> >>8.96, even after you factor in the fact that you might need more aluminum 
> >>(because it's lower conductivity), it's still better than 2:1 weight 
> > 
> > 
> > Oh yes.
> > Lots of telephone circuits were wired in aluminium in the 1960's in the 
> > UK. Corrosion now means these customers have difficulty getting ADSL.
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nathan at iwantka.com  Sun Feb  8 14:10:43 2004
From: nathan at iwantka.com (Nathan Littlepage)
Date: Sun, 08 Feb 2004 13:10:43 -0600
Subject: [Beowulf] DC Powered Chassis
Message-ID: <402689B3.9070104@iwantka.com>

With all the power talk on the 'HVAC and Room Cooling' subject. I've 
been looking for 1 or 2u chassis that support -48v DC as the main power 
source. Does anyone know of someone that manufactures these?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Mon Feb  9 00:24:28 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sun, 8 Feb 2004 21:24:28 -0800 (PST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402080807550.13314-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0402082058470.8682-100000@twin.uoregon.edu>

On Sun, 8 Feb 2004, Robert G. Brown wrote:

> On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote:
> 
> > Should we mention the problems in household wiring caused by use of 
> > aluminum wiring, then using breakers, outlets and fixtures designed for 
> > copper?  I almost lost a house in Houston to that once.  I spent the 8 
> > hours after the fire department left retightening all the connections 
> > throughout.
> 
> I seem to remember the government finding out about aluminum the hard
> way with some of their armored fighting vehicles a decade or two ago.
> When struck with a hot enough round, the armor itself just burned right
> up.

armor is supposed to burn. several armor desgins including that of the 
american abrams battle tank, are desgined to ablate under pressure from 
kinetic energy weapons. british chobham type composite armor, boron 
carbide, or aluminum or some conbination of those and others, protect 
larger armored vehicles from depleted uranium and tungsten sabot 
munitions.

depleted uranium has similar or better pyrophoric properties (igniting at
500c and burning at 2000c) and the added nastyness of being a toxic heavy
metal... in general taking a 10kg urunium slug, accelerating it to
15,000fps and slamming it into another object will cause a fire. It has 
been used in both armor and projectiles for more or less the same reasons.
 
>    rgb
> 
> > 
> > John Hearns wrote:
> > > On Fri, 6 Feb 2004, Jim Lux wrote:
> > > 
> > > 
> > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 
> > >>8.96, even after you factor in the fact that you might need more aluminum 
> > >>(because it's lower conductivity), it's still better than 2:1 weight 
> > > 
> > > 
> > > Oh yes.
> > > Lots of telephone circuits were wired in aluminium in the 1960's in the 
> > > UK. Corrosion now means these customers have difficulty getting ADSL.
> > > 
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> > 
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Sun Feb  8 17:49:18 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sun, 8 Feb 2004 14:49:18 -0800 (PST)
Subject: [Beowulf] DC Powered Chassis
In-Reply-To: <402689B3.9070104@iwantka.com>
Message-ID: <Pine.LNX.4.44.0402081449120.5461-100000@twin.uoregon.edu>

http://www.rackmountpro.com/productsearch.cfm?catid=118

On Sun, 8 Feb 2004, Nathan Littlepage wrote:

> With all the power talk on the 'HVAC and Room Cooling' subject. I've 
> been looking for 1 or 2u chassis that support -48v DC as the main power 
> source. Does anyone know of someone that manufactures these?
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Feb  9 13:13:16 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 9 Feb 2004 13:13:16 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402082058470.8682-100000@twin.uoregon.edu>
Message-ID: <Pine.LNX.4.44.0402091252230.5606-100000@ganesh.phy.duke.edu>

On Sun, 8 Feb 2004, Joel Jaeggli wrote:

> On Sun, 8 Feb 2004, Robert G. Brown wrote:
> 
> > On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote:
> > 
> > > Should we mention the problems in household wiring caused by use of 
> > > aluminum wiring, then using breakers, outlets and fixtures designed for 
> > > copper?  I almost lost a house in Houston to that once.  I spent the 8 
> > > hours after the fire department left retightening all the connections 
> > > throughout.
> > 
> > I seem to remember the government finding out about aluminum the hard
> > way with some of their armored fighting vehicles a decade or two ago.
> > When struck with a hot enough round, the armor itself just burned right
> > up.
> 
> armor is supposed to burn. several armor desgins including that of the 

"supposed to burn"?  Where to "burn" is to release additional heat
energy into an already hot environment in a self-sustaining way?  Ouch.

Supposed to ablate and dissipate energy (hopefully in non-destructive
ways on the outside of the vehicle) sure, but naive aluminum designs can
be deadly and at various points in the past have been seriously
mistrusted by the military personnel supposedly being protected.  See
e.g.

  http://www.g2mil.com/aluminum.htm

where they recall the early bradley flaws, and argue that the HMS
Sheffield (sunk by a single exocet missle in the falklands war) went
down in large measure because it was an aluminum ship, where steel ships
have been hit by more than one exocet and survived.  The site also
presents a counterpoint that argues that aluminum isn't THAT bad a
choice (as near as I can make out) provided that all one wishes to stop
is "small arms fire".  It very quickly loses out to steel, though, in a
variety of measures when faced with RPG's or things that actually cause
fires, as it is a good conductor of heat and quickly spreads a fire and
structurally collapses at a relatively low temperature.  The aluminum
Bradley did tolerably in the first gulf war, losing only 3 to enemy fire
(compared to 17 lost to friendly fire from Abrams tanks) but it does
have provisions for additional armor plates of steel to be added on
outside and I imagine that it used them.  Most of what it faced in the
gulf war OTHER than our Abrams was its forte -- small arms fire.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Sun Feb  8 17:09:36 2004
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Sun, 8 Feb 2004 14:09:36 -0800 (PST)
Subject: [Beowulf] DC Powered Chassis
In-Reply-To: <402689B3.9070104@iwantka.com>
Message-ID: <Pine.LNX.3.96.1040208140456.23039A-100000@Maggie.Linux-Consulting.com>


hi ya nathan

On Sun, 8 Feb 2004, Nathan Littlepage wrote:

> With all the power talk on the 'HVAC and Room Cooling' subject. I've 
> been looking for 1 or 2u chassis that support -48v DC as the main power 
> source. Does anyone know of someone that manufactures these?

some collection of "these"

http://www.Linux-1U.net/PowerSupp/DC
http://www.Linux-1U.net/PowerSupp/12v


problem with +12v or -48v dc inputs is you need to provide
enough current to these "dc power supply"
	- at 12v .. we were calculating about 400A ...
	since we estimate 4A per mb and 100 mb per rack
	and double it or 50% for keeping the powersupply
	reasonably within its normal lifespan ( mtbf )

fun stuff...
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nathan at iwantka.com  Sun Feb  8 17:31:40 2004
From: nathan at iwantka.com (Nathan Littlepage)
Date: Sun, 08 Feb 2004 16:31:40 -0600
Subject: [Beowulf] DC Powered Chassis
In-Reply-To: <Pine.LNX.3.96.1040208140456.23039A-100000@Maggie.Linux-Consulting.com>
References: <Pine.LNX.3.96.1040208140456.23039A-100000@Maggie.Linux-Consulting.com>
Message-ID: <4026B8CC.7050102@iwantka.com>

Thanks!

Alvin Oga wrote:

>hi ya nathan
>
>On Sun, 8 Feb 2004, Nathan Littlepage wrote:
>
>  
>
>>With all the power talk on the 'HVAC and Room Cooling' subject. I've 
>>been looking for 1 or 2u chassis that support -48v DC as the main power 
>>source. Does anyone know of someone that manufactures these?
>>    
>>
>
>some collection of "these"
>
>http://www.Linux-1U.net/PowerSupp/DC
>http://www.Linux-1U.net/PowerSupp/12v
>
>
>problem with +12v or -48v dc inputs is you need to provide
>enough current to these "dc power supply"
>	- at 12v .. we were calculating about 400A ...
>	since we estimate 4A per mb and 100 mb per rack
>	and double it or 50% for keeping the powersupply
>	reasonably within its normal lifespan ( mtbf )
>
>fun stuff...
>alvin
>
>
>  
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From clwang at csis.hku.hk  Sat Feb  7 20:24:01 2004
From: clwang at csis.hku.hk (clwang at csis.hku.hk)
Date: Sun,  8 Feb 2004 09:24:01 +0800
Subject: [Beowulf] CFP: GCC2004 (3rd International Conference on Grid and Cooperative Computing)
Message-ID: <1076203441.40258fb1bf1c0@intranet.csis.hku.hk>

----------------------------------------------------------------
                      Call for Papers

3rd International Conference on Grid and Cooperative Computing 
               (http://grid.hust.edu.cn/gcc2004)
               October 21-24 2004, Wuhan, China
-----------------------------------------------------------------

The Third International Conference on Grid and Cooperative Computing
(GCC 2004) will be held from Oct. 21 to 24, 2004 in Wuhan. It will serve
as a forum to present current work by researchers in the grid computing
and cooperative computing area. GCC 2004 is the follow-up of the highly
successful GCC 2003 in Shanghai, China, and GCC 2002 in Sanya, China.
Wuhan is rich in culture and history. Its civilization began about 3,500
years ago, and is of great importance in Chinese culture, economy and
politics. It shares the same culture of Chu, formed since the ancient
Kingdom of Chu more than 2,000 years ago. Numerous natural and
artificial attractions and scenic spots are scattered around. Famous
scenic spots in Wuhan include Yellow Crane Tower, Guiyuan Temple, East
Lake, and Hubei Provincial Museum with the famous chimes playing the
music of different styles.
GCC 2004 will emphasize the design and analysis of grid computing and
cooperative computing and their scientific, engineering, and commercial
applications. In addition to technical sessions of contributed paper
presentations, the conference will have several workshops, a poster
session, tutorials, and vendor exhibitions.
GCC 2004 invites the submission of papers in grid computing, Web
services and cooperative computing, including theory and applications.
The conference is soliciting only original high quality research papers
on all above aspects.
The main topics of interest include, but not limited to: 

        -Resource Grid and Service Grid
        - Information Grid and Knowledge Grid
        - Grid Monitoring, Management and Organization Tools
        - Grid Portal
        - Grid Service, Web Service and their QoS
        - Service Orchestration
        - Grid Middleware and Toolkits
        - Grid Security
        - Innovative Grid Applications
        - Advanced Resource Reservation and Scheduling
        - Performance Evaluation and Modeling
        - Computer-Supported Cooperative Work
        - P2P Computing, automatic computing and so on
        - Meta-information Management
        - Software glue Technologies

PAPER SUBMISSION

Paper submissions must present original, unpublished research or
experiences. Late-breaking advances and work-in-progress reports from
ongoing research are also encouraged to be submitted to GCC 2004.
All papers submitted to this conference will be peer-reviewed and
accepted on the basis of their scientific merit and relevance to the
conference topics. Accepted papers will be published as conference
proceedings, published by Springer-Verlag in the Lecture Notes in
Computer Science (LNCS) Series (Pending).
It is also planned that a selection of papers from GCC 2004 proceedings
will be extended and published in international journals.

WORKSHOPS
Proposals are solicited for workshops to be held in conjunction with the
main conference. Interested individuals should submit a proposal by
March 1, 2004 to the Workshop Chair.

TUTORIALS
Proposals are solicited for tutorials to be held at the conference.
Interested individuals should submit a proposal by May 30,2004. The
proposal should include a brief description of the intended audience, a
lecture outline, and a vita for each lecturer.

EXHIBITION/VENDOR PRESENTATIONS
Companies and R&D laboratories are encouraged to present their exhibits
at the conference. In addition, a full day of vendor presentations is
planned.

IMPORTANT DATES
March           1, 2004                 Workshop Proposal Due
May             1, 2004                 Conference Paper Due
May             30, 2004                Tutorial Proposal Due
June            1, 2004                 Notification of
Acceptance/Rejection    
June            30, 2004                Camera-Ready Paper Due

ORGANIZATION
CONFERENCE Co-CHAIRS
Xicheng Lu, National University of Defense Technology, China
Andrew A. Chien, University of California at San Diego, USA.

PROGRAM Co-CHAIRS
Hai Jin, Huazhong University of Science and Technology, China.
hjin at hust.edu.cn
Yi Pan, Georgia State University, USA. pan at cs.gsu.edu

WORKSHOP CHAIR
Nong Xiao, National University of Defense Technology, China.
xiao-n at vip.sina.com, Xiao_n at sina.com.

Publicity Chair
Minglu Li, Shanghai Jiao Tong University, China. li-ml at cs.sjtu.edu.cn

Tutorial Chair
Dan Meng, Institute of Computing Technology, Chinese Academy of
Sciences, China. md at ncic.ac.cn

Poster Chair
Song Wu, Huazhong University of Science and Technology, China.
wusong at mail.hust.edu.cn

LOCAL ARRANGEMENT CHAIR
Pingpeng Yuan, Huazhong University of Science and Technology, China.
ppyuan at mail.hust.edu.cn.

Program Committee Members(More to be added)

Mark Baker (University of Portsmouth, UK)
Rajkumar Buyya (The University of Melbourne, Australia)
Wentong Cai (Nanyang Technological University, Singapore)
Jiannong Cao (Hong Kong Polytechnic University, Hong Kong)
Guihai Chen (Nanjing University, China)
Minyi Guo (University of Aizu, Japan)
Chun-Hsi Huang (University of Connecticut, USA)
Weijia Jia (City University of Hong Kong, Hong Kong)
Francis Lau (The University of Hong Kong, Hong Kong)
Keqin Li (State University of New York, USA)
Qing Li (City University of Hong Kong, Hong Kong)
Lionel Ni (Hong Kong University of Science and Technology, Hong Kong)
Hong Shen (Japan Advanced Institute of Science and Technology, Japan)
Yuzhong Sun (Institute of Computing Technology, CAS, China)
Huaglory Tianfield (Glasgow Caledonian University, UK)
Cho-Li Wang (The University of Hong Kong, Hong Kong)
Jie Wu (Florida Atlantic University, USA)
Cheng-Zhong Xu (Wayne State University, USA)
Laurence Tianruo Yang (St. Francis Xavier University, Canada)
Qiang Yang (Hong Kong University of Science & Technology, Hong Kong)
Yao Zheng (Zhejiang University, China)
Wanlei Zhou (Deakin University, Australia)
Jianping Zhu (The University of Akron, USA) 

For more information, please visit conference web site at: 
http://grid.hust.edu.cn/gcc2004.


-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgoornaden at intnet.mu  Mon Feb  9 10:34:28 2004
From: rgoornaden at intnet.mu (roudy)
Date: Mon, 9 Feb 2004 19:34:28 +0400
Subject: [Beowulf] parallel program
References: <200402081701.i18H1qh28395@NewBlue.scyld.com>
Message-ID: <001601c3ef22$35dd7be0$ab007bca@roudy>

Hello Beowulf people,
I have completed to build my cluster. I have have already run linpack on my
cluster and it's performance is fine.
Can someone help me by giving me some very big programs to run on my cluster
to compare the performance with a stand-alone computer.
Thanks
Roudy

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgoornaden at intnet.mu  Mon Feb  9 10:34:28 2004
From: rgoornaden at intnet.mu (roudy)
Date: Mon, 9 Feb 2004 19:34:28 +0400
Subject: [Beowulf] parallel program
References: <200402081701.i18H1qh28395@NewBlue.scyld.com>
Message-ID: <001601c3ef22$35dd7be0$ab007bca@roudy>

Hello Beowulf people,
I have completed to build my cluster. I have have already run linpack on my
cluster and it's performance is fine.
Can someone help me by giving me some very big programs to run on my cluster
to compare the performance with a stand-alone computer.
Thanks
Roudy

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcookeman at yahoo.com  Mon Feb  9 18:00:47 2004
From: jcookeman at yahoo.com (Justin Cook)
Date: Mon, 9 Feb 2004 15:00:47 -0800 (PST)
Subject: [Beowulf] parallel program
In-Reply-To: <001601c3ef22$35dd7be0$ab007bca@roudy>
Message-ID: <20040209230047.89106.qmail@web60510.mail.yahoo.com>

http://www.mpa-garching.mpg.de/galform/gadget/index.shtml

There is a serial and parallel version.  Have fun...

Justin

--- roudy <rgoornaden at intnet.mu> wrote:
> Hello Beowulf people,
> I have completed to build my cluster. I have have
> already run linpack on my
> cluster and it's performance is fine.
> Can someone help me by giving me some very big
> programs to run on my cluster
> to compare the performance with a stand-alone
> computer.
> Thanks
> Roudy
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Feb  9 17:50:45 2004
From: becker at scyld.com (Donald Becker)
Date: Mon, 9 Feb 2004 17:50:45 -0500 (EST)
Subject: [Beowulf] BWBUG meeting Tuesday Feb 10 at 3:00, Platform Computing
Message-ID: <Pine.LNX.4.44.0402091745480.2869-100000@localhost.localdomain>


        --- Note that this meeting is in VA not Maryland! --


Date: February 10, 2004
Time: 3:00 PM  (doors open at 2:30)
Location: Northrop Grumman IT, McLean Virginia


The folks from Platform Computing will be speaking about their LSF scheduler
and Grid Computing for Beowulf.

This event is sponsored by the Baltimore-Washington Beowulf Users Group
(BWBUG) and will be held at Northrop Grumman Information Technology 7575
Colshire Drive,  2nd floor, McLean Virginia.
Please register on line at http://bwbug.org
As usual there will be door prizes, food and refreshments.

Need to be a member?: No ( guests are welcome )
Parking: Free


T. Michael Fitzmaurice, Jr.
8110 Gatehouse Road, Suite 400W
Falls Church, VA 22042
703-205-3132 office
240-475-7877 cell

Email  michael.fitzmaurice at ngc.com

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Mon Feb  9 18:25:55 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 10 Feb 2004 10:25:55 +1100
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402091252230.5606-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0402091252230.5606-100000@ganesh.phy.duke.edu>
Message-ID: <200402101025.57234.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 10 Feb 2004 05:13 am, Robert G. Brown wrote:

> argue that the HMS Sheffield (sunk by a single exocet missle in the
> falklands war) went down in large measure because it was an aluminum ship

A quick correction, the Sheffield was an all steel ship, as I believe were all 
the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was 
not brought under control because the exocet (which failed to explode) took 
out a large chunk of the fire fighting system. She finally sank under tow on 
May 10th 1982, six days after being hit.

The sci.military.naval FAQ has an excellent section on the role of aluminium 
in the loss of warships which looks at this urban legend, and gives real 
examples when aluminium did cause the loss, at:

	http://www.hazegray.org/faq/smn6.htm#F7

as well as a section on the Type 42's at:

	http://www.hazegray.org/navhist/rn/destroyers/type42/

cheers,
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD4DBQFAKBcDO2KABBYQAh8RAq2vAJdRfrlHek12hced85HGV0z1nWbYAJ9GJegr
FBxjHUczDti0OXNKX5VoKA==
=PA8t
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Feb  9 18:31:18 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 9 Feb 2004 18:31:18 -0500 (EST)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <200402101025.57234.csamuel@vpac.org>
Message-ID: <Pine.LNX.4.44.0402091830390.25916-100000@lilith.rgb.private.net>

On Tue, 10 Feb 2004, Chris Samuel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Tue, 10 Feb 2004 05:13 am, Robert G. Brown wrote:
> 
> > argue that the HMS Sheffield (sunk by a single exocet missle in the
> > falklands war) went down in large measure because it was an aluminum ship
> 
> A quick correction, the Sheffield was an all steel ship, as I believe were all 
> the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was 
> not brought under control because the exocet (which failed to explode) took 
> out a large chunk of the fire fighting system. She finally sank under tow on 
> May 10th 1982, six days after being hit.

I stand corrected.  Obviously one can't believe everything one
googles...;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Mon Feb  9 22:42:32 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Tue, 10 Feb 2004 11:42:32 +0800 (CST)
Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
Message-ID: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>

>From comp.arch: "One of the things that the version
8.0 of the Intel compiler included was an
"Intel-specific" flag."

But looks like the purpose is to slow down AMD:

http://groups.google.ca/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&group=comp.arch&selm=a13e403a.0402091438.14018f5a%40posting.google.com


If intel releases 64-bit x86 CPUs and compilers, then
AMD may get even better benchmarks results.

Again, no matter how pretty the benchmarks results
look, in the end we still need to run on the real
system. So, what's the point of having benchmarks?

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb 10 03:10:39 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 10 Feb 2004 09:10:39 +0100 (CET)
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <200402101025.57234.csamuel@vpac.org>
Message-ID: <Pine.LNX.4.44.0402100905270.26720-100000@druifje.clustervision.com>

On Tue, 10 Feb 2004, Chris Samuel wrote:

> A quick correction, the Sheffield was an all steel ship, as I believe were all 
> the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was 
> not brought under control because the exocet (which failed to explode) took 
> out a large chunk of the fire fighting system. She finally sank under tow on 
> May 10th 1982, six days after being hit.

Steering the argument back to computers :-)
I saw a documentary about the Sheffield once.
Two ships were sent out as 'goalkeepers',
the Sheffield and the smaller Broadsword.
The Sheffield had a longer range missile system, the Broadsword
a short range one (or other way around).
During a period of vulnerability (can;t remember the exact reason)
the Broadsword had to reboot its ageing fire control computer.
I think build by Ferranti. (No slur intended on their fine engineers,
but the thing was old at the time).


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wardwe at navseadn.navy.mil  Tue Feb 10 16:21:01 2004
From: wardwe at navseadn.navy.mil (Ward William E DLDN)
Date: Tue, 10 Feb 2004 16:21:01 -0500
Subject: [Beowulf] Intel Compiler cheating against non-Intel CPUs?
Message-ID: <AF67AB108F16D21196F600805F19516D2452E2A7@phdnex01.navseadn.navy.mil>

Has anyone seen this yet?

Any comments or discussion?

>From the message, it looks like the Intel Compilers
are cheating against SSE and SSE2 capable non-Intel
CPUs (ie, A64 especially).  

http://groups.google.ca/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=a13e403a.040
2091438.14018f5a%40posting.google.com&rnum=1

R/William Ward
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mbanck at gmx.net  Tue Feb 10 13:01:16 2004
From: mbanck at gmx.net (Michael Banck)
Date: Tue, 10 Feb 2004 19:01:16 +0100
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <20040207112119.GA5120@galactic.demon.co.uk>
References: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net> <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> <20040207112119.GA5120@galactic.demon.co.uk>
Message-ID: <20040210180116.GA27872@blackbird.oase.mhn.de>

On Sat, Feb 07, 2004 at 11:21:19AM +0000, Andrew M.A. Cater wrote:
> On Sat, Feb 07, 2004 at 11:19:58AM +0800, Andrew Wang wrote:
> > Can you add GridEngine (SGE) and Torque (SPBS)?
> > 
> > The problem with OpenPBS is not only it is broken, it
> > is not under development these days, but also I found
> > that Altair is not allowing new users to download
> > OpenPBS. I went to its homepage today but it only
> > leads me to the PBSPro page.
> > 
> To clarify things a bit, I hope.
> 
> In the beginning was PBS - developed in house at NASA by engineers
> who needed a Portable Batch System.  If you understand Cray NQS syntax 
> and concepts it's familiar :) They left / sold to Veridian who in turn 
> sold to Altair.  The original PBS was GPL or a close equivalent, if I 
> understand correctly.
> 
> Altair are marketing a propietary development of PBS as PBSPro.  OpenPBS
> remains available, though you have to register with Altair for download.
> What they have done very recently, which is rather sneaky, is for the 
> site to oblige you to register for an evaluation copy of PBSPro and 
> potentially answer a questionnaire prior to providing the link to allow 
> you to download OpenPBS.
> 
> OpenPBS is not under active development and PBSPro may have stalled.  
> Certainly the price per node that Altair are quoting has apparently 
> dropped significantly - though their salesmen are still persistent :)
> 
> The academic community and the active users forked OpenPBS to create 
> Scalable PBS [SPBS] which is the name most widely known.  They've added
> patches, fixes and features, though there is still an Altair licence for 
> OpenPBS in there.  In the last couple of months, SPBS changed its name
> initially to StORM and then to Torque.

Thanks for the clarification. Does anybody know whether Torque is
considered to be conforming to the Open Source Definition[1]?

In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software
License', which seems to prohibit commercial distribution, making it
non-free unfortunately. Is there some other fork of PBS with a true Open
Source license perhaps?


thanks,


Michael

[1] http://www.opensource.org/docs/definition.php
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb 10 18:00:05 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 11 Feb 2004 10:00:05 +1100
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <20040210180116.GA27872@blackbird.oase.mhn.de>
References: <Pine.BSF.4.58.0402061357240.69824@net.bluemoon.net> <20040207112119.GA5120@galactic.demon.co.uk> <20040210180116.GA27872@blackbird.oase.mhn.de>
Message-ID: <200402111000.08919.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote:

> Thanks for the clarification. Does anybody know whether Torque is
> considered to be conforming to the Open Source Definition[1]?
>
> In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software
> License', which seems to prohibit commercial distribution, making it
> non-free unfortunately. Is there some other fork of PBS with a true Open
> Source license perhaps?

My understanding is that they cannot alter the license as they have inherited 
that from the original OpenPBS sources, and as they do not hold all the 
copyrights to the code it cannot be changed unless Altair can be persuaded.

My understanding is that the SuperCluster people picked the 2.3.12 version as 
a starting point as that was the most recent with the most liberal license 
(i.e. others could fork development from it).

I've CC'd this to the SuperCluster folks so they can comment and correct.

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU
QwJlxOBwfLiUT7Y543RwiIY=
=xTbA
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb 10 17:44:17 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 11 Feb 2004 09:44:17 +1100
Subject: [Beowulf] Re: HVAC and room cooling... wires
In-Reply-To: <Pine.LNX.4.44.0402100905270.26720-100000@druifje.clustervision.com>
References: <Pine.LNX.4.44.0402100905270.26720-100000@druifje.clustervision.com>
Message-ID: <200402110944.21802.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 10 Feb 2004 07:10 pm, John Hearns wrote:

> During a period of vulnerability (can;t remember the exact reason)
> the Broadsword had to reboot its ageing fire control computer.
> I think build by Ferranti. (No slur intended on their fine engineers,
> but the thing was old at the time).

I'm not aware of that one, but on a similar vein there was the widespread 
failure of the Patriot systems during the first Gulf War, including the 
attack on the barracks at Dhahran where 28 were killed.

This was caused by the system truncating the values of the clock when written 
to memory, which over a long period of operation resulted in the system 
dismissing incoming missiles as false alarms.

	http://shelley.toich.net/projects/CS201/patriot.html

However, this is starting to sound more like the RISKS digest than Beowulf, so 
I'll leave it there.

Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKV7EO2KABBYQAh8RApXjAJ9Gil07Z/XekN3XDSturEu2KihedQCfXBA7
aUUMVqTZuHfQ5RKsKGwnuNw=
=+9RK
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at pathscale.com  Tue Feb 10 18:52:41 2004
From: lindahl at pathscale.com (Greg Lindahl)
Date: Tue, 10 Feb 2004 15:52:41 -0800
Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>; from andrewxwang@yahoo.com.tw on Tue, Feb 10, 2004 at 11:42:32AM +0800
References: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>
Message-ID: <20040210155241.A29026@fileserver.internal.keyresearch.com>

On Tue, Feb 10, 2004 at 11:42:32AM +0800, Andrew Wang wrote:

> Again, no matter how pretty the benchmarks results
> look, in the end we still need to run on the real
> system. So, what's the point of having benchmarks?

There isn't much point at staring at a benchmark that isn't at all
relevant to how you're using the system -- for example, a SPECcpu
score with the Intel compiler in 32-bit mode isn't going to tell you
much about an AMD64 app in 64-bit mode.

If I remember correctly, a guy at Intel published a paper about a
feedback optimization technique related to irregular strides that got
a 22% improvement in mcf. When I get back to the office in a couple of
days, I'll post a reference. And no, it's not at all Intel-specific.

-- greg
(posting from Paris. I should be asleep!)


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb 10 19:32:47 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 11 Feb 2004 11:32:47 +1100
Subject: Fwd: Re: [Beowulf] Gentoo for Science and Engineering
Message-ID: <200402111132.49119.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Forwarded at the request of SuperCluster.org

- ----------  Forwarded Message  ----------

Subject: Re: [Beowulf] Gentoo for Science and Engineering
Date: Wed, 11 Feb 2004 11:55 am
From: help at supercluster.org
To: Chris Samuel <csamuel at vpac.org>
Cc: beowulf at beowulf.org

Chris,

  Thanks for the cc.  You will probably need to forward this message to
beowulf as I don't think we are registered.  OpenPBS 2.3.12 was selected
because its license did allow anyone to modify/distribute the code for any
reason with the only conditions being that the license be included and the
original creators acknowledged.

  To our understanding, changing the license can only be done by the
current license holders, ie Altair.  The good news is that they are
currently considering this as a possibility although we do not know which
way they are leaning.

  As far the Cluster Resources/Supercluster is concerned, our plans are to
continue to contribute to this project, developing infrastructure changes
as needed, adding scalability, security, usability, and functionality
enhancements, and rolling in community patches and enhancements with no
intention of creating a commercial/closed product out of it.

  Let us know if we can be of further assistance.

Thanks,
Supercluster Development Group

On Wed, 11 Feb 2004, Chris Samuel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote:
> > Thanks for the clarification. Does anybody know whether Torque is
> > considered to be conforming to the Open Source Definition[1]?
> >
> > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software
> > License', which seems to prohibit commercial distribution, making it
> > non-free unfortunately. Is there some other fork of PBS with a true Open
> > Source license perhaps?
>
> My understanding is that they cannot alter the license as they have
> inherited that from the original OpenPBS sources, and as they do not hold
> all the copyrights to the code it cannot be changed unless Altair can be
> persuaded.
>
> My understanding is that the SuperCluster people picked the 2.3.12 version
> as a starting point as that was the most recent with the most liberal
> license (i.e. others could fork development from it).
>
> I've CC'd this to the SuperCluster folks so they can comment and correct.
>
> - --
>  Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
>
> iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU
> QwJlxOBwfLiUT7Y543RwiIY=
> =xTbA
> -----END PGP SIGNATURE-----

- --
- --------------------------------------------------------
Supercluster Development Group
Scheduling and Resource Management of Clusters and Grids
Maui Home Page   - http://supercluster.org/maui
Silver Home Page - http://supercluster.org/silver
Documentation    - http://supercluster.org/documentation

- -------------------------------------------------------

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKXgvO2KABBYQAh8RAok7AKCABbnmwiYvRf4BxeFoY+Jp9F/W1gCfReKD
dKc1islXxQLdTrabQglX1MU=
=xfyh
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From yduan at albert.chem.udel.edu  Tue Feb 10 10:37:49 2004
From: yduan at albert.chem.udel.edu (Dr. Yong Duan)
Date: Tue, 10 Feb 2004 10:37:49 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and
 other benchmarks?)
In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402101027470.23860-100000@albert.chem.udel.edu>


On Tue, 10 Feb 2004, [big5] Andrew Wang wrote:

> Again, no matter how pretty the benchmarks results
> look, in the end we still need to run on the real
> system. So, what's the point of having benchmarks?
> 
> Andrew.
> 

A guidelines, I guess. A lot of CPUs (including some rather expensive 
ones and often call them HPC CPUs) perform at less than half the speed of 
consumer grade CPUs. You'd definitely avoid those, for instance.
Also, you can look at the performance in each area and figure out the 
relative performance expected to your own code. In the end, the most 
reliable benchmark is always on your own code, of course.

Whether Intel compiler has been tuned for SPEC2K is probably an open 
question. I tried various compilers on our code and found it is also tuned 
for it :), consistently 10-20% faster than others. This included 
performance on Opterons, strangely enough.

yong

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From help at supercluster.org  Tue Feb 10 19:55:22 2004
From: help at supercluster.org (help at supercluster.org)
Date: Tue, 10 Feb 2004 17:55:22 -0700 (MST)
Subject: [Beowulf] Gentoo for Science and Engineering
In-Reply-To: <200402111000.08919.csamuel@vpac.org>
Message-ID: <Pine.LNX.4.44.0402101744500.14842-100000@supercluster.org>

Chris,

  Thanks for the cc.  You will probably need to forward this message to 
beowulf as I don't think we are registered.  OpenPBS 2.3.12 was selected 
because its license did allow anyone to modify/distribute the code for any 
reason with the only conditions being that the license be included and the 
original creators acknowledged.

  To our understanding, changing the license can only be done by the 
current license holders, ie Altair.  The good news is that they are 
currently considering this as a possibility although we do not know which 
way they are leaning.

  As far the Cluster Resources/Supercluster is concerned, our plans are to 
continue to contribute to this project, developing infrastructure changes 
as needed, adding scalability, security, usability, and functionality 
enhancements, and rolling in community patches and enhancements with no 
intention of creating a commercial/closed product out of it.

  Let us know if we can be of further assistance.

Thanks,
Supercluster Development Group

On Wed, 11 Feb 2004, Chris Samuel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote:
> 
> > Thanks for the clarification. Does anybody know whether Torque is
> > considered to be conforming to the Open Source Definition[1]?
> >
> > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software
> > License', which seems to prohibit commercial distribution, making it
> > non-free unfortunately. Is there some other fork of PBS with a true Open
> > Source license perhaps?
> 
> My understanding is that they cannot alter the license as they have inherited 
> that from the original OpenPBS sources, and as they do not hold all the 
> copyrights to the code it cannot be changed unless Altair can be persuaded.
> 
> My understanding is that the SuperCluster people picked the 2.3.12 version as 
> a starting point as that was the most recent with the most liberal license 
> (i.e. others could fork development from it).
> 
> I've CC'd this to the SuperCluster folks so they can comment and correct.
> 
> - -- 
>  Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
> 
> iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU
> QwJlxOBwfLiUT7Y543RwiIY=
> =xTbA
> -----END PGP SIGNATURE-----
> 

-- 
--------------------------------------------------------
Supercluster Development Group
Scheduling and Resource Management of Clusters and Grids
Maui Home Page   - http://supercluster.org/maui
Silver Home Page - http://supercluster.org/silver
Documentation    - http://supercluster.org/documentation

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bwegner at ekt.tu-darmstadt.de  Wed Feb 11 05:02:23 2004
From: bwegner at ekt.tu-darmstadt.de (Bernhard Wegner)
Date: Wed, 11 Feb 2004 11:02:23 +0100
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages
Message-ID: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>

Hello,

I have a really small "cluster" of 4 PC's which are connected by a normal 
Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
I thought I might be able to improve performance by connecting the machines 
via a Gigabit switch (which are really cheap nowadays).

Everything seemed to work fine. The switch indicates 1000Mbit connections to 
the PC's and transfer rate for scp-ing large files is significantly higher 
now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
with the 100 Mbit switch.

I wasn't able to actually track down the problem, but it seems that there is 
a problem with small messages. When I run the performance test provided with 
mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
byte message length, while for larger messages everything looks fine (linear 
dependancy of transfer time on message length, everything below 300 us). I 
have also tried mpich2 which shows exactly the same behavior.

Does anyone have any idea?

Here are the details of my system: 
 - Suse Linux 9.0 (kernel 2.4.21)
 - mpich-1.2.5.2
 - motherboard ASUS P4P800
 - LAN (10/100/1000) on board (3COM 3C940 chipset)
 - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M +
   8x88E1111-BAB, AT89C2051-24PI)

-- 
Mit besten Gr??en -- Best regards,

Bernhard Wegner

_______________________________________________________
=======================================================

   Dipl.-Ing. Bernhard Wegner

   Fachgebiet Energie- und Kraftwerkstechnik
   Technische Universit?t Darmstadt

   Petersenstr. 30      64287 Darmstadt      Germany

   phone: +49-6151-162357       fax: +49-6151-166555
   e-mail: bwegner at ekt.tu-darmstadt.de
_______________________________________________________
=======================================================

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From moloned at tcd.ie  Wed Feb 11 12:44:59 2004
From: moloned at tcd.ie (david moloney)
Date: Wed, 11 Feb 2004 17:44:59 +0000
Subject: [Beowulf] Profiling floating-point performance
Message-ID: <402A6A1B.2070805@tcd.ie>

I have an application written in C++ which compiles under both MSVC++ 
6.0 and gcc 2.9.6 that I would like to profile in terms of floating 
point performance.

My special requirement is that I would like not only peak and average 
flops numbers but also I would like a histogram of the actual x86 
floating point instructions executed and their contribution to those 
peak and average flops numbers.

Can anybody offer advice on how to do this?  I tried using Vtune but it 
didn't seem to have this feature.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 11:44:10 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 11:44:10 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <Pine.LNX.4.44.0402101027470.23860-100000@albert.chem.udel.edu>
Message-ID: <Pine.LNX.4.44.0402111106120.1290-100000@lilith.rgb.private.net>

On Tue, 10 Feb 2004, Dr. Yong Duan wrote:

> 
> On Tue, 10 Feb 2004, [big5] Andrew Wang wrote:
> 
> > Again, no matter how pretty the benchmarks results
> > look, in the end we still need to run on the real
> > system. So, what's the point of having benchmarks?
> > 
> > Andrew.
> > 
> 
> A guidelines, I guess. A lot of CPUs (including some rather expensive 
> ones and often call them HPC CPUs) perform at less than half the speed of 
> consumer grade CPUs. You'd definitely avoid those, for instance.
> Also, you can look at the performance in each area and figure out the 
> relative performance expected to your own code. In the end, the most 
> reliable benchmark is always on your own code, of course.

A short article this morning, as I'm debugging code and somewhat busy.

Before discussing benchmarks in general, one needs to make certain
distinctions.  There are really two kinds of benchmarks.  Maybe even
three.  Hell, more, but I'm talking broad categories.  Let's try these
three:

  * microbenchmarks
  * comparative benchmarks
  * application benchmarks

Microbenchmarks measure very specific, highly targeted areas of system
functionality.  By their very nature they are "simple", not complex --
often the pseudocode is as simple as

 start_timer();
 loop lotsatimes{
   do_something_simple = dumb*operation;
 }
 stop_timer();
 compute_speed();
 print_result();

(To compute "how fast a multiply occurs").  Simple can also describe
atomicity -- benchmarking "a single operation" where the operation might
be complex but is a standard unitary building block of complex code.

Microbenchmarks are undeniably not only useful, they are essential to
anyone who takes systems/cluster/programming engineering seriously.
Examples of microbenchmark suites that are in more or less common use
are:

  lmbench (very full featured suite; one infamous user: Linux Torvalds:-)
  stream  (very commonly cited on the list)
  cpu_rate (not so common -- wraps e.g. stream and other tests so
            variations with vector size can be explored)
  rand_rate (almost unknown, but it DOES benchmark all the gsl rands:-)
  netpipes (measure network speeds)
  netperf  (ditto, but alas no longer maintained)

I (and many others) USE these tools (I wrote two of them SO I could use
them) to study systems that we are thinking of buying and using for a
cluster, to study the kernel and see if the latest change made some
critical operation faster or slower, to figure out if the NIC/switch
combo we are using is why PVM code is moving like molasses.  They are
LESS commonly poached by vendors, fortunately - Larry Macvoy has lmbench
bristling with anti-vendor-cooking requirements at the license level.
The benchmarks are simple, but because one needs a lot of them to get an
accurate picture of overall performance they tend to be too complex for
typical mindless consumers...

Comparative benchmarks are what I think you're really referring to.
They aren't completely useless, but they do often become pissing
contests (such as the top 500 list) and there are famous stories of Evil
by corporations seeking to cook up good results on one or another
(sometimes at the expense of overall system balance and performance!).

Most of the Evil in these benchmarks arise because people end up using
them as a naive basis for purchase decisions.  "Ooo, that system has a
linpork of 4 Gigacowflops so it must be better than that one which only
gets 2.7 Gcf, so I'll buy 250 of them for my next cluster and be able to
brag about my 1 Teracowflop supercomputer and make the top third of the
top 500 list, which will impress my granting agencies and tenure board,
who are just as ignorant as I am about meaningful measures of systems
performance..."  Never mind that your application is totally
non-linpack-like, that the bus performance on the systems you got sucks,
and that the 2.7 Gcf systems you rejected cost 1/2 the 4 Gcf systems you
got so you could have had 500 at 2.7 Gcf for a net of 1.35 Tcf and
balanced memory and bus performance (and run your application faster per
dollar) if you'd bothered to do a cost benefit analysis.

The bleed of dollars attracts the vendor sharks, who often can rattle
off the aggregate specmarks and so forth for their most expensive boxes.
However, they CAN be actually useful, if one of the tests in the SPEC
suite happens to correspond to your application, if you bother to read
all the component results in the SPECmarks, if you bother to check the
compiler used and flags and system architecture in some detail to see if
they appear cooked (hand tuned or optimized, based on a compiler that is
lovely but very expensive and has to be factored into your CBA).

Finally, there are application benchmarks.  These tend to be "atomic"
but at a very high level (an application is generally very complex).
These are also subject to the Evil of comparative benchmarks (in fact
some of comparative benchmark suites, especially in the WinX world, are
a collection of application benchmarks).  They also have some evil of
their own when the application in question is commercial and not open
source -- you have effectively no control over how it was built and
tuned for your architecture, for example, and may not even have
meaningful version information.

However, they are also undeniably useful.  Especially when the
application being benchmarked is YOUR application and under your
complete control.

So the answer to your question appears to be:

  * Microbenchmarks berry berry good.  Useful.  Essential.  Fundamental.
  * Comparative benchmarks sometimes good.  Sometimes a force for Evil.
  * Application benchmark ideal if it is your application or very
similar and under your control.

Pissing contests in general are not useful, and even a useful higher
level benchmark divorced from an associated CBA is like shopping in a
store that has no price tags -- a thing of use only to those so rich
that they don't have to ask.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Wed Feb 11 13:55:01 2004
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Wed, 11 Feb 2004 12:55:01 -0600
Subject: [Beowulf] how are people doing this?
Message-ID: <20040211185501.GA31590@mikee.ath.cx>

I feel that in a proper cluster that the nodes are all (basically)
identical. I 'own' a server environment of 20+ servers that  are
all dedicated to specific applications and this is not a cluster.
However, I would like to manage config files (/etc/resolv.conf, etc),
user accounts, patches, etc., as I would in a clustered environment.
I have read the papers at infrastructures.org and agree with the
principles mentioned there. I have looked extensively at cfengine,
though I prefer the solution be in PERL as all my servers have
PERL already (the manufacturer installs PERL as default on the boxes).

How is everyone managing their cluster or what are suggestions
on how I can manage my server environment.

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 13:08:41 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 13:08:41 -0500 (EST)
Subject: [Beowulf] Profiling floating-point performance
In-Reply-To: <402A6A1B.2070805@tcd.ie>
Message-ID: <Pine.LNX.4.44.0402111247520.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, david moloney wrote:

> I have an application written in C++ which compiles under both MSVC++ 
> 6.0 and gcc 2.9.6 that I would like to profile in terms of floating 
> point performance.
> 
> My special requirement is that I would like not only peak and average 
> flops numbers but also I would like a histogram of the actual x86 
> floating point instructions executed and their contribution to those 
> peak and average flops numbers.
> 
> Can anybody offer advice on how to do this?  I tried using Vtune but it 
> didn't seem to have this feature.

I'm not sure how accurate it is overall, but see "man gprof" and compile
with the -g -p flag.  This will give you at least some useful times and so
forth.

It will NOT give you (AFAIR) "histogram of actual x86 floats etc".

I don't know of anything that will -- to get them you have to instrument
your code, probably so horribly that a la heisenberg your measurements
would bear little resemblance to actual performance (especially if your
code wants to be doing all sorts of smooth vector things in cache and
register memory and you keep calling instrumentation subroutines to try
to measure times that wreck state).

Consider that with my best, on CPU, raw assembler based timing clock
(using the onboard cycle counter) I still find the overhead of reading
that clock to be in the tens of clock cycles.  To microtime a single
multiply is thus all but impossible -- the clock itself takes 10-40
times as long to execute as a multiply might take, depending on where
the data to be multiplied is when one starts.  So timing per-instruction
is effectively out.  

Similarly, to instrument and count floating point operations requires
something to "watch the machine instructions" as they stream through the
CPU.  Unfortunately, the only thing available to watch the instructions
is the CPU itself, so you have to damn near write an
assembler-interpreter to instrument this.  Which in turn would be slow
as molasses -- an easy 10x slower than the native code in overhead alone
plus it would utterly wreck just about every code optimization known to
man.

Finally, there is the question of "what's a flop".  The answer is, not
much that's useful or consistent -- the number of floating point
operations that a system does per second varies wildly depending in a
complex way on system state, cache locality, whether the variable is
general or register, whether the instruction is part of a
complex/advanced instruction (e.g. add/multiply) or an instruction that
has to be done partly in software (divide), whether or not the
instruction is part of a stream of vectorized instructions, and more.

That's why microbenchmarks are useful.  You may not be able to extract
meaningful results from your code with a simple tool (although it isn't
terribly difficult to instrument major blocks or subroutines with timers
and counters, which is more or less with -p and gprof do) but you can
learn at least some things about how your system executes core
operations in various contexts to learn how to optimize one's code with
a good microbenchmark.  Just sweeping stream across vector sizes from 1
to 10^8 or so teaches you a whole lot about the system's performance in
different contexts, as does doing a stream-like benchmark but working
through the vector in a random order (i.e. deliberately defeating any
sort of vector optimization and cache benefit).

Good luck,

   rgb

> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 13:58:39 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 13:58:39 -0500 (EST)
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211185501.GA31590@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0402111356050.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, Mike Eggleston wrote:

> I feel that in a proper cluster that the nodes are all (basically)
> identical. I 'own' a server environment of 20+ servers that  are
> all dedicated to specific applications and this is not a cluster.
> However, I would like to manage config files (/etc/resolv.conf, etc),
> user accounts, patches, etc., as I would in a clustered environment.
> I have read the papers at infrastructures.org and agree with the
> principles mentioned there. I have looked extensively at cfengine,
> though I prefer the solution be in PERL as all my servers have
> PERL already (the manufacturer installs PERL as default on the boxes).
> 
> How is everyone managing their cluster or what are suggestions
> on how I can manage my server environment.

Mike, this is nearly a FAQ -- the list archives should have a discussion
(one of many) only a few weeks old on this very subject.

There are NIS, LDAP, rsync, cfengine, and even yum/rpm-based solutions
possible, and more.  Oh, and dhcp actually pushes lots of stuff out all
by itself these days -- it should handle the stuff in resolv.conf for
example, and you should be using dhcp anyway for scalability reasons.

   rgb

> 
> Mike
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Wed Feb 11 14:44:02 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Wed, 11 Feb 2004 13:44:02 -0600 (CST)
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211185501.GA31590@mikee.ath.cx>
References: <20040211185501.GA31590@mikee.ath.cx>
Message-ID: <Pine.GSO.4.58.0402111343300.27397@is.rice.edu>

Mike, we use systemimager, systemconfigurator and a custom utility called
"cupdate" to maintain our clusters. In our case it works beautifully and
easilly.

-Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Wed, 11 Feb 2004, Mike Eggleston wrote:

> I feel that in a proper cluster that the nodes are all (basically)
> identical. I 'own' a server environment of 20+ servers that  are
> all dedicated to specific applications and this is not a cluster.
> However, I would like to manage config files (/etc/resolv.conf, etc),
> user accounts, patches, etc., as I would in a clustered environment.
> I have read the papers at infrastructures.org and agree with the
> principles mentioned there. I have looked extensively at cfengine,
> though I prefer the solution be in PERL as all my servers have
> PERL already (the manufacturer installs PERL as default on the boxes).
>
> How is everyone managing their cluster or what are suggestions
> on how I can manage my server environment.
>
> Mike
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From scyld at jasons.us  Wed Feb 11 13:01:21 2004
From: scyld at jasons.us (scyld at jasons.us)
Date: Wed, 11 Feb 2004 13:01:21 -0500 (EST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>
References: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>
Message-ID: <20040211125741.A5961@torgo.bigbroncos.org>


On Wed, 11 Feb 2004, Bernhard Wegner wrote:

> Hello,
>
> I have a really small "cluster" of 4 PC's which are connected by a normal
> Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
> I thought I might be able to improve performance by connecting the machines
> via a Gigabit switch (which are really cheap nowadays).
>
> Everything seemed to work fine. The switch indicates 1000Mbit connections to
> the PC's and transfer rate for scp-ing large files is significantly higher
> now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
> with the 100 Mbit switch.

Have you tried setting the speed and duplex of the gig NICs to 1000/full
on both the system side and switch side?  I've found that autonegotiate
rarely does especially with 3com gear.  I'm guessing, based on its size,
your switch isn't managed so you may have to stick to locking it on the
systems and watching the behavior to see if the switch gets the
negotiation right.  (if traffic is bursty you have a speed mismatch and if
you get loads of errors it's more likely to be duplex problem)

FWIW I have the same mobo at home but haven't hooked it to gigabit yet so
I'm quite curious to see how this works out.

-Jason

-----
Jason K. Schechner  -   check out www.cauce.org and help ban spam-mail.
"All HELL would break loose if time got hacked." - Bill Kearney 02-04-03
---There is no TRUTH.  There is no REALITY.  There is no CONSISTENCY.---
   ---There are no ABSOLUTE STATEMENTS   I'm very probably wrong.---
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Wed Feb 11 15:00:15 2004
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Wed, 11 Feb 2004 14:00:15 -0600
Subject: [Beowulf] how are people doing this?
In-Reply-To: <Pine.LNX.4.44.0402111356050.1290-100000@lilith.rgb.private.net>
References: <20040211185501.GA31590@mikee.ath.cx> <Pine.LNX.4.44.0402111356050.1290-100000@lilith.rgb.private.net>
Message-ID: <20040211200015.GE31590@mikee.ath.cx>

On Wed, 11 Feb 2004, Robert G. Brown wrote:

> On Wed, 11 Feb 2004, Mike Eggleston wrote:
> 
> > I feel that in a proper cluster that the nodes are all (basically)
> > identical. I 'own' a server environment of 20+ servers that  are
> > all dedicated to specific applications and this is not a cluster.
> > However, I would like to manage config files (/etc/resolv.conf, etc),
> > user accounts, patches, etc., as I would in a clustered environment.
> > I have read the papers at infrastructures.org and agree with the
> > principles mentioned there. I have looked extensively at cfengine,
> > though I prefer the solution be in PERL as all my servers have
> > PERL already (the manufacturer installs PERL as default on the boxes).
> > 
> > How is everyone managing their cluster or what are suggestions
> > on how I can manage my server environment.
> 
> Mike, this is nearly a FAQ -- the list archives should have a discussion
> (one of many) only a few weeks old on this very subject.
> 
> There are NIS, LDAP, rsync, cfengine, and even yum/rpm-based solutions
> possible, and more.  Oh, and dhcp actually pushes lots of stuff out all
> by itself these days -- it should handle the stuff in resolv.conf for
> example, and you should be using dhcp anyway for scalability reasons.

I know it's been discussed and I apologize for asking it again. I've
just not found the way that seems to fit with the picture I'm trying
to reach. What I'm thinking of doing is writing a perl script that
can be placed into CVS. On each server a cron process checks out the
current CVS repository of server (AIX) config data and script. Then
the perl script starts to check permissions, update resolv.conf, hosts,
login, passwd, etc., and to check that specific packages are installed
or that the packages need updating. I like a lot of what cfengine
did, but I really want a script that can be maintained in CVS.

For installing packages I plan for the script to mount an NFS export
for pulling the packages.

# mkdir /tmp/nfs.$$
# mount admin:/opt/packages /tmp/nfs.$$
# installp -d /tmp/nfs.$$ package
# umount /tmp/nfs.$$
# rmdir /tmp/nfs.$$

For the account management I'm thinking of something on my admin
server that pulls LDAP (M$ ADS) at some frequency (30-60 min) updating
a local file with new users and their passwords. Then this file
is checked into CVS for distribution to other nodes/servers. Using
another file to list the users that are authorized access to the
local node/server keeps my user-space to a minimum.

Is that any more clear what I'm trying to do? I don't have a cluster,
but I want to manage all nodes as identically as I can.

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 14:35:13 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 14:35:13 -0500 (EST)
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211200015.GE31590@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0402111432580.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, Mike Eggleston wrote:

> I know it's been discussed and I apologize for asking it again. I've
> just not found the way that seems to fit with the picture I'm trying
> to reach. What I'm thinking of doing is writing a perl script that
> can be placed into CVS. On each server a cron process checks out the
> current CVS repository of server (AIX) config data and script. Then
> the perl script starts to check permissions, update resolv.conf, hosts,
> login, passwd, etc., and to check that specific packages are installed
> or that the packages need updating. I like a lot of what cfengine
> did, but I really want a script that can be maintained in CVS.

You might look into yum.  You'd have to learn python, but yum already
does most of what you want for rpm packages and could likely be hacked.
In fact, yum would do what you want for all the config files if you roll
them into an rpm package right now -- it already has precisely what it
needs to install and update according to a revision number.

You can run yum update as often as you wish.  It will run from NFS and
can be secured a variety of ways.

   rgb

> 
> For installing packages I plan for the script to mount an NFS export
> for pulling the packages.
> 
> # mkdir /tmp/nfs.$$
> # mount admin:/opt/packages /tmp/nfs.$$
> # installp -d /tmp/nfs.$$ package
> # umount /tmp/nfs.$$
> # rmdir /tmp/nfs.$$
> 
> For the account management I'm thinking of something on my admin
> server that pulls LDAP (M$ ADS) at some frequency (30-60 min) updating
> a local file with new users and their passwords. Then this file
> is checked into CVS for distribution to other nodes/servers. Using
> another file to list the users that are authorized access to the
> local node/server keeps my user-space to a minimum.
> 
> Is that any more clear what I'm trying to do? I don't have a cluster,
> but I want to manage all nodes as identically as I can.
> 
> Mike
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From canon at nersc.gov  Wed Feb 11 17:05:26 2004
From: canon at nersc.gov (canon at nersc.gov)
Date: Wed, 11 Feb 2004 14:05:26 -0800
Subject: [Beowulf] Profiling floating-point performance 
In-Reply-To: Message from david moloney <moloned@tcd.ie> 
   of "Wed, 11 Feb 2004 17:44:59 GMT." <402A6A1B.2070805@tcd.ie> 
Message-ID: <200402112205.i1BM5QwA011397@pookie.nersc.gov>


David,

You may want to look into PAPI and perfctr.  It allows you query
the performance counters built into most processors.

--Shane

------------------------------------------------------------------------
Shane Canon                             voice: 510-486-6981
PSDF Project Lead                       fax:   510-486-7520
National Energy Research Scientific
  Computing Center
1 Cyclotron Road Mailstop 943-256
Berkeley, CA 94720                      canon at nersc.gov
------------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 17:13:32 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 17:13:32 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4C0CD@orsmsx402.jf.intel.com>
Message-ID: <Pine.LNX.4.44.0402111628500.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, Lombard, David N wrote:

> > They also have some evil of
> > their own when the application in question is commercial and not open
> > source -- you have effectively no control over how it was built and
> > tuned for your architecture, for example, and may not even have
> > meaningful version information.
> 
> Let's be fair here. An ISV application is not the definition of evil.

I did not mean to imply that they were wholly evil or even evil in
intent.

> Clearly, "you have effectively no control over how an application was
> built and tuned for your architecture" has no direct correspondence to
> performance.

I would have to respectfully and vehemently disagree.  It has all sorts
of direct correspondances.  Let us make a short tally of ways that a
closed source, binary only application used as a benchmark can mislead
me with regard to the performance of a system.

  * I don't control the compiler choice.  Your compiler and mine might
result in me getting a very different performance even if your
application "resembles" mine (AFAICT given that I cannot read the
source).

  * I don't control the libraries.  Your application is (probably)
static linked in various places and might even use private libraries
that are hand-optimized.  My application would likely be linked
dynamically with completely different libraries.  Your libraries might
be out of date.  My libraries might be out of date.

  * I don't have any way of knowing whether your "canned" (say) Monte
Carlo benchmark is relevant to my Monte Carlo application.  Maybe your
code is structured to be strictly vectorized and local, but mine
requires random site access.  Yours might be CPU bound.  Mine might be
memory bound.  Since I can't see the source, I'll never know.

  * I have to pay money for the application to use as a benchmark before
I even look at hardware.  If I'm an honest soul, I probably have to buy
a separate license for every platform I plan to test even before I buy
the test platform OR run afoul of the Dumb Mutha Copyright Act (aka
known as the "Intellectual Straightjacket Act").  Or maybe I can rely on
vendor reports of the results.  This adds costs to the engineering
process.

  * Even leaving side the additional costs, there is the issue of
whether the application I'm using is tuned for the hardware I'm running
on.  strict i386 code will not run as fast as strict i586 code will not
run as fast as i686 code will not run optimally on an Athlon will not
run optimally on an Opteron.  Yet the Opteron will likely RUN i386 code.
I just won't know whether the result is at all relevant to how the
Opteron runs Opteron code.  (These effects are not necessarily small.)

  * And if I thought about it hard, I could likely come up with a few
more negatives...such as the entire raft of reasons that closed source
software is a Bad Thing to encourage on general principles.  The
principles built right into the original beowulf mission statement
(which IIRC has a very clear open source requirement for engineering
reasons).

The point being that while closed source commercial applications don't
necessarily make "evil" benchmarks in the sense that there is any intent
to hide or alter performance characteristics of a given architecture,
they add a number of sources of noise to an already arcane and uncertain
process.  They are less reliable, more likely to mislead you (quite
possibly through nobody's fault or intention), less likely to accurately
predict the performance of the architecture on your application suite.
And they are ultimately black boxes that you have to pay people to use.

I personally am a strong proponent (in case you can't tell:-) of open
source (ideally GPL) software and tools, ESPECIALLY for benchmarking.  I
even tried to talk Larry McVoy into GPL-ing lmbench back when it had a
fairly curmudgeonly license, even though it the source itself was open
enough.

Note, BTW, that all of the observations above are irrelevant if the
application being used as a benchmark is the application you intend to
use in the form you intend to use it, purchased or not.  As in:

> > However, they are also undeniably useful.  Especially when the
> > application being benchmarked is YOUR application and under your
> > complete control.
> 
> Regardless of ownership or control, they're especially useful when
> you're looking at an application being used in the way you intend on
> using it. Many industrial users buy systems to run a specific list of
> ISV applications.  In this instance, the application benchmark can be
> the most valid benchmark, as it can model the system in the way it will
> be used -- and that's the most important issue.

Sure.  Absolutely.  I'd even say that your application(s) is(are) ALWAYS
the best benchmark for many or even most purposes, with the minor caveat
that the microbenchmarks have a slightly different purpose and are best
for the purpose for which they are intended.  I doubt that Linus runs a
scripted set of userspace Gnome applications to test the performance of
kernel subsystems...

> I'm not disagreeing with your message.  I too try to make sure that
> people use the right benchmarks for the right purpose; I've seen way too
> many people jump to absurd conclusions based on a single data point or
> completely unrelated information.  I'm just trying to sharpen your
> message by pointing out some too broad brush strokes...
> 
> Well, maybe I don't put as much faith in micro benchmarks unless in the
> hands of a skilled interpreter, such as yourself.  My preference is for
> whatever benchmarks most closely describe your use of the system.

Microbenchmarks are not intended to be predictors of performance in
macro-applications, although a suite of results such as lmbench can give
an expert a surprisingly accurate idea of what to expect there.  They
are more to help you understand systems performance in certain atomic
operations that are important components of many applications.  A
networking benchmark can easily reveal problems with your network, for
example, that help you understand why this application which ran just
peachy keen at one scale as a "benchmark" suddenly turns into a pig at
another scale.  A good CPU/memory benchmark can do the same thing wrt
the memory subsystem.

This is yet another major problem with an naive application benchmark or
comparative benchmark (and even with microbenchmarks) -- they are OFTEN
run at a single scale or with a single set of parameters.  On system A,
that scale might be one that lets the application remain L2-local.  On
system B it might not be.  You might then conclude that B is much
slower.  On the scale that you intend to run it, both might be L2-local
or both might be running out of memory.  B might have a faster
processor, or a better overall balance of performance and might actually
be faster at that scale.

I don't put much faith in benchmarks, period.  With the exception of
your application(s), of course.  Faith isn't the point -- they are just
rulers, stopwatches, measuring tools.  Some of them measure "leagues per
candle", or "furlongs per semester" and aren't terribly useful.  Others
are just what you need to make sense of a system.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nixon at nsc.liu.se  Wed Feb 11 16:26:44 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Wed, 11 Feb 2004 22:26:44 +0100
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211200015.GE31590@mikee.ath.cx> (Mike Eggleston's message
 of "Wed, 11 Feb 2004 14:00:15 -0600")
References: <20040211185501.GA31590@mikee.ath.cx>
	<Pine.LNX.4.44.0402111356050.1290-100000@lilith.rgb.private.net>
	<20040211200015.GE31590@mikee.ath.cx>
Message-ID: <m3ekt15muj.fsf@nammatj.nsc.liu.se>

Mike Eggleston <mikee at mikee.ath.cx> writes:

> I know it's been discussed and I apologize for asking it again. I've
> just not found the way that seems to fit with the picture I'm trying
> to reach. What I'm thinking of doing is writing a perl script that
> can be placed into CVS. On each server a cron process checks out the
> current CVS repository of server (AIX) config data and script. Then
> the perl script starts to check permissions, update resolv.conf, hosts,
> login, passwd, etc., and to check that specific packages are installed
> or that the packages need updating. I like a lot of what cfengine
> did, but I really want a script that can be maintained in CVS.

Well, if it comes to that, surely you can place cfengine's
configuration files in CVS and let cron run a script that updates the
config files from CVS and then launches cfengine? You don't have to
run cfd, you know; you can start cfengine any way you want.

I'd really think twice before starting to re-implement cfengine's
existing functionality.

cfengine helped me keep my sanity in an earlier life while
single-handedly adminning a heterogenous Unix environment ranging from
SunOS 4.1.3_u1 through Solaris 7, diverse Tru64:s and a hodge-podge of
Linuxen.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Wed Feb 11 16:58:48 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Wed, 11 Feb 2004 13:58:48 -0800
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4C0CD@orsmsx402.jf.intel.com>

From: Robert G. Brown; Wednesday, February 11, 2004 8:44 AM
[deletia]
> 
> Finally, there are application benchmarks.  These tend to be "atomic"
> but at a very high level (an application is generally very complex).
> These are also subject to the Evil of comparative benchmarks (in fact
> some of comparative benchmark suites, especially in the WinX world,
are
> a collection of application benchmarks).

True.  I cringe to think how many systems were bought for scientific and
technical computations based on UT2003 "benchmarks".

> They also have some evil of
> their own when the application in question is commercial and not open
> source -- you have effectively no control over how it was built and
> tuned for your architecture, for example, and may not even have
> meaningful version information.

Let's be fair here. An ISV application is not the definition of evil.

Clearly, "you have effectively no control over how an application was
built and tuned for your architecture" has no direct correspondence to
performance.

Having been on the ISV side of the fence, and spent a tremendous amount
of energy making sure that each port of the application performed as
well as it could, I'm quite confident in saying we generally succeeded
in maximizing performance.  Realize that we had day after day to spend
on performance, usually with the attention of one or more experts from
the platform vendor at our beck and call -- and those experts would
spend even more time on even more narrow aspects of performance.

Having said that, there are some notable ISV applications that simply do
not perform as well as they should.  This can occur for a host of
reasons, such as they, did not care, didn't know how, could/would not to
make the effort, didn't have the time, were ignored by the vendor, &etc
-- basically the very same reasons that some people who don't work for
ISVs fail to make their own applications perform as well as they could.

> However, they are also undeniably useful.  Especially when the
> application being benchmarked is YOUR application and under your
> complete control.

Regardless of ownership or control, they're especially useful when
you're looking at an application being used in the way you intend on
using it. Many industrial users buy systems to run a specific list of
ISV applications.  In this instance, the application benchmark can be
the most valid benchmark, as it can model the system in the way it will
be used -- and that's the most important issue.

I'm not disagreeing with your message.  I too try to make sure that
people use the right benchmarks for the right purpose; I've seen way too
many people jump to absurd conclusions based on a single data point or
completely unrelated information.  I'm just trying to sharpen your
message by pointing out some too broad brush strokes...

Well, maybe I don't put as much faith in micro benchmarks unless in the
hands of a skilled interpreter, such as yourself.  My preference is for
whatever benchmarks most closely describe your use of the system.

-- 
David N. Lombard
 
My comments represent my opinions, not those of Intel Corporation.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Wed Feb 11 18:16:22 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Wed, 11 Feb 2004 15:16:22 -0800
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4C0D2@orsmsx402.jf.intel.com>

From: Robert G. Brown; Wednesday, February 11, 2004 2:14 PM
> On Wed, 11 Feb 2004, Lombard, David N wrote:
> 
> > > They also have some evil of
> > > their own when the application in question is commercial and not
open
> > > source -- you have effectively no control over how it was built
and
> > > tuned for your architecture, for example, and may not even have
> > > meaningful version information.
> >
> > Let's be fair here. An ISV application is not the definition of
evil.
> 
> I did not mean to imply that they were wholly evil or even evil in
> intent.
> 
> > Clearly, "you have effectively no control over how an application
was
> > built and tuned for your architecture" has no direct correspondence
to
> > performance.
> 
> I would have to respectfully and vehemently disagree.  It has all
sorts
> of direct correspondances.  Let us make a short tally of ways that a
> closed source, binary only application used as a benchmark can mislead
> me with regard to the performance of a system.
>
[deletia]
> 
> Note, BTW, that all of the observations above are irrelevant if the
> application being used as a benchmark is the application you intend to
> use in the form you intend to use it, purchased or not.

OK. So there's our difference.  I only consider an application benchmark
useful in this scenario.  I can't imagine using an application benchmark
of any sort if it isn't; you enumerated all the reasons for this in the
bits I just snipped.

We agree completely on this.

-- 
David N. Lombard
 
My comments represent my opinions, not those of Intel Corporation.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Feb 11 18:07:18 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 12 Feb 2004 10:07:18 +1100
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
In-Reply-To: <Pine.LNX.4.44.0402111628500.1290-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0402111628500.1290-100000@lilith.rgb.private.net>
Message-ID: <200402121007.30002.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 12 Feb 2004 09:13 am, Robert G. Brown wrote:

>   * Even leaving side the additional costs, there is the issue of
> whether the application I'm using is tuned for the hardware I'm running
> on.

Such as ISV's including IA32 executables as part of their IA64 version.  It 
wasn't all IA32, just bits.  Very odd.

We only spotted it when it failed to work on Rocks 3.1.0, which doesn't supply 
the IA32 compatability libraries (which Rocks 3.0.0 did).

No, I'm not going to name names, but the "file" and "ldd" are your friends.

cheers,
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKrWmO2KABBYQAh8RAu9JAJ41djUEj+6zEZYrY9IuPG4E9s9qugCeKhJd
2pf/pnDftPMs0zCLYb7IaRM=
=t/c6
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 18:34:06 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 18:34:06 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <200402121007.30002.csamuel@vpac.org>
Message-ID: <Pine.LNX.4.44.0402111832560.1290-100000@lilith.rgb.private.net>

On Thu, 12 Feb 2004, Chris Samuel wrote:

> No, I'm not going to name names, but the "file" and "ldd" are your
friends.

...and with that, I'm going to quit for the day and take my nameless
friends out for a beer somewhere...

(Sorry, revenge for the lies, damned lies...:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Feb 11 17:23:12 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 12 Feb 2004 09:23:12 +1100
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)
In-Reply-To: <Pine.LNX.4.44.0402111106120.1290-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0402111106120.1290-100000@lilith.rgb.private.net>
Message-ID: <200402120923.19328.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 12 Feb 2004 03:44 am, Robert G. Brown wrote:

> There are really two kinds of benchmarks.  Maybe even
> three.

Lies, damn lies and statistics ?

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAKqtTO2KABBYQAh8RAg+TAJ4uLkrC7zOUDlK8OYVxBuwKY/GXuQCeJFvj
vd9nT5nkEuUY/3Myv0IROaU=
=8pIh
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb 11 18:32:28 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 11 Feb 2004 18:32:28 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4C0D2@orsmsx402.jf.intel.com>
Message-ID: <Pine.LNX.4.44.0402111817570.1290-100000@lilith.rgb.private.net>

On Wed, 11 Feb 2004, Lombard, David N wrote:

> OK. So there's our difference.  I only consider an application benchmark
> useful in this scenario.  I can't imagine using an application benchmark
> of any sort if it isn't; you enumerated all the reasons for this in the
> bits I just snipped.
> 
> We agree completely on this.

I figured that we did -- I'm getting verbose on it because I think it is
an important issue to be precise on.  "What's a FLOP?" is a perfectly
reasonable question with a perfectly unintelligible and meaningless
answer, in spite of it being cited again and again over decades to sell
systems.  At the same time, benchmarks are certainly useful.

I think the confusion is probably my fault -- my age/history showing
again.  I can remember fairly clearly when awk was cited as a benchmark.
Quake too, and not for people who were USING awk or necessarily going to
play quake.  This is what I meant by an "application benchmark" -- some
sort of application that somebody thinks is a good measure of general
systems performance and manage to get people to take seriously.  Stuff
like this is still fairly commonly used in many WinXX "benchmarks" that
you'll see "published" both on the web and in real paper magazine
articles.  How fast can Excel update a spreadsheet that computes lunar
orbital trajectories, that sort of thing.

Sometimes they are almost a joke -- applications that do a lot of disk
I/O (apparently, who knows) are used as a "disk performance benchmark".
I won't even get started on this sort of thing and the number of
variables left completely uncontrolled (for example, the disk caching
subsystems both hardware and software) compared to, say, bonnie or
lmbench.  I also won't comment on just how much crap there is out there
with stuff like this in it, sometimes from supposedly "reputable"
testing companies that ought to know better or be more honest.

That's why I "trust" GPL/Open microbenchmarks the most, because I can
look at their sources, understand just what they are doing and how it
compares to what I want to do, maybe even hack them if I need to because
it isn't QUITE right, and get numbers with some meaning.  Stuff like
SPEC and linpack (where linpack should probably be considered micro)
isn't horrible but (in the case of SPEC) isn't GPL or terribly
straightforward to understand microscopically or macroscopically -- it
takes experience to know how the profile it generates compares to
features in your own code.  Great for sales-speak, though -- "Our system
gets 2301.124 specoloids/second, while THEIR system is a laughable
1721.564."  Quake isn't a useful benchmark -- it is a game, and one that
generally runs as fast as it needs to whereever it runs...but it is a
GREAT benchmark for how a system plays quake:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Wed Feb 11 19:31:43 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Wed, 11 Feb 2004 19:31:43 -0500 (EST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>
Message-ID: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal>

On Wed, 11 Feb 2004, Bernhard Wegner wrote:

> Hello,
> 
> I have a really small "cluster" of 4 PC's which are connected by a normal 
> Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
> I thought I might be able to improve performance by connecting the machines 
> via a Gigabit switch (which are really cheap nowadays).
> 
> Everything seemed to work fine. The switch indicates 1000Mbit connections to 
> the PC's and transfer rate for scp-ing large files is significantly higher 
> now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
> with the 100 Mbit switch.
> 
> I wasn't able to actually track down the problem, but it seems that there is 
> a problem with small messages. When I run the performance test provided with 
> mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
> byte message length, while for larger messages everything looks fine (linear 
> dependancy of transfer time on message length, everything below 300 us). I 
> have also tried mpich2 which shows exactly the same behavior.
> 
> Does anyone have any idea?

First, I assume you were running the 100BT through the same 
onboard NICs and got reasonable performance. So some possible
things:

- the switch is a dog or it is broken
- your cables may be old or bad (but worked fine for 100BT)
- negotiation problem

Some things to try:

Use a cross over cable (cat5e) and see if you get the same problem.
You might try using a lower level benchmark (of the micro variety)
like netperf and netpipe. 

The Beowulf Performance Suite:
http://www.clusterworld.com/article.pl?sid=03/03/17/1838236

has these tests. Also, the December and January issues of ClusterWorld
show how to test a network connection using netpipe. At some point this 
content will be showing up on the web-page. 

Also, the MPI Link-checker from Microway (www.microway.com)

http://www.clusterworld.com/article.pl?sid=04/02/09/1952250

May help.


Doug

> 
> Here are the details of my system: 
>  - Suse Linux 9.0 (kernel 2.4.21)
>  - mpich-1.2.5.2
>  - motherboard ASUS P4P800
>  - LAN (10/100/1000) on board (3COM 3C940 chipset)
>  - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M 
+
>    8x88E1111-BAB, AT89C2051-24PI)
> 
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Wed Feb 11 20:19:26 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 11 Feb 2004 20:19:26 -0500
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
	for SPEC2k (and other benchmarks?)
In-Reply-To: <200402121007.30002.csamuel@vpac.org>
References: <Pine.LNX.4.44.0402111628500.1290-100000@lilith.rgb.private.net>
	 <200402121007.30002.csamuel@vpac.org>
Message-ID: <1076548766.3950.91.camel@protein.scalableinformatics.com>

On Wed, 2004-02-11 at 18:07, Chris Samuel wrote:

> No, I'm not going to name names, but the "file" and "ldd" are your friends.

... and strace.  Amazing how useful that one is.   

-- 
Joe Landman <landman at scalableinformatics.com>
Scalable Informatics LLC

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Thu Feb 12 03:44:12 2004
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Thu, 12 Feb 2004 09:44:12 +0100
Subject: [Beowulf] Profiling floating-point performance
In-Reply-To: <402A6A1B.2070805@tcd.ie>
References: <402A6A1B.2070805@tcd.ie>
Message-ID: <200402120944.12719.joachim@ccrl-nece.de>

david moloney:
> Can anybody offer advice on how to do this?  I tried using Vtune but it
> didn't seem to have this feature.

Try PAPI: http://icl.cs.utk.edu/papi/

It offers you all information that the CPU has to offer for this. It depends 
on you how to gather them.

However, for an instruction-level histogramm, a simulator will probaby be more 
useful. And you should think about if you *really* need this - if the 
information you get is worth the effort. 

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Thu Feb 12 03:31:21 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Thu, 12 Feb 2004 09:31:21 +0100 (CET)
Subject: [Beowulf] Profiling floating-point performance
In-Reply-To: <402A6A1B.2070805@tcd.ie>
Message-ID: <Pine.LNX.4.44.0402120928560.13663-100000@druifje.clustervision.com>

On Wed, 11 Feb 2004, david moloney wrote:

> 
> My special requirement is that I would like not only peak and average 
> flops numbers but also I would like a histogram of the actual x86 
> floating point instructions executed and their contribution to those 
> peak and average flops numbers.
> 
> Can anybody offer advice on how to do this?  I tried using Vtune but it 
> didn't seem to have this feature.
> 
Can't help directly,
but you could look at Oprofile  http://oprofile.sourceforge.net/about/


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Thu Feb 12 03:17:29 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Thu, 12 Feb 2004 09:17:29 +0100 (CET)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de>
Message-ID: <Pine.LNX.4.44.0402120912520.13663-100000@druifje.clustervision.com>

On Wed, 11 Feb 2004, Bernhard Wegner wrote:

> 
> Does anyone have any idea?
> 
> Here are the details of my system: 
>  - Suse Linux 9.0 (kernel 2.4.21)
>  - mpich-1.2.5.2
>  - motherboard ASUS P4P800
>  - LAN (10/100/1000) on board (3COM 3C940 chipset)
>  - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M +
>    8x88E1111-BAB, AT89C2051-24PI)

You might look at the P4_GLOBMEMSIZE parameter in the MPI job.

export P4_GLOBMEMSIZE=20194344      (say)

Try stepping through various values for this parameter,
and run the Pallas benchmark.
Let us know what the results are!


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Thu Feb 12 03:24:10 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Thu, 12 Feb 2004 09:24:10 +0100 (CET)
Subject: [Beowulf] how are people doing this?
In-Reply-To: <20040211185501.GA31590@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0402120919370.13663-100000@druifje.clustervision.com>

On Wed, 11 Feb 2004, Mike Eggleston wrote:

> I feel that in a proper cluster that the nodes are all (basically)
> identical. I 'own' a server environment of 20+ servers that  are
> all dedicated to specific applications and this is not a cluster.
> However, I would like to manage config files (/etc/resolv.conf, etc),
> user accounts, patches, etc., as I would in a clustered environment.
> I have read the papers at infrastructures.org and agree with the
> principles mentioned there. I have looked extensively at cfengine,
> though I prefer the solution be in PERL as all my servers have
> PERL already (the manufacturer installs PERL as default on the boxes).

Alternatives you might look at are:

 LCFG http://www.lcfg.org/

The European Datagrid people have the Quattor project  http://quattor.org/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikhailberis at free.net.ph  Thu Feb 12 05:37:54 2004
From: mikhailberis at free.net.ph (Dean Michael C. Berris)
Date: 12 Feb 2004 18:37:54 +0800
Subject: [Beowulf] Master-Slave Problems
Message-ID: <1076582124.5002.20.camel@mikhail>

Good day everyone,

I've just finished implementing and testing a master-slave prime number
finder as a test problem for my thesis on heterogeneous cluster load
balancing for parallel applications. Test results show anomalies which
may be tied to work chunk size allocations to the slaves, but to test
whether it will hold true for other applications and is not directly
tied to the parallel prime number finder, I am in need of other problems
that may be solved using a master-slave architecture.

Sure it is easy to come up with just any problem and implement a
solution in a master-slave model, but I'm looking for computationally
intensive problems wherein the computation necessary for parts of the
problem are not equal. What I mean by this is similar to the case of the
parallel number finder, seeing whether 11 is prime requires less
computation compared to seeing whether 9999991 is prime.

Any insights or pointers to documentations or papers that have had
similar problems are most welcome.

TIA

PS: Are ther any cluster admins there willing to spare some cycles and
cluster time for a cluster needy BS Undergraduate student in the
Philippines? :D
-- 
Dean Michael C. Berris
http://mikhailberis.blogspot.com
mikhailberis at free.net.ph
+63 919 8720686
GPG 08AE6EAC


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From meetsunil80x86 at yahoo.co.in  Thu Feb 12 06:58:41 2004
From: meetsunil80x86 at yahoo.co.in (=?iso-8859-1?q?sunil=20kumar?=)
Date: Thu, 12 Feb 2004 11:58:41 +0000 (GMT)
Subject: [Beowulf] Math Coprocessor
Message-ID: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com>


 Hello everybody,
  I am a newbie in the Linux world.I would like to
know    
 know to...  
  1) program the 80x87 using C/C++/Fortran95 in linux 
     
      platform.
  2) program the 80x86 using C/C++/Fortran95 in linux 
      platform.
  3) link a C function into a fortran95 program or
vice 
      versa. 
  
  Thanks in advance,
             sunil

________________________________________________________________________
Yahoo! India Education Special: Study in the UK now.
Go to http://in.specials.yahoo.com/index1.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Thu Feb 12 09:25:30 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Thu, 12 Feb 2004 22:25:30 +0800 (CST)
Subject: [Beowulf] IA64 & AMD64 binary SPBS and SGE download
Message-ID: <20040212142530.32328.qmail@web16807.mail.tpe.yahoo.com>

Just FYI only.

AMD64 binary from offical GridEngine homepage:
http://gridengine.sunsource.net/project/gridengine/download.html
(IA64 is supported but you need to build from source)

IA64 and AMD64 binary rpm for Torque:
http://www-user.tu-chemnitz.de/~kapet/torque/

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Feb 12 09:18:31 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 12 Feb 2004 09:18:31 -0500 (EST)
Subject: [Beowulf] Master-Slave Problems
In-Reply-To: <1076582124.5002.20.camel@mikhail>
Message-ID: <Pine.LNX.4.44.0402120913180.21789-100000@coffee.psychology.mcmaster.ca>

> Sure it is easy to come up with just any problem and implement a
> solution in a master-slave model, but I'm looking for computationally
> intensive problems wherein the computation necessary for parts of the
> problem are not equal. What I mean by this is similar to the case of the
> parallel number finder, seeing whether 11 is prime requires less
> computation compared to seeing whether 9999991 is prime.

an easy if hackneyed one is a mandelbrot-family fractal zoomer.
depending on what chunk of the space you look at, I'd guess you 
could find pretty much any distribution of work-per-point.  if your
master-slave model does smart domain decomp, this might be just the thing.

true, some people will roll their eyes when they find out you're doing
fractals.  I certainly did, when someone here used them.  but they do 
have nice properties, and nice pictures always help ;)


> PS: Are ther any cluster admins there willing to spare some cycles and
> cluster time for a cluster needy BS Undergraduate student in the
> Philippines? :D

send me some email.

regards, mark hahn.
hahn at sharcnet.ca

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 12 09:35:33 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 12 Feb 2004 09:35:33 -0500 (EST)
Subject: [Beowulf] Master-Slave Problems
In-Reply-To: <1076582124.5002.20.camel@mikhail>
Message-ID: <Pine.LNX.4.44.0402120925370.1290-100000@lilith.rgb.private.net>

On 12 Feb 2004, Dean Michael C. Berris wrote:

> Good day everyone,
> 
> I've just finished implementing and testing a master-slave prime number
> finder as a test problem for my thesis on heterogeneous cluster load
> balancing for parallel applications. Test results show anomalies which
> may be tied to work chunk size allocations to the slaves, but to test
> whether it will hold true for other applications and is not directly
> tied to the parallel prime number finder, I am in need of other problems
> that may be solved using a master-slave architecture.
> 
> Sure it is easy to come up with just any problem and implement a
> solution in a master-slave model, but I'm looking for computationally
> intensive problems wherein the computation necessary for parts of the
> problem are not equal. What I mean by this is similar to the case of the
> parallel number finder, seeing whether 11 is prime requires less
> computation compared to seeing whether 9999991 is prime.
> 
> Any insights or pointers to documentations or papers that have had
> similar problems are most welcome.

Two remarks.  One, lots of problems (e.g. descent into a Mandelbrot set)
have widely variable compute times for chunks of work divvied out in a
master/slave model with very short and uniform messages distributing the
work.  

Two, why not just simulate work?  You're studying something in computer
science, not trying to compute prime numbers or random numbers or
mandelbrot sets or julia sets.  Set up your master to distribute times
for slaves to sleep and then reply.  Select the times to distribute from
the distribution (random or otherwise) of your choice, and scale a
return "results" packet accordingly.  This yields you complete control
over the statistics of the "work" distribution and network load and lets
you explore distributions that you might not easily find in the real
world.  It also lets you CONNECT the results of your simulations with
"known" distributions to the results you obtain with real problems,
which may help you identify or even categorically classify problems in
terms of work-load complexity.  This would doubtless make your thesis
still more powerful.

This is what I've been doing in my Cluster World column -- simulating
work (or nearly so) with a trivial master-slave computation of random
numbers (the return) accompanied by an adjustable "sleep time" that
permits me to effectively sweep the granularity of the computation to
demonstrate at least simple Amdahlian scaling properties of this sort of
computation.  In fact, I can likely give you a PVM program to do this
that could easily be hacked into precisely what you'd need to implement
this with little effort (a few days, INCLUDING learning how to generate
distributions with e.g. the GSL).

Let me know.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Thu Feb 12 10:25:03 2004
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Thu, 12 Feb 2004 10:25:03 -0500
Subject: [Beowulf] Virginia Tech upgrade
In-Reply-To: <Pine.LNX.4.44.0401281748400.28669-100000@boltzmann.basement-supercomputing.com>
References: <Pine.LNX.4.44.0401281748400.28669-100000@boltzmann.basement-supercomputing.com>
Message-ID: <402B9ACF.2040502@lmco.com>


In case anyone hasn't read slashdot in the last few hours,

http://apple.slashdot.org/apple/04/02/12/0613255.shtml?tid=107&tid=126&tid=181&tid=187

Now, everyone face Doug's house and say, "Doug is
always right. Doug is always right" :)

Jeff

>
> The first thought I had was "what will they do with all the old systems?"
>
> Then it hit me. They put a fancy sticker on each box that says
> "This machine was part of the third fastest supercomputer on the planet
> Nov. 2003" or something similar. Also put a serial number on the tag and
> provide a "certificate of authenticity" from VT. My guess is they can 
> make
> a little bit on the whole deal. I wager they would sell rather quickly.
> Alumni eat this kind of thing up.
>
> For those interested, my old www.cluster-rant.com site has morphed into
> the new www.clusterworld.com site. You can check out issue contents,
> submit stories, check out the polls, and rant about clusters.
>
> Doug
>
>
> -- 
> ----------------------------------------------------------------
> Editor-in-chief                   ClusterWorld Magazine
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 12 10:04:29 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 12 Feb 2004 10:04:29 -0500 (EST)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402120941550.1290-100000@lilith.rgb.private.net>

On Thu, 12 Feb 2004, sunil kumar wrote:

> 
>  
>  Hello everybody,
>   I am a newbie in the Linux world.I would like to
> know    
>  know to...  
>   1) program the 80x87 using C/C++/Fortran95 in linux 

Why?  As one of the relatively few humans on the planet to ever actually
write 8087 code (back when it was the ONLY way to use the coprocessor
with the various compilers available at the time) I can authoritatively
say that it isn't horribly difficult -- the x87 is sort of a RPN HPC
calculator for your PC with its own stack and internal floating point
commands -- but all the compilers available already use it when they can
and it is appropriate, and in MANY cases their code will be as or more
efficient and robust than what you could hand code.  There are doubtless
exceptions, but are they worth the considerable amount of work required
to realize them?

Are you planning to join the GCC project or something?

>       platform.
>   2) program the 80x86 using C/C++/Fortran95 in linux 
>       platform.

This is straightfoward.  But I'm not going to explain inlining of
assembler here (I can give you an example/code fragment of inlined code
if you want it, though).  Instead...

...Google is your friend.  Try e.g. "86 assembler reference gnu"

    http://linux.maruhn.com/cat/Development/Languages.html
    http://www.redhat.com/docs/manuals/enterprise/ RHEL-3-Manual/pdf/rhel-as-en.pdf 
    http://www.linuxgazette.com/issue94/ramankutty.html

or "gnu assembler manual"

    http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_toc.html
    ...
>   3) link a C function into a fortran95 program or
> vice 

or "gnu fortran manual"

    http://gcc.gnu.org/onlinedocs/g77/

(and that's just the beginning!)  Try other search strings.  Consider
buying a book or two if you're unfamiliar with assembler altogether -- I
don't think it is taught much anymore in CPS departments unless you are
a really serious major and select the right courses.  And they still
have somebody who can teach them -- one thing about upper level
languages is that they make assembler level programming so difficult by
comparison that it has become a vanishing and highly arcane art.  Well,
not really vanishing, but I'll bet that no more than 10% of all
programmers have a clue about what registers are and how to manipulate
them with assembler commands...maybe more like 1-2%.  And mostly Old
Guys at that.  And the serious, I mean really serious, programmers and
hackers.

Basically, all of this is throroughly documented ag gnu.org, and much of
it is REdocumented, explained, tutorialized, and hashed over many times
many other places, all on the web.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 12 10:28:37 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 12 Feb 2004 10:28:37 -0500 (EST)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402121027420.1290-100000@lilith.rgb.private.net>

On Thu, 12 Feb 2004, sunil kumar wrote:

> 
>  
>  Hello everybody,
>   I am a newbie in the Linux world.I would like to
> know    
>  know to...  
>   1) program the 80x87 using C/C++/Fortran95 in linux 
>      
>       platform.
>   2) program the 80x86 using C/C++/Fortran95 in linux 
>       platform.
>   3) link a C function into a fortran95 program or
> vice 

One last reference:

  man as86

(it even has a list of the supported x86 and x87 instructions at the
bottom, although it does NOT teach you to program in assembler in the
first place).

    rgb

>       versa. 
>   
>   Thanks in advance,
>              sunil
> 
> ________________________________________________________________________
> Yahoo! India Education Special: Study in the UK now.
> Go to http://in.specials.yahoo.com/index1.html
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Feb 12 09:12:31 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 12 Feb 2004 09:12:31 -0500 (EST)
Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned
 for SPEC2k (and other benchmarks?)
In-Reply-To: <1076548766.3950.91.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.44.0402120906570.21789-100000@coffee.psychology.mcmaster.ca>

> ... and strace.  Amazing how useful that one is.   

true, but I've also fallen in love with ltrace,
which does both syscalls and lib calls.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ao8215 at wayne.edu  Thu Feb 12 08:13:48 2004
From: ao8215 at wayne.edu (Robson Pablo Sobradiel Peguin)
Date: Thu, 12 Feb 2004 08:13:48 -0500
Subject: [Beowulf] Message Error
Message-ID: <813143f0.10fc5818.81a9100@mirapointms3.wayne.edu>

Hi

I would like to know the meanings of these errors during the
compilation with MPICH in the cluster:

[root at master source]# make beowulf-WSU-INTEL
cp /usr/local/mpich/mpich-1.2.5_intel/include/mpif.h mpif.h
make FC=/usr/local/mpich/mpich-1.2.5_intel/bin/mpif90
FFLAGS="-O3 -tpp7 -xW -axW -c"\
CPFLAGS="-DSTRESS -D'POINTER=integer'" \
make LD="/usr/local/mpich/mpich-1.2.5_intel/bin/mpif90 -tpp7
-xW -axW -o" \
        FFLAGS="-O3 -tpp7 -xW -axW -c" \
CPFLAGS="-DSTRESS -DMPI -D'POINTER=integer'" \
EX=DLPOLY.MBE BINROOT=../execute 3pt
make[1]: Entering directory
`/home/sdr/DL_POLY/dl_poly_2.13/source'
make[1]: *** No rule to make target `make'.  Stop.
make[1]: Leaving directory `/home/sdr/DL_POLY/dl_poly_2.13/source'
make: *** [beowulf-WSU-INTEL] Error 2

Thank you very much
________________________________________________________

Robson P. S. Peguin, Graduate Student
Wayne State University
Department of Chemical Engineering and Materials Science
4815 Fourth Street, 2015 MBE,Detroit - MI 48201
phone: (313)577-1416 fax: (313)577-3810
e-mail: robson_peguin at wayne.edu
        http://chem1.eng.wayne.edu/~sdr/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Wed Feb 11 23:13:12 2004
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Wed, 11 Feb 2004 22:13:12 -0600
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages
In-Reply-To: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal>
References: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal>
Message-ID: <402AFD58.9060402@tamu.edu>

Realize that not all switches are created equal when working with small 
(and, overall, 0-byte == small) packets.  A number of otherwise decent 
network switches are less than stellar performers with small packets. 
We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test 
system running under the RFC-2544 testing suite...

There are switches that perform well with small packets, but it's been 
our experience that most switches, especially your lower cost switches 
(Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some 
others I can't recall right now) didn't perform well with smaller 
packets but did fine when the packet size was about 1500 bytes.

Going with cheap switches is usually not a good way to improve performance.

gerry

Douglas Eadline, Cluster World Magazine wrote:
> On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> 
> 
>>Hello,
>>
>>I have a really small "cluster" of 4 PC's which are connected by a normal 
>>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
>>I thought I might be able to improve performance by connecting the machines 
>>via a Gigabit switch (which are really cheap nowadays).
>>
>>Everything seemed to work fine. The switch indicates 1000Mbit connections to 
>>the PC's and transfer rate for scp-ing large files is significantly higher 
>>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
>>with the 100 Mbit switch.
>>
>>I wasn't able to actually track down the problem, but it seems that there is 
>>a problem with small messages. When I run the performance test provided with 
>>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
>>byte message length, while for larger messages everything looks fine (linear 
>>dependancy of transfer time on message length, everything below 300 us). I 
>>have also tried mpich2 which shows exactly the same behavior.
>>
>>Does anyone have any idea?
> 
> 
> First, I assume you were running the 100BT through the same 
> onboard NICs and got reasonable performance. So some possible
> things:
> 
> - the switch is a dog or it is broken
> - your cables may be old or bad (but worked fine for 100BT)
> - negotiation problem
> 
> Some things to try:
> 
> Use a cross over cable (cat5e) and see if you get the same problem.
> You might try using a lower level benchmark (of the micro variety)
> like netperf and netpipe. 
> 
> The Beowulf Performance Suite:
> http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> 
> has these tests. Also, the December and January issues of ClusterWorld
> show how to test a network connection using netpipe. At some point this 
> content will be showing up on the web-page. 
> 
> Also, the MPI Link-checker from Microway (www.microway.com)
> 
> http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> 
> May help.
> 
> 
> Doug
> 
> 
>>Here are the details of my system: 
>> - Suse Linux 9.0 (kernel 2.4.21)
>> - mpich-1.2.5.2
>> - motherboard ASUS P4P800
>> - LAN (10/100/1000) on board (3COM 3C940 chipset)
>> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M 
> 
> +
> 
>>   8x88E1111-BAB, AT89C2051-24PI)
>>
>>
> 
> 

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Thu Feb 12 22:22:02 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Thu, 12 Feb 2004 21:22:02 -0600 (CST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <402AFD58.9060402@tamu.edu>
References: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal>
 <402AFD58.9060402@tamu.edu>
Message-ID: <Pine.GSO.4.58.0402122121040.13767@is.rice.edu>

The best switch that we have found both in price and speed are the GigE
Switches from Dell. We use them in a few of our test clusters and smaller
clusters. They are actually pretty good performers and top even some of
the cisco switches.

-Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote:

> Realize that not all switches are created equal when working with small
> (and, overall, 0-byte == small) packets.  A number of otherwise decent
> network switches are less than stellar performers with small packets.
> We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test
> system running under the RFC-2544 testing suite...
>
> There are switches that perform well with small packets, but it's been
> our experience that most switches, especially your lower cost switches
> (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some
> others I can't recall right now) didn't perform well with smaller
> packets but did fine when the packet size was about 1500 bytes.
>
> Going with cheap switches is usually not a good way to improve performance.
>
> gerry
>
> Douglas Eadline, Cluster World Magazine wrote:
> > On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> >
> >
> >>Hello,
> >>
> >>I have a really small "cluster" of 4 PC's which are connected by a normal
> >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
> >>I thought I might be able to improve performance by connecting the machines
> >>via a Gigabit switch (which are really cheap nowadays).
> >>
> >>Everything seemed to work fine. The switch indicates 1000Mbit connections to
> >>the PC's and transfer rate for scp-ing large files is significantly higher
> >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
> >>with the 100 Mbit switch.
> >>
> >>I wasn't able to actually track down the problem, but it seems that there is
> >>a problem with small messages. When I run the performance test provided with
> >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0
> >>byte message length, while for larger messages everything looks fine (linear
> >>dependancy of transfer time on message length, everything below 300 us). I
> >>have also tried mpich2 which shows exactly the same behavior.
> >>
> >>Does anyone have any idea?
> >
> >
> > First, I assume you were running the 100BT through the same
> > onboard NICs and got reasonable performance. So some possible
> > things:
> >
> > - the switch is a dog or it is broken
> > - your cables may be old or bad (but worked fine for 100BT)
> > - negotiation problem
> >
> > Some things to try:
> >
> > Use a cross over cable (cat5e) and see if you get the same problem.
> > You might try using a lower level benchmark (of the micro variety)
> > like netperf and netpipe.
> >
> > The Beowulf Performance Suite:
> > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> >
> > has these tests. Also, the December and January issues of ClusterWorld
> > show how to test a network connection using netpipe. At some point this
> > content will be showing up on the web-page.
> >
> > Also, the MPI Link-checker from Microway (www.microway.com)
> >
> > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> >
> > May help.
> >
> >
> > Doug
> >
> >
> >>Here are the details of my system:
> >> - Suse Linux 9.0 (kernel 2.4.21)
> >> - mpich-1.2.5.2
> >> - motherboard ASUS P4P800
> >> - LAN (10/100/1000) on board (3COM 3C940 chipset)
> >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M
> >
> > +
> >
> >>   8x88E1111-BAB, AT89C2051-24PI)
> >>
> >>
> >
> >
>
> --
> Gerry Creager -- gerry.creager at tamu.edu
> Network Engineering -- AATLT, Texas A&M University
> Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
> Page: 979.228.0173
> Office: 903A Eller Bldg, TAMU, College Station, TX 77843
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Feb 12 22:35:51 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 13 Feb 2004 14:35:51 +1100
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages
In-Reply-To: <Pine.GSO.4.58.0402122121040.13767@is.rice.edu>
References: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal> <402AFD58.9060402@tamu.edu> <Pine.GSO.4.58.0402122121040.13767@is.rice.edu>
Message-ID: <200402131435.54453.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 13 Feb 2004 02:22 pm, Brent M. Clements wrote:

> The best switch that we have found both in price and speed are the GigE
> Switches from Dell. We use them in a few of our test clusters and smaller
> clusters. They are actually pretty good performers and top even some of
> the cisco switches.

That's bizzare, the GigE switches I've seen in a Dell cluster *were* rebadged 
Cisco switches.  Even had to do the usual "PortFast" routine in IOS to get 
PXE booting to work.

Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFALEYXO2KABBYQAh8RAm81AJoDHOfMZ+hrIyLVoBIr1lsESi70KACfcnYu
C1JcJ3iYX22Tm99gTvKlfOs=
=XWYZ
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Feb 12 23:17:26 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 12 Feb 2004 20:17:26 -0800 (PST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <Pine.GSO.4.58.0402122121040.13767@is.rice.edu>
Message-ID: <Pine.LNX.4.44.0402122009500.18247-100000@twin.uoregon.edu>

on varius revs of their code I've had regular (once a week) managment
stack crash on our dell switches which doesn't make it easy to collect
statistics, but they continue to forward packets just fine... the switches
are actually made by accton and they are also sold by smc...  depending
one who has better deals the dell 5212/5224 or smc 8612t/8624t may be
cheaper at any given time... the cisco cat-ios style cli and ssh support 
are a plus.

On Thu, 12 Feb 2004, Brent M. Clements wrote:

> The best switch that we have found both in price and speed are the GigE
> Switches from Dell. We use them in a few of our test clusters and smaller
> clusters. They are actually pretty good performers and top even some of
> the cisco switches.
> 
> -Brent
> 
> Brent Clements
> Linux Technology Specialist
> Information Technology
> Rice University
> 
> 
> On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote:
> 
> > Realize that not all switches are created equal when working with small
> > (and, overall, 0-byte == small) packets.  A number of otherwise decent
> > network switches are less than stellar performers with small packets.
> > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test
> > system running under the RFC-2544 testing suite...
> >
> > There are switches that perform well with small packets, but it's been
> > our experience that most switches, especially your lower cost switches
> > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some
> > others I can't recall right now) didn't perform well with smaller
> > packets but did fine when the packet size was about 1500 bytes.
> >
> > Going with cheap switches is usually not a good way to improve performance.
> >
> > gerry
> >
> > Douglas Eadline, Cluster World Magazine wrote:
> > > On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> > >
> > >
> > >>Hello,
> > >>
> > >>I have a really small "cluster" of 4 PC's which are connected by a normal
> > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
> > >>I thought I might be able to improve performance by connecting the machines
> > >>via a Gigabit switch (which are really cheap nowadays).
> > >>
> > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to
> > >>the PC's and transfer rate for scp-ing large files is significantly higher
> > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
> > >>with the 100 Mbit switch.
> > >>
> > >>I wasn't able to actually track down the problem, but it seems that there is
> > >>a problem with small messages. When I run the performance test provided with
> > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0
> > >>byte message length, while for larger messages everything looks fine (linear
> > >>dependancy of transfer time on message length, everything below 300 us). I
> > >>have also tried mpich2 which shows exactly the same behavior.
> > >>
> > >>Does anyone have any idea?
> > >
> > >
> > > First, I assume you were running the 100BT through the same
> > > onboard NICs and got reasonable performance. So some possible
> > > things:
> > >
> > > - the switch is a dog or it is broken
> > > - your cables may be old or bad (but worked fine for 100BT)
> > > - negotiation problem
> > >
> > > Some things to try:
> > >
> > > Use a cross over cable (cat5e) and see if you get the same problem.
> > > You might try using a lower level benchmark (of the micro variety)
> > > like netperf and netpipe.
> > >
> > > The Beowulf Performance Suite:
> > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> > >
> > > has these tests. Also, the December and January issues of ClusterWorld
> > > show how to test a network connection using netpipe. At some point this
> > > content will be showing up on the web-page.
> > >
> > > Also, the MPI Link-checker from Microway (www.microway.com)
> > >
> > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> > >
> > > May help.
> > >
> > >
> > > Doug
> > >
> > >
> > >>Here are the details of my system:
> > >> - Suse Linux 9.0 (kernel 2.4.21)
> > >> - mpich-1.2.5.2
> > >> - motherboard ASUS P4P800
> > >> - LAN (10/100/1000) on board (3COM 3C940 chipset)
> > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M
> > >
> > > +
> > >
> > >>   8x88E1111-BAB, AT89C2051-24PI)
> > >>
> > >>
> > >
> > >
> >
> > --
> > Gerry Creager -- gerry.creager at tamu.edu
> > Network Engineering -- AATLT, Texas A&M University
> > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
> > Page: 979.228.0173
> > Office: 903A Eller Bldg, TAMU, College Station, TX 77843
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Feb 12 23:19:16 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 12 Feb 2004 20:19:16 -0800 (PST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <Pine.LNX.4.44.0402122009500.18247-100000@twin.uoregon.edu>
Message-ID: <Pine.LNX.4.44.0402122018120.18247-100000@twin.uoregon.edu>

Also they support jumbo (9k) frames which is a plus for us since we do nfs 
over them.

joelja

On Thu, 12 Feb 2004, Joel Jaeggli wrote:

> on varius revs of their code I've had regular (once a week) managment
> stack crash on our dell switches which doesn't make it easy to collect
> statistics, but they continue to forward packets just fine... the switches
> are actually made by accton and they are also sold by smc...  depending
> one who has better deals the dell 5212/5224 or smc 8612t/8624t may be
> cheaper at any given time... the cisco cat-ios style cli and ssh support 
> are a plus.
> 
> On Thu, 12 Feb 2004, Brent M. Clements wrote:
> 
> > The best switch that we have found both in price and speed are the GigE
> > Switches from Dell. We use them in a few of our test clusters and smaller
> > clusters. They are actually pretty good performers and top even some of
> > the cisco switches.
> > 
> > -Brent
> > 
> > Brent Clements
> > Linux Technology Specialist
> > Information Technology
> > Rice University
> > 
> > 
> > On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote:
> > 
> > > Realize that not all switches are created equal when working with small
> > > (and, overall, 0-byte == small) packets.  A number of otherwise decent
> > > network switches are less than stellar performers with small packets.
> > > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test
> > > system running under the RFC-2544 testing suite...
> > >
> > > There are switches that perform well with small packets, but it's been
> > > our experience that most switches, especially your lower cost switches
> > > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some
> > > others I can't recall right now) didn't perform well with smaller
> > > packets but did fine when the packet size was about 1500 bytes.
> > >
> > > Going with cheap switches is usually not a good way to improve performance.
> > >
> > > gerry
> > >
> > > Douglas Eadline, Cluster World Magazine wrote:
> > > > On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> > > >
> > > >
> > > >>Hello,
> > > >>
> > > >>I have a really small "cluster" of 4 PC's which are connected by a normal
> > > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
> > > >>I thought I might be able to improve performance by connecting the machines
> > > >>via a Gigabit switch (which are really cheap nowadays).
> > > >>
> > > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to
> > > >>the PC's and transfer rate for scp-ing large files is significantly higher
> > > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
> > > >>with the 100 Mbit switch.
> > > >>
> > > >>I wasn't able to actually track down the problem, but it seems that there is
> > > >>a problem with small messages. When I run the performance test provided with
> > > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0
> > > >>byte message length, while for larger messages everything looks fine (linear
> > > >>dependancy of transfer time on message length, everything below 300 us). I
> > > >>have also tried mpich2 which shows exactly the same behavior.
> > > >>
> > > >>Does anyone have any idea?
> > > >
> > > >
> > > > First, I assume you were running the 100BT through the same
> > > > onboard NICs and got reasonable performance. So some possible
> > > > things:
> > > >
> > > > - the switch is a dog or it is broken
> > > > - your cables may be old or bad (but worked fine for 100BT)
> > > > - negotiation problem
> > > >
> > > > Some things to try:
> > > >
> > > > Use a cross over cable (cat5e) and see if you get the same problem.
> > > > You might try using a lower level benchmark (of the micro variety)
> > > > like netperf and netpipe.
> > > >
> > > > The Beowulf Performance Suite:
> > > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> > > >
> > > > has these tests. Also, the December and January issues of ClusterWorld
> > > > show how to test a network connection using netpipe. At some point this
> > > > content will be showing up on the web-page.
> > > >
> > > > Also, the MPI Link-checker from Microway (www.microway.com)
> > > >
> > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> > > >
> > > > May help.
> > > >
> > > >
> > > > Doug
> > > >
> > > >
> > > >>Here are the details of my system:
> > > >> - Suse Linux 9.0 (kernel 2.4.21)
> > > >> - mpich-1.2.5.2
> > > >> - motherboard ASUS P4P800
> > > >> - LAN (10/100/1000) on board (3COM 3C940 chipset)
> > > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M
> > > >
> > > > +
> > > >
> > > >>   8x88E1111-BAB, AT89C2051-24PI)
> > > >>
> > > >>
> > > >
> > > >
> > >
> > > --
> > > Gerry Creager -- gerry.creager at tamu.edu
> > > Network Engineering -- AATLT, Texas A&M University
> > > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
> > > Page: 979.228.0173
> > > Office: 903A Eller Bldg, TAMU, College Station, TX 77843
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Feb 13 03:40:09 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 13 Feb 2004 09:40:09 +0100 (CET)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <Pine.LNX.4.44.0402120941550.1290-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0402130931170.22103-100000@druifje.clustervision.com>

On Thu, 12 Feb 2004, Robert G. Brown wrote:

> not really vanishing, but I'll bet that no more than 10% of all
> programmers have a clue about what registers are and how to manipulate
> them with assembler commands...maybe more like 1-2%.  And mostly Old
> Guys at that.  And the serious, I mean really serious, programmers and
> hackers.
> 
Sigh. I was first taught assembler in the physics department (being as
you in the States would say a physics major).
The lab had Motorola 68000 trainer boards.
I still have a copy of "68000 Assembly Language" by Kane, Hawkins, 
Leventhal kicking around. Such a nice architecture. 

But then again I may be the only person to own "Fortran 77:
A Structured Approach".  Such perversity originating from being taught
Pascal by computer scientists then learning Fortran.

I also remember being taught about self-modifying
code by the then professor of computing science. Do they still teach
that?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rmiguel at usmp.edu.pe  Fri Feb 13 09:24:11 2004
From: rmiguel at usmp.edu.pe (Richard Miguel)
Date: Fri, 13 Feb 2004 09:24:11 -0500
Subject: [Beowulf] problmes with MPICH
References: <Pine.LNX.4.44.0402130931170.22103-100000@druifje.clustervision.com>
Message-ID: <000d01c3f23d$0e2af910$1101000a@cpn.senamhi.gob.pe>

Hi, i have problems with mpich, i have installed OSCAR with mpich
1.2.5.10-ch_p4-gcc using ssh for communicate with nodes.. that is ok. but, i
want to use rsh and i dont want reinstall OSCAR. then i change the line in
mpirun RSHCOMMAND=""ssh" by rsh these change was replicated on the nodes but
mpich not use rsh.
Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need help
in this point.

I have mpich-1.2.5.2 and fortran pgi and rsh.

Thanks

R. Miguel


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mwheeler at startext.co.uk  Fri Feb 13 09:53:38 2004
From: mwheeler at startext.co.uk (Martin WHEELER)
Date: Fri, 13 Feb 2004 14:53:38 +0000 (UTC)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <Pine.LNX.4.44.0402130931170.22103-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.33.0402131439090.20392-100000@caxton.startext.demon.co.uk>

On Fri, 13 Feb 2004, John Hearns wrote:

> But then again I may be the only person to own "Fortran 77:
> A Structured Approach".

Wow!  Bleeding edge stuff.
On the subject of pure perversity, my Fortran notes stop with a
roneotyped rip-off copy of Fortran IV Self-Taught *in French* dating
from 1968.  (Anyone else remember the 60s workhorse, the IBM 1130?
Punched card paradise?  I believe some guy in France has got one back
together and working, but I don't remember where.)

[Weeps sadly into Wincarnis as the memories flood back.]
-- 
Martin Wheeler   -   StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England
mwheeler at startext.co.uk                http://www.startext.co.uk/mwheeler/
GPG pub key : 01269BEB  6CAD BFFB DB11 653E B1B7 C62B  AC93 0ED8 0126 9BEB
      - Share your knowledge. It's a way of achieving immortality. -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joshh at cs.earlham.edu  Fri Feb 13 10:25:31 2004
From: joshh at cs.earlham.edu (joshh at cs.earlham.edu)
Date: Fri, 13 Feb 2004 10:25:31 -0500 (EST)
Subject: [Beowulf] Adding Latency to a Cluster Environment
Message-ID: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>

Here is an irregular question. I am profiling a software package that runs
over LAM-MPI on 16 node clusters [Details Below]. I would like to measure
the effect of increased latency on the run time of the program.

It would be nice if I could quantify the added latency in the process to
create some statistics. If possible, I do not want to alter the code line
of the program, or buy new hardware. I am looking for a software
solution/idea.

Bazaar Cluster:
16 Node Red Hat Linux machines running 500MHz PIII, 512MB RAM
1 100Mbps NIC card in each machine
2 100Mbps Full-Duplex switches

Cairo Cluster:
16 Node YellowDog Linux machines running 1GHz PPC G4, 1GB RAM
2 1Gbps NIC cards in each machine (only one in use)
2 1Gbps Full-Duplex switches

For more details on these clusters follow the link below:
http://cluster.earlham.edu/html/

Thank you,

Josh Hursey
Earlham College Cluster Computing Group

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Fri Feb 13 11:30:25 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Fri, 13 Feb 2004 08:30:25 -0800
Subject: [Beowulf] problmes with MPICH
Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F5563@orsmsx402.jf.intel.com>

I'm forwarding this to the OSCAR-users list, a more appropriate venue
for this question.

-- 
David N. Lombard
 
My comments represent my opinions, not those of Intel Corporation.

> -----Original Message-----
> From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On
Behalf
> Of Richard Miguel
> Sent: Friday, February 13, 2004 6:24 AM
> Cc: beowulf at beowulf.org
> Subject: [Beowulf] problmes with MPICH
> 
> Hi, i have problems with mpich, i have installed OSCAR with mpich
> 1.2.5.10-ch_p4-gcc using ssh for communicate with nodes.. that is ok.
but,
> i
> want to use rsh and i dont want reinstall OSCAR. then i change the
line in
> mpirun RSHCOMMAND=""ssh" by rsh these change was replicated on the
nodes
> but
> mpich not use rsh.
> Now i have download mpich-1.2.5.2 and i want compile it for rsh, i
need
> help
> in this point.
> 
> I have mpich-1.2.5.2 and fortran pgi and rsh.
> 
> Thanks
> 
> R. Miguel
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Fri Feb 13 11:57:00 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Fri, 13 Feb 2004 08:57:00 -0800
Subject: [Beowulf] Math Coprocessor
Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com>

> -----Original Message-----
> From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On
Behalf
> Of Martin WHEELER
> Sent: Friday, February 13, 2004 6:54 AM
> To: John Hearns
> Cc: Robert G. Brown; sunil kumar; beowulf at beowulf.org
> Subject: Re: [Beowulf] Math Coprocessor
> 
> On Fri, 13 Feb 2004, John Hearns wrote:
> 
> > But then again I may be the only person to own "Fortran 77:
> > A Structured Approach".
> 
> Wow!  Bleeding edge stuff.
> On the subject of pure perversity, my Fortran notes stop with a
> roneotyped rip-off copy of Fortran IV Self-Taught *in French* dating
> from 1968.  (Anyone else remember the 60s workhorse, the IBM 1130?
> Punched card paradise?  I believe some guy in France has got one back
> together and working, but I don't remember where.)
> 
> [Weeps sadly into Wincarnis as the memories flood back.]

Ah, another 1130 veteran!  Group hug!

There's an active 1130 group, and you too can run R2V12 on your very own
1130 simulator, complete w/ Fortran (not EMU, sigh) and other tools.
IIRC, APL may even be available.  http://ibm1130.org

One of my hobby tasks is to port the simulator GUI to Tcl/Tk or
Perl/Tk...

-- 
David N. Lombard

My comments represent my opinions, not those of Intel Corporation.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From djholm at fnal.gov  Fri Feb 13 13:15:28 2004
From: djholm at fnal.gov (Don Holmgren)
Date: Fri, 13 Feb 2004 12:15:28 -0600
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
Message-ID: <Pine.SGI.4.58.0402131043370.208687643@hppc.fnal.gov>


On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote:

> Here is an irregular question. I am profiling a software package that runs
> over LAM-MPI on 16 node clusters [Details Below]. I would like to measure
> the effect of increased latency on the run time of the program.
>
> It would be nice if I could quantify the added latency in the process to
> create some statistics. If possible, I do not want to alter the code line
> of the program, or buy new hardware. I am looking for a software
> solution/idea.
>
> Bazaar Cluster:
> 16 Node Red Hat Linux machines running 500MHz PIII, 512MB RAM
> 1 100Mbps NIC card in each machine
> 2 100Mbps Full-Duplex switches
>
> Cairo Cluster:
> 16 Node YellowDog Linux machines running 1GHz PPC G4, 1GB RAM
> 2 1Gbps NIC cards in each machine (only one in use)
> 2 1Gbps Full-Duplex switches
>
> For more details on these clusters follow the link below:
> http://cluster.earlham.edu/html/
>
> Thank you,
>
> Josh Hursey
> Earlham College Cluster Computing Group
>

Not an irregular question at all.

I tried something like this a couple of years ago to investigate the
bandwidth and latency sensitivity of an application which was using
MPICH over Myrinet.  One of D.K.Panda's students from Ohio State
University had a modified version of the "mcp" for Myrinet which added
quality of service features, tunable per connection.  The "mcp" is the
code which runs on the LANai microprocessor on the Myrinet interface
card. The modifications on top of the OSU modifications to gm used a
hardware timer on the interface card to add a fixed delay per packet for
bandwidth tuning, and a fixed delay per message (i.e., a delay added to
only the first packet of a new connection) for latency tuning.  Via
netpipe, I verified that I could independently tune the bandwidth and
latency.  Lots of fun to play with - for example, by plotting the
difference in message times for two different latency setting, the
eager-rendezvous threshold was easily identified.  All in all a very
useful experiment which told us a lot about our application.

Clearly, you want to delay the sending of a message, or the processing
of a received communication, without otherwise interfering with what the
system is doing.  Adding a 50 microsecond busy loop, say, to the
beginning of an MPI_*Send call is going to perturb your results because
the processor won't be doing useful work during that time.  That's
obviously not the same as running on a network with a switch that adds
the same 50 microseconds latency; in that case, the processor could be
doing useful work during the delay, happily overlapping computations
with communications.

Nevertheless, adding busy loops might still give you useful results.
You might want to look into using a LD_PRELOAD library to intercept MPI
calls of interest, assuming you're using a shared library for MPI.  In
your version, do the busy loop, then fall into the normal call.  A quick
google search on "LD_PRELOAD" or "library interposers" will return a lot
of examples, such as:
    http://uberhip.com/godber/interception/index.html
    http://developers.sun.com/solaris/articles/lib_interposers.html
The advantage of this approach is that no modifications to your source
code or compiled binaries are necessary.  You'll have to think carefully
about whether the added latency is slowing your application simply
because the processor is not doing work during the busy loop.  If I were
you, I'd modify your source code and time your syncronizations (eg,
MPI_Wait).  If your code is cpu-bound, these will return right away, and
adding latency via a busy loop is going to give you the wrong answer.
If your code is communications bound, these will have a variable delay
depending upon the latency and bandwidth of the network.

You are likely interested in delays of 10's of microseconds.  The most
accurate busy loops for this sort of thing use the processor hardware
timers, which tick every clock on x86.  On a G5 PPC running OS-X, the
hardware timer ticks every 60 cpu cycles.  I'm not sure what a PPC does
under Linux.  On x86, you can read the cycle timer via:
   #include <asm/msr.h>
   unsigned long long timerVal;
   rdtscll(timerVal);

A crude delay loop example:

   rdtscll(timeStart);
   do {
      rdtscll(timeEnd);
   } while ((timeEnd - timeStart) < latency * usecPerTick);

where latency is in microseconds, and usecPerTick is your calibration.

There have been other recent postings to this mailing list about using
inline assembler macros to read the time stamp counter.

Injecting small latencies w/out busy loops and without disturbing your
source code is going to be very difficult (though I'd love to be
contradicted on that statement!).  A couple of far fetched ideas in
kernel land:

 - some ethernet interfaces have very sophisticated processors aboard.
   IIRC there were gigE NICs (Broadcom, maybe???) which had a MIPS cpu.
   Perhaps the firmware can be modified similarly to the modified mcp
   for gm discussed above.  Obviously this has the huge disadvantage of
   being specific to particular network chips.

 - the local APIC on x86 processors has a programmable interval timer
   with better than  microsecond granularity which can be used to
   generate an interrupt.  Perhaps in the communications stack, or in
   the network device driver, a wait_queue could be used to postpone
   processing until after an interrupt from this timer.  I would worry
   about considerable jitter, though.
   For a sample driver using this feature,
   see
        http://www.oberle.org/apic_timer-timers.html
   The various realtime Linux folks talk about this as well:
        http://www.linuxdevices.com/articles/AT6105045931.html
   Unfortunately, IIRC this timer is now used (since 2.4 kernel) for
   interprocessor interrupts on SMP systems.  On uniprocessor systems it
   may still be available.

I hope there's something useful for you in this response.  I'm hoping
even more that there are other responses to your question - I would love
a facility which would allow me to "turn the dial" on latency and/or
bandwidth.  There's a substantial cost difference between a gigE cluster
and a Myrinet/Infiniband/Quadrix/SCI cluster, and it would be great to
simulate performance of different network architectures on specific
applications.

Don Holmgren
Fermilab
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lusk at mcs.anl.gov  Fri Feb 13 13:31:08 2004
From: lusk at mcs.anl.gov (Rusty Lusk)
Date: Fri, 13 Feb 2004 12:31:08 -0600 (CST)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heidelberg.de>
References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
	<Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <20040213.123108.12267444.lusk@localhost>

> Suggestions:
> - modify the routines that make MPI calls to call instead some wrapper 
> routines that do some thumb twiddling before making the MPI call; this 
> requires modification of the program source
> - modify the MPI routines (well, if you use an open-source MPI 
> implementation) to insert some delay, then relink your binary if static

With any standard-conforming MPI implementation, open-source or not, you
can use the MPI "profiling" interface to provide any kind of wrapper at
all.  Basically, you write your own MPI_Send, etc., which does whatever
you want and also calls PMPI_Send (required to be there) to do the real
work.  Then you link your routines in front of the MPI library, and
voila!

Cheers,
Rusty Lusk

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From modus at pr.es.to  Thu Feb 12 23:53:04 2004
From: modus at pr.es.to (Patrick Michael Kane)
Date: Thu, 12 Feb 2004 20:53:04 -0800
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages
In-Reply-To: <200402131435.54453.csamuel@vpac.org>; from csamuel@vpac.org on Fri, Feb 13, 2004 at 02:35:51PM +1100
References: <Pine.LNX.4.44.0402111907230.31576-100000@boltzmann-internal> <402AFD58.9060402@tamu.edu> <Pine.GSO.4.58.0402122121040.13767@is.rice.edu> <200402131435.54453.csamuel@vpac.org>
Message-ID: <20040212205304.A16115@pr.es.to>

* Chris Samuel (csamuel at vpac.org) [040212 20:42]:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Fri, 13 Feb 2004 02:22 pm, Brent M. Clements wrote:
> 
> > The best switch that we have found both in price and speed are the GigE
> > Switches from Dell. We use them in a few of our test clusters and smaller
> > clusters. They are actually pretty good performers and top even some of
> > the cisco switches.
> 
> That's bizzare, the GigE switches I've seen in a Dell cluster *were* rebadged 
> Cisco switches.  Even had to do the usual "PortFast" routine in IOS to get 
> PXE booting to work.

They used to be, I believe.

Now they appear to be something else (for their latest 24 port layer-2
model).  I've had good luck with them with the latest firmware, before
that they were fairly flakey.  Check the dell forums for all the
yammering and howling on the PowerEdge 5224.

Best,
-- 
Patrick Michael Kane
<modus at pr.es.to>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Fri Feb 13 12:44:33 2004
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 13 Feb 2004 18:44:33 +0100 (CET)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
Message-ID: <Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heidelberg.de>

On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote:

> Here is an irregular question. I am profiling a software package that runs
> over LAM-MPI on 16 node clusters [Details Below]. I would like to measure
> the effect of increased latency on the run time of the program.

It appears that in your setup MPI uses TCP/IP as underlying protocol.  
Latency is a fuzzy parameter in TCP/IP. Adding to something fuzzy gives a
fuzzy result.  So there :-)

Suggestions:
- modify the routines that make MPI calls to call instead some wrapper 
routines that do some thumb twiddling before making the MPI call; this 
requires modification of the program source
- modify the MPI routines (well, if you use an open-source MPI 
implementation) to insert some delay, then relink your binary if static
- modify the kernel source to insert some delays in the TCP path - pretty 
hard as TCP is very complex
- modify the network driver to insert some delays in the Tx or Rx packet
path; not very difficult, but might be leveled by the delays of TCP.

The kernel modifications have the disadvantage that they also require some 
way to change the delay value, so adding a /proc entry, an ioctl, etc. 
unless you want to recompile the kernel and reboot after each delay 
change.

> For more details on these clusters follow the link below:
> http://cluster.earlham.edu/html/

Please tell to whoever coded that page that Opera doesn't display it 
properly. And I use Opera all the time ;-)
The page also doesn't specify an important detail: the network cards/chips 
used in the clusters.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Fri Feb 13 14:01:25 2004
From: gropp at mcs.anl.gov (William Gropp)
Date: Fri, 13 Feb 2004 13:01:25 -0600
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heid
 elberg.de>
References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu>
 <Pine.LNX.4.44.0402131830260.1786-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <6.0.0.22.2.20040213125745.0266bbc0@localhost>

At 11:44 AM 2/13/2004, Bogdan Costescu wrote:
>On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote:
>
> > Here is an irregular question. I am profiling a software package that runs
> > over LAM-MPI on 16 node clusters [Details Below]. I would like to measure
> > the effect of increased latency on the run time of the program.
>
>It appears that in your setup MPI uses TCP/IP as underlying protocol.
>Latency is a fuzzy parameter in TCP/IP. Adding to something fuzzy gives a
>fuzzy result.  So there :-)
>
>Suggestions:
>- modify the routines that make MPI calls to call instead some wrapper
>routines that do some thumb twiddling before making the MPI call; this
>requires modification of the program source

Actually, this is not necessary, as long as you have the object files, not 
just the executable.  The MPI profiling interface could be used to add 
latency to every send and receive operation; adding latency to collectives 
will require some care, as the exact set of communication operations that 
an MPI implementation uses is up to the implementation.  Simply write your 
own MPI routine and call the PMPI version (e.g., for MPI_Send, call 
PMPI_Send) after adding some latency.

Note also that MPI may use any communication mechanism.  Even on small 
clusters, it may use something besides TCP (e.g., when the network is 
Infiniband).  MPI on SMPs often uses a collection of communication approaches.

Bill 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mwheeler at startext.co.uk  Fri Feb 13 15:39:14 2004
From: mwheeler at startext.co.uk (Martin WHEELER)
Date: Fri, 13 Feb 2004 20:39:14 +0000 (UTC)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com>
Message-ID: <Pine.LNX.4.33.0402131709430.22724-100000@caxton.startext.demon.co.uk>

On Fri, 13 Feb 2004, Lombard, David N wrote:

> There's an active 1130 group, and you too can run R2V12 on your very own
> 1130 simulator, complete w/ Fortran (not EMU, sigh) and other tools.
> IIRC, APL may even be available.  http://ibm1130.org

Thanks for the link -- didn't know about that.  As arts faculty
post-grads (applied linguistics) we were only allowed to play with
Fortran (and even then were regarded with deep suspicion by the physics
wallahs).

Now -- where did I put that stack of cards...?

Off to the attic to dig out more stuff.
-- 
Martin Wheeler   -   StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England
mwheeler at startext.co.uk                http://www.startext.co.uk/mwheeler/
GPG pub key : 01269BEB  6CAD BFFB DB11 653E B1B7 C62B  AC93 0ED8 0126 9BEB
      - Share your knowledge. It's a way of achieving immortality. -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Fri Feb 13 16:54:28 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Fri, 13 Feb 2004 16:54:28 -0500 (EST)
Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte
 messages
In-Reply-To: <402AFD58.9060402@tamu.edu>
Message-ID: <Pine.LNX.4.44.0402131639380.31576-100000@boltzmann-internal>


I wondered about your low cost switch statement. I had done this test
before, but I thought I would redo it anyway. I have an SMC 8 port GigE
EasySwitch 8508T (PriceGrabber $140 to my door). I should say that the
switch is not loaded, so it may fall down if the load were higher.
This is just two nodes running netpipe through the switch.

Latency: 0.000034
Now starting main loop
  0:         1 bytes 7287 times -->    0.22 Mbps in 0.000034 sec
  1:         2 bytes 7338 times -->    0.46 Mbps in 0.000033 sec
  2:         3 bytes 7469 times -->    0.68 Mbps in 0.000034 sec
  3:         4 bytes 4923 times -->    0.90 Mbps in 0.000034 sec
  4:         6 bytes 5545 times -->    1.36 Mbps in 0.000034 sec
  5:         8 bytes 3711 times -->    1.81 Mbps in 0.000034 sec
  6:        12 bytes 4637 times -->    2.67 Mbps in 0.000034 sec


My opinion: If you get a switch that can not "switch" then it
is broken by design. The original poster noted that his results seem to go
from OK to "really bad" for basic MPI tests. If a switch does this it is 
"really broken". Of course it may not be the switch.

BTW, the results were for a $30 NIC (netgear GA302T) running in 
a 66MHz slot. Top throughput was 800 Mbits/sec.

Doug


On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote:

> Realize that not all switches are created equal when working with small 
> (and, overall, 0-byte == small) packets.  A number of otherwise decent 
> network switches are less than stellar performers with small packets. 
> We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test 
> system running under the RFC-2544 testing suite...
> 
> There are switches that perform well with small packets, but it's been 
> our experience that most switches, especially your lower cost switches 
> (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some 
> others I can't recall right now) didn't perform well with smaller 
> packets but did fine when the packet size was about 1500 bytes.
> 
> Going with cheap switches is usually not a good way to improve performance.
> 
> gerry
> 
> Douglas Eadline, Cluster World Magazine wrote:
> > On Wed, 11 Feb 2004, Bernhard Wegner wrote:
> > 
> > 
> >>Hello,
> >>
> >>I have a really small "cluster" of 4 PC's which are connected by a normal 
> >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
> >>I thought I might be able to improve performance by connecting the machines 
> >>via a Gigabit switch (which are really cheap nowadays).
> >>
> >>Everything seemed to work fine. The switch indicates 1000Mbit connections to 
> >>the PC's and transfer rate for scp-ing large files is significantly higher 
> >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
> >>with the 100 Mbit switch.
> >>
> >>I wasn't able to actually track down the problem, but it seems that there is 
> >>a problem with small messages. When I run the performance test provided with 
> >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
> >>byte message length, while for larger messages everything looks fine (linear 
> >>dependancy of transfer time on message length, everything below 300 us). I 
> >>have also tried mpich2 which shows exactly the same behavior.
> >>
> >>Does anyone have any idea?
> > 
> > 
> > First, I assume you were running the 100BT through the same 
> > onboard NICs and got reasonable performance. So some possible
> > things:
> > 
> > - the switch is a dog or it is broken
> > - your cables may be old or bad (but worked fine for 100BT)
> > - negotiation problem
> > 
> > Some things to try:
> > 
> > Use a cross over cable (cat5e) and see if you get the same problem.
> > You might try using a lower level benchmark (of the micro variety)
> > like netperf and netpipe. 
> > 
> > The Beowulf Performance Suite:
> > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
> > 
> > has these tests. Also, the December and January issues of ClusterWorld
> > show how to test a network connection using netpipe. At some point this 
> > content will be showing up on the web-page. 
> > 
> > Also, the MPI Link-checker from Microway (www.microway.com)
> > 
> > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
> > 
> > May help.
> > 
> > 
> > Doug
> > 
> > 
> >>Here are the details of my system: 
> >> - Suse Linux 9.0 (kernel 2.4.21)
> >> - mpich-1.2.5.2
> >> - motherboard ASUS P4P800
> >> - LAN (10/100/1000) on board (3COM 3C940 chipset)
> >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M 
> > 
> > +
> > 
> >>   8x88E1111-BAB, AT89C2051-24PI)
> >>
> >>
> > 
> > 
> 
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Fri Feb 13 17:46:38 2004
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 13 Feb 2004 23:46:38 +0100 (CET)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.SGI.4.58.0402131043370.208687643@hppc.fnal.gov>
Message-ID: <Pine.LNX.4.44.0402132227420.1786-100000@kenzo.iwr.uni-heidelberg.de>

On Fri, 13 Feb 2004, Don Holmgren wrote:

> I tried something like this a couple of years ago to investigate the
> bandwidth and latency sensitivity of an application which was using
> MPICH over Myrinet.

... which is pretty different from the setup of the original poster :-)
But I'd like to see it discussed in general, so let's go on.

> a modified version of the "mcp" for Myrinet which added ...

Is this publicly available ? I'd like to give it a try.

> The modifications on top of the OSU modifications to gm

Well, that's a very important point: using GM, which doesn't try to make 
too many things like TCP does. I haven't used GM directly nor looked at 
its code, but I think that it doesn't introduce delays, like TCP does in 
some cases. Moreover, based on the description in the GM docs, GM is not 
needed to be optimized by the compiler as it's not in the fast path. 
Obviously, in such conditions, the results can be relied upon.

> Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send
> call is going to perturb your results because the processor won't be
> doing useful work during that time.

In the case of TCP, the processor doesn't appear to be doing anything
useful for "long" times, as it spends time in kernel space. So, a 50
microseconds busy loop might not make a difference. And given the somehow
non-deterministic behaviour of TCP in this respect, it might be that
adding the delay before the PMPI_* or after PMPI_* calls might make a
difference.

The delays don't have to be busy-loops. Busy-loops are probably precise,
but might have some side-effects; for example, reading some hardware
counter (even more as it is on a PCI device, which is "far" from the CPU
and might be even "farther" if it has any PCI bridge(s) in between)  
repeatedly will generate lots of "in*" operations during which the CPU is
stalled waiting for data. Especially with today's CPU speeds, I/O 
operations are expensive in terms of CPU cycles...

> You are likely interested in delays of 10's of microseconds.

Well, it depends :-) The latencies for today's HW+SW seem to be in a range
of about 2 orders of magnitude, so giving absolute figures doesn't make
much sense IMHO. Apart from this I would rather suggest an exponential
increase in the delay value.

>  - some ethernet interfaces have very sophisticated processors aboard.
>    IIRC there were gigE NICs (Broadcom, maybe???) which had a MIPS cpu.

Well, if the company releases enough documentation about the chip, then
yes ;-) 3Com has the 990 line which is still FastE but has a programmable
processor, so it's not only GigE.

>    Obviously this has the huge disadvantage of being specific to
>    particular network chips.

But there aren't so many programmable network chips these days. Those 
Ethernet chips might even be in wider use than Myrinet[1] and more people 
might benefit from such development. If I'd have to choose for the next 
cluster purchase the GigE network cards and I'd know that one offers such 
capabilities while not having significant flaws compared to the others, 
I'd certainly buy it.

Another hardware approach: the modern 3Com cards driven by 3c59x, Cyclone
and Tornado, have the means to delay a packet in their (hardware) Tx
queue. There is however a catch: there is not guarantee that the packet
will be sent at the exact time specified, it can be delayed; the only
guarantee is that the packet is not sent before that time. However, I 
somehow think that this is true for most other approaches, so it's not so 
bad as it sounds :-)
The operation is pretty simple, as the packet is "stamped" with the time 
when it should be transmitted, expressed as some internal clock ticks. 
Only one "in" operation to read the current clock is needed per packet, so 
this is certainly much less intrusive as the busy-loop.
[ I'm too busy (but not busy-looping :-)) to try this at the moment. If
somebody feels the urge, I can provide some guidance :-) ]

However, anything that still uses TCP (as both your Broadcom approach and 
my 3Com one do) will likely generate unreliable results...

> it would be great to simulate performance of different network
> architectures on specific applications.

Certainly ! Especially as this would provide means to justify spending 
money on fast interconnect ;-)

[1] I don't want this to look like I'm saying "compared with Myrinet as
it's the most widely used high-performance interconnect" and neglect
Infiniband, SCI, etc; I have no idea about "market share" of the different
interconnects. I compare with Myrinet because the original message talked
about it and because I'm ignorant WRT programmable processors on other
interconnect NICs.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From djholm at fnal.gov  Fri Feb 13 19:49:05 2004
From: djholm at fnal.gov (Don Holmgren)
Date: Fri, 13 Feb 2004 18:49:05 -0600
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To:  <Pine.LNX.4.44.0402132227420.1786-100000@kenzo.iwr.uni-heidelberg.de>
References:  <Pine.LNX.4.44.0402132227420.1786-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.SGI.4.58.0402131705490.251303101@hppc.fnal.gov>


On Fri, 13 Feb 2004, Bogdan Costescu wrote:

> On Fri, 13 Feb 2004, Don Holmgren wrote:
>
> > I tried something like this a couple of years ago to investigate the
> > bandwidth and latency sensitivity of an application which was using
> > MPICH over Myrinet.
>
> ... which is pretty different from the setup of the original poster :-)
> But I'd like to see it discussed in general, so let's go on.
>
> > a modified version of the "mcp" for Myrinet which added ...
>
> Is this publicly available ? I'd like to give it a try.

I'm afraid not, sorry, since the modified code base from OSU isn't
publically available.  IIRC it was part of a project for a masters
degree; if it's OK with them, it's OK with me (we can take this
offline).  The modified MCP had a bug I never fixed which required me to
reset the card and reload the driver when some counter overflowed, at
something like a gigabyte of messages.  Long enough to get very good
statistics, though.

>
> > The modifications on top of the OSU modifications to gm
>
> Well, that's a very important point: using GM, which doesn't try to make
> too many things like TCP does. I haven't used GM directly nor looked at
> its code, but I think that it doesn't introduce delays, like TCP does in
> some cases. Moreover, based on the description in the GM docs, GM is not
> needed to be optimized by the compiler as it's not in the fast path.
> Obviously, in such conditions, the results can be relied upon.

I miswrote a bit; to be precise, this was a modification to the MCP,
which is the NIC firmware, rather than to GM, which is the user space
code that interacts with the NIC hardware.  The modification caused the
NIC itself to introduce interpacket delays of a configurable value.  To
the application (well, to MPICH and to GM) it simply looked like the
external Myrinet network had a different bandwidth and/or latency.
There were tiny code changes to MPICH and to GM to allow modification of
the interpacket delay values in the MCP; otherwise I would have had to
recompile or patch the firmware image and reload that image for each new
value.

You are absolutely correct that GM, like all good OS-bypass software,
doesn't introduce the delays that you'd encounter with communications
protocols like TCP that have to pass through the kernel/user space
boundary.  Much more deterministic.

>
> > Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send
> > call is going to perturb your results because the processor won't be
> > doing useful work during that time.
>
> In the case of TCP, the processor doesn't appear to be doing anything
> useful for "long" times, as it spends time in kernel space. So, a 50
> microseconds busy loop might not make a difference. And given the somehow
> non-deterministic behaviour of TCP in this respect, it might be that
> adding the delay before the PMPI_* or after PMPI_* calls might make a
> difference.

TCP processing is likely a significant component of the natural latency,
and, as you point out, during that time the CPU is busy in kernel space
and isn't doing useful work.  But the goal here is to add additional
artificial latency in a manner that mimics a slower physical network,
i.e., so that during this artificial delay the application can still be
crunching numbers.  In user space I don't see how to accomplish this
goal (adding latency, yes; adding latency during which the cpu can do
calculations, no).

If delay code is added correctly in kernel space, say in the TCP/IP
stack (sounds like a nasty bit of careful work!), then during that 50
usec period the CPU could certainly be doing useful work in user space.
Small delays, relative to the timer tick, are very difficult to do
accurately in non-realtime kernels unless you have a handy source of
interrupts, like the local APIC.

Assuming that LAM MPI isn't multithreaded (I have no idea), then adding
a delay in the user space code in the MPI call, whether it's a sleep or
a busy loop, guarantees that no useful application work can done during
the delay.

I'm confess to be totally ignorant of the PMPI_* calls (time for
homework!) and defer humbly to the MPI masters from ANL.  I'm definitely
curious as to how these added latencies are implemented.

>
> The delays don't have to be busy-loops. Busy-loops are probably precise,
> but might have some side-effects; for example, reading some hardware
> counter (even more as it is on a PCI device, which is "far" from the CPU
> and might be even "farther" if it has any PCI bridge(s) in between)
> repeatedly will generate lots of "in*" operations during which the CPU is
> stalled waiting for data. Especially with today's CPU speeds, I/O
> operations are expensive in terms of CPU cycles...

Agreed, though I'd hope on x86 that reading the time stamp counter is
very quick and with minimal impact - it's got to be more like a
register-to-register move than an I/O access.  Hopefully on a modern
superscalar processor this doesn't interfere with the other execution
units.

[As I write this, I just ran a program that reads the time stamp counter
back to back to different registers, multiple times.  The difference in
values was a consistent 84 counts or 56 nsec on this 1.5 GHz Xeon - so,
definitely minimal impact.]

Without busy loops, achieving accurate delays of the order of 10's to
100's of microseconds with little jitter is a real trick in user space,
(and kernel space as well!).  nanosleep() won't work, delivering order
10 or 20 msec (i.e., the next timer tick) instead of the 50 usec
request.

>
> > You are likely interested in delays of 10's of microseconds.
>
> Well, it depends :-) The latencies for today's HW+SW seem to be in a range
> of about 2 orders of magnitude, so giving absolute figures doesn't make
> much sense IMHO. Apart from this I would rather suggest an exponential
> increase in the delay value.

True.  I was really thinking of my specific problem, not his!

The relevent latency range for deciding between Infiniband and switched
ethernet is ~ 6 usec to ~ 100+ usec, and the bandwidth range is ~ 100
MB/sec (gigE) to ~ 700 MB/sec (I.B.).  It would be really useful to be
able to inject latencies in that latency range with a precision of 5
usec or so, and to dial the bandwidth with a precision of ~ 50 MB/sec.
Of course, if latency really matters, one would drop TCP/IP and use an
OS-bypass, like GAMMA or MVIA.

> ...
>
> > it would be great to simulate performance of different network
> > architectures on specific applications.
>
> Certainly ! Especially as this would provide means to justify spending
> money on fast interconnect ;-)

What we need is some kind corporate soul to put up a large public
cluster with the lowest latency, highest bandwidth network fabric
available.  Then, we can add our adjustable firmware and degrade that
fabric to mimic less expensive networks, and figure out what we should
really buy.  Works for me!


Don Holmgren
Fermilab
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sat Feb 14 04:47:30 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sat, 14 Feb 2004 10:47:30 +0100 (CET)
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com>
Message-ID: <Pine.LNX.4.44.0402141044320.31047-100000@druifje.clustervision.com>

On Fri, 13 Feb 2004, Lombard, David N wrote:

> 
> Ah, another 1130 veteran!  Group hug!
> 
Talking about 'mature' computer systems,
I was at the ATLAS centre at RAL yesterday, where they display the
console of te IBM 360 in the front hall.
Plenty of blinkenlights and switches to toggle.

The notice beside it said it was a 15 MIPS machine. Seems impressive
for a machineof this vintage.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Sat Feb 14 04:43:37 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Sat, 14 Feb 2004 10:43:37 +0100 (CET)
Subject: [Beowulf] problmes with MPICH
In-Reply-To: <000d01c3f23d$0e2af910$1101000a@cpn.senamhi.gob.pe>
Message-ID: <Pine.LNX.4.44.0402141038400.31047-100000@druifje.clustervision.com>

On Fri, 13 Feb 2004, Richard Miguel wrote:

> Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need help
> in this point.
> 
> I have mpich-1.2.5.2 and fortran pgi and rsh.
> 
./configure -rsh=RSHCOMMAND   


>From the configure.in:

"The environment variable 'RSHCOMMAND' allows you to select an alternative
remote shell command (by default, configure will use 'rsh' or 'remsh' from
your 'PATH').  If your remote shell command does not support the '-l' 
option
(some AFS versions of 'rsh' have this bug), also give the option
'-rshnol'.  These options are useful only when building a network version
of MPICH (e.g., '--with-device=ch_p4').
The configure option '-rsh' is supported for backward compatibility."

SO rsh is the defautl behaviour.

You can compile with the rsh command set to the rsh under $SGE_HOME/mpi 
also.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Sat Feb 14 11:31:51 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Sat, 14 Feb 2004 11:31:51 -0500 (EST)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.SGI.4.58.0402131705490.251303101@hppc.fnal.gov>
Message-ID: <Pine.LNX.4.44.0402141126510.10373-100000@coffee.psychology.mcmaster.ca>

given the difficulty of accurately adding a small amount of latency
to a message passing interface, how about this: hack the driver to 
artificially pre/append a constant number of bytes to each message.  
they will appear to take longer to process, giving high-resolution added
delays.  course, this will also saturate earlier, but that's only the upper
knee of the curve: you can still learn what you want...

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From konstantin_kudin at yahoo.com  Sat Feb 14 14:28:22 2004
From: konstantin_kudin at yahoo.com (Konstantin Kudin)
Date: Sat, 14 Feb 2004 11:28:22 -0800 (PST)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
Message-ID: <20040214192822.35170.qmail@web21203.mail.yahoo.com>

 I am curious if anyone is using SMART monitoring of
ide drives in a big cluster.

 Basically, the question is in what percentage of the
situations when a drive fails SMART is able to give
some kind of a reasonable warning beforehand, let's
say more than 24 hours. And how often it does not
predict failure at all?

 The reason I am asking is that recently I had a drive
that started getting bunch of I/O errors on certain
sectors, yet SMART seemed to indicate that things were
fine.

 Thanks!

 Konstantin

 
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Sat Feb 14 18:12:38 2004
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Sun, 15 Feb 2004 00:12:38 +0100 (CET)
Subject: [Beowulf] Adding Latency to a Cluster Environment
In-Reply-To: <Pine.LNX.4.44.0402141126510.10373-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0402142347230.13996-100000@kenzo.iwr.uni-heidelberg.de>

On Sat, 14 Feb 2004, Mark Hahn wrote:

> hack the driver to artificially pre/append a constant number of
> bytes to each message.

I thought of this as well, but I dsmissed it because:

- if the higher level protocol uses fragmentation and checksums, I
think that it's pretty hard for the driver to mess with the messages.
- a side effect might be faster filling up of some FIFO buffers on the 
receiver side, which might influence in unexpected ways the latency 
that we want to measure. Another side effect might be on the switch 
(assuming a network that uses switches) where data might be kept 
longer in buffers or peak bandwidth might be reached for short times, 
but enough to make a difference...
- for networks that offer a very low latency, simulating a large 
latency might require adding a big lot of junk data, many times larger 
than the original message.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Mon Feb 16 09:08:54 2004
From: timm at fnal.gov (Steven Timm)
Date: Mon, 16 Feb 2004 08:08:54 -0600 (CST)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <20040214192822.35170.qmail@web21203.mail.yahoo.com>
References: <20040214192822.35170.qmail@web21203.mail.yahoo.com>
Message-ID: <Pine.LNX.4.58.0402160807020.19741@boxer.fnal.gov>

We are using the SMART monitoring on our cluster.  It depends
on the drive model how much predictive power you will get.
On the drives where we have had the most failures we've kept track
of how well SMART predicted it pretty well.. it finds an error
in advance about half the time.

Steve Timm


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Core Support Services Dept.
Assistant Group Leader, Scientific Computing Support Group
Lead of Computing Farms Team

On Sat, 14 Feb 2004, Konstantin Kudin wrote:

>  I am curious if anyone is using SMART monitoring of
> ide drives in a big cluster.
>
>  Basically, the question is in what percentage of the
> situations when a drive fails SMART is able to give
> some kind of a reasonable warning beforehand, let's
> say more than 24 hours. And how often it does not
> predict failure at all?
>
>  The reason I am asking is that recently I had a drive
> that started getting bunch of I/O errors on certain
> sectors, yet SMART seemed to indicate that things were
> fine.
>
>  Thanks!
>
>  Konstantin
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Finance: Get your refund fast by filing online.
> http://taxes.yahoo.com/filing.html
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From camm at enhanced.com  Mon Feb 16 10:47:01 2004
From: camm at enhanced.com (Camm Maguire)
Date: 16 Feb 2004 10:47:01 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <Pine.LNX.4.44.0309121013000.6195-100000@twin.uoregon.edu>
Message-ID: <54brnzrpqi.fsf@intech19.enhanced.com>

Greetings!  The subject line says it all -- where can one get the most
bang per watt among systems currently available?

Take care,
-- 
Camm Maguire			     			camm at enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Mon Feb 16 11:19:53 2004
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Mon, 16 Feb 2004 11:19:53 -0500 (EST)
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <54brnzrpqi.fsf@intech19.enhanced.com>
References: <54brnzrpqi.fsf@intech19.enhanced.com>
Message-ID: <Pine.LNX.4.58.0402161113330.23278@chaos.egr.duke.edu>

On Mon, 16 Feb 2004 at 10:47am, Camm Maguire wrote

> Greetings!  The subject line says it all -- where can one get the most
> bang per watt among systems currently available?

I have no numbers or benchmarks, but my search for a quiet but powerful 
set of nodes led me to buy Dell Optiplex SX270s.  They've got the Intel 
865G chipset (800MHz FSB, 400MHz dual channel memory), P4 HT up to 3.2GHz, 
onboard e1000, laptop-style HDD, a 150W power supply, and little else.  
They're sweet little systems.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gmpc at sanger.ac.uk  Mon Feb 16 12:10:43 2004
From: gmpc at sanger.ac.uk (Guy Coates)
Date: Mon, 16 Feb 2004 17:10:43 +0000 (GMT)
Subject: Subject: [Beowulf] S.M.A.R.T usage in big clusters 
In-Reply-To: <200402151704.i1FH4Vh21871@NewBlue.scyld.com>
References: <200402151704.i1FH4Vh21871@NewBlue.scyld.com>
Message-ID: <Pine.OSF.4.44.0402160956310.1755229-100000@ecs2c.internal.sanger.ac.uk>

> Message: 1
> Date: Sat, 14 Feb 2004 11:28:22 -0800 (PST)
> From: Konstantin Kudin <konstantin_kudin at yahoo.com>
> To: beowulf at beowulf.org
> Subject: [Beowulf] S.M.A.R.T usage in big clusters
>
>  I am curious if anyone is using SMART monitoring of
> ide drives in a big cluster.

Yes. We use smartmon tools

http://smartmontools.sourceforge.net/

Hard drive failures are by far the most common hardware failure we see on
our systems. We've hooked smartmontools into the batch queueing system we
use, so that if drives are flagged as failing, the host gets closed to new
jobs. (You could extend this to do checkpoint/migration if your code
supports it, ours doesn't.)

Our cluster typically runs fairly short jobs (less than 1 hour or so) so
jobs usually finish before the drive finally fails.  I haven't collected
any hard statistics on how many failures we catch before it impacts on a
user's work, but my gut feeling is that it catches over 80% of the cases,
and certainly enough for it to be worthwhile implementing.

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Mon Feb 16 16:00:34 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 16 Feb 2004 13:00:34 -0800
Subject: [Beowulf] Max flops to watts hardware for a cluster
References: <54brnzrpqi.fsf@intech19.enhanced.com>
Message-ID: <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422>

This is an exceedingly sophisticated question..

Do you count:
Wall plug watts to flops? or CPU watts to flops?
does the interconnect count?  (just the power in the line drivers and
terminations is a big power consumer for spaceflight hardware... why LVDS is
overtaking RS-422 ... 300mV into 100 ohms is a lot better than 12-15V into
100 ohms. Too bad LVDS parts don't have the common mode voltage tolerance.

I'll bet that gigabit backplane in the switch burns a fair amount of
power...

does the memory count?  This would drive more vs less cache decisions, which
affect algorithm partitioning and data locality of reference.

Is there a constraint on a "total minimum speed" or "maximum number of
nodes"?  The interesting tradeoff in speed of nodes vs number of nodes
manifests itself in many ways: more interconnects, bigger switches, etc.

More nodes means Larger physical size means longer cables means more cable
capacitance to charge and discharge on each bit means more power in the line
drivers.

What's your message latency requirement?  Can you do store and forward
through the nodes (a'la iPSC/1 hypercubes) (saving you the switch, but
adding some power in the CPU to shuffle messages around)

Can free space optical interconnects be used?  (power hungry Tx and Rx, but
no cable length issues)


Anyway.. this is an issue that is very near and dear to my heart (since I'm
designing power constrained systems).  One problem you'll find is that
reliable and comparable (across processors/architectures) numbers are very
hard to come by.  I've spent a fair amount of time explaining why 40 MFLOPs
in a 20 MHz DSP can actually be a lot more "crunch" at a lot less power than
a 200 MIPS PowerPC 750 running at 133 MHz.


Jim Lux
Spacecraft Telecommunications Section
Jet Propulsion Lab


----- Original Message -----
From: "Camm Maguire" <camm at enhanced.com>
To: <beowulf at scyld.com>
Sent: Monday, February 16, 2004 7:47 AM
Subject: [Beowulf] Max flops to watts hardware for a cluster


> Greetings!  The subject line says it all -- where can one get the most
> bang per watt among systems currently available?
>
> Take care,
> --
> Camm Maguire      camm at enhanced.com
> ==========================================================================
> "The earth is but one country, and mankind its citizens."  --  Baha'u'llah
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From amacater at galactic.demon.co.uk  Mon Feb 16 18:11:50 2004
From: amacater at galactic.demon.co.uk (Andrew M.A. Cater)
Date: Mon, 16 Feb 2004 23:11:50 +0000
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422>
References: <54brnzrpqi.fsf@intech19.enhanced.com> <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422>
Message-ID: <20040216231150.GA3060@galactic.demon.co.uk>

On Mon, Feb 16, 2004 at 01:00:34PM -0800, Jim Lux wrote:
> This is an exceedingly sophisticated question..
> 
> Do you count:
> Wall plug watts to flops? or CPU watts to flops?
Via Eden / Nehemiah chips at 1GHz for 7W or Acorn ARM e.g. Simtec 
evaluation boards ?

> does the interconnect count?  (just the power in the line drivers and
> terminations is a big power consumer for spaceflight hardware... why LVDS is
> overtaking RS-422 ... 300mV into 100 ohms is a lot better than 12-15V into
> 100 ohms. Too bad LVDS parts don't have the common mode voltage tolerance.
> 
Cheap slow ASICs and serial port type speeds? Low power Bluetooth 
devices?

> I'll bet that gigabit backplane in the switch burns a fair amount of
> power...
> 
> does the memory count?  This would drive more vs less cache decisions, which
> affect algorithm partitioning and data locality of reference.
> 
The early Seymour Cray model - minimum numbers of standard parts that 
are ultra fast?

> Is there a constraint on a "total minimum speed" or "maximum number of
> nodes"?  The interesting tradeoff in speed of nodes vs number of nodes
> manifests itself in many ways: more interconnects, bigger switches, etc.
> 

Buckyball of PDA's anyone ? :)

> More nodes means Larger physical size means longer cables means more cable
> capacitance to charge and discharge on each bit means more power in the line
> drivers.
> 

Xilinx FPGA type architecture? Inmos transputer-style? Node on chip? AVR 
Atmel-type chips?

> What's your message latency requirement?  Can you do store and forward
> through the nodes (a'la iPSC/1 hypercubes) (saving you the switch, but
> adding some power in the CPU to shuffle messages around)
> 
> Can free space optical interconnects be used?  (power hungry Tx and Rx, but
> no cable length issues)
> 

ThinkGeek do an _ultra cool_ looking green pumped laser pointer which 
will reach low cloudbases :)

> 
> Anyway.. this is an issue that is very near and dear to my heart (since I'm
> designing power constrained systems).  One problem you'll find is that
> reliable and comparable (across processors/architectures) numbers are very
> hard to come by.  I've spent a fair amount of time explaining why 40 MFLOPs
> in a 20 MHz DSP can actually be a lot more "crunch" at a lot less power than
> a 200 MIPS PowerPC 750 running at 133 MHz.
> 
If 5W of power goes to/from Mars - then the JPL are the ones to beat on 
this [makes QRP radio hams look positively profligate] :)

Andy
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Mon Feb 16 20:45:49 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 17 Feb 2004 12:45:49 +1100
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422>
References: <54brnzrpqi.fsf@intech19.enhanced.com> <20040216231150.GA3060@galactic.demon.co.uk> <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422>
Message-ID: <200402171245.51746.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 17 Feb 2004 12:22 pm, Jim Lux wrote:

> For those interested, all the deep space comm stuff is documented in CCSDS
> specs at http://www.ccsds.org/

Cool.

http://www1.ietf.org/mail-archive/ietf-announce/Current/msg27294.html

This document describes how to encapsulate Internet Protocol 
version 4 and version 6 packets can be encapsulated in 
Consultative Committee for Space Data Systems (CCSDS) Space 
Data Link Protocols.

That's going to be one hell of a round trip time for pings..

<ObCluster>

What about distributed processing between spacecraft ?   OK, maybe 
interplanetary would be a bit much, but what about lander(s) and orbiter(s) ?

</ObCluster>

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAMXJNO2KABBYQAh8RAkTUAKCDfbAaswt3oWYDrEzXecdrqPfIPACff5cS
UUAVTMwPAR3XA3lHjjf9lYc=
=+LJH
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Mon Feb 16 20:22:51 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 16 Feb 2004 17:22:51 -0800
Subject: [Beowulf] Max flops to watts hardware for a cluster
References: <54brnzrpqi.fsf@intech19.enhanced.com> <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> <20040216231150.GA3060@galactic.demon.co.uk>
Message-ID: <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422>

> >
> If 5W of power goes to/from Mars - then the JPL are the ones to beat on
> this [makes QRP radio hams look positively profligate] :)

that 15W from Mars, on the omni antenna, only gets you 7-8 bits/second,
working into a 70 meter diameter dish and a cryogenically cooled receiver
front end.  A bit beyond the typical ham's rig or budget.

Going the other way, it's hundreds of kW into the dish.  Beyond QRO.

More realistically, they get a hundred kbps or so on the UHF link to the
orbiter from a basically omni antenna on the rover. I can't recall what the
max rate on the "direct to earth" X-band high gain antenna (which is about
20 cm in diameter) is, but it's probably in the same ballpark.

That's the actual signalling rate, also... there's some coding going on as
well, so the "data rate" is lower, after you take out framing, error
correction etc.

For those interested, all the deep space comm stuff is documented in CCSDS
specs at http://www.ccsds.org/

---
Actually, the low power per function (or more accurately, low energy per
function) champs are probably the cellphone folks.. Battery life is a real
selling point.  The little GPS receivers for cellphones are actually spec'd
in milliJoules/fix, for instance.

That said, I don't see anyone building a big crunching cluster out of
cellphones...  It's all those other issues you have to deal with..
interconnects, cluster management, memory, etc.  They all require energy.

Jim Lux
Spacecraft Telecommunications Equipment Section
Jet Propulsion Laboratory

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Tue Feb 17 00:34:30 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 16 Feb 2004 21:34:30 -0800
Subject: [Beowulf] Max flops to watts hardware for a cluster
References: <54brnzrpqi.fsf@intech19.enhanced.com> <20040216231150.GA3060@galactic.demon.co.uk> <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> <200402171245.51746.csamuel@vpac.org>
Message-ID: <001a01c3f517$b7bbffb0$36a8a8c0@LAPTOP152422>

>
> This document describes how to encapsulate Internet Protocol
> version 4 and version 6 packets can be encapsulated in
> Consultative Committee for Space Data Systems (CCSDS) Space
> Data Link Protocols.
>
> That's going to be one hell of a round trip time for pings..
>
> <ObCluster>
>
> What about distributed processing between spacecraft ?   OK, maybe
> interplanetary would be a bit much, but what about lander(s) and
orbiter(s) ?
>
> </ObCluster>
>

Such ideas are being contemplated, and not only by me.  There are
distributed computing/ cooperative robotics sorts of things, and also
"formation flying" sorts of things, not to mention "sensor webs".

 Probably the biggest problem is not a technology one but a philosophical
one.  Spacecraft and mission design is exceedingly conservative, and you'd
have to show that it would enable something that's needed, that can't be
done by conventional approaches. It's sufficiently unusual that it doesn't
fit well with the usual analysis models for spacecraft; which tend to push
towards "one big X" supplied by power from "one big Y" using "one big Z" to
talk to home, etc.  The costing spreadsheets used in speculative mission
planning don't have cells for "number of processors in cluster" and "power
per node"   You need a fairly straightforward model that says, in effect,
you can process "x" amount of data with "y" mass and "z" watts/joules. That
model must be backed up by credible analysis and experience ("heritage" in
space speak).   In general, the perception is that "more parts = more
potential failure points = higher risk" so it's gotta be a "this is the ONLY
way to make the measurement" or it's not going to fly.

You're going to spend years and years getting ready to go, and you can't go
fix it if it breaks.  Spaceflight is a very, very, very different conceptual
and planning model. (we won't even get into what you have to do if it's
connected to human space flight in any way...).  The time from "great idea"
to "mission launch" is probably in the area of 5-7 years.  The CPU flying on
the Mars Rovers is a Rad6000, which is based on an old MIPS processor.
Current missions in planning and development use things like PowerPC750's
(derated) and Sparc7s and 8's (aka ERC32 and/or LEON) and ADSP21020 clones.
Nobody is thinking about flying ARMs or Transmetas or even Pentiums.  The
popular scheme these days is various and sundry microcores (6502, 8051,
PPC604s) in Xilinx megagate FPGAs.

Actually, though, the fact that only these relatively low powered
(computationally) processors are what are flying is what makes clusters
attractive.  If you need hundreds of megaflops to do your measurement,
you're only going to get it with multiple processors.

Jim Lux
JPL

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikhailberis at free.net.ph  Tue Feb 17 06:56:34 2004
From: mikhailberis at free.net.ph (Dean Michael C. Berris)
Date: 17 Feb 2004 19:56:34 +0800
Subject: [Beowulf] Best Setup for Batch Systems
Message-ID: <1077018992.18450.21.camel@mikhail>

Good day everyone,

I have just a 5 node cluster networked together with a 100 Mbps Ethernet
hub (well, not the best setup). The master acts as a NAT host for the
internal hosts, and only the master node has 2 nics, one facing the
internet and another facing the internal net. The master node is
accessible from the internet, and I login to it to run jobs in the
background (using screen).

I've been reading a lot about OpenPBS and the Maui scheduler, but as
mentioned in the list and also evident in the website, the OpenPBS
system is not readily downloadable/distributable. Are there any
alternatives to OpenPBS which does most of the same thing (batch
scheduling of jobs for clusters)? Interfaceability using a GUI frontend
(without having to make one of my own) is definitely a plus.

TIA
 
-- 
Dean Michael C. Berris
http://mikhailberis.blogspot.com
mikhailberis at free.net.ph
+63 919 8720686
GPG 08AE6EAC


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb 17 08:47:41 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 17 Feb 2004 14:47:41 +0100 (CET)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077018992.18450.21.camel@mikhail>
Message-ID: <Pine.LNX.4.44.0402171444310.3421-100000@druifje.clustervision.com>

On 17 Feb 2004, Dean Michael C. Berris wrote:

> Good day everyone,
> 
> 
> I've been reading a lot about OpenPBS and the Maui scheduler, but as
> mentioned in the list and also evident in the website, the OpenPBS
> system is not readily downloadable/distributable. Are there any
> alternatives to OpenPBS which does most of the same thing (batch
> scheduling of jobs for clusters)? Interfaceability using a GUI frontend
> (without having to make one of my own) is definitely a plus.

Gridengine is probably a good bet for you.
http://gridengine.sunsource.net
The GUI is called qmon (I don't use it much)

There are binaries available, and clear instructions on how to install 
it. If you have problems, join the Gridengine list where we'll help.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb 17 08:40:46 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 17 Feb 2004 14:40:46 +0100 (CET)
Subject: [Beowulf] Linux-HA conference and tutorial, UK
Message-ID: <Pine.LNX.4.44.0402171038390.1881-100000@druifje.clustervision.com>

If anyone is interested in Linux-HA,
the UKUUG are having a tutorial and conference in Bournemouth.

The people leading the tutorial are Alan Robertson and 
Lars Markowsky-Bree, who head up the Linux-HA project.
http://www.ukuug.org/events/winter2004/


(ps. I won't be there)


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From camm at enhanced.com  Tue Feb 17 11:41:19 2004
From: camm at enhanced.com (Camm Maguire)
Date: 17 Feb 2004 11:41:19 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <Pine.LNX.4.44.0402161635110.1671-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0402161635110.1671-100000@coffee.psychology.mcmaster.ca>
Message-ID: <547jylk6a8.fsf@intech19.enhanced.com>

Greetings, and thanks for the fascinating discussion!  

I'm mostly interested in dram flops, and also not the absolute
maximum, mars-rover level technology, but say within 10% of the best
available options on a more or less commodity basis.

Take care,

Mark Hahn <hahn at physics.mcmaster.ca> writes:

> > Greetings!  The subject line says it all -- where can one get the most
> > bang per watt among systems currently available?
> 
> depends on which kind of flops: cache-friendly or dram-oriented?
> 
> 
> 
> 

-- 
Camm Maguire			     			camm at enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Tue Feb 17 14:59:52 2004
From: atp at piskorski.com (Andrew Piskorski)
Date: Tue, 17 Feb 2004 14:59:52 -0500
Subject: [Beowulf] ECC RAM or not?
Message-ID: <20040217195952.GA50999@piskorski.com>

For a low-cost cluster, would you insist on ECC RAM or not, and why?

My inclination would be to always use ECC for anything, but it looks
as if there is no such thing as an inexpensive motherboard which also
supports ECC RAM.  Either you can have a cheap motherboard (well under
$100) with no ECC, or a pricey (well over $100) motherboard with ECC.
Am I mistaken about this, are are there really no exceptions to this
seeming "ECC motherboads are always expensive" rule?

Also at least some large production clusters out there, KASY0, for
example, doesn't use ECC RAM, do not use ECC - I wonder why:

  http://aggregate.org/KASY0/cost.html

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Tue Feb 17 18:20:12 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Tue, 17 Feb 2004 18:20:12 -0500 (EST)
Subject: [Beowulf] ECC RAM or not?
In-Reply-To: <20040217195952.GA50999@piskorski.com>
Message-ID: <Pine.LNX.4.44.0402171741240.10055-100000@coffee.psychology.mcmaster.ca>

> For a low-cost cluster, would you insist on ECC RAM or not, and why?

how low-cost, and what kind of code?

technically, the chances of seeing dram corruption depends on how much
ram you have, and how much you use it (as well as environmental factors,
such as altitude, of course!)  for a sufficiently low-cost cluster,
you'd expect to have relatively little ram, and little CPU power to churn it,
and therefore low rate of bit-flips.  otoh, you can bet that the recent 
ECC upgrade of the VT cluster had a significant real cost (probably eaten
by vendors for PR reasons...)

some kinds of codes are "rad hard", in the sense that if a failure gives
you a possibly-wront answer, you can just check the answer.  that definition
pretty much excludes traditional supercomputing, and certainly all
physics-based simulations.  searching/optimization stuff might work well
in that mode, though rechecking only catches false positives, doesn't 
recover from false negatives.  I suspect that doing ECC is cheaper than 
messing around with this kind of uncertainty, even for these specialized codes.

> My inclination would be to always use ECC for anything, but it looks
> as if there is no such thing as an inexpensive motherboard which also
> supports ECC RAM.  Either you can have a cheap motherboard (well under
> $100) with no ECC, or a pricey (well over $100) motherboard with ECC.

well, you're really pointing out the difference between desktop and 
workstation/server markets.  for instance, there's not much physical
difference between the i875 and i865 chipsets, but the former shows 
up in $200 boards that need a video card, and the latter in $100 ones
that have integrated video.

> Am I mistaken about this, are are there really no exceptions to this
> seeming "ECC motherboads are always expensive" rule?

it's a marketing/market-driven phenomenon.

it's useful to work out the risks when you make this kind of decision.

if you have 32 low-overhead nodes containing 20K-hour power supplies, you'll
need to think about doing a replacement per month.

if you have a 1M-hour disk in each of 1100 nodes, you shouldn't be shocked
to get a couple failures a week.

if 1100 nodes with 4G but no ECC see a two undetected corruptions a day,
then 32 nodes with 1G will go a couple months between events...

regards, mark hahn.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dieter at engr.uky.edu  Tue Feb 17 18:18:20 2004
From: dieter at engr.uky.edu (William Dieter)
Date: Tue, 17 Feb 2004 18:18:20 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <200402171701.i1HH13h07766@NewBlue.scyld.com>
Message-ID: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu>

Try the cluster design tool at 
<http://cgi.aggregate.org/cgi-bin/cdr.cgi>.  You can enter your basic 
memory, memory bandwidth, etc requirements, then set the metric 
weighting to choose designs with the least power consumption first.

For example, for the default requirements (minimal memory, disk, and 
network requirements, at least 50 GFLOPS, and a $10,000 budget), and 
weighting power consumption first then memory bandwidth, followed by 
GFLOPS I get the following as the best design:

23 	Generic Fast Ethernet NIC 			  $8.00 	$184.00
23 	Cat5 Cable for Fast Ethernet 			  $2.00 	$46.00
1 	Generic 24 Port Fast Ethernet Switch 	$76.00 	$76.00
23 	Pentium 4 2.4GHz 					$166.00 	$3818.00
23 	Generic Socket 478 					$56.00 	$1288.00
69 	Generic PC3200 256MB DDR 			$44.00 	$3036.00
23 	Generic Mid-Tower Case 				$50.00 	$1150.00
3 	Generic 2x2 Shelving Unit with Wheels 	$50.00 	$150.00
									Total 	$9748.00

The above design gets you 50 GFLOPS and 2.67 bytes/FLOP for about 30 
Amps (you get to convert Amps to Watts.)  Everything else in the design 
is pretty minimal, but you can adjust the requirements on the form to 
get what you need (or if you can't let me know why not :-)

The CGI tries all designs with the parts in its database to find the 
ones that meet your requirements and metric weighting.  The model 
includes current consumption for switches and compute nodes based on 
the power supply.  The parts database is a bit out of date right now...

let me know what you think.

Bill Dieter.
dieter at engr.uky.edu

On Tuesday, February 17, 2004, at 12:01 PM, Camm Maguire wrote:
> Greetings, and thanks for the fascinating discussion!
>
> I'm mostly interested in dram flops, and also not the absolute
> maximum, mars-rover level technology, but say within 10% of the best
> available options on a more or less commodity basis.
>
> Take care,
>
> Mark Hahn <hahn at physics.mcmaster.ca> writes:
>
>>> Greetings!  The subject line says it all -- where can one get the 
>>> most
>>> bang per watt among systems currently available?
>>
>> depends on which kind of flops: cache-friendly or dram-oriented?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Feb 17 21:38:39 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 18 Feb 2004 13:38:39 +1100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077018992.18450.21.camel@mikhail>
References: <1077018992.18450.21.camel@mikhail>
Message-ID: <200402181338.50678.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 17 Feb 2004 10:56 pm, Dean Michael C. Berris wrote:

> I've been reading a lot about OpenPBS and the Maui scheduler, but as
> mentioned in the list and also evident in the website, the OpenPBS
> system is not readily downloadable/distributable.

There is a forked version of OpenPBS called 'Torque' (it was called 
ScalablePBS, but Altair requested it changed its name) which includes a whole 
host of bug fixes and enhancements (including massive scalability) and is 
freely downloadable under an earlier, more free, OpenPBS license.

It's under active development and has an active user community, though the 
mailing list is moderated for some bizzare reason, which means posts can take 
a little while to get through.

The website is at:

	http://www.supercluster.org/projects/torque/

Good luck!
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAMtAvO2KABBYQAh8RAp8cAJsHNJuoCmIxYMNUWguwpoueopKUxACdHJiq
p0nGW3X3ATurlzaV+Iw5jtg=
=xwcU
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Tue Feb 17 23:20:37 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Wed, 18 Feb 2004 12:20:37 +0800 (CST)
Subject: [Beowulf] SLURM - newest (and greatest?) batch system
Message-ID: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com>

One of the new features of SGE 6.0 is the parallelized
job container (qmaster).

Another batch system called SLURM (Simple Linux
Utility for Resource Management) will be releasing
soon.

http://www.llnl.gov/linux/slurm/slurm.html

- Like SGE 6.0, it also uses threads to parallelize
the job container.
- licensed under the GPL!!
- developed by the US gov
- uses Maui
- designed to be simple :)
- supports lots of interconnect switches.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Tue Feb 17 23:02:54 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Wed, 18 Feb 2004 12:02:54 +0800 (CST)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077018992.18450.21.camel@mikhail>
Message-ID: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>

You can choose between SGE and SPBS.

SGE has more features, better fault tolerance, better
documentation, and better user support.

http://gridengine.sunsource.net

SPBS is closer to what you have now, so you and your
users (BTW, are you the only one?) don't need to learn
something new.

http://www.supercluster.org/

Andrew.


 --- "Dean Michael C. Berris"
<mikhailberis at free.net.ph> ????> Good day
everyone,
> 
> I have just a 5 node cluster networked together with
> a 100 Mbps Ethernet
> hub (well, not the best setup). The master acts as a
> NAT host for the
> internal hosts, and only the master node has 2 nics,
> one facing the
> internet and another facing the internal net. The
> master node is
> accessible from the internet, and I login to it to
> run jobs in the
> background (using screen).
> 
> I've been reading a lot about OpenPBS and the Maui
> scheduler, but as
> mentioned in the list and also evident in the
> website, the OpenPBS
> system is not readily downloadable/distributable.
> Are there any
> alternatives to OpenPBS which does most of the same
> thing (batch
> scheduling of jobs for clusters)? Interfaceability
> using a GUI frontend
> (without having to make one of my own) is definitely
> a plus.
> 
> TIA
>  
> -- 
> Dean Michael C. Berris
> http://mikhailberis.blogspot.com
> mikhailberis at free.net.ph
> +63 919 8720686
> GPG 08AE6EAC
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Tue Feb 17 23:37:00 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 17 Feb 2004 20:37:00 -0800
Subject: [Beowulf] ECC RAM or not?
References: <Pine.LNX.4.44.0402171741240.10055-100000@coffee.psychology.mcmaster.ca>
Message-ID: <003101c3f5d8$d9b03250$36a8a8c0@LAPTOP152422>

> some kinds of codes are "rad hard", in the sense that if a failure gives
> you a possibly-wront answer, you can just check the answer.

My practical experience with DRAM designs has been that bit errors are more
likely due to noise/design issues than radiation induced single event
upsets. Back in the 80's I worked on a Multibus system where we used to get
double bit errors in 11/8 ecc several times a week.  Everyone just said
"well, that's why we have ECC" until I did some quick statistics on what the
ratio between single bit (corrected but counted) and double bit errors
should have been. Such high rates defied belief, and it turned out to be a
bus drive problem.


 that definition
> pretty much excludes traditional supercomputing, and certainly all
> physics-based simulations.  searching/optimization stuff might work well
> in that mode, though rechecking only catches false positives, doesn't
> recover from false negatives.  I suspect that doing ECC is cheaper than
> messing around with this kind of uncertainty, even for these specialized
codes.


There are a number of algorithms which have inherent self checking built in.
In the accounting business, this is why there's double entry, and/or
checksums. In the signal processing world, there are checks you can do on
things like FFTs, where total power in should equal total power out.

>
>
> if you have 32 low-overhead nodes containing 20K-hour power supplies,
you'll
> need to think about doing a replacement per month.
>
> if you have a 1M-hour disk in each of 1100 nodes, you shouldn't be shocked
> to get a couple failures a week.

Shades of replacing tubes in Eniac or the Q-7A

MIL-HDBK-217A is the "bible" on these sorts of computations.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Tue Feb 17 23:28:53 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 17 Feb 2004 20:28:53 -0800
Subject: [Beowulf] Max flops to watts hardware for a cluster
References: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu>
Message-ID: <002701c3f5d7$ceae2980$36a8a8c0@LAPTOP152422>

This kind of thing is way cool..
Have you published the algorithm behind the page in a concise form
somewhere?  It would be handy to be able to point mission/system planners to
it.

----- Original Message -----
From: "William Dieter" <dieter at engr.uky.edu>
To: <beowulf at scyld.com>
Sent: Tuesday, February 17, 2004 3:18 PM
Subject: Re: [Beowulf] Max flops to watts hardware for a cluster


> Try the cluster design tool at
> <http://cgi.aggregate.org/cgi-bin/cdr.cgi>.  You can enter your basic
> memory, memory bandwidth, etc requirements, then set the metric
> weighting to choose designs with the least power consumption first.
>
> For example, for the default requirements (minimal memory, disk, and
> network requirements, at least 50 GFLOPS, and a $10,000 budget), and
> weighting power consumption first then memory bandwidth, followed by
> GFLOPS I get the following as the best design:
>
> 23 Generic Fast Ethernet NIC   $8.00 $184.00
> 23 Cat5 Cable for Fast Ethernet   $2.00 $46.00
> 1 Generic 24 Port Fast Ethernet Switch $76.00 $76.00
> 23 Pentium 4 2.4GHz $166.00 $3818.00
> 23 Generic Socket 478 $56.00 $1288.00
> 69 Generic PC3200 256MB DDR $44.00 $3036.00
> 23 Generic Mid-Tower Case $50.00 $1150.00
> 3 Generic 2x2 Shelving Unit with Wheels $50.00 $150.00
> Total $9748.00
>
> The above design gets you 50 GFLOPS and 2.67 bytes/FLOP for about 30
> Amps (you get to convert Amps to Watts.)  Everything else in the design
> is pretty minimal, but you can adjust the requirements on the form to
> get what you need (or if you can't let me know why not :-)
>
> The CGI tries all designs with the parts in its database to find the
> ones that meet your requirements and metric weighting.  The model
> includes current consumption for switches and compute nodes based on
> the power supply.  The parts database is a bit out of date right now...
>
> let me know what you think.
>
> Bill Dieter.
> dieter at engr.uky.edu
>
> On Tuesday, February 17, 2004, at 12:01 PM, Camm Maguire wrote:
> > Greetings, and thanks for the fascinating discussion!
> >
> > I'm mostly interested in dram flops, and also not the absolute
> > maximum, mars-rover level technology, but say within 10% of the best
> > available options on a more or less commodity basis.
> >
> > Take care,
> >
> > Mark Hahn <hahn at physics.mcmaster.ca> writes:
> >
> >>> Greetings!  The subject line says it all -- where can one get the
> >>> most
> >>> bang per watt among systems currently available?
> >>
> >> depends on which kind of flops: cache-friendly or dram-oriented?
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Wed Feb 18 01:26:38 2004
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 17 Feb 2004 22:26:38 -0800
Subject: [Beowulf] ECC RAM or not?
References: <Pine.LNX.4.44.0402180026230.14755-100000@coffee.psychology.mcmaster.ca>
Message-ID: <000601c3f5e8$2a8430f0$36a8a8c0@LAPTOP152422>


----- Original Message -----
From: "Mark Hahn" <hahn at physics.mcmaster.ca>
To: "Jim Lux" <James.P.Lux at jpl.nasa.gov>
Sent: Tuesday, February 17, 2004 9:36 PM
Subject: Re: [Beowulf] ECC RAM or not?


> > > some kinds of codes are "rad hard", in the sense that if a failure
gives
> > > you a possibly-wront answer, you can just check the answer.
> >
> > My practical experience with DRAM designs has been that bit errors are
more
> > likely due to noise/design issues than radiation induced single event
> > upsets.
>
> understood.  then again, you're using deliberately selected rad-hard-ware,
no?

Nope... that was off the shelf DRAMs in a commercial environment (in 1980ish
time frame, so they were none too dense DRAMs, either..  256kB on a board I
think, many, many, pieces.. probably 64kbit parts..)


> I was mostly thinking about a talk I saw by the folks who care for ASCI-Q,
> which is in Los Alamos.  they say that the altitude alone is worth a 14x
> increase in particle flux, and that this caused big problems for them with
> a particular register on the ES40 data path that was not ecc'ed.

Indeed.. ECC on memory is only part of the problem.. you really need ECC on
address and data lines for full coverage (or, more properly EDAC)..   The
classic paper on altitude effects was done by folks at IBM, where they ran
boards in NY and in Denver and, underground in Denver.  Good experimental
technique, etc.


>
> > Back in the 80's I worked on a Multibus system where we used to get
> > double bit errors in 11/8 ecc several times a week.  Everyone just said
> > "well, that's why we have ECC" until I did some quick statistics on what
the
> > ratio between single bit (corrected but counted) and double bit errors
> > should have been. Such high rates defied belief, and it turned out to be
a
> > bus drive problem.
>
> makes sense.  to be honest, I don't see many single-bit errors even,
> but today we've only < 200 GB ram online.  inside a year, it'll probably
> be more like 2TB, so maybe things will get more exciting ;)

It's a very mixed bag, depending on what's causing the errors. If it's
radiation, smaller feature sizes mean that there's a smaller target to hit,
and the amount of energy transferred is less (of course, less energy is
stored in the memory cell, too)


> we're also pretty much at sealevel, with lots of building over us.
> reactor next door, though ;)

Type of particle, and it's energy, has a huge effect on the SEU effects.  I
would maintain, though, that run of the mill timing margin effects,
particularly over temperature; and EMI/EMC effects are probably a more
important source of bit hits in modern computers.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikhailberis at free.net.ph  Wed Feb 18 05:25:22 2004
From: mikhailberis at free.net.ph (Dean Michael C. Berris)
Date: 18 Feb 2004 18:25:22 +0800
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>
References: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>
Message-ID: <1077099918.4818.15.camel@mikhail>

Thanks sir, and to everyone else that responded. I'm currently reading
on SGE, and am going to be choosing as soon as I get the full picture.
Currently my preference is still towards SPBS (Torque) mainly because it
doesn't seem as complicated to set up.

However, as a Debian user, I did an apt-cache search on batch system and
a couple of packages were Queue and DQS (Distributed Queueing System). I
went over to the DQS website, and I'm reading on it right now. What I'd
like to know would be how different DQS (and/or Queue) is with regards
to SPBS and SGE?

It would seem like from what I've been reading, SGE and SPBS are really
for clusters (and grids), and DQS is for a collection of computers that
really don't work as a cluster (or as a parallel computer). How accurate
is this assessment of mine?

Are there any articles written by people in the group regarding
comparisons between SGE and SPBS with regards to effectivity and
reliability? Scalability is also a factor because the cluster may grow
as more funding and problems get into the cluster project.

I hope I never cease to get enlightened from posts in the group, and
insights would be most appreciated.

Thanks very much and have a nice day! :)

On Wed, 2004-02-18 at 12:02, Andrew Wang wrote:
> You can choose between SGE and SPBS.
> 
> SGE has more features, better fault tolerance, better
> documentation, and better user support.
> 
> http://gridengine.sunsource.net
> 
> SPBS is closer to what you have now, so you and your
> users (BTW, are you the only one?) don't need to learn
> something new.
> 
> http://www.supercluster.org/
> 
> Andrew.
> 
> 
>  --- "Dean Michael C. Berris"
> <mikhailberis at free.net.ph> ????> Good day
> everyone,
> > 
> > I have just a 5 node cluster networked together with
> > a 100 Mbps Ethernet
> > hub (well, not the best setup). The master acts as a
> > NAT host for the
> > internal hosts, and only the master node has 2 nics,
> > one facing the
> > internet and another facing the internal net. The
> > master node is
> > accessible from the internet, and I login to it to
> > run jobs in the
> > background (using screen).
> > 
> > I've been reading a lot about OpenPBS and the Maui
> > scheduler, but as
> > mentioned in the list and also evident in the
> > website, the OpenPBS
> > system is not readily downloadable/distributable.
> > Are there any
> > alternatives to OpenPBS which does most of the same
> > thing (batch
> > scheduling of jobs for clusters)? Interfaceability
> > using a GUI frontend
> > (without having to make one of my own) is definitely
> > a plus.
> > 
> > TIA
> >  
> > -- 
> > Dean Michael C. Berris
> > http://mikhailberis.blogspot.com
> > mikhailberis at free.net.ph
> > +63 919 8720686
> > GPG 08AE6EAC
> > 
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or
> > unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> -----------------------------------------------------------------
> ??? Yahoo!??
> ??????????????????????
> http://tw.promo.yahoo.com/mail_premium/stationery.html
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-- 
Dean Michael C. Berris
http://mikhailberis.blogspot.com
mikhailberis at free.net.ph
+63 919 8720686
GPG 08AE6EAC


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Wed Feb 18 05:47:27 2004
From: ds10025 at cam.ac.uk (ds10025 at cam.ac.uk)
Date: Wed, 18 Feb 2004 10:47:27 +0000
Subject: [Beowulf] Howto setup jobs using MPI
In-Reply-To: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>
References: <1077018992.18450.21.camel@mikhail>
Message-ID: <5.1.1.6.0.20040218104616.02a89e00@imap.hermes.cam.ac.uk>

Hi

How best to setup jobs using MPI?

Dan

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mack.joseph at epa.gov  Wed Feb 18 07:22:07 2004
From: mack.joseph at epa.gov (Joseph Mack)
Date: Wed, 18 Feb 2004 07:22:07 -0500
Subject: [Beowulf] S.M.A.R.T usage in big clusters
References: <20040214192822.35170.qmail@web21203.mail.yahoo.com> <Pine.LNX.4.58.0402160807020.19741@boxer.fnal.gov>
Message-ID: <403358EF.7F0BDE75@epa.gov>

Steven Timm wrote:
> 

> On the drives where we have had the most failures we've kept track
> of how well SMART predicted it pretty well.. it finds an error
> in advance about half the time.

How do you get your information out of smartd?

I've found output in syslog - presumably I can grep for this.

I can get e-mail if I want (from the docs).

To look at the output of the long and short tests it appears that
I have to interactively use smartctl.

Is there anyway to have a flag that can be looked at periodically to 
say "this disk is about to fail"?

Thanks Joe
-- 
Joseph Mack PhD, High Performance Computing & Scientific Visualization
SAIC, Supporting the EPA Research Triangle Park, NC 919-541-0007
Federal Contact - John B. Smith 919-541-1087 - smith.johnb at epa.gov
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Wed Feb 18 09:16:48 2004
From: timm at fnal.gov (Steven Timm)
Date: Wed, 18 Feb 2004 08:16:48 -0600 (CST)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <403358EF.7F0BDE75@epa.gov>
References: <20040214192822.35170.qmail@web21203.mail.yahoo.com>
 <Pine.LNX.4.58.0402160807020.19741@boxer.fnal.gov> <403358EF.7F0BDE75@epa.gov>
Message-ID: <Pine.LNX.4.58.0402180815110.22943@boxer.fnal.gov>

On Wed, 18 Feb 2004, Joseph Mack wrote:

> Steven Timm wrote:
> >
>
> > On the drives where we have had the most failures we've kept track
> > of how well SMART predicted it pretty well.. it finds an error
> > in advance about half the time.
>
> How do you get your information out of smartd?
>
> I've found output in syslog - presumably I can grep for this.

At the moment we are not using smartd.  I was running an older
version that didn't have it as part of the package. I wrote
some cron scripts that do a short test every night and capture
the output to a file.  But we are going to transition and
use smartd and use an agent we already have that is grepping
/var/log/messages for other purposes.

Steve Timm


>
> I can get e-mail if I want (from the docs).
>
> To look at the output of the long and short tests it appears that
> I have to interactively use smartctl.
>
> Is there anyway to have a flag that can be looked at periodically to
> say "this disk is about to fail"?
>
> Thanks Joe
> --
> Joseph Mack PhD, High Performance Computing & Scientific Visualization
> SAIC, Supporting the EPA Research Triangle Park, NC 919-541-0007
> Federal Contact - John B. Smith 919-541-1087 - smith.johnb at epa.gov
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dieter at engr.uky.edu  Wed Feb 18 09:35:55 2004
From: dieter at engr.uky.edu (William Dieter)
Date: Wed, 18 Feb 2004 09:35:55 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <002701c3f5d7$ceae2980$36a8a8c0@LAPTOP152422>
Message-ID: <C2BE53A4-621F-11D8-B4A2-000393BF25C6@engr.uky.edu>


On Tuesday, February 17, 2004, at 11:28 PM, Jim Lux wrote:
> This kind of thing is way cool..
> Have you published the algorithm behind the page in a concise form
> somewhere?  It would be handy to be able to point mission/system 
> planners to
> it.

We just submitted the paper to IEEE Computer for review last week.  If 
you want to look at the source code, it is available through 
<http://bdr.sourceforge.net>.  I haven't made an official tarball 
release yet, but you can get the latest code through CVS.

If you want to make your own parts database on our website you can do 
that, too.  It copies one of the existing databases into a new one, so 
if you just want to update a few prices, or add a few new parts, it 
doesn't take too much effort.

Bill.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hanzl at noel.feld.cvut.cz  Wed Feb 18 11:28:25 2004
From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz)
Date: Wed, 18 Feb 2004 17:28:25 +0100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077099918.4818.15.camel@mikhail>
References: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com>
	<1077099918.4818.15.camel@mikhail>
Message-ID: <20040218172825E.hanzl@unknown-domain>

> However, as a Debian user, I did an apt-cache search on batch system and
> a couple of packages were Queue and DQS (Distributed Queueing System). I
> went over to the DQS website, and I'm reading on it right now. What I'd
> like to know would be how different DQS (and/or Queue) is with regards
> to SPBS and SGE?

DQS is SGE's grandfather, the genealogy goes somehow like this:

  DQS(Florida State Univ.) -> CODINE(Genias) -> SGE(Sun)

so you can expect DQS to be much simpler but also you can expect SGE
to be much improoved.

(My personal choice is SGE and I am quite happy with it.)

Regards

Vaclav Hanzl

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Wed Feb 18 09:35:23 2004
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Wed, 18 Feb 2004 15:35:23 +0100 (CET)
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu>
Message-ID: <Pine.LNX.4.44.0402181528530.7667-100000@kenzo.iwr.uni-heidelberg.de>

On Tue, 17 Feb 2004, William Dieter wrote:

> 23 	Generic Fast Ethernet NIC 	  $8.00 	$184.00

How much in terms of power have you assigned to this item ?
If you really buy a cheap low-end FE NIC, you'll most probably end up 
with a RTL8139 based card. This chip by design puts quite a load on 
the main CPU especially if you use it in a cluster context (=lots of 
network activity). This might increase significantly the power 
consumption or reduce the available flops...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dieter at engr.uky.edu  Wed Feb 18 10:24:31 2004
From: dieter at engr.uky.edu (William Dieter)
Date: Wed, 18 Feb 2004 10:24:31 -0500
Subject: [Beowulf] Max flops to watts hardware for a cluster
In-Reply-To: <Pine.LNX.4.44.0402181528530.7667-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <8CD39EDD-6226-11D8-B4A2-000393BF25C6@engr.uky.edu>


On Wednesday, February 18, 2004, at 09:35 AM, Bogdan Costescu wrote:

> On Tue, 17 Feb 2004, William Dieter wrote:
>
>> 23 	Generic Fast Ethernet NIC 	  $8.00 	$184.00
>
> How much in terms of power have you assigned to this item ?

The tool is not perfect.  We have not broken down the power to that 
level of detail.  There is a tradeoff between how much work you have to 
do for each component and how much detail the model has.

> If you really buy a cheap low-end FE NIC, you'll most probably end up
> with a RTL8139 based card. This chip by design puts quite a load on
> the main CPU especially if you use it in a cluster context (=lots of
> network activity). This might increase significantly the power
> consumption or reduce the available flops...

To get really accurate power consumption numbers we would have to 
measure for many different CPU/Motherboard/NIC combinations.

OTOH, there are some really cheap cards based on the Davicom 9102 
chipset, (newegg.com has at least two different brands for $4.00 to 
$6.00).  The Davicom 9102  is enough of a tulip clone that the Ethernet 
HOWTO recommends trying the tulip driver before the manufacturer 
supplied driver...

Bill.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Wed Feb 18 13:27:55 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Wed, 18 Feb 2004 12:27:55 -0600 (CST)
Subject: [Beowulf] Best or standard hpc kernel sysctl settings.
Message-ID: <Pine.GSO.4.58.0402181225040.14188@is.rice.edu>

As part of our standards documentation, I'd like to set a good starting
point for tuning various kernel parameters for clusters on Rice's campus.

We have a few sysctl settings that we do based on the requirements of
certain codes, but I'd like to know how everyone else is tuning their
linux systems in their clusters.

Can I get from you guys the sysctl parameter, it's value, and the reason
why you set it that way?

Thanks,
Brent Clements
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dag at sonsorol.org  Wed Feb 18 15:54:58 2004
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed, 18 Feb 2004 15:54:58 -0500
Subject: [Beowulf] 2nd call for speakers -- Bioclusters 2004 Workshop -- March 30 Boston,
 MA
Message-ID: <4033D122.4080008@sonsorol.org>


{ Apologies for the cross-posting }

Enclosed is a meeting announcement for a 1 day workshop we are 
organizing alongside the much larger 'BioITWorld Expo' in Boston, Ma.

The goals are two-fold -- recreating the vibe from the OReilly 
Bioinformatics Technology conference series that was recently cancelled 
as well as providing a forum where folks involved at the intersection of 
life science research and high performance IT can come together to talk 
shop.

Feel free to pass along the enclosed announcement as appropriate. We are 
actively seeking technical talks and presentations focusing on how 
challenging problems were solved or overcome.

Regards,
Chris {on behalf of the organizing committee}
Email: bioclusters04 at open-bio.org


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bioclusters-workshop.txt
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20040218/4a933dd1/attachment-0001.txt>

From nixon at nsc.liu.se  Tue Feb 17 09:16:39 2004
From: nixon at nsc.liu.se (Leif Nixon)
Date: Tue, 17 Feb 2004 15:16:39 +0100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077018992.18450.21.camel@mikhail> (Dean Michael C. Berris's
 message of "17 Feb 2004 19:56:34 +0800")
References: <1077018992.18450.21.camel@mikhail>
Message-ID: <m3znbhaj08.fsf@nammatj.nsc.liu.se>

"Dean Michael C. Berris" <mikhailberis at free.net.ph> writes:

> I've been reading a lot about OpenPBS and the Maui scheduler, but as
> mentioned in the list and also evident in the website, the OpenPBS
> system is not readily downloadable/distributable. 

Torque (a.k.a Storm, a.k.a. Scalable PBS) is a fork of the OpenPBS
source tree, with active maintenance and a reasonable license.

http://www.supercluster.org/projects/torque

It plays nicely with Maui.

-- 
Leif Nixon                                    Systems expert
------------------------------------------------------------
National Supercomputer Centre           Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.giesen at kodak.com  Tue Feb 17 15:58:57 2004
From: david.giesen at kodak.com (David J Giesen)
Date: Tue, 17 Feb 2004 15:58:57 -0500
Subject: [Beowulf] Cluster questions for Quantum Chemistry
Message-ID: <40328091.A0200730@kodak.com>

Hello-

(Apologies to those who have seen a similar question on the CCL mailing
list)

We may be in the market for a new Linux cluster these days. 
Unfortunately, I haven't kept up on all the latest issues, and I'd
appreciate any answers you all have for any of these questions.

We want to run mainly QM codes such as Gaussian 98/Gaussian 03, Jaguar
and PQS on these
machines with linux.  We'd likely be running in parallel, typically
across 3-4 dual-processor nodes.

1) Xeon vs P4:  
[a] At the same GHz and front-side bus speed is there a difference in
performance between these chips?

[b] Is there a difference in reliability?

2) AMD Opteron vs Athlon: 
[a] Does any QM code actually take advantage of Opteron's 64-bit
technology? 

[b] Have people moved away from Athlon boxes because of heat problems?

3) AMD vs Intel: How to compare speeds between these two different types
of processors for QM codes?  Does an Athlon 2800 (2.08 GHz) run more
like a 2.0 GHz P4 or a 2.8 GHz P4?

3) How important is front-side bus speed these days for quantum
chemistry problems?

4) How important are 100 MHz ethernet versus 1 Gb ethernet connections
between the nodes for quantum chemistry problems?

Thanks in advance!

Dave

Any questions which highlight out my extreme stupidity are a result of
exactly that (my own stupidity) rather than a reflection on the
positions of the Eastman Kodak Company.

-- 
Dr. David J. Giesen
Eastman Kodak Company                           david.giesen at kodak.com
2/83/RL MC 02216                                (ph) 1-585-58(8-0480)
Rochester, NY 14650                             (fax)1-585-588-1839


-- 
Dr. David J. Giesen
Eastman Kodak Company                           david.giesen at kodak.com
2/83/RL MC 02216                                (ph) 1-585-58(8-0480)
Rochester, NY 14650                             (fax)1-585-588-1839

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Feb 18 22:09:53 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Thu, 19 Feb 2004 11:09:53 +0800 (CST)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <1077099918.4818.15.camel@mikhail>
Message-ID: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com>

 --- "Dean Michael C. Berris" 
> I'm  currently reading
> on SGE, and am going to be choosing as soon as I get
> the full picture.
> Currently my preference is still towards SPBS
> (Torque) mainly because it
> doesn't seem as complicated to set up.

To install SGE, you don't even need to compile the
source, just download the pre-compiled binary package,
or grab the rpm.

And also, SGE doesn't require root access, it can
untar the package in your home directory, run the
install scripts, and start playing with it.

> What I'd
> like to know would be how different DQS (and/or
> Queue) is with regards
> to SPBS and SGE?

Debian is planning to replace DQS with SGE, but the
maintainer of DQS was gone (he left the university).

DQS and SGE are very similar. And PBS and SPBS are
very similar too.

> It would seem like from what I've been reading, SGE
> and SPBS are really
> for clusters (and grids), and DQS is for a
> collection of computers that
> really don't work as a cluster (or as a parallel
> computer). How accurate
> is this assessment of mine?

Are you talking about compute farms?

SGE is also used in compute farms as well, where
people run EDA simulations, graphic rendering jobs,
BLAST jobs, etc.

SGE has quite a lot of resource management features.

SPBS/PBS are used in HPC clusters, since before SGE
was opensource, PBS was free/opensource, so more
people used it in those environments.


> Are there any articles written by people in the
> group regarding
> comparisons between SGE and SPBS with regards to
> effectivity and
> reliability?

SGE vs PBS on the rocks cluster mailing list:
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-September/002980.html

SPBS has lots of patches integrated, but still if your
SPBS master node crashes, your cluster is gone.

In SGE, the admin can config 1 or more shadow masters,
so in theory as long as any one machine in the cluster
is running, your cluster is not dead.

>  Scalability is also a factor because
> the cluster may grow
> as more funding and problems get into the cluster
> project.

Both SGE and SPBS can scale to thousands of nodes, the
question is, do you have the funding? :-)

(SGE 6.0 will scale even further)

> I hope I never cease to get enlightened from posts
> in the group, and
> insights would be most appreciated.

I think you should try to install both, it is better
to feel it than to just listen to other people.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Feb 18 22:38:18 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 19 Feb 2004 14:38:18 +1100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com>
References: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com>
Message-ID: <200402191438.19333.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 19 Feb 2004 02:09 pm, Andrew Wang wrote:

> SPBS has lots of patches integrated, but still if your
> SPBS master node crashes, your cluster is gone.

Well, depends on your definition of "gone" really.

People can't queue new jobs, jobs waiting to run won't be started, but as long 
as your filestore is elsewhere then running jobs won't be interrupted.

However, if your filestore server disappears then you're stuffed. :-)

Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFANC+qO2KABBYQAh8RApu3AKCET1tayR/fx4dStcQXO+AXJgThUACdE+3q
jHWTp4HmlzO8CnmObbFarWA=
=PrTq
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Thu Feb 19 10:41:26 2004
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Thu, 19 Feb 2004 07:41:26 -0800 (PST)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <200402191438.19333.csamuel@vpac.org>
Message-ID: <20040219154126.56423.qmail@web11411.mail.yahoo.com>

I think it is one of the biggest problems with *PBS, especially in the
compute farm environment.

The more advanced batch systems (SGE and LSF) have this feature for
years, not sure why *PBS still don't have it.

(AFAIK, PBSPro 5.4 will include it, but isn't it late??)

Rayson


--- Chris Samuel <csamuel at vpac.org> wrote:

> Well, depends on your definition of "gone" really.
> 
> People can't queue new jobs, jobs waiting to run won't be started,
> but as long 
> as your filestore is elsewhere then running jobs won't be
> interrupted.
> 
> However, if your filestore server disappears then you're stuffed. :-)
> 
> Chris
> 

__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.brookes at quadrics.com  Thu Feb 19 10:50:43 2004
From: john.brookes at quadrics.com (john.brookes at quadrics.com)
Date: Thu, 19 Feb 2004 15:50:43 -0000
Subject: [Beowulf] Best Setup for Batch Systems
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F6234EB15@tardis0.quadrics.com>

If you keep the db on a separate filestore then - if your pbs server goes down - you can just have a failover server that 'becomes' (takes over the ipaddr and hostname - the other nodes won't even notice the difference) the original server if the original gets screwed. We've got a couple of customers that do this, but YMMV as they use: a) a somewhat non-standard PBS; b) out-of-band management to ensure that the node isn't just temporarily unresponsive.

Cheers,

John Brookes
Quadrics

> -----Original Message-----
> From: Chris Samuel [mailto:csamuel at vpac.org]
> Sent: 19 February 2004 03:38
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] Best Setup for Batch Systems
> 
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Thu, 19 Feb 2004 02:09 pm, Andrew Wang wrote:
> 
> > SPBS has lots of patches integrated, but still if your
> > SPBS master node crashes, your cluster is gone.
> 
> Well, depends on your definition of "gone" really.
> 
> People can't queue new jobs, jobs waiting to run won't be 
> started, but as long 
> as your filestore is elsewhere then running jobs won't be interrupted.
> 
> However, if your filestore server disappears then you're stuffed. :-)
> 
> Chris
> - -- 
>  Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
> 
> iD8DBQFANC+qO2KABBYQAh8RApu3AKCET1tayR/fx4dStcQXO+AXJgThUACdE+3q
> jHWTp4HmlzO8CnmObbFarWA=
> =PrTq
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From radams at csail.mit.edu  Thu Feb 19 14:03:38 2004
From: radams at csail.mit.edu (Ryan Adams)
Date: Thu, 19 Feb 2004 14:03:38 -0500
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
Message-ID: <1077217418.4982.35.camel@localhost>

Please forgive the length of this email, as I'm going to try to be
comprehensive:

I have a problem that divides nicely (embarrassingly?) into
parallelizable chunks.  Each chunk takes approximately 2 to 5 seconds to
complete and requires no communication during that time.  Essentially
there is a piece of data, around 500KB that must be processed and a
result returned.  I'd like to process as many of these pieces of data as
possible.  I am considering building a small heterogeneous cluster to do
this (at home, basically), and am trying to decide exactly how to
architect the task distribution.  

The network will probably be Fast Ethernet.  Initially there will be
four machines processing the data, but I could imagine as many as ten in
the near term.  My current back-of-the-envelope math puts an aggregate
load (assuming 2.0s per job, 500KB transferred each, with ten nodes) of
2.5MB/s on the network, so it would seem that 100BT can get the job done
without introducing much delay compared to the 2.0s execution time. 
Perhaps I am doing this math wrong, but I was also thinking that since
the download of the data is such an I/O-intensive task that it would be
reasonable to place that in a separate thread from the floating point
calculations.  This way, I could hope to work on data while my socket
read is blocking.

My question is basically this: is 2-5 seconds too small of a job to
justify a batching system like *PBS or Gridengine?  It would seem that
the overhead for a job that requires a few hours would be very
insignificant, but what about a few seconds?  Certainly, one option
would be to bundle sets of these chunks together for a larger effective
job.  Am I wasting my time thinking about this?

I've been considering rolling my own scheduling system using some kind
of RPC, but I've been around software development long enough to know
that it is better to use something off-the-shelf if at all possible.

Thanks in advance...

Ryan


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 19 14:20:04 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 19 Feb 2004 14:20:04 -0500 (EST)
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <1077217418.4982.35.camel@localhost>
Message-ID: <Pine.LNX.4.44.0402191351170.21826-100000@lilith.rgb.private.net>

On Thu, 19 Feb 2004, Ryan Adams wrote:

> My question is basically this: is 2-5 seconds too small of a job to
> justify a batching system like *PBS or Gridengine?  It would seem that
> the overhead for a job that requires a few hours would be very
> insignificant, but what about a few seconds?  Certainly, one option
> would be to bundle sets of these chunks together for a larger effective
> job.  Am I wasting my time thinking about this?
> 
> I've been considering rolling my own scheduling system using some kind
> of RPC, but I've been around software development long enough to know
> that it is better to use something off-the-shelf if at all possible.
> 
> Thanks in advance...

I personally think that it is too small a task to use a batching system,
especially since you're likely not going to architect it as a true
batching system.

I think you have three primary options for ways to develop your code.
Well, four if you count NFS.

The SIMPLEST way is to put your data blocks in files on an NFS
crossmounted filesystem, and start jobs inside e.g. a simple perl script
loop that grabs "the next data file" and runs on it and writes out its
results back to the NFS file system for dealing with or accruing later.
You're basically using NFS as your transport mechanism.  Now, NFS isn't
horribly efficient relative to raw peak network speed, but neither is it
completely horrible -- at 100 BT (say 9-10 MB/sec peak BW) you ought to
be able to get at least half of that on an NFS read of a big file.  At 5
MB/sec, your 1/2 MB file should take a 0.1 seconds to be read (plus a
latency hit) which is "small" (as you note) compared to a run time of
2-5 seconds so you should be able to get nice parallel speedup on four
or five hosts.  You can test your combined latency and bandwidth with a
simple perl script or binary that opens a dozen (different!) files
inside a loop.  Beware caching, which will give you insane numbers if
you aren't careful (as in don't run the test twice on the files without
modifying them on the server).

The other three ways do it "properly" and permit you both finer control
(with the NFS method you'll have to work out file locking and work
distribution to make sure two nodes don't try to work on the same file
at the same time) and higher BW, close to the full bandwidth of the
network.  They'll ALSO require more programming.

  a) PVM

  b) MPI

  c) raw networking.

PVM is a message passing library.  There is a PVM program template on my
personal GPL software website:

 http://www.phy.duke.edu/~rgb/General/general.php

that might suffice to get you started -- it should just compile and run
a simple master/slave program, and you should be able to modify it
fairly simply to have the master distribute the next block of work to
the first worker/slave to finish.  If your CPUs are well balanced the
I/O transactions will antibunch and communications will be very
efficient.

MPI is another message passing library.  I don't have an MPI template,
but there are example programs in the MPI distributions and on many
websites, and there are books (on both PVM and MPI) from e.g. MIT press
that are quite excellent.  There is also a regular MPI column in Cluster
World Magazine that has been working through intro level MPI
applications, and old columns by Forrest Hoffman in Linux Magazine
ditto.  At least -- google is your friend.

Both PVM and MPI are likely to be similar in ease of programming, hassle
of setting up a parallel environment, and speed, and both of them should
give you a very healthy fraction of wirespeed while shielding you from
having to directly manipulate the network.

Finally there are raw sockets (which it sounds like you are inclined
towards).  Now, I have nothing against raw socket programming (he says
having spent the day on xmlsysd/wulflogger/libwulf, a raw socket-based
monitoring program:-).  However, it is NOT trivial -- you have to invent
all sorts of wheels that are already invented for you and wrapped up in
simple library calls with PVM or MPI.  Its advantages are maximal speed
-- you can't get faster than a point to point network connection -- the
ability to thread the connection/I/O component and MAYBE take advantage
of letting the NIC do some of the work via DMA while the CPU is doing
other work, and complete control.  The disadvantages are that you'll be
responsible for determining e.g. message length, dealing with a dropped
connection without crashing everything, debugging your server daemon and
worker clients (or worker daemons and master client) in parallel when
they are running on different machines, and so forth.

I >>might<< be able to provide you with some applications that aren't
exactly templates but that illustrate how to get started on this
approach (and refer you to some key books) but if you really are a
networking novice you'll need to want to do this as an excuse to stop
being a novice by writing your own application or it isn't worth it.
You'll need to be a much better and more skilled programmer altogether
in order to debug everything and check for the myriad of error
conditions that can occur and deal with them robustly.

There are really a few other approaches -- perl now supports threads so
you CAN use a perl script and ssh as a master/work distribution system
-- but raw sockets aren't much easier to manage in perl than they are in
C and using ssh as a transport layer adds overhead at least equal to or
in excess to NFS, so you'd probably want to use NFS as transport and the
perl script to just manage task distribution (for which it is ideally
suited in this simple a context).  I have a nice example threaded perl
task distribution script (which I wrote for MY Cluster Magazine column
some months ago:-) which I can put somewhere if this interests you.

 HTH,

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dcs at et.byu.edu  Thu Feb 19 15:19:56 2004
From: dcs at et.byu.edu (Dave Stirling)
Date: Thu, 19 Feb 2004 13:19:56 -0700 (MST)
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?
Message-ID: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu>

Hi all,

While performance (latency, bandwidth) usually comes to the fore in
discussions about high performance interconnects for MPI clusters, I'm
curious as to what your experiences are from the standpoint of
manageability -- NIC's and spines and switches all fail at one time or
another, but I'd like input as to how individual products (Myrinet,
Quadrics, Infiniband, etc) handle this.  In your clusters does the
hardware replacement involve simple steps (swap out the NIC, rerun some
config utilities) or something more complex (such as bringing down the
entire high speed network to reconfigure it so all the nodes can talk to
the new hardware); i.e., How painful is it to replace a single failed NIC?

I'd imagine that most cluster admins are reluctant to interrupt running
jobs in order to re-initialize the equipment after hardware replacement.
Any information about how your clusters running high-speed interconnects
handle interconnect hardware failure/replacement would be very helpful.

Thanks,

Dave Stirling
Brigham Young University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Feb 19 17:22:38 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 20 Feb 2004 09:22:38 +1100
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <20040219154126.56423.qmail@web11411.mail.yahoo.com>
References: <20040219154126.56423.qmail@web11411.mail.yahoo.com>
Message-ID: <200402200922.39632.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 20 Feb 2004 02:41 am, Rayson Ho wrote:

[No failover support in the pbs_server]

> I think it is one of the biggest problems with *PBS, especially in the
> compute farm environment.

Torque (formerly SPBS) is very stable, especially since we helped the 
SuperCluster folks clobber the various memory leaks in the server.

Our pbs_server has been running for almost a month now since I last restarted 
it (because I was doing a bit of system maintenance, not because of PBS 
problems, I think it'd been running for about 2 months before that) and it's 
only VSZ 3148 and RSS 2136. :-)

NB: I'm still running an SPBS release from early November as that's when we 
fixed the last memory leak and it's worked like a dream since then.

> The more advanced batch systems (SGE and LSF) have this feature for
> years, not sure why *PBS still don't have it.

I believe it's on the SuperCluster folks list of things to do, but they've 
been busy working on the stability front (as well as MAUI and Silver).

CC'd to the SuperCluster folks so they can respond.

> (AFAIK, PBSPro 5.4 will include it, but isn't it late??)

No idea, don't use it.

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFANTcuO2KABBYQAh8RAk8AAJ0ZGx3+qLPHWMjFkG7PGD8pPzwBWwCeKnUQ
u1aXnixvHrknKTqtNVDRVhM=
=28y0
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel.kidger at quadrics.com  Thu Feb 19 18:13:20 2004
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Thu, 19 Feb 2004 23:13:20 +0000
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?
In-Reply-To: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu>
References: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu>
Message-ID: <200402192313.20932.daniel.kidger@quadrics.com>

Dave,

> While performance (latency, bandwidth) usually comes to the fore in
> discussions about high performance interconnects for MPI clusters, I'm
> curious as to what your experiences are from the standpoint of
> manageability -- NIC's and spines and switches all fail at one time or
> another, but I'd like input as to how individual products (Myrinet,
> Quadrics, Infiniband, etc) handle this.  In your clusters does the
> hardware replacement involve simple steps (swap out the NIC, rerun some
> config utilities) or something more complex (such as bringing down the
> entire high speed network to reconfigure it so all the nodes can talk to
> the new hardware); i.e., How painful is it to replace a single failed NIC?
>
> I'd imagine that most cluster admins are reluctant to interrupt running
> jobs in order to re-initialize the equipment after hardware replacement.
> Any information about how your clusters running high-speed interconnects
> handle interconnect hardware failure/replacement would be very helpful.


AFAIK all interconnects would allow the swap of a NIC without bringing down 
the whole network - but in all cases any parallel job running on that node would 
need to be aborted since in general high-speed interconect PCI cards are not
hot-swappable - that node woudl need to be power-cycled.

As for the cables and switches, I can't speak for other vendors - but for example a
line card in a Quadrics Switch can be hot-swapped even while there are running 
MPI jobs that are sending data through that line card at the time - the jobs simply 
pause until the cables are reconnected. I would expect that other interconnects
are the same in this respect?


Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Thu Feb 19 18:07:43 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 20 Feb 2004 00:07:43 +0100 (CET)
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <1077217418.4982.35.camel@localhost>
Message-ID: <Pine.LNX.4.44.0402200005500.26723-100000@druifje.clustervision.com>

On Thu, 19 Feb 2004, Ryan Adams wrote:

> Please forgive the length of this email, as I'm going to try to be
> comprehensive:
> 
There was a discussion on the Gridengine user list recently,
regarding submitting lots and lots of short jobs in a bank in London.
It developed into quite an interesting discussion, and I learned lots.

Sorry - I tried to find the thread, but can't quite get the correct
keywords.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Thu Feb 19 20:04:32 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Fri, 20 Feb 2004 09:04:32 +0800 (CST)
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <1077217418.4982.35.camel@localhost>
Message-ID: <20040220010432.88699.qmail@web16802.mail.tpe.yahoo.com>

--- Ryan Adams <radams at csail.mit.edu> ???? 
> My question is basically this: is 2-5 seconds too
> small of a job to
> justify a batching system like *PBS or Gridengine? 

Yes, 10 minutes or greater sound more reasonable.

May be you can chunk 100 or more of those tasks into a
job and submit it into a batch system.

Also, from the "Tuning guide" HOWTO on the GridEngine
website, SGE has a feature called
"scheduling-on-demand" -- seems like it will help a
lot since the scheduler is activated whenever a job
arrives or a machine becomes available.

Andrew.


> Certainly, one option
> would be to bundle sets of these chunks together for
> a larger effective
> job.  Am I wasting my time thinking about this?
> 
> I've been considering rolling my own scheduling
> system using some kind
> of RPC, but I've been around software development
> long enough to know
> that it is better to use something off-the-shelf if
> at all possible.
> 
> Thanks in advance...
> 
> Ryan
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Thu Feb 19 20:13:18 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Fri, 20 Feb 2004 09:13:18 +0800 (CST)
Subject: [Beowulf] Best Setup for Batch Systems
In-Reply-To: <200402200922.39632.csamuel@vpac.org>
Message-ID: <20040220011318.63921.qmail@web16804.mail.tpe.yahoo.com>

 --- Chris Samuel <csamuel at vpac.org> ????> -----
> Torque (formerly SPBS) is very stable, especially
> since we helped the 
> SuperCluster folks clobber the various memory leaks
> in the server.

It's not whether PBS itself is stable or not. There
are human errors, machine problems, network problems,
etc...

And besides, the master machine also needed to be
taken offline for OS upgrade.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From tobeveryhonest at hotmail.com  Fri Feb 20 03:55:22 2004
From: tobeveryhonest at hotmail.com (Salman Guy)
Date: Fri, 20 Feb 2004 08:55:22 +0000
Subject: [Beowulf] want to implement a Beowulf cluster
Message-ID: <BAY8-F96MSoshy1Zzej00026803@hotmail.com>

hi all,
I want to learn Beowulf cluster implementation practically and for this 
purpose i need some help from u ppl.....I need reading material and ebooks 
so if anyone of u has done some practical work on Beowulf clusters then plz 
guide me or send me information regarding this,

help will be appreciated ...thanx in advance

_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.  
http://join.msn.com/?page=features/virus&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Feb 20 06:13:02 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 20 Feb 2004 12:13:02 +0100 (CET)
Subject: [Beowulf] want to implement a Beowulf cluster
In-Reply-To: <BAY8-F96MSoshy1Zzej00026803@hotmail.com>
Message-ID: <Pine.LNX.4.44.0402201211480.31095-100000@druifje.clustervision.com>

On Fri, 20 Feb 2004, Salman Guy wrote:

> hi all,
> I want to learn Beowulf cluster implementation practically and for this 
> purpose i need some help from u ppl.....I need reading material and ebooks 
> so if anyone of u has done some practical work on Beowulf clusters then plz 
> guide me or send me information regarding this,
> 
I think we need a FAQ here :-)
Sorry I'm in a rush to go off on the train to FOSDEM in Brussels.

SO I always say:
Look at Robert Browns webpages at Duke

The books 'Linux Clustering' by Charles Bookman
and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Fri Feb 20 05:10:38 2004
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Fri, 20 Feb 2004 11:10:38 +0100
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?
In-Reply-To: <200402192313.20932.daniel.kidger@quadrics.com>
References: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu> <200402192313.20932.daniel.kidger@quadrics.com>
Message-ID: <200402201110.38850.joachim@ccrl-nece.de>

Dan Kidger:
> AFAIK all interconnects would allow the swap of a NIC without bringing down
> the whole network - but in all cases any parallel job running on that node
> would need to be aborted since in general high-speed interconect PCI cards
> are not hot-swappable - that node woudl need to be power-cycled.

AFAIK, this is the same for SCI, but I would need to check this to be sure. 
Anyway, the application using the adapter to be swapped would have to be 
restarted anyway as its resources are gone. Avoiding this would be very hard, 
if at all possible.

> As for the cables and switches, I can't speak for other vendors - but for
> example a line card in a Quadrics Switch can be hot-swapped even while
> there are running MPI jobs that are sending data through that line card at
> the time - the jobs simply pause until the cables are reconnected. I would
> expect that other interconnects are the same in this respect?

SCI typically uses no external switches, and concerning the exchange of 
adapters or cables, there are two strategies: the application(s) has/have to 
wait until transfers are again successful, or the driver recognizes the 
problem and changes the routing. Of course, this can be combined into a 
two-phase strategy. I guess this is the way Scali is doing it.

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at yahoo.com  Fri Feb 20 08:41:11 2004
From: lathama at yahoo.com (Andrew Latham)
Date: Fri, 20 Feb 2004 05:41:11 -0800 (PST)
Subject: [Beowulf] want to implement a Beowulf cluster
In-Reply-To: <Pine.LNX.4.44.0402201211480.31095-100000@druifje.clustervision.com>
Message-ID: <20040220134111.27571.qmail@web60305.mail.yahoo.com>

or download the mailing list archive for the last year!

thats an ebook all to its self

--- John Hearns <john.hearns at clustervision.com> wrote:
> On Fri, 20 Feb 2004, Salman Guy wrote:
> 
> > hi all,
> > I want to learn Beowulf cluster implementation practically and for this 
> > purpose i need some help from u ppl.....I need reading material and ebooks 
> > so if anyone of u has done some practical work on Beowulf clusters then plz
> 
> > guide me or send me information regarding this,
> > 
> I think we need a FAQ here :-)
> Sorry I'm in a rush to go off on the train to FOSDEM in Brussels.
> 
> SO I always say:
> Look at Robert Browns webpages at Duke
> 
> The books 'Linux Clustering' by Charles Bookman
> and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


=====
*----------------------------------------------------------*
Andrew Latham AKA: LATHAMA (lay-th-ham-eh) - LATHAMA.COM
LATHAMA at LATHAMA.COM - LATHAMA at YAHOO.COM
If yahoo.com is down we have bigger problems than my email!
*----------------------------------------------------------*
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gmpc at sanger.ac.uk  Fri Feb 20 13:39:49 2004
From: gmpc at sanger.ac.uk (Guy Coates)
Date: Fri, 20 Feb 2004 18:39:49 +0000 (GMT)
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <200402201109.i1KB99h12383@NewBlue.scyld.com>
References: <200402201109.i1KB99h12383@NewBlue.scyld.com>
Message-ID: <Pine.OSF.4.44.0402201831230.865376-100000@ecs2a.internal.sanger.ac.uk>

>
> My question is basically this: is 2-5 seconds too small of a job to
> justify a batching system like *PBS or Gridengine?

That workload is do-able with the right queuing system. LSF (don't know
about gridengine off hand) has a concept of "job chunking" for dealing
with short running jobs.

The queuing system batches up a number of jobs (eg 10 or 20) and then
submits them all on one go to the work host where they run sequentially.
This cuts down on the scheduling overhead.

We've just had a user push 250,000 short running jobs though our cluster
this-afternoon using this approach.

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Fri Feb 20 15:12:09 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Fri, 20 Feb 2004 15:12:09 -0500 (EST)
Subject: [Beowulf] want to implement a Beowulf cluster
In-Reply-To: <Pine.LNX.4.44.0402201211480.31095-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.44.0402201448570.5271-100000@boltzmann-internal>

On Fri, 20 Feb 2004, John Hearns wrote:

> On Fri, 20 Feb 2004, Salman Guy wrote:
> 
> > hi all,
> > I want to learn Beowulf cluster implementation practically and for this 
> > purpose i need some help from u ppl.....I need reading material and ebooks 
> > so if anyone of u has done some practical work on Beowulf clusters then plz 
> > guide me or send me information regarding this,
> > 
> I think we need a FAQ here :-)

There are the old FAQ and HOWTO's (still some relevant background 
information):

http://www.canonical.org/~kragen/beowulf-faq.txt
http://yara.ecn.purdue.edu/~pplinux/PPHOWTO/pphowto.html#toc1
http://www.tldp.org/HOWTO/Beowulf-HOWTO.html

There are other links at ClusterWorld.com (on the right side, scroll down)
that may be useful.

Now is a good time to announce my effort to update the FAQ (and possibly
the HOWTO).  Starting next week, I plan on updating the FAQ by using the
ClusterWorld.com site as a place to collect questions and answers. Stay
tuned. 

<shameless plug>
Of course ClusterWorld magazine is designed to provide this type of 
information as well. 
</shameless plug>

> Sorry I'm in a rush to go off on the train to FOSDEM in Brussels.
> 
> SO I always say:
> Look at Robert Browns webpages at Duke
> 

and book:
  http://www.phy.duke.edu/brahma/Resources/beowulf_book.php

> The books 'Linux Clustering' by Charles Bookman

IMO, this is not a good book for HPC clusters.

> and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent.

New edition:
http://www.amazon.com/exec/obidos/tg/detail/-/0262692929/102-0957058-4520116?v=glance


Doug
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mg_india at sancharnet.in  Sat Feb 21 19:18:55 2004
From: mg_india at sancharnet.in (Sawan Gupta)
Date: Sun, 22 Feb 2004 05:48:55 +0530
Subject: [Beowulf] Movie Editing Requirements
Message-ID: <000001c3f8d9$79436f00$8bd2003d@myserver>


Hello,

My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
system with 512 DDRAM and a 128 MB Graphic Card.

But when he perform some rendering operations, it takes nearly 10-15
minutes to complete.

He wishes to upgrade his system to dual XEON with more RAM to minimize
this time delay.

I want to know whether this will suit his requirments or a cluster is
just what he needs.
Please tell me which cluster can suit his requirements i.e.
Windows/Linux.
I mean which cluster can best suit these requirements.

Also are the softwares used by him also available for Linux or not. (If
the solution suggested is in Linux)


Regards,

Sawan Gupta
|| Mg_India at sancharnet.in ||

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Sat Feb 21 20:50:54 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Sat, 21 Feb 2004 20:50:54 -0500 (EST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <000001c3f8d9$79436f00$8bd2003d@myserver>
Message-ID: <Pine.LNX.4.44.0402212033390.9487-100000@coffee.psychology.mcmaster.ca>

> My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> system with 512 DDRAM and a 128 MB Graphic Card.

I would guess that none of this work is done by the graphics card,
so that his performance is strictly dependent on the P4 and the fairly
modest amount of ram he has.

I would guess that most of these applications are fairly memory-intensive,
and not particularly cache-friendly.  I doubt HT would matter in this case,
except that IIRC all HT CPUs are the 'c' model, and thus run with 6.4 GB/s
of dram bandwidth.  I'm sure you already know that 512M probably too low.

> But when he perform some rendering operations, it takes nearly 10-15
> minutes to complete.

if this was linux, I'd advise you to use tools like oprofile, vmstat, etc
to find out where it's spending its time.  since it's only windows, you'll
probably have to resort to watching the disk light, and running that nasty
little windows accessory that tells you about cpu/memory usage.

> He wishes to upgrade his system to dual XEON with more RAM to minimize
> this time delay.

sure.  though he'd almost certainly run faster with a dual-opteron,
since such systems deliver noticably more memory bandwidth and lower latency.

a dual-xeon can actually be slower than a uni P4c system.  it would probably
make sense to talk to him about how his machine and apps are configured first.
for instance, is he actually using HT, and does he notice any performance 
difference if he turns it off?  is his ram dual-bank-PC3200?  any sense of 
how much time is spent on disk IO?

> I want to know whether this will suit his requirments or a cluster is
> just what he needs.

clusters are clearly more scalable, and are widely used in the render/effects
industry.  comparing a pair of P4c's to a single dual-opteron, though,
I have no idea.  I think it would depend on his applications, mainly.

there's no clear answer to price/performance when it comes to clusters of 
duals vs unis.  unis tend to be too large, and in most cases wind up
replicating too many components, especially moving parts, to compete.
I believe most clusters, in any industry, are not unis.

> Please tell me which cluster can suit his requirements i.e.
> Windows/Linux.

windows is the right choice in exactly one situation: when the exact
configuration you need is available off-the-shelf, and you already know how
to use it.

linux (unix in general) is far more robust, easy-to-manage, flexible,
scalable, cheap, etc.  all those TCO studies sponsored by msft consist
of the following astonishing conclusion: if you have windows-only users
and a supply of cheap msce's and are comfortable with the crappy level of 
support that the ms world provides, then indeed windows is cheaper.

> Also are the softwares used by him also available for Linux or not. (If
> the solution suggested is in Linux)

only he can decide that.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Sat Feb 21 23:30:13 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Sat, 21 Feb 2004 22:30:13 -0600 (CST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu>
References: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu>
Message-ID: <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>

Actually a beowulf cluster can also run windows. There is a port of maya
to clusters...There are also many other movie editing software
distributions that work very well on clusters..It also doesn't matter what
os a beowulf cluster runs.

-Brent

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Sat, 21 Feb 2004, Joel Jaeggli wrote:

> Given that it sounds like you're on windows, a beowulf cluster is not
> appropriate from your application...
>
>
> On Sun, 22 Feb 2004, Sawan Gupta wrote:
>
> >
> > Hello,
> >
> > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> > system with 512 DDRAM and a 128 MB Graphic Card.
> >
> > But when he perform some rendering operations, it takes nearly 10-15
> > minutes to complete.
> >
> > He wishes to upgrade his system to dual XEON with more RAM to minimize
> > this time delay.
> >
> > I want to know whether this will suit his requirments or a cluster is
> > just what he needs.
> > Please tell me which cluster can suit his requirements i.e.
> > Windows/Linux.
> > I mean which cluster can best suit these requirements.
> >
> > Also are the softwares used by him also available for Linux or not. (If
> > the solution suggested is in Linux)
> >
> >
> > Regards,
> >
> > Sawan Gupta
> > || Mg_India at sancharnet.in ||
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> --
> --------------------------------------------------------------------------
> Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu
> GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Sat Feb 21 23:18:09 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Sat, 21 Feb 2004 20:18:09 -0800 (PST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <000001c3f8d9$79436f00$8bd2003d@myserver>
Message-ID: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu>

Given that it sounds like you're on windows, a beowulf cluster is not 
appropriate from your application...


On Sun, 22 Feb 2004, Sawan Gupta wrote:

> 
> Hello,
> 
> My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> system with 512 DDRAM and a 128 MB Graphic Card.
> 
> But when he perform some rendering operations, it takes nearly 10-15
> minutes to complete.
> 
> He wishes to upgrade his system to dual XEON with more RAM to minimize
> this time delay.
> 
> I want to know whether this will suit his requirments or a cluster is
> just what he needs.
> Please tell me which cluster can suit his requirements i.e.
> Windows/Linux.
> I mean which cluster can best suit these requirements.
> 
> Also are the softwares used by him also available for Linux or not. (If
> the solution suggested is in Linux)
> 
> 
> Regards,
> 
> Sawan Gupta
> || Mg_India at sancharnet.in ||
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From c00jsh00 at nchc.org.tw  Sun Feb 22 04:32:41 2004
From: c00jsh00 at nchc.org.tw (Jyh-Shyong Ho)
Date: Sun, 22 Feb 2004 17:32:41 +0800
Subject: [Beowulf] 64-bit Gaussian 03 on Opteron/SuSE
References: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com>
Message-ID: <40387739.3891D96C@nchc.org.tw>

Hi,

We have managed to built a native 64-bit version of Gaussian 03 Rev.B05
on a dual Opteron box running SLSE8 for AMD64 with 64-bit PGI
Workstation 
5.1.3 compiler and 64-bit GOTO library.

We ran all the test cases included in Gaussian 03 source code and
compared
the results against the reference results ran on SGI. All tests cases
are
successfully completed except test602 and test605 with error at the last
stage when l9999 tries to close files. 

There are several files in directory bsd need some modification:

machine.c  (add one section to return "x86_64" as machine
identification)
mdutil.c (add one section for x86_64)
mdutil.f (add one section for x86_64)
bldg03 (modify the file so it can pick up x86_64.make as g03.make)

and create a make file x86_64.make (use i386.make as a template)

The compiler used is pgf90, but l906 and l609 has to be compiled with
pgf77, in order to pass all the test cases.

We are running more tests and comparing the performance of 64-bit
version abd 32-bit version.

Regards

Jyh-Shyong Ho, Ph.D.
Research Scientist
National Center for High-Performance Computing
Hsinchu, Taiwan, ROC
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mirk at vsnl.com  Sat Feb 21 10:31:35 2004
From: mirk at vsnl.com (Mohd Irfan R Khan)
Date: Sat, 21 Feb 2004 21:01:35 +0530
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?
In-Reply-To: <Pine.GHP.4.58.0402191245000.11022@cb317h2.et.byu.edu>
Message-ID: <GEEGLNJALCLEOGJEFECPIEJIGLAA.mirk@vsnl.com>


hi I am one using SCI (Dolphin) cards and I think in dolphin u don't have to
stop the whole cluster in case of failure.
In this there is a matrix where it always has redundancy if one machine
fails and the software provided by it (SCALI)
will route the data to other machine and will reroute it back once it finds
the line working properly.


Regards.
-----Original Message-----.

From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf
Of Dave Stirling
Sent: Friday, February 20, 2004 1:50 AM
To: beowulf at beowulf.org
Subject: [Beowulf] comparing MPI HPC interconnects: manageability?


Hi all,

While performance (latency, bandwidth) usually comes to the fore in
discussions about high performance interconnects for MPI clusters, I'm
curious as to what your experiences are from the standpoint of
manageability -- NIC's and spines and switches all fail at one time or
another, but I'd like input as to how individual products (Myrinet,
Quadrics, Infiniband, etc) handle this.  In your clusters does the
hardware replacement involve simple steps (swap out the NIC, rerun some
config utilities) or something more complex (such as bringing down the
entire high speed network to reconfigure it so all the nodes can talk to
the new hardware); i.e., How painful is it to replace a single failed NIC?

I'd imagine that most cluster admins are reluctant to interrupt running
jobs in order to re-initialize the equipment after hardware replacement.
Any information about how your clusters running high-speed interconnects
handle interconnect hardware failure/replacement would be very helpful.

Thanks,

Dave Stirling
Brigham Young University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mwheeler at startext.co.uk  Sun Feb 22 09:57:05 2004
From: mwheeler at startext.co.uk (Martin WHEELER)
Date: Sun, 22 Feb 2004 14:57:05 +0000 (UTC)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>
Message-ID: <Pine.LNX.4.33.0402221452240.20737-100000@caxton.startext.demon.co.uk>

On Sat, 21 Feb 2004, Brent M. Clements wrote:

> It also doesn't matter what
> os a beowulf cluster runs.

 ..as long as that OS conforms to the definition of free software, that
is..

Or am I just an old fuddy-duddy, with out-of-date concepts?
-- 
Martin Wheeler   -   StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England
mwheeler at startext.co.uk                http://www.startext.co.uk/mwheeler/
GPG pub key : 01269BEB  6CAD BFFB DB11 653E B1B7 C62B  AC93 0ED8 0126 9BEB
      - Share your knowledge. It's a way of achieving immortality. -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Sun Feb 22 11:04:17 2004
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Sun, 22 Feb 2004 17:04:17 +0100 (CET)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <403358EF.7F0BDE75@epa.gov>
Message-ID: <Pine.LNX.4.21.0402221140410.384-100000@localhost>

On Wed, 18 Feb 2004, Joseph Mack wrote:
> How do you get your information out of smartd?
>
> I've found output in syslog - presumably I can grep for this.

I've done this for a while to get temperature information from a
server in our small group server room (together with MRTG we have a
nice history of temperature to show to the facilities people when the
temperature was too high again...).

The problem with greping for smartd information in the syslog file is
that there is no current information after a log rotation. That's why
I changed our cron jobs. Now I use a small setuid-root program which
starts "smartctl -a /dev/sdX" and then greps for the temperature.

- Felix

---
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From agrajag at dragaera.net  Sun Feb 22 10:20:20 2004
From: agrajag at dragaera.net (Jag)
Date: 22 Feb 2004 10:20:20 -0500
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>
References: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu>
	 <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>
Message-ID: <1077463220.2561.4.camel@loiosh>

On Sat, 2004-02-21 at 23:30, Brent M. Clements wrote:
> Actually a beowulf cluster can also run windows. There is a port of maya
> to clusters...There are also many other movie editing software
> distributions that work very well on clusters..It also doesn't matter what
> os a beowulf cluster runs.

By definition, a beowulf cluster uses a free/open OS.  So, a beowulf
cluster can't run windows.  However, an HPC (High Performance Computing)
cluster doesn't have that requirement.

I know its kinda nitpicking to try to distinguish between Beowulf
cluster and HPC cluster, but in some ways it is important.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Sun Feb 22 11:30:03 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Sun, 22 Feb 2004 10:30:03 -0600 (CST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <1077463220.2561.4.camel@loiosh>
References: <Pine.LNX.4.44.0402212017050.5235-100000@twin.uoregon.edu> 
 <Pine.GSO.4.58.0402212228410.14383@is.rice.edu> <1077463220.2561.4.camel@loiosh>
Message-ID: <Pine.GSO.4.58.0402221029260.5724@is.rice.edu>

Please don't start a flame war guys, I just had my terms mixed up...it was
1 am in the morning when I replied.

-Brent


On Sun, 22 Feb 2004, Jag wrote:

> On Sat, 2004-02-21 at 23:30, Brent M. Clements wrote:
> > Actually a beowulf cluster can also run windows. There is a port of maya
> > to clusters...There are also many other movie editing software
> > distributions that work very well on clusters..It also doesn't matter what
> > os a beowulf cluster runs.
>
> By definition, a beowulf cluster uses a free/open OS.  So, a beowulf
> cluster can't run windows.  However, an HPC (High Performance Computing)
> cluster doesn't have that requirement.
>
> I know its kinda nitpicking to try to distinguish between Beowulf
> cluster and HPC cluster, but in some ways it is important.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Mon Feb 23 06:37:25 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Mon, 23 Feb 2004 12:37:25 +0100 (CET)
Subject: [Beowulf] Flashmobcomputing
Message-ID: <Pine.LNX.4.44.0402231233030.31934-100000@druifje.clustervision.com>

I hesitate a bit to send things seen on Slashdot to the list,
but this is probably relevant:

http://www.flashmobcomputing.org/


It might be worth a bit of a debate though.
Given that this cluster will be composed of differing CPUs,
and conneced together by 100Mbps links will it really have chance
of getting into the Top 500?

The bootable CF they are using is a Knoppix variant.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Mon Feb 23 07:27:20 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Mon, 23 Feb 2004 07:27:20 -0500 (EST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.GSO.4.58.0402212228410.14383@is.rice.edu>
Message-ID: <Pine.LNX.4.44.0402230711110.5271-100000@boltzmann-internal>

On Sat, 21 Feb 2004, Brent M. Clements wrote:

> Actually a beowulf cluster can also run windows. There is a port of maya
> to clusters...There are also many other movie editing software
> distributions that work very well on clusters..It also doesn't matter what
> os a beowulf cluster runs.

>From time to time, I think it is a important to recall the original
definition of Beowulf. In the book "How to Build Beowulf", Sterling,
Salmon, Becker, Savarese define Beowulf as:

"A Beowulf is a collection of personal computers (PCs) interconnected by 
widely available networking technology running one of several open-source 
Unix like operating systems."

There is often confusion as to "what is a Beowulf?" because the definition
is more of a framework for building clusters and less of a recipe.

I suppose, one could come up with definition of an HPC cluster which
would read something like"

"An HPC cluster is collection of commodity processors interconnected by
widely available networking technology running a widely available OS."

Rather broad. I think the keyword in all this is "commodity", which to me
means choice and implies low cost.

Doug

> 
> -Brent
> 
> Brent Clements
> Linux Technology Specialist
> Information Technology
> Rice University
> 
> 
> On Sat, 21 Feb 2004, Joel Jaeggli wrote:
> 
> > Given that it sounds like you're on windows, a beowulf cluster is not
> > appropriate from your application...
> >
> >
> > On Sun, 22 Feb 2004, Sawan Gupta wrote:
> >
> > >
> > > Hello,
> > >
> > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> > > system with 512 DDRAM and a 128 MB Graphic Card.
> > >
> > > But when he perform some rendering operations, it takes nearly 10-15
> > > minutes to complete.
> > >
> > > He wishes to upgrade his system to dual XEON with more RAM to minimize
> > > this time delay.
> > >
> > > I want to know whether this will suit his requirments or a cluster is
> > > just what he needs.
> > > Please tell me which cluster can suit his requirements i.e.
> > > Windows/Linux.
> > > I mean which cluster can best suit these requirements.
> > >
> > > Also are the softwares used by him also available for Linux or not. (If
> > > the solution suggested is in Linux)
> > >
> > >
> > > Regards,
> > >
> > > Sawan Gupta
> > > || Mg_India at sancharnet.in ||
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> >
> > --
> > --------------------------------------------------------------------------
> > Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu
> > GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Feb 23 09:11:00 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 23 Feb 2004 09:11:00 -0500 (EST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.LNX.4.33.0402221452240.20737-100000@caxton.startext.demon.co.uk>
Message-ID: <Pine.LNX.4.44.0402230910000.1955-100000@lilith.rgb.private.net>

On Sun, 22 Feb 2004, Martin WHEELER wrote:

> On Sat, 21 Feb 2004, Brent M. Clements wrote:
> 
> > It also doesn't matter what
> > os a beowulf cluster runs.
> 
>  ..as long as that OS conforms to the definition of free software, that
> is..
> 
> Or am I just an old fuddy-duddy, with out-of-date concepts?

No, you're absolutely right.  It's right in there in the original
beowulf documents and description, IIRC.

There are some excellent reasons for this, BTW, as you'll discover the
first time something doesn't just work for you "out of the box".

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bclem at rice.edu  Mon Feb 23 08:36:05 2004
From: bclem at rice.edu (Brent M. Clements)
Date: Mon, 23 Feb 2004 07:36:05 -0600 (CST)
Subject: [Beowulf] Movie Editing Requirements
In-Reply-To: <Pine.LNX.4.44.0402230711110.5271-100000@boltzmann-internal>
References: <Pine.LNX.4.44.0402230711110.5271-100000@boltzmann-internal>
Message-ID: <Pine.GSO.4.58.0402230732300.7155@is.rice.edu>

Again, I go back to my last email concerning this..I didn't want to start
people flaming me(which has now happened), I wrote my original response at
1am in the morning and was sloppy with my terms. For that I apologize.
This tangent of explanations from now over 50 people can be gotten off of
and people can go about there business..Nothing to see here, move along.

-Brent


On Mon, 23 Feb 2004, Douglas Eadline, Cluster World Magazine wrote:

> On Sat, 21 Feb 2004, Brent M. Clements wrote:
>
> > Actually a beowulf cluster can also run windows. There is a port of maya
> > to clusters...There are also many other movie editing software
> > distributions that work very well on clusters..It also doesn't matter what
> > os a beowulf cluster runs.
>
> >From time to time, I think it is a important to recall the original
> definition of Beowulf. In the book "How to Build Beowulf", Sterling,
> Salmon, Becker, Savarese define Beowulf as:
>
> "A Beowulf is a collection of personal computers (PCs) interconnected by
> widely available networking technology running one of several open-source
> Unix like operating systems."
>
> There is often confusion as to "what is a Beowulf?" because the definition
> is more of a framework for building clusters and less of a recipe.
>
> I suppose, one could come up with definition of an HPC cluster which
> would read something like"
>
> "An HPC cluster is collection of commodity processors interconnected by
> widely available networking technology running a widely available OS."
>
> Rather broad. I think the keyword in all this is "commodity", which to me
> means choice and implies low cost.
>
> Doug
>
> >
> > -Brent
> >
> > Brent Clements
> > Linux Technology Specialist
> > Information Technology
> > Rice University
> >
> >
> > On Sat, 21 Feb 2004, Joel Jaeggli wrote:
> >
> > > Given that it sounds like you're on windows, a beowulf cluster is not
> > > appropriate from your application...
> > >
> > >
> > > On Sun, 22 Feb 2004, Sawan Gupta wrote:
> > >
> > > >
> > > > Hello,
> > > >
> > > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT
> > > > system with 512 DDRAM and a 128 MB Graphic Card.
> > > >
> > > > But when he perform some rendering operations, it takes nearly 10-15
> > > > minutes to complete.
> > > >
> > > > He wishes to upgrade his system to dual XEON with more RAM to minimize
> > > > this time delay.
> > > >
> > > > I want to know whether this will suit his requirments or a cluster is
> > > > just what he needs.
> > > > Please tell me which cluster can suit his requirements i.e.
> > > > Windows/Linux.
> > > > I mean which cluster can best suit these requirements.
> > > >
> > > > Also are the softwares used by him also available for Linux or not. (If
> > > > the solution suggested is in Linux)
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Sawan Gupta
> > > > || Mg_India at sancharnet.in ||
> > > >
> > > > _______________________________________________
> > > > Beowulf mailing list, Beowulf at beowulf.org
> > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > > >
> > >
> > > --
> > > --------------------------------------------------------------------------
> > > Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu
> > > GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
> > >
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> --
> ----------------------------------------------------------------
> Editor-in-chief                   ClusterWorld Magazine
> Desk: 610.865.6061
> Cell: 610.390.7765         Redefining High Performance Computing
> Fax:  610.865.6618                www.clusterworld.com
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel at labtie.mmt.upc.es  Mon Feb 23 10:18:34 2004
From: daniel at labtie.mmt.upc.es (Daniel Fernandez)
Date: Mon, 23 Feb 2004 16:18:34 +0100
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <Pine.LNX.4.21.0402221140410.384-100000@localhost>
References: <Pine.LNX.4.21.0402221140410.384-100000@localhost>
Message-ID: <1077549514.31096.0.camel@qeldroma.cttc.org>

El dom, 22-02-2004 a las 17:04, Felix Rauch escribi?:
> On Wed, 18 Feb 2004, Joseph Mack wrote:
> > How do you get your information out of smartd?
> >
> > I've found output in syslog - presumably I can grep for this.
> 
> I've done this for a while to get temperature information from a
> server in our small group server room (together with MRTG we have a
> nice history of temperature to show to the facilities people when the
> temperature was too high again...).
> 
> The problem with greping for smartd information in the syslog file is
> that there is no current information after a log rotation. That's why
> I changed our cron jobs. Now I use a small setuid-root program which
> starts "smartctl -a /dev/sdX" and then greps for the temperature.
> 
> - Felix
> 
> ---
> Felix Rauch                      | Email: rauch at inf.ethz.ch
> Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> 
> 

> On Wed, 18 Feb 2004, Joseph Mack wrote:
> > How do you get your information out of smartd?
> >
> > I've found output in syslog - presumably I can grep for this.
> 
> I've done this for a while to get temperature information from a
> server in our small group server room (together with MRTG we have a
> nice history of temperature to show to the facilities people when the
> temperature was too high again...).
> 
> The problem with greping for smartd information in the syslog file is
> that there is no current information after a log rotation. That's why
> I changed our cron jobs. Now I use a small setuid-root program which
> starts "smartctl -a /dev/sdX" and then greps for the temperature.
> 
> - Felix
> 
> ---
> Felix Rauch                      | Email: rauch at inf.ethz.ch
> Institute for Computer Systems   | Homepage:
http://www.cs.inf.ethz.ch/~rauch/
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> 

On the other hand, is possible to deviate smartd log to a specific file
and check it regularly when it's updated, adding this parameter to
smartd:

        -l facility

Of course some syslog.conf modifying will be needed to instruct syslogd
to log on a specific file from the "facility" specified.

        facility.* /var/log/smartd.log

Also, the '-M' coupled with the 'exec' directive should work, a script
could be run to update some flags for example:

        -M exec /usr/bin/smartd_alert.sh

-- 
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Centre tecnol?gic de transfer?ncia de calor - CTTC
www.cttc.upc.edu
c/ Colom n?11
UPC Campus Industrials Terrassa , Edifici TR4

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From m-valerio at onu.edu  Mon Feb 23 11:24:23 2004
From: m-valerio at onu.edu (Matt Valerio)
Date: Mon, 23 Feb 2004 11:24:23 -0500
Subject: [Beowulf] Anyone use MOSIX?
Message-ID: <200402231626.i1NGQQBf052721@postoffice.onu.edu>

Has anyone on this list used MOSIX before?

I'm particularly interested in how it compares to other clustering software
such as PVM and MPI.

Any information regarding what you're using MOSIX for, recommendations about
setting it up, comparisons to other software, etc, would be welcomed. 

Thanks!


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kpodesta at redbrick.dcu.ie  Mon Feb 23 13:45:44 2004
From: kpodesta at redbrick.dcu.ie (Karl Podesta)
Date: Mon, 23 Feb 2004 18:45:44 +0000
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231233030.31934-100000@druifje.clustervision.com>
References: <Pine.LNX.4.44.0402231233030.31934-100000@druifje.clustervision.com>
Message-ID: <20040223184544.GB30983@carbon.redbrick.dcu.ie>

On Mon, Feb 23, 2004 at 12:37:25PM +0100, John Hearns wrote:
> http://www.flashmobcomputing.org/
> 
> It might be worth a bit of a debate though.
> Given that this cluster will be composed of differing CPUs,
> and conneced together by 100Mbps links will it really have chance
> of getting into the Top 500?
> 
> The bootable CF they are using is a Knoppix variant.

It seems a bit loose or unfair to suggest a project like this 'registers'
for the top500 list? It's a once-off, temporary system, dedicated (seemingly)
to nothing but qualification to the list. They say in the FAQ that if the
system proves itself, it could potentially be used for bigger problems, 
which is a noble idea - but they obviously don't read the beowulf list often
("it all depends on the application", etc.) :-)

Additionally, a flashmob system would have a limited shelf-life, before
the owners want to take their computers home. Distributed projects
like SETI at home and Folding at home etc. have been running for years...

I'm not familiar with the entry rules to the top500, but to be fair to 
existing, dedicated installations - they would have a certain 'reliability'
in terms of their existence. If you needed to perform a serious calculation 
to a scale of 36 TFLOPS etc., then you know that there is a system that
can do it. They might want to be critical of how sustainable the result from 
the Flashmob is, if they wanted to 'call' on it's power at any particular
time in the future. (whoops, it was raining today, that's 10 TFLOPS down 
the drain...). Pardon the pun.

Kp
--
Karl Podesta
Dublin City University, Ireland 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Feb 23 14:11:48 2004
From: becker at scyld.com (Donald Becker)
Date: Mon, 23 Feb 2004 14:11:48 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231233030.31934-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain>

On Mon, 23 Feb 2004, John Hearns wrote:

> I hesitate a bit to send things seen on Slashdot to the list,
> but this is probably relevant:
> 
> http://www.flashmobcomputing.org/

  >> A Flash Mob computer, unlike an ordinary cluster, is temporary and
  >> organized on-the-fly for the purpose of working on a single
  >> problem. Flash Mob I is the first of it's kind.

A bit of hype here.
Flash Mob is a fun demo, but not a new system architecture.  All of the
software is on a live CD, which Yggdrasil pioneered back in 1993, and
it's far from being the first on-the-fly cluster.

One of first public demo of Scyld Beowulf was temporarily converting the
email-reading machines at the ALS conference into a cluster.  We did
that in a few minutes, taking only a few second beyond the amount of
time it took to boot the machines from floppy.  Today there is the
opportunity to use PXE boot, which makes configuration even easier.

A key was the innovative approach of making most of the systems
specialized compute slaves, with only the environment needed to support
the fully-cached running application.  (Note that NFS root sounds like a
likely alternative, but doesn't scale and has a run-time performance
impact.)

> It might be worth a bit of a debate though.
> Given that this cluster will be composed of differing CPUs,
> and conneced together by 100Mbps links will it really have chance
> of getting into the Top 500?
> The bootable CF they are using is a Knoppix variant.

The differing CPUs and full workstation-oriented distribution will
likely pose more a problem than the switched Fast Ethernet.

Unless they make significant modifications, they will run into the
scalability problem that every full-installation system encounters: at
every timestep a few of the machines will be paging, running cron, or
doing something else that slows the machine.  That would be barely
noticed in a workstation environment, but is a major problem with most
cluster jobs.

Still, it sounds like a fun, demystifying demo that introduces people to
scalable computing.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster systems
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bernd-schubert at web.de  Mon Feb 23 17:05:36 2004
From: bernd-schubert at web.de (Bernd Schubert)
Date: Mon, 23 Feb 2004 23:05:36 +0100
Subject: [Beowulf] 64-bit Gaussian 03 on Opteron/SuSE
In-Reply-To: <40387739.3891D96C@nchc.org.tw>
References: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com> <40387739.3891D96C@nchc.org.tw>
Message-ID: <200402232305.36711.bernd-schubert@web.de>

On Sunday 22 February 2004 10:32, Jyh-Shyong Ho wrote:
> Hi,
>
> We have managed to built a native 64-bit version of Gaussian 03 Rev.B05
> on a dual Opteron box running SLSE8 for AMD64 with 64-bit PGI
> Workstation
> 5.1.3 compiler and 64-bit GOTO library.
>

Hello,

thanks for this great information! I've forwarded it to the CCL list, since I 
guess on this list many people are interested in this topic.

Cheers,
	Bernd 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Mon Feb 23 17:47:27 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Mon, 23 Feb 2004 17:47:27 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain>
Message-ID: <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>

> Still, it sounds like a fun, demystifying demo that introduces people to
> scalable computing.

demystification is always good.  IMO, the best part of this is that 
it'll actually demonstrate why a flash mob *CAN'T* build a supercomputer.
partly the reason is hetrogeneity and other "practical" downers.  

but mainly, a super-computer needs a super-network.

of course, in the grid nirvana, 
all computers would have multiple ports of infiniband,
and the word would be 5 us across ;)

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Mon Feb 23 15:52:58 2004
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Mon, 23 Feb 2004 15:52:58 -0500 (EST)
Subject: [Beowulf] Anyone use MOSIX?
In-Reply-To: <200402231626.i1NGQQBf052721@postoffice.onu.edu>
Message-ID: <Pine.LNX.4.44.0402231501080.3858-100000@coffee.psychology.mcmaster.ca>

> Has anyone on this list used MOSIX before?

I expect many have given it a try.

> I'm particularly interested in how it compares to other clustering software
> such as PVM and MPI.

apples and oranges, I believe.  mosix more or less tries to virtualize
a cluster by making multiple machines share things like a single pid space,
with forwarding of signals, etc.  the idea is that the OS takes care of 
migrating jobs across nodes, including using proxies for resources that can't
be directly moved (pages can, for instance).

from the PVM/MPI perspective, the most important resource would be sockets.
as far as I know, MPI-on-Mosix would use proxied sockets, and would therefore
have performance problems for anything closely-coupled or high-bandwidth.
in principle, Mosix could provide some sort of clusterized group-comm
mechanism that wouldn't require proxies, but that would be a large effort.

in a way, it's a shame that MPI is such a fat interface, since there's a lot
of really good work that could be done in this direction, but is simply too 
large for a typical thesis project :(

regards, mark hahn.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at pathscale.com  Mon Feb 23 20:53:18 2004
From: lindahl at pathscale.com (Greg Lindahl)
Date: Mon, 23 Feb 2004 17:53:18 -0800
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain> <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20040224015318.GB6942@greglaptop.internal.keyresearch.com>

On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:

> of course, in the grid nirvana, 
> all computers would have multiple ports of infiniband,
> and the word would be 5 us across ;)

In grid nirvana, the speed of light would rise with Moore's Law.

5 usec is a long time now, and much longer a year from now. That's
not nirvana.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Tue Feb 24 03:22:16 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Tue, 24 Feb 2004 09:22:16 +0100 (CET)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <20040224015318.GB6942@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0402240918310.8598-100000@druifje.clustervision.com>

On Mon, 23 Feb 2004, Greg Lindahl wrote:

> On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> 
> > of course, in the grid nirvana, 
> > all computers would have multiple ports of infiniband,
> > and the word would be 5 us across ;)
> 
> In grid nirvana, the speed of light would rise with Moore's Law.
> 
An odd fact I always remember is that light travels at a foot per 
nanosecond.

(Useful to know if you are plugging coax delay lines into trigger 
circuits)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Tue Feb 24 04:35:48 2004
From: jakob at unthought.net (Jakob Oestergaard)
Date: Tue, 24 Feb 2004 10:35:48 +0100
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <Pine.LNX.4.04.10402010204440.3455-100000@c-24-18-245-161.client.comcast.net>
References: <1075512676.4915.207.camel@protein.scalableinformatics.com> <Pine.LNX.4.04.10402010204440.3455-100000@c-24-18-245-161.client.comcast.net>
Message-ID: <20040224093548.GA29776@unthought.net>

On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote:
> > I could easily optimize it more (do the work on a larger buffer at a
> > once), but I think enough waste heat has been created here.  This is a
> > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3.
> 
> Enough time wasted on finding different solutions to a simple problem?  Surely
> not.  Let me toss my hat into the ring:
...

Hi guys!

Guess who's back - yes, it's your friendly neighborhood language
evangelist :)

I said I'd be gone one week - well, I put instant coffe in the
microwave, and *wooosh* went three weeks ahead in time.

What a fantastic thread this turned into - awk, perl, more C, java and
God knows what.  I'm almost surprised I didn't see a Fortran
implementation.

See, I was trying to follow up on the challenge, then things got
complicated (mainly by me not being able to get the performance I wanted
out of my code) - so instead of flooding your inboxes, I wrote a little
"article" on my findings.

It's up at:
 http://unthought.net/c++/c_vs_c++.html

Highlights:
 *) Benchmarks - real numbers.
 *) A C++'ification of the fast C implementation (that turns out to be
    negligibly faster than the C implementation although the same
    algorithm and the same system calls are used), which is generalized
    and generally made usable as a template library routine (for
    convenient re-use in other projects - yes, this requires all that
    boring non-sexy stuff like freeing up memory etc.)
 *) Two new C++ implementations - another 15 liner that's "only" twice
    as slow as the C code, and another longer different-algorithm C++
    implementation that is significantly faster than the fastest C
    implementation (so far).

Now, I did not include all the extra implementations presented here. I
would like to update the document with those, but I will need a little
feedback from various people.

First; how do I compile the java implementation?  GCC-3.3.2 gives me
----------------------------------------------------------------
[falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java
wordcount.java: In class `wordcount':
wordcount.java: In method `wordcount.main(java.lang.String[])':
wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)'
in type `java.util.regex.Pattern'.
                   words = p.split(s);
                            ^
1 error
----------------------------------------------------------------

Second; another much faster C implementation was posted - I'd like to
test against that one as well. I'm curious as to how it was done, and
I'd like to use it as an example in the document if it turns out that it
makes sense to write a generic C++ implementation of whatever algorithm
is used there.  Well, if the code is not a government secret   ;)

So, well, clearly my document isn't completely updated with all the
great things from this thread - but at least I think it is a decent
reply to the mail where the 'programming pearl' C implementation was
presented.

I guess this could turn into a nice little reference/FAQ/fact type of
document - the oppinions stated there are biased of course, but not
completely unreasonable in my own (biased) oppinion - besides, there's
real-world numbers for solving a real-world problem, that's a pretty
good start I would say  :)

I'd love to hear what people think - if you have the time to give it a
look.

Let me know, flame away, give me Fortran code that is faster than my
'ego-booster' implementation at the bottom of the document!  ;)

Cheers all :)

 / jakob

BTW: Yes, I had a great vacation;
   http://unthought.net/avoriaz/p1010050.jpg  ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel.kidger at quadrics.com  Tue Feb 24 04:35:54 2004
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Tue, 24 Feb 2004 09:35:54 +0000
Subject: [Beowulf] Math Coprocessor
In-Reply-To: <Pine.LNX.4.33.0402131439090.20392-100000@caxton.startext.demon.co.uk>
References: <Pine.LNX.4.33.0402131439090.20392-100000@caxton.startext.demon.co.uk>
Message-ID: <200402240935.54623.daniel.kidger@quadrics.com>

> On Fri, 13 Feb 2004, John Hearns wrote:
> > But then again I may be the only person to own "Fortran 77:
> > A Structured Approach".

I don't have that but I do have on my bookshelf "A Fortran Primer" by Elliot Organick, Addison-Wiley (1963)
- so go on: does anyone own any even older Fortran texts ?

Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Tue Feb 24 05:56:45 2004
From: jakob at unthought.net (Jakob Oestergaard)
Date: Tue, 24 Feb 2004 11:56:45 +0100
Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC
In-Reply-To: <1077217418.4982.35.camel@localhost>
References: <1077217418.4982.35.camel@localhost>
Message-ID: <20040224105645.GC29776@unthought.net>

On Thu, Feb 19, 2004 at 02:03:38PM -0500, Ryan Adams wrote:
...
> I have a problem that divides nicely (embarrassingly?) into
> parallelizable chunks.  Each chunk takes approximately 2 to 5 seconds to
> complete and requires no communication during that time.  Essentially
> there is a piece of data, around 500KB that must be processed and a
> result returned.  I'd like to process as many of these pieces of data as
> possible.  I am considering building a small heterogeneous cluster to do
> this (at home, basically), and am trying to decide exactly how to
> architect the task distribution.  

I had the following problem; lots and lots of compile jobs. They take
from a few seconds to a few minutes each.

No batch scheduling system that I tried, was up to the task (simply
waaay too long latency in the scheduling).

...
> I've been considering rolling my own scheduling system using some kind
> of RPC, but I've been around software development long enough to know
> that it is better to use something off-the-shelf if at all possible.

Maybe you would want to take a quick look at ANTS
 http://unthought.net/antsd/

ANTS was the solution I developed for the problem I had, and from the
sound of it, I think your problem may be a good fit for ANTS as well.

I've been updating it as of lately, but haven't put new releases on the
web site.  If you're interested, I can provide you with the new releases
(featuring krellm2 applet! ;) - but the basic functionality is unchanged
from the old release on the web site.

ANTS specifically schedules jobs very quickly - but it lacks the
advanced features of "real" batch systems (like accounting, gang
scheduling, job restart, etc. etc.).

 / jakob

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Feb 24 08:34:16 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 24 Feb 2004 08:34:16 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <20040224015318.GB6942@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private.net>

On Mon, 23 Feb 2004, Greg Lindahl wrote:

> On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> 
> > of course, in the grid nirvana, 
> > all computers would have multiple ports of infiniband,
> > and the word would be 5 us across ;)
> 
> In grid nirvana, the speed of light would rise with Moore's Law.

I'll have to think about that one.

Exponential growth of the speed of light.  Hmmm.  Some sort of
inflationary model?  Space flattening towards non-relativistic
classical?  The physics of Nirvana would be veeeery interesting...

:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ajt at rri.sari.ac.uk  Tue Feb 24 08:56:31 2004
From: ajt at rri.sari.ac.uk (Tony Travis)
Date: Tue, 24 Feb 2004 13:56:31 +0000
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
References: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
Message-ID: <403B580F.1020009@rri.sari.ac.uk>

Matt Valerio wrote:
> Hello,
> 
> I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
> languages.
> 
> That being said, I think it would be interesting to see what the creator of
> both C and C++ has said about the two.  I ran across this interview with
> Bjorn Stroustrup at
> http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
> 
> Like anything on the internet, it should be taken with a grain of salt.  Can
> anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
> stick with C?

Hello, Matt.

I think most people know that Brian Kernighan and Denis Richie created 
'C', not Bjorn Stroustrup: He created C++. The IEEE 'interview' is a 
hoax, of course! but Bjorn Stroustrup doesn't think it's funny:

http://www.research.att.com/~bs/bs_faq.html#IEEE

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctibirna at giref.ulaval.ca  Tue Feb 24 09:15:20 2004
From: ctibirna at giref.ulaval.ca (Cristian Tibirna)
Date: Tue, 24 Feb 2004 09:15:20 -0500
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
References: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
Message-ID: <200402240915.20284.ctibirna@giref.ulaval.ca>

On Tuesday, 24 February 2004 08:13, Matt Valerio wrote:

>
> Like anything on the internet, it should be taken with a grain of salt. 
> Can anyone vouch for its validity, or is it a hoax to get us to all hate
> C++ and stick with C?

Of course it's a hoax ;o)

http://www.research.att.com/~bs/bs_faq.html#IEEE

And in fact all the FAQ deserve a reading, no matter which language one 
preaches as being the Holy Grail.

-- 
Cristian Tibirna				(418) 656-2131 / 4340
  Laval University - Qu?bec, CAN ... http://www.giref.ulaval.ca/~ctibirna
  Research professional - GIREF ... ctibirna at giref.ulaval.ca
  Chemical Engineering PhD Student ... tibirna at gch.ulaval.ca
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From m-valerio at onu.edu  Tue Feb 24 08:13:12 2004
From: m-valerio at onu.edu (Matt Valerio)
Date: Tue, 24 Feb 2004 08:13:12 -0500
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <20040224093548.GA29776@unthought.net>
Message-ID: <200402241315.i1ODFGBf084523@postoffice.onu.edu>

Hello,

I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
languages.

That being said, I think it would be interesting to see what the creator of
both C and C++ has said about the two.  I ran across this interview with
Bjorn Stroustrup at
http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.

Like anything on the internet, it should be taken with a grain of salt.  Can
anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
stick with C?

-Matt


-----Original Message-----
From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of
Jakob Oestergaard
Sent: Tuesday, February 24, 2004 4:36 AM
To: Beowulf
Subject: Re: [Beowulf] C vs C++ challenge

On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote:
> > I could easily optimize it more (do the work on a larger buffer at a
> > once), but I think enough waste heat has been created here.  This is a
> > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3.
> 
> Enough time wasted on finding different solutions to a simple problem?
Surely
> not.  Let me toss my hat into the ring:
...

Hi guys!

Guess who's back - yes, it's your friendly neighborhood language
evangelist :)

I said I'd be gone one week - well, I put instant coffe in the
microwave, and *wooosh* went three weeks ahead in time.

What a fantastic thread this turned into - awk, perl, more C, java and
God knows what.  I'm almost surprised I didn't see a Fortran
implementation.

See, I was trying to follow up on the challenge, then things got
complicated (mainly by me not being able to get the performance I wanted
out of my code) - so instead of flooding your inboxes, I wrote a little
"article" on my findings.

It's up at:
 http://unthought.net/c++/c_vs_c++.html

Highlights:
 *) Benchmarks - real numbers.
 *) A C++'ification of the fast C implementation (that turns out to be
    negligibly faster than the C implementation although the same
    algorithm and the same system calls are used), which is generalized
    and generally made usable as a template library routine (for
    convenient re-use in other projects - yes, this requires all that
    boring non-sexy stuff like freeing up memory etc.)
 *) Two new C++ implementations - another 15 liner that's "only" twice
    as slow as the C code, and another longer different-algorithm C++
    implementation that is significantly faster than the fastest C
    implementation (so far).

Now, I did not include all the extra implementations presented here. I
would like to update the document with those, but I will need a little
feedback from various people.

First; how do I compile the java implementation?  GCC-3.3.2 gives me
----------------------------------------------------------------
[falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java
wordcount.java: In class `wordcount':
wordcount.java: In method `wordcount.main(java.lang.String[])':
wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)'
in type `java.util.regex.Pattern'.
                   words = p.split(s);
                            ^
1 error
----------------------------------------------------------------

Second; another much faster C implementation was posted - I'd like to
test against that one as well. I'm curious as to how it was done, and
I'd like to use it as an example in the document if it turns out that it
makes sense to write a generic C++ implementation of whatever algorithm
is used there.  Well, if the code is not a government secret   ;)

So, well, clearly my document isn't completely updated with all the
great things from this thread - but at least I think it is a decent
reply to the mail where the 'programming pearl' C implementation was
presented.

I guess this could turn into a nice little reference/FAQ/fact type of
document - the oppinions stated there are biased of course, but not
completely unreasonable in my own (biased) oppinion - besides, there's
real-world numbers for solving a real-world problem, that's a pretty
good start I would say  :)

I'd love to hear what people think - if you have the time to give it a
look.

Let me know, flame away, give me Fortran code that is faster than my
'ego-booster' implementation at the bottom of the document!  ;)

Cheers all :)

 / jakob

BTW: Yes, I had a great vacation;
   http://unthought.net/avoriaz/p1010050.jpg  ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From m-valerio at onu.edu  Tue Feb 24 09:12:49 2004
From: m-valerio at onu.edu (Matt Valerio)
Date: Tue, 24 Feb 2004 09:12:49 -0500
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <403B580F.1020009@rri.sari.ac.uk>
Message-ID: <200402241414.i1OEErBf096719@postoffice.onu.edu>

Wow, I guess I didn't do my homework!  Apologizes to everyone for the
misinformation!

As Tony pointed out, the real interview may be found at
http://www.research.att.com/~bs/ieee_interview.html.

-Matt


-----Original Message-----
From: Tony Travis [mailto:ajt at rri.sari.ac.uk] 
Sent: Tuesday, February 24, 2004 8:57 AM
To: Matt Valerio
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] C vs C++ challenge

Matt Valerio wrote:
> Hello,
> 
> I'm an avid C++ programmer myself, and C++ is definitely my choice of the
2
> languages.
> 
> That being said, I think it would be interesting to see what the creator
of
> both C and C++ has said about the two.  I ran across this interview with
> Bjorn Stroustrup at
> http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
> 
> Like anything on the internet, it should be taken with a grain of salt.
Can
> anyone vouch for its validity, or is it a hoax to get us to all hate C++
and
> stick with C?

Hello, Matt.

I think most people know that Brian Kernighan and Denis Richie created 
'C', not Bjorn Stroustrup: He created C++. The IEEE 'interview' is a 
hoax, of course! but Bjorn Stroustrup doesn't think it's funny:

http://www.research.att.com/~bs/bs_faq.html#IEEE

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Feb 24 09:33:37 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 24 Feb 2004 09:33:37 -0500 (EST)
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
Message-ID: <Pine.LNX.4.44.0402240931270.3849-100000@lilith.rgb.private.net>

On Tue, 24 Feb 2004, Matt Valerio wrote:

> Hello,
> 
> I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
> languages.
> 
> That being said, I think it would be interesting to see what the creator of
> both C and C++ has said about the two.  I ran across this interview with
> Bjorn Stroustrup at
> http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
> 
> Like anything on the internet, it should be taken with a grain of salt.  Can
> anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
> stick with C?

I posted that up there when I found it because it is hilarious.  I
assume that it is a satire (not exactly the same thing as a "hoax":-).
However, as is the case with much satire, it contains a lot of little
nuggets that (should) make you think... about "good practice" ways of
coding in C++ if nothing else.

  r-still-a-C-guy-at-heart-gb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcownie at etnus.com  Tue Feb 24 10:37:12 2004
From: jcownie at etnus.com (James Cownie)
Date: Tue, 24 Feb 2004 15:37:12 +0000
Subject: [Beowulf] Adding Latency to a Cluster Environment 
In-Reply-To: Message from joshh@cs.earlham.edu 
   of "Fri, 13 Feb 2004 10:25:31 EST." <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> 
Message-ID: <1AvecS-6wh-00@etnus.com>


> I am profiling a software package that runs over LAM-MPI on 16 node
> cluster s [Details Below]. I would like to measure the effect of
> increased latency on the run time of the program.
> 

Look for "dimemas" on Google.

It's a simulator from Cepba for parallel architectures which is
intended to allow you to adjust exactly this kind of parameter.

At one point they had it coupled up with Pallas' Vampir so that it
could read Vampir trace files and then simulate the same execution
with modified communication properties, or modified CPU properties.

-- 
-- Jim
--
James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From david.n.lombard at intel.com  Tue Feb 24 10:40:29 2004
From: david.n.lombard at intel.com (Lombard, David N)
Date: Tue, 24 Feb 2004 07:40:29 -0800
Subject: [Beowulf] C vs C++ challenge
Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F55AB@orsmsx402.jf.intel.com>

From: Matt Valerio;  Tuesday, February 24, 2004 6:13 AM
> 
> Wow, I guess I didn't do my homework!  Apologizes to everyone for the
> misinformation!
> 
> As Tony pointed out, the real interview may be found at
> http://www.research.att.com/~bs/ieee_interview.html.

For a Stroustrup statement that C proponents (as am I) will also agree
with, see http://www.research.att.com/~bs/bs_faq.html#really-say-that

FYI, the top of the FAQ has a .wav file with the proper pronunciation of
his name...

-- 
David N. Lombard
 
My comments represent my opinions, not those of Intel Corporation.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Tue Feb 24 12:15:01 2004
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Tue, 24 Feb 2004 18:15:01 +0100
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain>
References: <Pine.LNX.4.44.0402231345220.3515-100000@localhost.localdomain>
Message-ID: <200402241815.01355.joachim@ccrl-nece.de>

Donald Becker:
> On Mon, 23 Feb 2004, John Hearns wrote:
> > I hesitate a bit to send things seen on Slashdot to the list,
> > but this is probably relevant:
> >
> >
> > http://www.flashmobcomputing.org/
>
> A bit of hype here.
[...]

Exactly. It's a nice idea (although the wrong approach, as Donald elaborated - 
maybe they will find out), but they shouldn't seriously clame to be the first 
with this "revolutionary idea" (sic!).

In addition to Donald's references to earlier "on-the-fly clusters", here's 
another one from Germany (December 1998):
http://www.heise.de/ix/artikel/E/1999/01/010/

I don't know if they actually submitted results to TOP500 - I could not find a 
matching entry for 1999.

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Tue Feb 24 12:15:45 2004
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 24 Feb 2004 09:15:45 -0800
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
References: <20040224093548.GA29776@unthought.net>
Message-ID: <5.2.0.9.2.20040224091430.017cb1d8@mailhost4.jpl.nasa.gov>

At 08:13 AM 2/24/2004 -0500, Matt Valerio wrote:
>Hello,
>
>I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
>languages.
>
>That being said, I think it would be interesting to see what the creator of
>both C and C++ has said about the two.  I ran across this interview with
>Bjorn Stroustrup at
>http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
>
>Like anything on the internet, it should be taken with a grain of salt.  Can
>anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
>stick with C?
>
>-Matt

That's a classic hoax interview (and I think identified as such by RGB), 
and remarkably funny.  Almost as good as Dijkstra's apocryphal comment that 
more brains have been ruined by BASIC than ....


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Tue Feb 24 12:21:18 2004
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 24 Feb 2004 09:21:18 -0800
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private
 .net>
References: <20040224015318.GB6942@greglaptop.internal.keyresearch.com>
Message-ID: <5.2.0.9.2.20040224091607.0350aa38@mailhost4.jpl.nasa.gov>

At 08:34 AM 2/24/2004 -0500, Robert G. Brown wrote:
>On Mon, 23 Feb 2004, Greg Lindahl wrote:
>
> > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> >
> > > of course, in the grid nirvana,
> > > all computers would have multiple ports of infiniband,
> > > and the word would be 5 us across ;)
> >
> > In grid nirvana, the speed of light would rise with Moore's Law.
>
>I'll have to think about that one.
>
>Exponential growth of the speed of light.  Hmmm.  Some sort of
>inflationary model?  Space flattening towards non-relativistic
>classical?  The physics of Nirvana would be veeeery interesting...

5 usec gives you a "grid diameter" of a mile or so... (if you don't worry 
about pesky things like wires or fibers to carry the signals).  You could 
fit a LOT of processors in a sphere a mile in diameter.  Does bring up some 
interesting questions about optimum interconnection strategies.  Even if 
you put nodes on the surface of that sphere (so you can use free space 
optical interconnects across the middle of the sphere, you'd have about 7.2 
million square meters to fool with. Say you can fit a 100 nodes in a square 
meter.  That's almost a billion nodes.

If you need bigger, one could always use fancy stuff like quantum 
entanglement, about which I don't know much, but which might provide a 
solution to communicating across large distances very quickly (at least in 
one frame of reference)


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From orion at cora.nwra.com  Tue Feb 24 18:17:31 2004
From: orion at cora.nwra.com (Orion Poplawski)
Date: Tue, 24 Feb 2004 16:17:31 -0700
Subject: [Beowulf] G5 cluster for testing
Message-ID: <403BDB8B.7060904@cora.nwra.com>

Anyone (vendors?) out there have a G5 cluster available for some 
testing?  I've been charged with putting together a small cluster and 
have been asked to look into G5 systems as well (I guess 64 bit powerPC 
really....)

Thanks

-- 
Orion Poplawski
System Administrator                   303-415-9701 x222
Colorado Research Associates/NWRA      FAX: 303-415-9702
3380 Mitchell Lane, Boulder CO 80301   http://www.co-ra.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mwheeler at startext.co.uk  Tue Feb 24 17:57:07 2004
From: mwheeler at startext.co.uk (Martin WHEELER)
Date: Tue, 24 Feb 2004 22:57:07 +0000 (UTC)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.33.0402242251250.8151-100000@caxton.startext.demon.co.uk>

On Tue, 24 Feb 2004, Robert G. Brown wrote:

> > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> >
> > In grid nirvana, the speed of light would rise with Moore's Law.
>
> I'll have to think about that one.

Then you'll have to think *very* (exponentially?) fast.  Just to keep up
with where you were when you started...

Shades of the Red Queen.  :)

Maybe Lewis Carroll already described the physics?
-- 
Martin Wheeler   -   StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England
mwheeler at startext.co.uk                http://www.startext.co.uk/mwheeler/
GPG pub key : 01269BEB  6CAD BFFB DB11 653E B1B7 C62B  AC93 0ED8 0126 9BEB
      - Share your knowledge. It's a way of achieving immortality. -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Tue Feb 24 09:10:31 2004
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Tue, 24 Feb 2004 08:10:31 -0600
Subject: [Beowulf] C vs C++ challenge
In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
References: <200402241315.i1ODFGBf084523@postoffice.onu.edu>
Message-ID: <403B5B57.8080403@tamu.edu>

Since he's now faculty here, I guess I'll walk down the hall and ask him.

gerry

Matt Valerio wrote:
> Hello,
> 
> I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2
> languages.
> 
> That being said, I think it would be interesting to see what the creator of
> both C and C++ has said about the two.  I ran across this interview with
> Bjorn Stroustrup at
> http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html.
> 
> Like anything on the internet, it should be taken with a grain of salt.  Can
> anyone vouch for its validity, or is it a hoax to get us to all hate C++ and
> stick with C?
> 
> -Matt
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of
> Jakob Oestergaard
> Sent: Tuesday, February 24, 2004 4:36 AM
> To: Beowulf
> Subject: Re: [Beowulf] C vs C++ challenge
> 
> On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote:
> 
>>>I could easily optimize it more (do the work on a larger buffer at a
>>>once), but I think enough waste heat has been created here.  This is a
>>>simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3.
>>
>>Enough time wasted on finding different solutions to a simple problem?
> 
> Surely
> 
>>not.  Let me toss my hat into the ring:
> 
> ...
> 
> Hi guys!
> 
> Guess who's back - yes, it's your friendly neighborhood language
> evangelist :)
> 
> I said I'd be gone one week - well, I put instant coffe in the
> microwave, and *wooosh* went three weeks ahead in time.
> 
> What a fantastic thread this turned into - awk, perl, more C, java and
> God knows what.  I'm almost surprised I didn't see a Fortran
> implementation.
> 
> See, I was trying to follow up on the challenge, then things got
> complicated (mainly by me not being able to get the performance I wanted
> out of my code) - so instead of flooding your inboxes, I wrote a little
> "article" on my findings.
> 
> It's up at:
>  http://unthought.net/c++/c_vs_c++.html
> 
> Highlights:
>  *) Benchmarks - real numbers.
>  *) A C++'ification of the fast C implementation (that turns out to be
>     negligibly faster than the C implementation although the same
>     algorithm and the same system calls are used), which is generalized
>     and generally made usable as a template library routine (for
>     convenient re-use in other projects - yes, this requires all that
>     boring non-sexy stuff like freeing up memory etc.)
>  *) Two new C++ implementations - another 15 liner that's "only" twice
>     as slow as the C code, and another longer different-algorithm C++
>     implementation that is significantly faster than the fastest C
>     implementation (so far).
> 
> Now, I did not include all the extra implementations presented here. I
> would like to update the document with those, but I will need a little
> feedback from various people.
> 
> First; how do I compile the java implementation?  GCC-3.3.2 gives me
> ----------------------------------------------------------------
> [falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java
> wordcount.java: In class `wordcount':
> wordcount.java: In method `wordcount.main(java.lang.String[])':
> wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)'
> in type `java.util.regex.Pattern'.
>                    words = p.split(s);
>                             ^
> 1 error
> ----------------------------------------------------------------
> 
> Second; another much faster C implementation was posted - I'd like to
> test against that one as well. I'm curious as to how it was done, and
> I'd like to use it as an example in the document if it turns out that it
> makes sense to write a generic C++ implementation of whatever algorithm
> is used there.  Well, if the code is not a government secret   ;)
> 
> So, well, clearly my document isn't completely updated with all the
> great things from this thread - but at least I think it is a decent
> reply to the mail where the 'programming pearl' C implementation was
> presented.
> 
> I guess this could turn into a nice little reference/FAQ/fact type of
> document - the oppinions stated there are biased of course, but not
> completely unreasonable in my own (biased) oppinion - besides, there's
> real-world numbers for solving a real-world problem, that's a pretty
> good start I would say  :)
> 
> I'd love to hear what people think - if you have the time to give it a
> look.
> 
> Let me know, flame away, give me Fortran code that is faster than my
> 'ego-booster' implementation at the bottom of the document!  ;)
> 
> Cheers all :)
> 
>  / jakob
> 
> BTW: Yes, I had a great vacation;
>    http://unthought.net/avoriaz/p1010050.jpg  ;)
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ashley at quadrics.com  Tue Feb 24 09:10:09 2004
From: ashley at quadrics.com (Ashley Pittman)
Date: Tue, 24 Feb 2004 14:10:09 +0000
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>
References: 	 <Pine.LNX.4.44.0402231742170.3858-100000@coffee.psychology.mcmaster.ca>
Message-ID: <1077631809.646.83.camel@ashley>

On Mon, 2004-02-23 at 22:47, Mark Hahn wrote:
> > Still, it sounds like a fun, demystifying demo that introduces people to
> > scalable computing.
> 
> demystification is always good.  IMO, the best part of this is that 
> it'll actually demonstrate why a flash mob *CAN'T* build a supercomputer.
> partly the reason is hetrogeneity and other "practical" downers.  

It will be interesting to see, I don't expect they are going to get much
time to benchmark but it would be nice to have a plot of achieved
performance against CPU count in this kind of configuration.  Anybody
care to predict how many CPU's you will need before wall clock
performance starts dropping?

> but mainly, a super-computer needs a super-network.

That I won't dispute but does a single linpack run require a
super-computer?

Ashley,

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raiders at phreaker.net  Tue Feb 24 10:39:19 2004
From: raiders at phreaker.net (raiders at phreaker.net)
Date: Tue, 24 Feb 2004 23:39:19 +0800
Subject: [Beowulf] Subclusters...
Message-ID: <200402242339.19310.raiders@phreaker.net>

We are on a project as described below:

- IA32 linux cluster for general parallel programming
- five head nodes, each head node will have about 15 compute nodes and 
dedicated storage 
- groups of cluster-users will be restricted to their own clusters normally 
(some exclusions may apply)
- SGE/PBS, GbE etc are standard choices

But the people in power want one single software or admin console (cluster 
toolkit?) to manage the entire cluster from one adm station (which may or may 
not be one of the head nodes).

I looked around and could not find any suitable solution (ROCKS, oscar, etc). 
ROCKS, oscar etc can manage only one cluster at a time and cannot handle 
subclusters. (I might be wrong) 

 I believe that only custom programming can help. Appreciate any expert 
opinion

Thanks,
Shawn


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Tue Feb 24 23:38:54 2004
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Tue, 24 Feb 2004 20:38:54 -0800 (PST)
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <403BDB8B.7060904@cora.nwra.com>
Message-ID: <Pine.LNX.4.44.0402242035100.6690-100000@twin.uoregon.edu>

I'd suggest asking your friendly IBM sales guy about ppc970 blades...

joelja

On Tue, 24 Feb 2004, Orion Poplawski wrote:

> Anyone (vendors?) out there have a G5 cluster available for some 
> testing?  I've been charged with putting together a small cluster and 
> have been asked to look into G5 systems as well (I guess 64 bit powerPC 
> really....)
> 
> Thanks
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at cse.ucdavis.edu  Wed Feb 25 02:10:39 2004
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Tue, 24 Feb 2004 23:10:39 -0800
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <403BDB8B.7060904@cora.nwra.com>
References: <403BDB8B.7060904@cora.nwra.com>
Message-ID: <20040225071039.GA29125@cse.ucdavis.edu>

On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote:
> Anyone (vendors?) out there have a G5 cluster available for some 

For the most part I'm finding that cluster performance is mostly
predictable by single node performance, and the scaling of the
interconnect.  At least as an approximation, I'm going to use to find
a good place to start for my next couple cluster designs.

I'm current benchmarking:
	Dual G5
	Opteron duals (1.4, 1.8, and 2.2)
	Opteron quad 1.4
	Itanium dual 1.4 GHz
	Dual P4-3.0 GHz+HT
	Single P4-3.0 GHz+HT

Alas, my single node performance testing on the G5 has been foiled by
my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem 
working.

Anyone else have MPICH and shared memory working on OSX?  Or maybe a dual
g5 linux account for an evening of benchmarking?

Normally using ch_p4 and localhost wouldn't be to big a deal, but
ping localhost on OSX is something like 40 times than linux, mpich with
ch_p4 on OSX is around 20 times worse than linux with shared memory.

> testing?  I've been charged with putting together a small cluster and 
> have been asked to look into G5 systems as well (I guess 64 bit powerPC 
> really....)

Assuming all the applications and tools work under all environments your
considering I'd figure out what interconnect you want to get first.

-- 
Bill Broadley
Computational Science and Engineering
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Wed Feb 25 03:34:38 2004
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 25 Feb 2004 09:34:38 +0100 (CET)
Subject: [Beowulf] S.M.A.R.T usage in big clusters
In-Reply-To: <1077542843.26492.12.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.21.0402250910490.393-100000@localhost>

On Mon, 23 Feb 2004, Daniel Fernandez wrote:
[...]
> On the other hand, is possible to deviate smartd log to a specific file
> and check it regularly when it's updated, adding this parameter to
> smartd:
>
> 	-l facility
>
> Of course some syslog.conf modifying will be needed to instruct syslogd
> to log on a specific file from the "facility" specified.

Thanks for the hints, I was not yet aware of the -l and -M flags.

Still, I think directly calling "smartctl" from a cron job is the
better solution. With just smartd and the flags above, you still won't
get any updates if the smartd simply dies and you won't even notice,
because grep simply finds the last entry in the log. Besides, you
still have the problem of log rotate (except if you let grow your log
file forever...).

Regards,
Felix

---
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Wed Feb 25 04:18:14 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Wed, 25 Feb 2004 10:18:14 +0100 (CET)
Subject: [Beowulf] Subclusters...
In-Reply-To: <200402242339.19310.raiders@phreaker.net>
Message-ID: <Pine.LNX.4.44.0402251011160.19127-100000@druifje.clustervision.com>

On Tue, 24 Feb 2004 raiders at phreaker.net wrote:

> - groups of cluster-users will be restricted to their own clusters normally 
> (some exclusions may apply)
> - SGE/PBS, GbE etc are standard choices
> 
A very quick answer from me is to think of the whole thing as one cluster,
then install it.

In SGE, it is possible to have groups of users defined, and to allow only
certain groups/users access to each queue.
So (say) you could have a Physics group, a Chemistry group etc.

As for access to the public facing nodes, again quickly off the top of my
head, you just need authentication which allows logins only from the
appropriate group.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Wed Feb 25 07:01:02 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Wed, 25 Feb 2004 13:01:02 +0100 (CET)
Subject: [Beowulf] FOSDEM talk
Message-ID: <Pine.LNX.4.44.0402251249540.19985-100000@druifje.clustervision.com>

There is a current thread on SMART usage.
There was also a thread about six months ago on lm_sensors,
about the output format of sensors, and how one has to parse it.

Sorry if this message is a bit of a ramble.
At FOSDEM over the weekend I went to a talk by Robert Love
on his work on Linux kernel and destop integration,
on HAL and DBUS.
One slide made me sit up and take notice, as he had an example
of a kernel message saying 'overheating'.
The message format was something like an SNMP OID,
as I remember  org.kernel.processor.overheating
(or something like that).

One could then think of a process listening on the netlink socket,
generating (for example) an SNMP trap on receiving events of this
category.
A better way of doing things than running sensors periodically then
parsing the output.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Wed Feb 25 04:55:22 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Wed, 25 Feb 2004 10:55:22 +0100 (CET)
Subject: [Beowulf] Subclusters...
In-Reply-To: <200402242339.19310.raiders@phreaker.net>
Message-ID: <Pine.LNX.4.44.0402251043200.19127-100000@druifje.clustervision.com>

On Tue, 24 Feb 2004 raiders at phreaker.net wrote:

> We are on a project as described below:
> 
> - IA32 linux cluster for general parallel programming
> - five head nodes, each head node will have about 15 compute nodes and 
> dedicated storage 
> - groups of cluster-users will be restricted to their own clusters normally 
> (some exclusions may apply)
> - SGE/PBS, GbE etc are standard choices
> 
> But the people in power want one single software or admin console (cluster 
> toolkit?) to manage the entire cluster from one adm station (which may or may 
> not be one of the head nodes).

Thinking about this, the way I would architect things is to stop thinking
of subclusters - yet of course give the users their allocation of 
resources.

So, choose your cluster install method of choice.
Have one admin/master node and install all 75 nodes.

Have 5 public facing machines, and have logins go through a load-balancer 
or round robin. When a user logs in they get directed to the least loaded 
machine.
Why? If one machine goes down (fault or upgrade) the users still have four 
machines. They don't "see" this as you have entries in the DNS for e.g.
necromancy.hogwarts defence-darkarts.hogwarts potions.hogwarts
spells.hogwarts magical-creatures.hogwarts
all pointing the same way.

It would be better to have 5 separate storage nodes, but the login
machines in your scenario will have to do that job also. Just allocate
storage per group.

The 75 compute nodes are installed within the cluster.

Now, at a first pass you want to 'saw things up' into 15 node lumps.
This can be done easily - just put a queue or queues on each and allow 
only certain groups access.

But I will contend this is a bad idea. Batch queueing systems have 
facilities to look after fair shares of resources between groups.

Say you have the 5 separate groups scenario.
Say today Professor Snape isn't doing any potions work.
The 15 potions machines will lie idel, while there are plenty of jobs in
necromancy just dying to run.

Use the fairshare in SGE or LSF. 
Each group will get their allocated share of CPU.
You'll also have redundancy - so that you can take machines out for
maintenance/repairs without impacting any one group, ie. the load is 
shared across 75 machines not 5.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From patricka at its.uct.ac.za  Wed Feb 25 09:01:49 2004
From: patricka at its.uct.ac.za (Patrick)
Date: Wed, 25 Feb 2004 16:01:49 +0200
Subject: [Beowulf] G5 cluster for testing
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu>
Message-ID: <026001c3fba7$e9af5d00$a61b9e89@nawty>

Has anyone here actually tried out Xgrid ? Apples grid stuff. It seems to be
not so fussy in regards to the type of macs you attach and suchlike ? as
well as them being configurable via Zeroconf.

P
----- Original Message ----- 
From: "Bill Broadley" <bill at cse.ucdavis.edu>
To: "Orion Poplawski" <orion at cora.nwra.com>
Cc: <beowulf at beowulf.org>
Sent: Wednesday, February 25, 2004 9:10 AM
Subject: Re: [Beowulf] G5 cluster for testing


> On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote:
> > Anyone (vendors?) out there have a G5 cluster available for some
>
> For the most part I'm finding that cluster performance is mostly
> predictable by single node performance, and the scaling of the
> interconnect.  At least as an approximation, I'm going to use to find
> a good place to start for my next couple cluster designs.
>
> I'm current benchmarking:
> Dual G5
> Opteron duals (1.4, 1.8, and 2.2)
> Opteron quad 1.4
> Itanium dual 1.4 GHz
> Dual P4-3.0 GHz+HT
> Single P4-3.0 GHz+HT
>
> Alas, my single node performance testing on the G5 has been foiled by
> my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem
> working.
>
> Anyone else have MPICH and shared memory working on OSX?  Or maybe a dual
> g5 linux account for an evening of benchmarking?
>
> Normally using ch_p4 and localhost wouldn't be to big a deal, but
> ping localhost on OSX is something like 40 times than linux, mpich with
> ch_p4 on OSX is around 20 times worse than linux with shared memory.
>
> > testing?  I've been charged with putting together a small cluster and
> > have been asked to look into G5 systems as well (I guess 64 bit powerPC
> > really....)
>
> Assuming all the applications and tools work under all environments your
> considering I'd figure out what interconnect you want to get first.
>
> -- 
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Feb 25 08:22:15 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Wed, 25 Feb 2004 21:22:15 +0800 (CST)
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu>
Message-ID: <20040225132215.21497.qmail@web16813.mail.tpe.yahoo.com>

I've heard that LAM works better with OSX.

Andrew.


 --- Bill Broadley <bill at cse.ucdavis.edu> ????> On

> Alas, my single node performance testing on the G5
> has been foiled by
> my inability to get MPICH, OSX, and ./configure
> --with-device=ch_shmem 
> working.
> 
> Anyone else have MPICH and shared memory working on
> OSX?  Or maybe a dual
> g5 linux account for an evening of benchmarking?
> 
> Normally using ch_p4 and localhost wouldn't be to
> big a deal, but
> ping localhost on OSX is something like 40 times
> than linux, mpich with
> ch_p4 on OSX is around 20 times worse than linux
> with shared memory.
> 
> > testing?  I've been charged with putting together
> a small cluster and 
> > have been asked to look into G5 systems as well (I
> guess 64 bit powerPC 
> > really....)
> 
> Assuming all the applications and tools work under
> all environments your
> considering I'd figure out what interconnect you
> want to get first.
> 
> -- 
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Feb 25 08:27:54 2004
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Wed, 25 Feb 2004 21:27:54 +0800 (CST)
Subject: [Beowulf] Subclusters...
In-Reply-To: <200402242339.19310.raiders@phreaker.net>
Message-ID: <20040225132754.14108.qmail@web16809.mail.tpe.yahoo.com>

GridEngine has the concept of a CELL.

It is not well documented, but it works like pointing
to a different cell gives you a different
configuration, ie. different subcluster.

When you setup SGE, it will ask you for the name of
the cell, so on the same head node, each time you run
the sge install script, use a different cell name.
This way you will get 5 different SGE clusters
controlled by the same headnode.

Better ask on the SGE mailing list since I've never
played around with this too much.

http://gridengine.sunsource.net/project/gridengine/maillist.html

Andrew.


 --- raiders at phreaker.net ????> We are on a
project as described below:
> 
> - IA32 linux cluster for general parallel
> programming
> - five head nodes, each head node will have about 15
> compute nodes and 
> dedicated storage 
> - groups of cluster-users will be restricted to
> their own clusters normally 
> (some exclusions may apply)
> - SGE/PBS, GbE etc are standard choices
> 
> But the people in power want one single software or
> admin console (cluster 
> toolkit?) to manage the entire cluster from one adm
> station (which may or may 
> not be one of the head nodes).
> 
> I looked around and could not find any suitable
> solution (ROCKS, oscar, etc). 
> ROCKS, oscar etc can manage only one cluster at a
> time and cannot handle 
> subclusters. (I might be wrong) 
> 
>  I believe that only custom programming can help.
> Appreciate any expert 
> opinion
> 
> Thanks,
> Shawn
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ashley at quadrics.com  Wed Feb 25 08:03:02 2004
From: ashley at quadrics.com (Ashley Pittman)
Date: Wed, 25 Feb 2004 13:03:02 +0000
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu>
References: <403BDB8B.7060904@cora.nwra.com>
	 <20040225071039.GA29125@cse.ucdavis.edu>
Message-ID: <1077714182.656.235.camel@ashley>

On Wed, 2004-02-25 at 07:10, Bill Broadley wrote:
> On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote:
> > Anyone (vendors?) out there have a G5 cluster available for some 
> 
> For the most part I'm finding that cluster performance is mostly
> predictable by single node performance, and the scaling of the
> interconnect.

There is a third issue here which you've missed which is that
interconnect performance can depends on the PCI bridge that it's plugged
into.  It would be more correct to say that performance is predictable
by dual-node performance and scaling of the interconnect.  Of course
this may not make a difference for Ethernet or even gig-e but it does
matter at the high end.

Ashley,

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at pathscale.com  Wed Feb 25 14:32:12 2004
From: lindahl at pathscale.com (Greg Lindahl)
Date: Wed, 25 Feb 2004 11:32:12 -0800
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <1077714182.656.235.camel@ashley>
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <1077714182.656.235.camel@ashley>
Message-ID: <20040225193212.GA14558@greglaptop.internal.keyresearch.com>

On Wed, Feb 25, 2004 at 01:03:02PM +0000, Ashley Pittman wrote:

> There is a third issue here which you've missed which is that
> interconnect performance can depends on the PCI bridge that it's plugged
> into.

Doesn't the G5 have exactly one chipset implementation available?

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From pesch at attglobal.net  Thu Feb 26 05:38:09 2004
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Thu, 26 Feb 2004 02:38:09 -0800
Subject: [Beowulf] Flashmobcomputing
References: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private.net>
Message-ID: <403DCC91.5A10A77B@attglobal.net>

Nothing moves faster than the speed of light - with the exception of bad news (according to the late Douglas
Adams); therefore, at the grid nirvana, bad news must get increasingly more bad. Which leads me to the hypothesis
that nirvana is that locus at the irs which stores the access codes for the pentium microcode backdoors...

"Robert G. Brown" wrote:

> On Mon, 23 Feb 2004, Greg Lindahl wrote:
>
> > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote:
> >
> > > of course, in the grid nirvana,
> > > all computers would have multiple ports of infiniband,
> > > and the word would be 5 us across ;)
> >
> > In grid nirvana, the speed of light would rise with Moore's Law.
>
> I'll have to think about that one.
>
> Exponential growth of the speed of light.  Hmmm.  Some sort of
> inflationary model?  Space flattening towards non-relativistic
> classical?  The physics of Nirvana would be veeeery interesting...
>
> :-)
>
>    rgb
>
> --
> Robert G. Brown                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Feb 25 22:02:03 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 26 Feb 2004 14:02:03 +1100
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <403DCC91.5A10A77B@attglobal.net>
References: <Pine.LNX.4.44.0402240831020.3849-100000@lilith.rgb.private.net> <403DCC91.5A10A77B@attglobal.net>
Message-ID: <200402261402.05015.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 26 Feb 2004 09:38 pm, pesch at attglobal.net wrote:

> Nothing moves faster than the speed of light - with the exception of bad
> news (according to the late Douglas Adams);

The only things known to go faster than ordinary light is monarchy, according 
to the philosopher Ly Tin Weedle. He reasoned like this: you can't have more 
than one king, and tradition demands that there is no gap between kings, so 
when a king dies the succession must therefore pass to the heir 
instantaneously. Presumably, he said, there must be some elementary particles 
- -- kingons, or possibly queons -- that do this job, but of course succession 
sometimes fails if, in mid-flight, they strike an anti-particle, or 
republicon. His ambitious plans to use his discovery to send messages, 
involving the careful torturing of a small king in order to modulate the 
signal, were never fully expanded because, at that point, the bar closed.

- -- (Terry Pratchett, Mort)

courtesy of:

	http://www.co.uk.lspace.org/books/pqf/mort.html

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAPWGrO2KABBYQAh8RAsm3AJ4zV3fEk8q/8Jm/zqY4xiBzGvKj4ACfeT+N
3NhDhvgiJyhukmnzBFHUaMQ=
=NgG+
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at cse.ucdavis.edu  Wed Feb 25 21:32:59 2004
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Wed, 25 Feb 2004 18:32:59 -0800
Subject: [Beowulf] Cray buys Octigabay
Message-ID: <20040226023258.GA9211@cse.ucdavis.edu>


An interesting development:
	http://www.octigabay.com/
	http://www.octigabay.com/newsEvents/cray_release.htm
	http://www.cray.com/
	http://www.cray.com/media/2004/february/octigabay.html

-- 
Bill Broadley
Computational Science and Engineering
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hpcatcnc at yahoo.com  Thu Feb 26 01:56:09 2004
From: hpcatcnc at yahoo.com (prakash borade)
Date: Wed, 25 Feb 2004 22:56:09 -0800 (PST)
Subject: [Beowulf] predifined nodes for a job
Message-ID: <20040226065609.19075.qmail@web21507.mail.yahoo.com>

can any body tell how can i allot
some fix predefined machines from my cluster to the
job

i have tried using option -machinefile mcfile
where mcfile is the fiel in a local dir contaning
required machinnames 

also i don't want to use the machine from which i will
issue mpirun command and the mpich is installed opn
thish machine

do i have any solution  fro this

__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hpcatcnc at yahoo.com  Thu Feb 26 02:04:08 2004
From: hpcatcnc at yahoo.com (prakash borade)
Date: Wed, 25 Feb 2004 23:04:08 -0800 (PST)
Subject: [Beowulf] predifined nodes for a job
Message-ID: <20040226070408.73987.qmail@web21509.mail.yahoo.com>

can any body tell how can i allot
some fix predefined machines from my cluster to the
job

i have tried using option -machinefile mcfile
where mcfile is the fiel in a local dir contaning
required machinnames 

also i don't want to use the machine from which i will
issue mpirun command and the mpich is installed opn
thish machine

do i have any solution  fro this

__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sdutta at deas.harvard.edu  Thu Feb 26 07:43:51 2004
From: sdutta at deas.harvard.edu (Suvendra Nath Dutta)
Date: Thu, 26 Feb 2004 07:43:51 -0500 (EST)
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu>
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu>
Message-ID: <Pine.GSO.4.58.0402260735480.27557@mass>

I fought for a while to get a OSX cluster up, precisely to test the G5
performance. I had lots of problems with setting up NFS and setting
up MPICH to use shared memory on the dual processors. I was able to take
advantage of the firewire networking built into OS X. We were taking the
harder route of staying away from all non-open source tools to do NFS
(NFSManager) or MPI (Pooch).

As was pointed out in another message, we are mostly keen on just testing
performance of three applications that we will run on our cluster rather
than HPL numbers. Finally we gave up the struggle. We are now working with
Apple to benchmark on an existing setup instead of us trying to set
everything up ourselves. Unfortunately there isn't a howto on doing this
yet.

I'll post numbers when we get it.

Suvendra.


On Tue, 24 Feb 2004, Bill Broadley wrote:

> On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote:
> > Anyone (vendors?) out there have a G5 cluster available for some
>
> For the most part I'm finding that cluster performance is mostly
> predictable by single node performance, and the scaling of the
> interconnect.  At least as an approximation, I'm going to use to find
> a good place to start for my next couple cluster designs.
>
> I'm current benchmarking:
> 	Dual G5
> 	Opteron duals (1.4, 1.8, and 2.2)
> 	Opteron quad 1.4
> 	Itanium dual 1.4 GHz
> 	Dual P4-3.0 GHz+HT
> 	Single P4-3.0 GHz+HT
>
> Alas, my single node performance testing on the G5 has been foiled by
> my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem
> working.
>
> Anyone else have MPICH and shared memory working on OSX?  Or maybe a dual
> g5 linux account for an evening of benchmarking?
>
> Normally using ch_p4 and localhost wouldn't be to big a deal, but
> ping localhost on OSX is something like 40 times than linux, mpich with
> ch_p4 on OSX is around 20 times worse than linux with shared memory.
>
> > testing?  I've been charged with putting together a small cluster and
> > have been asked to look into G5 systems as well (I guess 64 bit powerPC
> > really....)
>
> Assuming all the applications and tools work under all environments your
> considering I'd figure out what interconnect you want to get first.
>
> --
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at cse.ucdavis.edu  Thu Feb 26 06:55:22 2004
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Thu, 26 Feb 2004 03:55:22 -0800
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <1077714182.656.235.camel@ashley>
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <1077714182.656.235.camel@ashley>
Message-ID: <20040226115522.GA12286@cse.ucdavis.edu>

On Wed, Feb 25, 2004 at 01:03:02PM +0000, Ashley Pittman wrote:
> There is a third issue here which you've missed which is that
> interconnect performance can depends on the PCI bridge that it's plugged
> into.  It would be more correct to say that performance is predictable
> by dual-node performance and scaling of the interconnect.  Of course
> this may not make a difference for Ethernet or even gig-e but it does
> matter at the high end.

Take this chart for instance:
	http://www.myri.com/myrinet/PCIX/bus_performance.html

On any decent size cluster the node performance or interconnect
performance is likely to be significantly larger effects on cluster
performance then any of the differences on that chart.

Or maybe your talking about sticking $1200 Myrinet cards in a 
133 MB/sec PCI slot?

Don't forget peak bandwidth measurements assume huge (10000-64000 byte
packets), latency tolerance, and zero computation.  Not exactly the use
I'd expect in a typical production cluster.

So my suggestion is:
#1  Pick your application(s), this is why your buying a cluster right?
#2  For compatible nodes pick the node with the best perf
    or price/perf.
#3  For compatible interconnects pick the one with the best
    scaling or price/scaling for the number of nodes you can afford/fit.
#3  If you get a choice of PCI-X bridges, sure consult the URL above
    and pick the fastest one.

-- 
Bill Broadley
Computational Science and Engineering
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb 26 11:51:42 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 26 Feb 2004 11:51:42 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <403DCC91.5A10A77B@attglobal.net>
Message-ID: <Pine.LNX.4.44.0402261148060.6202-100000@lilith.rgb.private.net>

On Thu, 26 Feb 2004 pesch at attglobal.net wrote:

> Nothing moves faster than the speed of light - with the exception of bad
> news (according to the late Douglas Adams); therefore, at the grid
> nirvana, bad news must get increasingly more bad. Which leads me to the
> hypothesis that nirvana is that locus at the irs which stores the access
> codes for the pentium microcode backdoors...

This is not exactly correct.  Or rather, it might well be true
(something mandala-like in the image of that locus:-) but isn't strictly
logical or on topic for the list.  The correct LIST conclusion is that
for us to build transluminal clusters, we need to insure that all the
messages (news) carried are bad.

Now, who is going to develop BMPI (Bad Message Passing Interface)?  Any
volunteers?

;-)

   rgb


-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From choyhauyan at yahoo.com  Thu Feb 26 00:51:24 2004
From: choyhauyan at yahoo.com (choy hau yan)
Date: Wed, 25 Feb 2004 21:51:24 -0800 (PST)
Subject: [Beowulf] shared distributed memory ?
Message-ID: <20040226055124.33033.qmail@web41313.mail.yahoo.com>

I am a user for scyld beowulf cluster. I use mpi  for
parallel computing.I have some question:

> I got 2 processors that in shared memory and then
> connect with TCP/IP to another 2 processors in
shared
> memory.
>
> I use mpisend/recv for communication, but why can't
I
> call this shared distributed memory?
> The speedup with this architecture is very low.why?
>
> speedup:
> 2 processor: 1.61
> 3 processor: 2.31
> 4 processor: 2.30
> actually with shared memory, the speedup is more
high
> that distributed becasue almost no cos communication
> in shared memory.right? Hope that some one can
answer my question. thanks..
>


__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From moor007 at bellsouth.net  Thu Feb 26 16:51:03 2004
From: moor007 at bellsouth.net (moor007 at bellsouth.net)
Date: Thu, 26 Feb 2004 15:51:03 -0600
Subject: [Beowulf] Cluster HW
Message-ID: <200402261551.03785.moor007@bellsouth.net>

I apologize for having to ask in this forum...but I really do not know where 
to begin.  I just upgraded my interconnects from the Dolphinics (SCI) and 
want to sell them (rarely used) because only one of the four applications I 
use would utilize them.  Is there a forum/market, besides Ebay, for this type 
of specialty HW?

Tim

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Feb 26 17:34:08 2004
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 27 Feb 2004 09:34:08 +1100
Subject: [Beowulf] G5 cluster for testing
In-Reply-To: <Pine.GSO.4.58.0402260735480.27557@mass>
References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <Pine.GSO.4.58.0402260735480.27557@mass>
Message-ID: <200402270934.10022.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 26 Feb 2004 11:43 pm, Suvendra Nath Dutta wrote:

> We were taking the harder route of staying away from all non-open source
> tools to do NFS (NFSManager) or MPI (Pooch).

There is also Black Lab Linux from Terrasoft which build clusters on YDL with 
BProc, MPICH, etc for Macs. No idea whether it supports G5's or how FOSS it 
is though..

cheers!
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD4DBQFAPnRgO2KABBYQAh8RAq6sAJMEJwyT1vn3MV9RM/Fwpy6gs4CZAJ9QAGf2
oyEbIVcHgTfcs+Jk2xb7dg==
=92C8
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Thu Feb 26 19:01:51 2004
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Thu, 26 Feb 2004 19:01:51 -0500 (EST)
Subject: [Beowulf] shared distributed memory ?
In-Reply-To: <20040226055124.33033.qmail@web41313.mail.yahoo.com>
Message-ID: <Pine.LNX.4.44.0402261854260.29407-100000@boltzmann-internal>


A few questions:

What are your processor speeds?

What is your interconnect?

What is your application?

Having to communicate with another node vs the same node
is not the same thing. (ping local host and ping the other
node) Obviously your application is sensitive to the interconnect
(either bandwidth of latency)

Really fast processors and a slow interconnect usually means poor
scalability for some applications. 

Doug

On Wed, 25 Feb 2004, choy hau yan wrote:

> I am a user for scyld beowulf cluster. I use mpi  for
> parallel computing.I have some question:
> 
> > I got 2 processors that in shared memory and then
> > connect with TCP/IP to another 2 processors in
> shared
> > memory.
> >
> > I use mpisend/recv for communication, but why can't
> I
> > call this shared distributed memory?
> > The speedup with this architecture is very low.why?
> >
> > speedup:
> > 2 processor: 1.61
> > 3 processor: 2.31
> > 4 processor: 2.30
> > actually with shared memory, the speedup is more
> high
> > that distributed becasue almost no cos communication
> > in shared memory.right? Hope that some one can
> answer my question. thanks..
> >
> 
> 
> __________________________________
> Do you Yahoo!?
> Get better spam protection with Yahoo! Mail.
> http://antispam.yahoo.com/tools
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From graham.mullier at syngenta.com  Fri Feb 27 05:12:18 2004
From: graham.mullier at syngenta.com (graham.mullier at syngenta.com)
Date: Fri, 27 Feb 2004 10:12:18 -0000
Subject: [Beowulf] Flashmobcomputing
Message-ID: <0B27450D68F1D511993E0001FA7ED2B30437D3BD@ukjhmbx12.ukjh.zeneca.com>

Hmm, presumably a 'bad' message will need to have the Evil Bit set
(http://www.ietf.org/rfc/rfc3514.txt)?

Graham

-----Original Message-----
From: Robert G. Brown [mailto:rgb at phy.duke.edu]
[...] The correct LIST conclusion is that
for us to build transluminal clusters, we need to insure that all the
messages (news) carried are bad.

Now, who is going to develop BMPI (Bad Message Passing Interface)?  Any
volunteers?

;-)
[...]
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From anantanagb at yahoo.com  Fri Feb 27 04:49:38 2004
From: anantanagb at yahoo.com (anantanag bhat)
Date: Fri, 27 Feb 2004 01:49:38 -0800 (PST)
Subject: [Beowulf] P4_error: net_recv read : probable EOF on socket:1
Message-ID: <20040227094938.32769.qmail@web21322.mail.yahoo.com>

Sir,
I have installed MPICH on my 8 processor Cluster.
Every thing was running fine for first few days. Now
if I starts the run in the node4, it is getting stuck.
after 2hour. the error in the .out file is as below
"P4_error: net_recv read : probable EOF on socket:1"

But it is not the same in first 3 nodes. In these runs
are going fine.
Can anybody please help me to solve this.
Thanks in advance


__________________________________
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Feb 27 08:11:43 2004
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 27 Feb 2004 08:11:43 -0500 (EST)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B30437D3BD@ukjhmbx12.ukjh.zeneca.com>
Message-ID: <Pine.LNX.4.44.0402270801240.6948-100000@lilith.rgb.private.net>

On Fri, 27 Feb 2004 graham.mullier at syngenta.com wrote:

> Hmm, presumably a 'bad' message will need to have the Evil Bit set
> (http://www.ietf.org/rfc/rfc3514.txt)?

You know, I just joined the ietf.org list a week or two ago to see if
there was any possibility of leveraging their influence on e.g. AV
vendors to get them to stop mailing bounce messages back to the "From"
address on viruses, given that there hasn't been a virus that hasn't
forged its From header to an innocent third party for several years now.

Finding myself sucked into an endless discussion with people who want
the ietf to issue an RFC to call for digitally signing all mail and
using said signatures to drive all spam white/blacklisting (imagine the
keyservice THAT would require and the gazillion dollar profits it would
generate) I have gradually started to wonder if the ietf has degenerated
into a kind of a cruel joke.

This RFC, however, lifts my spirits and renews my confidence that the
original luminaries that designed in the Internet have not fully stopped
glowing in the chaotic darkness that surrounds them.

Armed with the complete confidence that my design is based on both sound
protocol and Dr. D. Adams' valuable empirical observation about bad
news, I will start work on a PVM version that sets the Evil Bit right
away.  I fully expect to win a Nobel Prize from the proof that
communications are transluminal in the resulting cluster.  It must be
that the Evil Bit is somehow a time-reversal bit or a tachyonic bit --
Bad News must somehow propagate backwards in time from the event.

I most certainly will acknowledge all of the contributions of all you
"little people" when I receive my invitation to Stockholm.

I'm so happy.

Sniff.

   rgb

> 
> Graham
> 
> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> [...] The correct LIST conclusion is that
> for us to build transluminal clusters, we need to insure that all the
> messages (news) carried are bad.
> 
> Now, who is going to develop BMPI (Bad Message Passing Interface)?  Any
> volunteers?
> 
> ;-)
> [...]
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Feb 27 12:38:09 2004
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 27 Feb 2004 18:38:09 +0100 (CET)
Subject: [Beowulf] Flashmobcomputing
In-Reply-To: <Pine.LNX.4.44.0402270801240.6948-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0402271833190.9065-100000@druifje.clustervision.com>

On Fri, 27 Feb 2004, Robert G. Brown wrote:

> 
> Armed with the complete confidence that my design is based on both sound
> protocol and Dr. D. Adams' valuable empirical observation about bad
> news, I will start work on a PVM version that sets the Evil Bit right
> away.  I fully expect to win a Nobel Prize from the proof that
> communications are transluminal in the resulting cluster.  It must be
> that the Evil Bit is somehow a time-reversal bit or a tachyonic bit --
> Bad News must somehow propagate backwards in time from the event.
> 
Once this phase of the research has been completed,
can we make an application to the NSF for an extension into using
SEP fields for systems management?
http://www.fact-index.com/s/so/somebody_else_s_problem_field.html

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at comcast.net  Fri Feb 27 16:23:23 2004
From: laytonjb at comcast.net (Jeffrey B. Layton)
Date: Fri, 27 Feb 2004 16:23:23 -0500
Subject: [Beowulf] Single Processor vs SMP
In-Reply-To: <20040227092124.GA8410@blackTiger>
References: <20040227092124.GA8410@blackTiger>
Message-ID: <403FB54B.5030401@comcast.net>

Paulo,

   I hoping someone will jump and say that the answer depends
upon the code(s) you're running. If possible test your codes on
a dual CPU box with one copy running and then two copies
running (make sure one copy is on one CPU). Test this on the
architectures you are interested in. If you can also test on
multiple nodes with some kind of interconnect to judge how
the code(s) scale with number of nodes and with interconnect.
   For example, at work I use a code that we tested on single
and dual CPU machines. It was an older PIII/500 box that used
the old Intel 440BX chipset (if I remember correctly). We
found that running two copies only resulted in a 30% penalty
for running duals.
   We also tested on a cluster with Myrinet and GigE. Myrinet
only gave this code about a 2% decrease in wall clock time
(we measure speed in wall clock time since that is what is
important to us).
   Then we got quotes for machines and did the price/performance
calculation and determined which cluster was the best.
    I highly recommend doing the same thing for your code(s).
Be sure to check out Opterons since they have an interesting
memory subsytem that should allow your codes to have little
penalty in running on dual machines ("should" is the operative
word. You should test your codes to determine if this is true).

Good Luck!

Jeff


>Hello,
>
>I'm currently working in a physics department that is in the process of
>building a high performance Beowulf cluster and I have some doubts in
>terms of what type of hardware to acquire.
>
>The programming systems that will be used are MPI and HPF. Does anyone
>knows any study comparing the performance of single cpu machines vs smp
>machines or even between the several cpu's available (intel p4, amd athlon,
>powerpc g5, ...)?
>
>Thanks for any advice
>  
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From paulojjs at bragatel.pt  Fri Feb 27 04:21:24 2004
From: paulojjs at bragatel.pt (Paulo Silva)
Date: Fri, 27 Feb 2004 09:21:24 +0000
Subject: [Beowulf] Single Processor vs SMP
Message-ID: <20040227092124.GA8410@blackTiger>

Hello,

I'm currently working in a physics department that is in the process of
building a high performance Beowulf cluster and I have some doubts in
terms of what type of hardware to acquire.

The programming systems that will be used are MPI and HPF. Does anyone
knows any study comparing the performance of single cpu machines vs smp
machines or even between the several cpu's available (intel p4, amd athlon,
powerpc g5, ...)?

Thanks for any advice
-- 
Paulo Jorge Jesus Silva
perl -we 'print "paulojjs".reverse "\ntp.letagarb@"'

The best you get is an even break.
		-- Franklin Adams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20040227/74f316b3/attachment-0001.sig>

From ddw at dreamscape.com  Sat Feb 28 01:19:37 2004
From: ddw at dreamscape.com (Daniel Williams)
Date: Sat, 28 Feb 2004 01:19:37 -0500
Subject: [Beowulf] Flashmobcomputing - the evil bit
References: <200402271705.i1RH5vh16216@NewBlue.scyld.com>
Message-ID: <404032F8.F5DDDA59@dreamscape.com>

The problem with this idea is that Linux is too good at dealing with flawed or
malicious data, so even if the evil bit is set, it still would not qualify as
"bad news", and thus would not travel superluminally.  Consequently, I would
speculate that the only system that could communicate superluminally is one
running some form of Winblows, since *any* data, of *any* kind, with or
without the evil bit set is bad news for MS operating systems, and likely to
cause a crash.  The problem with superluminal cluster computing then becomes
obvious - you can't get any actual useful calculation done faster than
lightspeed, because the only operating systems that work at that speed can't
do any useful work.

DDW


> Hmm, presumably a 'bad' message will need to have the Evil Bit set
> (http://www.ietf.org/rfc/rfc3514.txt)?
> 
> Graham
> 
> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> [...] The correct LIST conclusion is that
> for us to build transluminal clusters, we need to insure that all the
> messages (news) carried are bad.
> 
> Now, who is going to develop BMPI (Bad Message Passing Interface)?  Any
> volunteers?
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf