From andrewxwang at yahoo.com.tw Sun Feb 1 00:39:40 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sun, 1 Feb 2004 13:39:40 +0800 (CST) Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 Message-ID: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html 2.6 looks very promising, wondering when distributions will include it. Also ia64 performance looks bad when compared to Xeon or amd64. Intel switching to amd64 is a good choice ;-> Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Sun Feb 1 05:40:34 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Sun, 1 Feb 2004 10:40:34 +0000 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> Message-ID: <20040201104034.GA9280@galactic.demon.co.uk> On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote: > http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html > > 2.6 looks very promising, wondering when distributions > will include it. > Debian unstable does today. The new installer for the next release of Debian (currently Debian testing) which is in beta test may well include a 2.6 kernel option. > Also ia64 performance looks bad when compared to Xeon > or amd64. Intel switching to amd64 is a good choice > ;-> > Newsflash: Severe weather means Hell freezes over, preventing flying pigs from taking off :) IIRC: Since you seem well aware of SPBS / storm - is the newest storm release fully free / GPL'd such that I can use it anywhere? Thanks, Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Sun Feb 1 05:57:37 2004 From: xyzzy at speakeasy.org (Trent Piepho) Date: Sun, 1 Feb 2004 02:57:37 -0800 (PST) Subject: [Beowulf] C vs C++ challenge In-Reply-To: <1075512676.4915.207.camel@protein.scalableinformatics.com> Message-ID: > I could easily optimize it more (do the work on a larger buffer at a > once), but I think enough waste heat has been created here. This is a > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3. Enough time wasted on finding different solutions to a simple problem? Surely not. Let me toss my hat into the ring: Awk Perl C My program (C) wrnpc10.txt 1.771 1.125 0.506 0.164 shaks12.txt 3.055 1.877 0.955 0.243 big.txt 20.339 12.792 5.858 1.196 vbig.txt 101.466 63.770 29.079 5.666 All times are from a dual PIII-1GHz on a ServerWorks board with 1GB dual channel PC133 ram. Each time is the best of three runs and is wall time. The awk version is by Selva Nair, Perl by Joe Landman, C version by Robert G Brown. The Java version isn't portable enough for me to run (go Java!) and I didn't see the source for a C++/STL version. Compiler used was gcc 2.96, awk was 3.1.0, and perl was 5.6.1. The actual results for shaks12.txt, which are of course never the same: version total unique awk 902299 31384 perl 23903 C 902299 37499 My 906912 27321 wc 901325 I considered words to be formed from 0-9, a-z, A-Z, and '. Everything is lower cased. The shaks12.txt is complicated by the use of the single quote for as both for quotations and for contractions. I also have the list of words and counts, sorted no less, but do not print it. I'll give you guys a few days, and see if anyone finds a solution before I reveal my secrets. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sun Feb 1 06:51:22 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sun, 1 Feb 2004 12:51:22 +0100 (CET) Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> Message-ID: On Sun, 1 Feb 2004, [big5] Andrew Wang wrote: > http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html > > 2.6 looks very promising, wondering when distributions > will include it. > It's already possible to use yum to get a 2.6 kernel for Fedora. (Must start testing it myself). This prompted me to look at the Fedora roadmap: http://fedora.redhat.com/participate/schedule/ Looks like 2.6 will be in Fedora 2, scheduled for April. And very interestingly: "and integrating work on other architectures (at least AMD64, and possibly also SPARC)." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From klamman.gard at telia.com Sun Feb 1 07:30:52 2004 From: klamman.gard at telia.com (Per Lindstrom) Date: Sun, 01 Feb 2004 13:30:52 +0100 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk> Message-ID: <401CF17C.8080706@telia.com> I have experienced some problems to compile SMP support for the 2.6.1-kernel on my Intel Xeon based workstation: M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz Chipset: Intel 7505 CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB Graph: GeForce FX 5200 128MB The SMP support works fine all the way up to kernel 2.4.22 but when there is stop for the XEON. The SMP support works fine for the Intel Tualatin workstation all the way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to build a 2.6.0. Please advice if some one have solved this problem. Best regards Per Lindstrom . . Andrew M.A. Cater wrote: >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote: > > >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html >> >>2.6 looks very promising, wondering when distributions >>will include it. >> >> >> >Debian unstable does today. The new installer for the next release >of Debian (currently Debian testing) which is in beta test may well >include a 2.6 kernel option. > > > >>Also ia64 performance looks bad when compared to Xeon >>or amd64. Intel switching to amd64 is a good choice >>;-> >> >> >> >Newsflash: Severe weather means Hell freezes over, preventing flying >pigs from taking off :) > >IIRC: Since you seem well aware of SPBS / storm - is the newest storm >release fully free / GPL'd such that I can use it anywhere? > >Thanks, > >Andy >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sun Feb 1 06:44:18 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sun, 1 Feb 2004 12:44:18 +0100 (CET) Subject: [Beowulf] HVAC and room cooling... In-Reply-To: <401C253E.9040206@obs.unige.ch> Message-ID: On Sat, 31 Jan 2004, Pfenniger Daniel wrote: > > Note that in the responded message John was confusing N2 and NO2. Eeek! I am outed as a physicist... I've come out of the lab (closet). Guess I can now wear a slide rule with pride. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sun Feb 1 12:50:44 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sun, 1 Feb 2004 12:50:44 -0500 (EST) Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <401CF17C.8080706@telia.com> Message-ID: > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz > Chipset: Intel 7505 > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB all extremely mundane and FULLY supported. > Graph: GeForce FX 5200 128MB bzzt. take it out, try again. don't even *think* about loading the binary nvidia driver. > The SMP support works fine all the way up to kernel 2.4.22 but when > there is stop for the XEON. needless to say, 2.6 has been extensively tested on xeons, and it works fine. your problem is specific to your config. if you want help, you'll have to start by describing how it fails. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From shaeffer at neuralscape.com Sun Feb 1 13:17:15 2004 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Sun, 1 Feb 2004 10:17:15 -0800 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <401CF17C.8080706@telia.com> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk> <401CF17C.8080706@telia.com> Message-ID: <20040201181715.GB8159@synapse.neuralscape.com> On Sun, Feb 01, 2004 at 01:30:52PM +0100, Per Lindstrom wrote: > I have experienced some problems to compile SMP support for the > 2.6.1-kernel on my Intel Xeon based workstation: > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz > Chipset: Intel 7505 > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB > Graph: GeForce FX 5200 128MB > > The SMP support works fine all the way up to kernel 2.4.22 but when > there is stop for the XEON. I am compiling linux-2.6.2-rc2 on dual XEONs with no problems. The kernels actually run too. I'm just starting performance testing, but results are very promising. Thanks, Karen > > The SMP support works fine for the Intel Tualatin workstation all the > way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to > build a 2.6.0. > > Please advice if some one have solved this problem. > > Best regards > Per Lindstrom > . > . > Andrew M.A. Cater wrote: > > >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote: > > > > > >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html > >> > >>2.6 looks very promising, wondering when distributions > >>will include it. > >> > >> > >> > >Debian unstable does today. The new installer for the next release > >of Debian (currently Debian testing) which is in beta test may well > >include a 2.6 kernel option. > > > > > > > >>Also ia64 performance looks bad when compared to Xeon > >>or amd64. Intel switching to amd64 is a good choice > >>;-> > >> > >> > >> > >Newsflash: Severe weather means Hell freezes over, preventing flying > >pigs from taking off :) > > > >IIRC: Since you seem well aware of SPBS / storm - is the newest storm > >release fully free / GPL'd such that I can use it anywhere? > > > >Thanks, > > > >Andy > >_______________________________________________ > >Beowulf mailing list, Beowulf at beowulf.org > >To change your subscription (digest mode or unsubscribe) visit > >http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf ---end quoted text--- -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 shaeffer at neuralscape.com http://www.neuralscape.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From poobah_99 at hotmail.com Sun Feb 1 14:24:03 2004 From: poobah_99 at hotmail.com (Ryan Kastrukoff) Date: Sun, 01 Feb 2004 11:24:03 -0800 Subject: [Beowulf] unsubscribe universe beowulf@beowulf.org Message-ID: _________________________________________________________________ The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail http://join.msn.com/?page=dept/bcomm&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Sun Feb 1 14:33:03 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Sun, 01 Feb 2004 14:33:03 -0500 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk> Message-ID: <401D546F.7090109@scalableinformatics.com> Andrew M.A. Cater wrote: > >>Also ia64 performance looks bad when compared to Xeon >>or amd64. Intel switching to amd64 is a good choice >>;-> >> >> >> >Newsflash: Severe weather means Hell freezes over, preventing flying >pigs from taking off :) > > Note: http://www.hometownvalue.com/hell.htm which is zip code 48169 According to weather.com, this zip code is about 27 F right now. As 32 F is officially "freezing over", we can with all accuracy note that indeed, Hell (MI) has frozen over. Note 2: It was quite a bit colder last week and up to yesterday where southeast Michigan was hovering in the low negative/positive single digits in degrees F. We shouldn't complain as the folks in Minnesota have not seen the high side of 0 very much recently. As for the aerodynamic porcine units, you are on your own. Joe _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From toon at moene.indiv.nluug.nl Sun Feb 1 10:37:37 2004 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Sun, 01 Feb 2004 16:37:37 +0100 Subject: [Beowulf] HVAC and room cooling... In-Reply-To: <401C2C97.8020903@tamu.edu> References: <401BE891.708@obs.unige.ch> <401C0807.4000209@telia.com> <401C253E.9040206@obs.unige.ch> <401C2C97.8020903@tamu.edu> Message-ID: <401D1D41.8090709@moene.indiv.nluug.nl> Gerry Creager (N5JXS) wrote: > That's the end of gas exchange physiology I. There will be a short quiz > Monday. We'll continue with the next module. I encourage everyone to > have read the Pulmonary Medicine chapters in Harrison's for the next > lecture. Hmmm, I won't hold my breath on that one :-) -- Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Feb 1 15:53:54 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 1 Feb 2004 15:53:54 -0500 (EST) Subject: [Beowulf] HVAC and room cooling... In-Reply-To: <401D1D41.8090709@moene.indiv.nluug.nl> Message-ID: On Sun, 1 Feb 2004, Toon Moene wrote: > Gerry Creager (N5JXS) wrote: > > > That's the end of gas exchange physiology I. There will be a short quiz > > Monday. We'll continue with the next module. I encourage everyone to > > have read the Pulmonary Medicine chapters in Harrison's for the next > > lecture. > > Hmmm, I won't hold my breath on that one :-) Careful or I'll beat you with John's slide rule (what kinda physicist uses a slide rule for anything other than a blunt instrument?;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Feb 1 21:35:43 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Mon, 2 Feb 2004 10:35:43 +0800 (CST) Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk> Message-ID: <20040202023543.11015.qmail@web16807.mail.tpe.yahoo.com> --- "Andrew M.A. Cater" > IIRC: Since you seem well aware of SPBS / storm - is > the newest storm > release fully free / GPL'd such that I can use it > anywhere? They now call it "torque", not sure when they are going to get a new name again :( Not sure what you mean by "use it anywhere". You can use SPBS (yes, I like this name better) in commerical environments. If you make modifications to SPBS, you need to provide the source code for download. If you want to modify the source, and sell it as a product, you may want to use SGE. AFAIK, SGE uses a license similar to the BSD, while OpenPBS uses a license similar to GPL. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Mon Feb 2 05:19:30 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Mon, 02 Feb 2004 11:19:30 +0100 Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: <1075566655.2560.8.camel@loiosh> (agrajag@dragaera.net's message of "31 Jan 2004 11:30:56 -0500") References: <1075566655.2560.8.camel@loiosh> Message-ID: Jag writes: > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > >> NIS works fine for many purposes as well, but be warned -- in certain >> configurations and for certain tasks it becomes a very high overhead >> protocol. In particular, it adds an NIS hit to every file stat, for >> example, so that it can check groups and permissions. > > A good way around this is to run nscd (Name Services Caching Daemon). I'm really, really suspicious against nscd. I've more than once seen it hang on to stale information forever for no good reason at all. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Mon Feb 2 07:45:05 2004 From: bclem at rice.edu (Brent M. Clements) Date: Mon, 2 Feb 2004 06:45:05 -0600 (CST) Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: References: <1075566655.2560.8.camel@loiosh> Message-ID: Nscd is a necessary evil sometimes though. -B Brent Clements Linux Technology Specialist Information Technology Rice University On Mon, 2 Feb 2004, Leif Nixon wrote: > Jag writes: > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > > > >> NIS works fine for many purposes as well, but be warned -- in certain > >> configurations and for certain tasks it becomes a very high overhead > >> protocol. In particular, it adds an NIS hit to every file stat, for > >> example, so that it can check groups and permissions. > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > I'm really, really suspicious against nscd. I've more than once seen > it hang on to stale information forever for no good reason at all. > > -- > Leif Nixon Systems expert > ------------------------------------------------------------ > National Supercomputer Centre Linkoping University > ------------------------------------------------------------ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Mon Feb 2 10:32:01 2004 From: timm at fnal.gov (Steven Timm) Date: Mon, 2 Feb 2004 09:32:01 -0600 (CST) Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: <1075730850.3936.19.camel@protein.scalableinformatics.com> References: <1075566655.2560.8.camel@loiosh> <1075730850.3936.19.camel@protein.scalableinformatics.com> Message-ID: On Mon, 2 Feb 2004, Joe Landman wrote: > I have tried to avoid NIS on linux, as it appears not to be as stable as > needed under heavy load. I have had customers bring it crashing down > when it serves login information, just by running simple scripts across > the cluster. To clarify, the problem is when there is some cron job (or reboot) in which a couple of hundred nodes all go after the NIS server at once. It's magnified by the fact that there's an NIS lookup done even when it's a user in the local password file such as root. The problems can be mitigated by having a lot of nodes be slaves. At one point I had all of the nodes of my cluster be slaves. But the problem with that is that the transmission protocol is not perfect and every once in a while you wind up with a slave server that is down a map or two. We've now shifted to pushing out our password files via rsync. > > I prefer pushing name service lookups through DNS, and I tend to use > dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html). > Setting up a full blown named/bind system for a cluster seems like > significant overkill in most cases. > > On the authentication side, I had high hopes for LDAP, but haven't been > able to easily/repeatably make a working LDAP server with databases. I > am starting to think more along the lines of a simple database with pam > modules on the frontend. See > http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or > http://sourceforge.net/projects/pam-mysql/ for examples. Our set of kerberos 5 kdc's have thus far been able to handle the load of some 1500 nodes with more still coming. Plus then we have no real passwords in the passwd file and thus the security issues of distributing it are much less critical. Steve Timm > > > > On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote: > > Nscd is a necessary evil sometimes though. > > > > -B > > > > Brent Clements > > Linux Technology Specialist > > Information Technology > > Rice University > > > > > > On Mon, 2 Feb 2004, Leif Nixon wrote: > > > > > Jag writes: > > > > > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > > > > > > > >> NIS works fine for many purposes as well, but be warned -- in certain > > > >> configurations and for certain tasks it becomes a very high overhead > > > >> protocol. In particular, it adds an NIS hit to every file stat, for > > > >> example, so that it can check groups and permissions. > > > > > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > > > > > I'm really, really suspicious against nscd. I've more than once seen > > > it hang on to stale information forever for no good reason at all. > > > > > > -- > > > Leif Nixon Systems expert > > > ------------------------------------------------------------ > > > National Supercomputer Centre Linkoping University > > > ------------------------------------------------------------ > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Feb 2 09:07:30 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 02 Feb 2004 09:07:30 -0500 Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: References: <1075566655.2560.8.camel@loiosh> Message-ID: <1075730850.3936.19.camel@protein.scalableinformatics.com> I have tried to avoid NIS on linux, as it appears not to be as stable as needed under heavy load. I have had customers bring it crashing down when it serves login information, just by running simple scripts across the cluster. I prefer pushing name service lookups through DNS, and I tend to use dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html). Setting up a full blown named/bind system for a cluster seems like significant overkill in most cases. On the authentication side, I had high hopes for LDAP, but haven't been able to easily/repeatably make a working LDAP server with databases. I am starting to think more along the lines of a simple database with pam modules on the frontend. See http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or http://sourceforge.net/projects/pam-mysql/ for examples. On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote: > Nscd is a necessary evil sometimes though. > > -B > > Brent Clements > Linux Technology Specialist > Information Technology > Rice University > > > On Mon, 2 Feb 2004, Leif Nixon wrote: > > > Jag writes: > > > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > > > > > >> NIS works fine for many purposes as well, but be warned -- in certain > > >> configurations and for certain tasks it becomes a very high overhead > > >> protocol. In particular, it adds an NIS hit to every file stat, for > > >> example, so that it can check groups and permissions. > > > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > > > I'm really, really suspicious against nscd. I've more than once seen > > it hang on to stale information forever for no good reason at all. > > > > -- > > Leif Nixon Systems expert > > ------------------------------------------------------------ > > National Supercomputer Centre Linkoping University > > ------------------------------------------------------------ > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Mon Feb 2 09:24:25 2004 From: bclem at rice.edu (Brent M. Clements) Date: Mon, 2 Feb 2004 08:24:25 -0600 (CST) Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: <1075730850.3936.19.camel@protein.scalableinformatics.com> References: <1075566655.2560.8.camel@loiosh> <1075730850.3936.19.camel@protein.scalableinformatics.com> Message-ID: We use ldap extensively here on all of our clusters that IT maintains. We like it because it allows great flexibility if we need to write web based account management systems for groups on campus. LDAP is actually very very easy to implement, especially if you use redhat as your distribution. We use redhat mostly exclusive here so our setup and configuration for ldap is pretty cookie-cutter. -Brent Brent Clements Linux Technology Specialist Information Technology Rice University On Mon, 2 Feb 2004, Joe Landman wrote: > I have tried to avoid NIS on linux, as it appears not to be as stable as > needed under heavy load. I have had customers bring it crashing down > when it serves login information, just by running simple scripts across > the cluster. > > I prefer pushing name service lookups through DNS, and I tend to use > dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html). > Setting up a full blown named/bind system for a cluster seems like > significant overkill in most cases. > > On the authentication side, I had high hopes for LDAP, but haven't been > able to easily/repeatably make a working LDAP server with databases. I > am starting to think more along the lines of a simple database with pam > modules on the frontend. See > http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or > http://sourceforge.net/projects/pam-mysql/ for examples. > > > > On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote: > > Nscd is a necessary evil sometimes though. > > > > -B > > > > Brent Clements > > Linux Technology Specialist > > Information Technology > > Rice University > > > > > > On Mon, 2 Feb 2004, Leif Nixon wrote: > > > > > Jag writes: > > > > > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > > > > > > > >> NIS works fine for many purposes as well, but be warned -- in certain > > > >> configurations and for certain tasks it becomes a very high overhead > > > >> protocol. In particular, it adds an NIS hit to every file stat, for > > > >> example, so that it can check groups and permissions. > > > > > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > > > > > I'm really, really suspicious against nscd. I've more than once seen > > > it hang on to stale information forever for no good reason at all. > > > > > > -- > > > Leif Nixon Systems expert > > > ------------------------------------------------------------ > > > National Supercomputer Centre Linkoping University > > > ------------------------------------------------------------ > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Feb 2 09:29:49 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 02 Feb 2004 09:29:49 -0500 Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: References: <1075566655.2560.8.camel@loiosh> <1075730850.3936.19.camel@protein.scalableinformatics.com> Message-ID: <1075732189.3936.28.camel@protein.scalableinformatics.com> On Mon, 2004-02-02 at 09:24, Brent M. Clements wrote: > We use ldap extensively here on all of our clusters that IT maintains. We > like it because it allows great flexibility if we need to write web > based account management systems for groups on campus. LDAP is actually > very very easy to implement, especially if you use redhat as your > distribution. We use redhat mostly exclusive here so our setup and > configuration for ldap is pretty cookie-cutter. I know the clients are rather easy, it is setting up the server that I found somewhat difficult. I did go through the howto's, used the RH packages. Had some issues I could not find resolution to. This was about a year ago. I have a nice LDAP server set up with a completely read-only database now. I haven't been able to convince it to let clients write (e.g. password and other changes). Not sure what I am doing wrong, relatively sure it is pilot error. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jonbernard at uab.edu Mon Feb 2 11:46:21 2004 From: jonbernard at uab.edu (Jon B Bernard) Date: Mon, 2 Feb 2004 10:46:21 -0600 Subject: [Beowulf] HVAC and room cooling... Message-ID: <92E49C92F9CDBF4EA106E2E7154938830202B1F3@UABEXMB1.ad.uab.edu> The American Society of Heating, Refrigerating and Air-Conditioning Engineers (www.ashrae.org) has just released "Thermal Guidelines for Data Processing Environments". It looks like there's also a summary available in the January issue of their journal, or online for $8. Jon -----Original Message----- From: Brent M. Clements [mailto:bclem at rice.edu] Sent: Friday, January 30, 2004 11:18 PM To: rossini at u.washington.edu Cc: John Bushnell; beowulf at beowulf.org Subject: Re: [Beowulf] HVAC and room cooling... I have found that the best thing to do is outsource the colocation of your equipment. The cost of installing and maintaining the proper type of cooling and ventilation for mid-large size clusters costs more than to colocate. We are currently exploring placing our larger clusters in colocation facilities right now. The only downside that we have is that we can't find colocation facilities that will give us 24/7 physical access to our equipment. As you all know...researchers push beowulf hardware to the limits and the meantime to failure is higher. -B Brent Clements Linux Technology Specialist Information Technology Rice University On Fri, 30 Jan 2004, A.J. Rossini wrote: > John Bushnell writes: > > > (So many watts) times 'x' equals how many "tons" of AC. Multiply > > by at least two of course ;-) > > Or 3, sigh... > > >>Also, does anyone have any brilliant thoughts for cooling an internal > >>room that can't affordably get chilled water? (I've been suggesting > >>to people that it isn't possible, but someone brought up "portable > >>liquid nitrogen" -- for the room, NOT for overclocking -- I'm trying > >>to get stable systems, not instability :-). > > > > You can have an external heat exchanger. If you are lucky and are, > > say, on the first floor somewhere close to an external wall, it is > > pretty simple to run a small pipe between the internal AC and the > > heat exchanger outside. Don't know how far it is practical to run > > one though. We have one in our computer room, but it is only six > > feet or so from the exchanger outside. Our newer AC runs on chilled > > water which was quoted for a lot less than another inside/outside > > combo, but we already had a leftover chilled water supply in the > > computer room. > > I've looked at the chilled-water approach. They estimated between > $40k-$80k. oops (this room is REALLY in the middle of the building. > Great for other computing purposes, but not for cooling). > > I'm looking for the proverbial vent-free A/C. Sort of like > frictionless tables and similar devices I recall from undergraduate > physics... > > Thanks for the comments! > > best, > -tony > > -- > rossini at u.washington.edu http://www.analytics.washington.edu/ > Biomedical and Health Informatics University of Washington > Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center > UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable > FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email > > CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be > confidential and privileged. If you received this message in error, > please destroy it and notify the sender. Thank you. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Eckhoff.Peter at epamail.epa.gov Mon Feb 2 16:27:25 2004 From: Eckhoff.Peter at epamail.epa.gov (Eckhoff.Peter at epamail.epa.gov) Date: Mon, 02 Feb 2004 16:27:25 -0500 Subject: [Beowulf] Re: HVAC and room cooling... Message-ID: We have 3 - 16 hard drive file servers, 13 compute nodes and a master unit. We had to spread the load from 3 to 4 - 20 Amp circuits to keep from popping circuit breakers. We have AC coming into an interior room and experienced several problems. Problem 1: There was no adequate exhaust system. 5 active vents in , 1 passive vent out and in the wrong location. Solution: We substituted in several grates in place of acoustic tiles. The heat is vented up into the plenum above. There are fans atop the rack venting the interior of the rack into one of the grates above. The other heat follows. Problem 2: What do you do when the AC stops? Maintenance and the occasional AC system oops can be devastating to a cluster in a small room. Solution 2a: We are tied directly into a security system. When a sensor in the room reaches a temperature level, "Security" responds dependent upon the level detected. Solution 2b: We installed a backup automated telephone dialer. Not that we don't trust "Security", but we wanted a backup to let us know what was going on. When the temperature reaches a certain level, the phone dials us with an automated message: " This is the Sensaphone 1108. The time is 1:36 AM and ... [ ed. your CPUs are about to fry... Have a nice night!!!" ;-) ] Solution 2c: Install a thermal sensor into a serial or tcp/ip socket. Some vendors have software that read these sensors and will shut down the machines. We are still working on our system. Others' experiences and solutions are welcomed. We are using dual Tyan motherboards with dual AMD MP processors. Good luck!! Peter ******************************************* Peter Eckhoff Environmental Scientist U.S. Environmental Protection Agency 4930 Page Road, D243-01 Research Triangle Park, NC 27709 Tel: (919) 541-5385 Fax: (919) 541-0044 E-mail: eckhoff.peter at epa.gov Website: www.epa.gov/scram001 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Feb 2 19:56:33 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 02 Feb 2004 16:56:33 -0800 Subject: [Beowulf] Re: HVAC and room cooling... Message-ID: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov> At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote: >Problem 2: What do you do when the AC stops? Maintenance and the >occasional AC system oops can be devastating to a cluster in a small room. > >Solution 2a: We are tied directly into a security system. When a >sensor in the room reaches a temperature level, "Security" responds >dependent upon the >level detected. > >Solution 2b: We installed a backup automated telephone dialer. Not >that we don't trust "Security", but we wanted a backup to let us know what was >going on. > When the temperature reaches a certain level, the phone dials us with >an > automated message: > " This is the Sensaphone 1108. The time is 1:36 AM and ... > [ ed. your CPUs are about to fry... Have a nice night!!!" ;-) ] YOu need to seriously consider a "failsafe" totally automated shutdown (as in chop the power when temperature gets to, say, 40C, in the room)... Security might be busy (maybe there was a big problem with the chiller plant catching fire or the boiler exploding.. if they're directing fire engine traffic, the last thing they're going to be thinking about is going over to your machine room and shutting down your hardware. The autodialer is nice, but, what if you're out of town when the balloon goes up? A simple temperature sensor with a contact closure wired into the "shunt trip" on your power distribution will work quite nicely as a "kill it before it melts". Sure, the file system will be corrupted, and so forth, but, at least, you'll have functioning hardware to rebuild it on. Automated monitoring and tcp sockets are nice for management in the day to day situation, ideal for answering questions like: Should we get another fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if there's a DDoS attack on someone near you, and netops decides to shut down the router. What if all those Windows desktops run amok, sending mass emails to each other or trying to remotely manage each other's IIS, bringing the network to a grinding halt. The upshot is: Do not trust computers to save your computers in the ultimate extreme. Have a totally separate, bulletproof system. It's cheap, it's reliable, all that stuff. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Tue Feb 3 08:40:25 2004 From: gropp at mcs.anl.gov (William Gropp) Date: Tue, 03 Feb 2004 07:40:25 -0600 Subject: [Beowulf] O'Reilly's _Building Linux Clusters_ In-Reply-To: <20040203125618.GA6026@mikee.ath.cx> References: <20040203125618.GA6026@mikee.ath.cx> Message-ID: <6.0.0.22.2.20040203073727.02614538@localhost> At 06:56 AM 2/3/2004, Mike Eggleston wrote: >This book from 2000 discusses building clusters from linux. I >bought it from a discount store not because I'm going to build >another cluster from linux, but rather because of the discussions >on cluster management. Has anyone read/implemented his approach? >What other cluster management techniques/solutions are out there? Beowulf Cluster Computing With Linux, 2nd edition (MIT Press) includes chapters on cluster setup and cluster management (new in the 2nd edition). Disclaimer: I'm one of the editors of this book. Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Tue Feb 3 09:05:07 2004 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Tue, 3 Feb 2004 08:05:07 -0600 Subject: [Beowulf] O'Reilly's _Building Linux Clusters_ In-Reply-To: <6.0.0.22.2.20040203073727.02614538@localhost> References: <20040203125618.GA6026@mikee.ath.cx> <6.0.0.22.2.20040203073727.02614538@localhost> Message-ID: <20040203140507.GB6026@mikee.ath.cx> On Tue, 03 Feb 2004, William Gropp wrote: > At 06:56 AM 2/3/2004, Mike Eggleston wrote: > >This book from 2000 discusses building clusters from linux. I > >bought it from a discount store not because I'm going to build > >another cluster from linux, but rather because of the discussions > >on cluster management. Has anyone read/implemented his approach? > >What other cluster management techniques/solutions are out there? > > Beowulf Cluster Computing With Linux, 2nd edition (MIT Press) includes > chapters on cluster setup and cluster management (new in the 2nd > edition). Disclaimer: I'm one of the editors of this book. > > Bill > > I have the 1st edition and it does have a chapter discussing some of the management. How would this method scale to managing a (not really a cluster) group of AIX servers? Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Eckhoff.Peter at epamail.epa.gov Tue Feb 3 09:26:37 2004 From: Eckhoff.Peter at epamail.epa.gov (Eckhoff.Peter at epamail.epa.gov) Date: Tue, 03 Feb 2004 09:26:37 -0500 Subject: [Beowulf] Re: HVAC and room cooling... Message-ID: Hello Jim The main goal for us is to stay up and running as long as we can. (Please read the last paragraph before responding to this one:) Most of our temperature problems have been caused by AC maintenance induced temperature spikes. Having "security" open the doors slows the room heating process. The Sensaphone call to us helps us to know that there is a problem and we can phone in to be briefed. "Do we have to come in or has the room already begun to cool?" The last of the Solutions is for just the type of incident that you describe. These are very rare but like you say, they need to be planned for. Our ideal goal would be one that signals a problem to the cluster. The cluster takes the signal and gracefully shuts down the programs and then shuts down the nodes. We did not find such a solution on the commercial market for our "came with the room" UPS. Instead we found a sensor/software combination where the sensor ties into the serial port of one of the nodes. So far we **have** been able to gracefully shut down the programs that are running. We have **not** found a way to automatically turn off the various cluster nodes. That's where we need some help/suggestions. ******************************************* Peter Eckhoff Environmental Scientist U.S. Environmental Protection Agency 4930 Page Road, D243-01 Research Triangle Park, NC 27709 Tel: (919) 541-5385 Fax: (919) 541-0044 E-mail: eckhoff.peter at epa.gov Website: www.epa.gov/scram001 Jim Lux cc: Subject: Re: [Beowulf] Re: HVAC and room cooling... 02/02/04 07:56 PM At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote: >Problem 2: What do you do when the AC stops? Maintenance and the >occasional AC system oops can be devastating to a cluster in a small room. > >Solution 2a: We are tied directly into a security system. When a >sensor in the room reaches a temperature level, "Security" responds >dependent upon the >level detected. > >Solution 2b: We installed a backup automated telephone dialer. Not >that we don't trust "Security", but we wanted a backup to let us know what was >going on. > When the temperature reaches a certain level, the phone dials us with >an > automated message: > " This is the Sensaphone 1108. The time is 1:36 AM and ... > [ ed. your CPUs are about to fry... Have a nice night!!!" ;-) ] YOu need to seriously consider a "failsafe" totally automated shutdown (as in chop the power when temperature gets to, say, 40C, in the room)... Security might be busy (maybe there was a big problem with the chiller plant catching fire or the boiler exploding.. if they're directing fire engine traffic, the last thing they're going to be thinking about is going over to your machine room and shutting down your hardware. The autodialer is nice, but, what if you're out of town when the balloon goes up? A simple temperature sensor with a contact closure wired into the "shunt trip" on your power distribution will work quite nicely as a "kill it before it melts". Sure, the file system will be corrupted, and so forth, but, at least, you'll have functioning hardware to rebuild it on. Automated monitoring and tcp sockets are nice for management in the day to day situation, ideal for answering questions like: Should we get another fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if there's a DDoS attack on someone near you, and netops decides to shut down the router. What if all those Windows desktops run amok, sending mass emails to each other or trying to remotely manage each other's IIS, bringing the network to a grinding halt. The upshot is: Do not trust computers to save your computers in the ultimate extreme. Have a totally separate, bulletproof system. It's cheap, it's reliable, all that stuff. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From grid at iki.fi Tue Feb 3 09:26:53 2004 From: grid at iki.fi (Michael Kustaa Gindonis) Date: Tue, 3 Feb 2004 16:26:53 +0200 Subject: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset In-Reply-To: <200402021546.i12Fk4h24131@NewBlue.scyld.com> References: <200402021546.i12Fk4h24131@NewBlue.scyld.com> Message-ID: <200402031626.53453.grid@iki.fi> Hi, I noticed in the Linux kernel configuration that there is support for LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this. Do any readers of this list have any experiences in this area? Knowledge about LSI's plans to support this chipset in the future? ... Mike On Monday 02 February 2004 17:46, beowulf-request at scyld.com wrote: > Send Beowulf mailing list submissions to > beowulf at beowulf.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.beowulf.org/mailman/listinfo/beowulf > or, via email, send a message with subject or body 'help' to > beowulf-request at beowulf.org > > You can reach the person managing the list at > beowulf-admin at beowulf.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Beowulf digest..." > > > Today's Topics: > > 1. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Mark Hahn) > 2. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Karen Shaeffer) > 3. unsubscribe universe beowulf at beowulf.org (Ryan Kastrukoff) > 4. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Joe Landman) > 5. Re: HVAC and room cooling... (Toon Moene) > 6. Re: HVAC and room cooling... (Robert G. Brown) > 7. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (=?big5?q?Andrew=20Wang?=) > 8. Re: Authentication within beowulf clusters. (Leif Nixon) > 9. Re: Authentication within beowulf clusters. (Brent M. Clements) > 10. Re: Authentication within beowulf clusters. (Joe Landman) > 11. Re: Authentication within beowulf clusters. (Brent M. Clements) > 12. Re: Authentication within beowulf clusters. (Joe Landman) > 13. Re: Authentication within beowulf clusters. (Steven Timm) > > --__--__-- > > Message: 1 > Date: Sun, 1 Feb 2004 12:50:44 -0500 (EST) > From: Mark Hahn > To: Per Lindstrom > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 > > > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz > > Chipset: Intel 7505 > > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB > > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB > > all extremely mundane and FULLY supported. > > > Graph: GeForce FX 5200 128MB > > bzzt. take it out, try again. don't even *think* about loading the > binary nvidia driver. > > > The SMP support works fine all the way up to kernel 2.4.22 but when > > there is stop for the XEON. > > needless to say, 2.6 has been extensively tested on xeons, and it works > fine. your problem is specific to your config. > > if you want help, you'll have to start by describing how it fails. > > > --__--__-- > > Message: 2 > Date: Sun, 1 Feb 2004 10:17:15 -0800 > From: Karen Shaeffer > To: Per Lindstrom > Cc: "Andrew M.A. Cater" , > beowulf at beowulf.org Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs > amd64 > > On Sun, Feb 01, 2004 at 01:30:52PM +0100, Per Lindstrom wrote: > > I have experienced some problems to compile SMP support for the > > 2.6.1-kernel on my Intel Xeon based workstation: > > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz > > Chipset: Intel 7505 > > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB > > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB > > Graph: GeForce FX 5200 128MB > > > > The SMP support works fine all the way up to kernel 2.4.22 but when > > there is stop for the XEON. > > I am compiling linux-2.6.2-rc2 on dual XEONs with no problems. The kernels > actually run too. I'm just starting performance testing, but results are > very promising. > > Thanks, > Karen > > > The SMP support works fine for the Intel Tualatin workstation all the > > way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to > > build a 2.6.0. > > > > Please advice if some one have solved this problem. > > > > Best regards > > Per Lindstrom > > . > > . > > > > Andrew M.A. Cater wrote: > > >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote: > > >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html > > >> > > >>2.6 looks very promising, wondering when distributions > > >>will include it. > > > > > >Debian unstable does today. The new installer for the next release > > >of Debian (currently Debian testing) which is in beta test may well > > >include a 2.6 kernel option. > > > > > >>Also ia64 performance looks bad when compared to Xeon > > >>or amd64. Intel switching to amd64 is a good choice > > >>;-> > > > > > >Newsflash: Severe weather means Hell freezes over, preventing flying > > >pigs from taking off :) > > > > > >IIRC: Since you seem well aware of SPBS / storm - is the newest storm > > >release fully free / GPL'd such that I can use it anywhere? > > > > > >Thanks, > > > > > >Andy > > >_______________________________________________ > > >Beowulf mailing list, Beowulf at beowulf.org > > >To change your subscription (digest mode or unsubscribe) visit > > >http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > ---end quoted text--- > > -- > Karen Shaeffer > Neuralscape, Palo Alto, Ca. 94306 > shaeffer at neuralscape.com http://www.neuralscape.com > > --__--__-- > > Message: 3 > From: "Ryan Kastrukoff" > To: beowulf at beowulf.org > Date: Sun, 01 Feb 2004 11:24:03 -0800 > Subject: [Beowulf] unsubscribe universe beowulf at beowulf.org > > > > _________________________________________________________________ > The new MSN 8: smart spam protection and 2 months FREE* > http://join.msn.com/?page=features/junkmail > http://join.msn.com/?page=dept/bcomm&pgmarket=en-ca&RU=http%3a%2f%2fjoin.ms >n.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca > > > --__--__-- > > Message: 4 > Date: Sun, 01 Feb 2004 14:33:03 -0500 > From: Joe Landman > To: "Andrew M.A. Cater" > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 > > Andrew M.A. Cater wrote: > >>Also ia64 performance looks bad when compared to Xeon > >>or amd64. Intel switching to amd64 is a good choice > >>;-> > > > >Newsflash: Severe weather means Hell freezes over, preventing flying > >pigs from taking off :) > > Note: http://www.hometownvalue.com/hell.htm which is zip code 48169 > According to weather.com, this zip code is about 27 F right now. As 32 > F is officially "freezing over", we can with all accuracy note that > indeed, Hell (MI) has frozen over. > > Note 2: It was quite a bit colder last week and up to yesterday where > southeast Michigan was hovering in the low negative/positive single > digits in degrees F. We shouldn't complain as the folks in Minnesota > have not seen the high side of 0 very much recently. > > As for the aerodynamic porcine units, you are on your own. > > Joe > > --__--__-- > > Message: 5 > Date: Sun, 01 Feb 2004 16:37:37 +0100 > From: Toon Moene > Organization: Moene Computational Physics, Maartensdijk, The Netherlands > To: gerry.creager at tamu.edu > CC: Pfenniger Daniel , > Per Lindstrom , > John Hearns , rossini at u.washington.edu, > beowulf at beowulf.org > Subject: Re: [Beowulf] HVAC and room cooling... > > Gerry Creager (N5JXS) wrote: > > That's the end of gas exchange physiology I. There will be a short quiz > > Monday. We'll continue with the next module. I encourage everyone to > > have read the Pulmonary Medicine chapters in Harrison's for the next > > lecture. > > Hmmm, I won't hold my breath on that one :-) > > -- > Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html > GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction) > > > --__--__-- > > Message: 6 > Date: Sun, 1 Feb 2004 15:53:54 -0500 (EST) > From: "Robert G. Brown" > To: Toon Moene > Cc: gerry.creager at tamu.edu, Pfenniger Daniel > , Per Lindstrom , > John Hearns , , > > Subject: Re: [Beowulf] HVAC and room cooling... > > On Sun, 1 Feb 2004, Toon Moene wrote: > > Gerry Creager (N5JXS) wrote: > > > That's the end of gas exchange physiology I. There will be a short > > > quiz Monday. We'll continue with the next module. I encourage > > > everyone to have read the Pulmonary Medicine chapters in Harrison's for > > > the next lecture. > > > > Hmmm, I won't hold my breath on that one :-) > > Careful or I'll beat you with John's slide rule (what kinda physicist > uses a slide rule for anything other than a blunt instrument?;-) > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > > --__--__-- > > Message: 7 > Date: Mon, 2 Feb 2004 10:35:43 +0800 (CST) > From: =?big5?q?Andrew=20Wang?= > Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 > To: beowulf at beowulf.org > > --- "Andrew M.A. Cater" > > > IIRC: Since you seem well aware of SPBS / storm - is > > the newest storm > > release fully free / GPL'd such that I can use it > > anywhere? > > They now call it "torque", not sure when they are > going to get a new name again :( > > Not sure what you mean by "use it anywhere". You can > use SPBS (yes, I like this name better) in commerical > environments. If you make modifications to SPBS, you > need to provide the source code for download. > > If you want to modify the source, and sell it as a > product, you may want to use SGE. > > AFAIK, SGE uses a license similar to the BSD, while > OpenPBS uses a license similar to GPL. > > Andrew. > > > ----------------------------------------------------------------- > ?C???? Yahoo!?_?? > ?????C???B?????????B?R?A???????A???b?H?????? > http://tw.promo.yahoo.com/mail_premium/stationery.html > > --__--__-- > > Message: 8 > To: Beowulf Mailing List > Subject: Re: [Beowulf] Authentication within beowulf clusters. > From: Leif Nixon > Date: Mon, 02 Feb 2004 11:19:30 +0100 > > Jag writes: > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > >> NIS works fine for many purposes as well, but be warned -- in certain > >> configurations and for certain tasks it becomes a very high overhead > >> protocol. In particular, it adds an NIS hit to every file stat, for > >> example, so that it can check groups and permissions. > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > I'm really, really suspicious against nscd. I've more than once seen > it hang on to stale information forever for no good reason at all. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Tue Feb 3 04:21:24 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Tue, 03 Feb 2004 10:21:24 +0100 Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov> (Jim Lux's message of "Mon, 02 Feb 2004 16:56:33 -0800") References: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov> Message-ID: Jim Lux writes: > YOu need to seriously consider a "failsafe" totally automated shutdown > (as in chop the power when temperature gets to, say, 40C, in the > room)... Security might be busy (maybe there was a big problem with > the chiller plant catching fire or the boiler exploding.. if they're > directing fire engine traffic, the last thing they're going to be > thinking about is going over to your machine room and shutting down > your hardware. Ah, that reminds me of the bad old days in industry. The A/C went belly up the night between Friday and Saturday. That triggered the alarm down at Security, who promptly called the on-duty ventilation technicians and notified us. Excellent. Except that the A/C alarm was never reset properly, so when the A/C failed again Saturday afternoon nobody noticed. When the temperature reached 35C, the thermal kill switch triggered automatically. Pity that the electrician had never got around to actually, like, *wire* it to anything. We arrived Monday morning to the smell of frying electronics. Expensive weekend, that. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From verycoldpenguin at hotmail.com Tue Feb 3 11:24:18 2004 From: verycoldpenguin at hotmail.com (Gareth Glaccum) Date: Tue, 03 Feb 2004 16:24:18 +0000 Subject: [Beowulf] Re: HVAC and room cooling... Message-ID: We sell solutions with automated power-off scripts upon node overheat using some of the APC products controlled from a linux master. Not that particular unit though. Gareth >From: Joshua Baker-LePain >To: Eckhoff.Peter at epamail.epa.gov >CC: beowulf at scyld.com >Subject: Re: [Beowulf] Re: HVAC and room cooling... >Date: Tue, 3 Feb 2004 10:11:32 -0500 (EST) > >On Tue, 3 Feb 2004 at 9:26am, Eckhoff.Peter at epamail.epa.gov wrote > > > Instead we found a sensor/software combination where the sensor ties > > into the > > serial port of one of the nodes. So far we **have** been able to > > gracefully shut down the > > programs that are running. We have **not** found a way to automatically > > turn off the > > various cluster nodes. That's where we need some help/suggestions. > >Well, your high-temperature-triggered scripts should call a 'shutdown -h >now'. *If* your nodes are on motherboards that support it, and *if* the >BIOS is new enough to support it, and *if* the nodes were booted with >'apm=power-off' on the kernel command line, then they should actually >power off. > >Another option would be something like this: > >http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP7960 > >With that (ungodly expensive) power strip, you can remotely cut the power >to selected outlets. It probably can be automated, but you'd have to >check that. > >As Jim said, though, all this is great, but there really does need to be >one final level of hardware level failsafe. It is entirely conceivable >that all your software monitoring could fail, and the temperature will >still be climbing. There needs to be a piece of hardware in the room that >literally cuts power to the whole damn room at a set temperature that is >(obviously) above the one that trips your software shutdown scripts. > >-- >Joshua Baker-LePain >Department of Biomedical Engineering >Duke University >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _________________________________________________________________ Stay in touch with absent friends - get MSN Messenger http://www.msn.co.uk/messenger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rmiguel at usmp.edu.pe Tue Feb 3 12:06:50 2004 From: rmiguel at usmp.edu.pe (Richard Miguel) Date: Tue, 3 Feb 2004 12:06:50 -0500 Subject: [Beowulf] about cluster's tunning References: <200402021546.i12Fk4h24131@NewBlue.scyld.com> <200402031626.53453.grid@iki.fi> Message-ID: <015f01c3ea78$24daccc0$1101000a@cpn.senamhi.gob.pe> Hi.. i have a cluster with 27 nodes PIV Intel .. I have installed a model for climate forecast. My question is how i can improvement the performance of my cluster.. there is techniques for tunning of clusters througth operative system or network hardware?. thanks for yours anwers.. and suggests.. R. Miguel _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Feb 3 13:12:24 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 3 Feb 2004 13:12:24 -0500 (EST) Subject: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset In-Reply-To: <200402031626.53453.grid@iki.fi> Message-ID: > I noticed in the Linux kernel configuration that there is support for LSI's > Fusion-MPT chipset. Also, it is possible to run MPI over this. huh? afaikt, it's just another overly expensive, overly complicated hw raid controller. I guess there must be a market for this kind of wrongheaded crap, but I really don't understand it. I guess it's just the impulse to offload whatever possible from the host; that's an understandable idea, but you really need to look at whether it makes sense, or whether it's just a holdover from bygone days when your million-dollar mainframe was actually compute-bound ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 3 23:01:17 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 4 Feb 2004 15:01:17 +1100 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> Message-ID: <200402041501.19592.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 1 Feb 2004 04:39 pm, Andrew Wang wrote: > 2.6 looks very promising, wondering when distributions > will include it. Mandrake 10 will include it (beta 2 just appeared with 2.6.2rc3 - they reckon the final 2.6.2 will make the release of Mdk10). - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAIG6NO2KABBYQAh8RAlCaAJ9Y5LKBLZQjGvCJCzO7ViuwZMGFiQCePiI+ Q2x2XGPUUWKYDT2nRv/5DHI= =S0ef -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 4 08:17:30 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 4 Feb 2004 08:17:30 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: Message-ID: On Tue, 3 Feb 2004, Leif Nixon wrote: > Jim Lux writes: > > > YOu need to seriously consider a "failsafe" totally automated shutdown > > (as in chop the power when temperature gets to, say, 40C, in the > > room)... Security might be busy (maybe there was a big problem with > > the chiller plant catching fire or the boiler exploding.. if they're > > directing fire engine traffic, the last thing they're going to be > > thinking about is going over to your machine room and shutting down > > your hardware. > > Ah, that reminds me of the bad old days in industry. > > The A/C went belly up the night between Friday and Saturday. That > triggered the alarm down at Security, who promptly called the on-duty > ventilation technicians and notified us. Excellent. > > Except that the A/C alarm was never reset properly, so when the A/C > failed again Saturday afternoon nobody noticed. > > When the temperature reached 35C, the thermal kill switch triggered > automatically. Pity that the electrician had never got around to > actually, like, *wire* it to anything. > > We arrived Monday morning to the smell of frying electronics. > Expensive weekend, that. Did you ever manage to track down the electrician and put bamboo slivers underneath his toenails or something? That one seems like it would be worth some sort of retaliation. A small nuclear device planted in his front lawn. An anonymous call to the IRS. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Wed Feb 4 17:34:21 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed, 4 Feb 2004 17:34:21 -0500 (EST) Subject: [Beowulf] about cluster's tunning In-Reply-To: <015f01c3ea78$24daccc0$1101000a@cpn.senamhi.gob.pe> Message-ID: You may want to look at the online course mentioned here: http://www.clusterworld.com/article.pl?sid=03/11/12/1919210&mode=thread&tid=10 Doug On Tue, 3 Feb 2004, Richard Miguel wrote: > Hi.. i have a cluster with 27 nodes PIV Intel .. I have installed a model > for climate forecast. My question is how i can improvement the performance > of my cluster.. there is techniques for tunning of clusters througth > operative system or network hardware?. > > thanks for yours anwers.. and suggests.. > > R. Miguel > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Wed Feb 4 15:08:04 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed, 04 Feb 2004 21:08:04 +0100 Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: (Robert G. Brown's message of "Wed, 4 Feb 2004 08:17:30 -0500 (EST)") References: Message-ID: "Robert G. Brown" writes: > On Tue, 3 Feb 2004, Leif Nixon wrote: > >> When the temperature reached 35C, the thermal kill switch triggered >> automatically. Pity that the electrician had never got around to >> actually, like, *wire* it to anything. >> >> We arrived Monday morning to the smell of frying electronics. >> Expensive weekend, that. > > Did you ever manage to track down the electrician and put bamboo slivers > underneath his toenails or something? Sadly, no. And don't get me started on luser electricians. "Ooops, did that feed go to the computer room?" "Hmmm, what's on this circuit? Let's toggle it and see what reboots." (Yes, it really happened. I don't often shout at people, but that time...) Dropping a fine gauge wire across the main power rails was an interesting stunt, too. Too bad he didn't even get flash burns. I think the main point here is: If you get hold of a competent electrician, take *real* good care of him. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From waitt at saic.com Thu Feb 5 07:41:24 2004 From: waitt at saic.com (Tim Wait) Date: Thu, 05 Feb 2004 07:41:24 -0500 Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: References: Message-ID: <402239F4.8030304@saic.com> > Dropping a fine gauge wire across the main power rails was an > interesting stunt, too. Too bad he didn't even get flash burns. How about an electrician, who, while working on your building power conditioning, sends 180V through your 120V building, frying everything not on UPS? We were not amused. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Thu Feb 5 11:23:13 2004 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Thu, 5 Feb 2004 11:23:13 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: <402239F4.8030304@saic.com> Message-ID: On Thu, 5 Feb 2004, Tim Wait wrote: > > > Dropping a fine gauge wire across the main power rails was an > > interesting stunt, too. Too bad he didn't even get flash burns. > > How about an electrician, who, while working on your building > power conditioning, sends 180V through your 120V building, > frying everything not on UPS? > > We were not amused. Oh, give the guy a break: Red, Black, White...it is all very confusing! My most serious problem has been with the computer room UPS begin shutdown accidentally, dropping a half-dozen raid servers. Many TBs of data were endangered. I might be able to forgive them if it only happened once, but I've needed to force myself to stop counting events because doing so interferes with my ability to properly suppress homocidal urges. Seriously, one would think that a Darwinian effect would kick in at some point and cull the electrical service hurd. My observations (and others here as well) seem to dispute that. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.gindonis at hip.fi Thu Feb 5 12:27:20 2004 From: michael.gindonis at hip.fi (Michael Gindonis) Date: Thu, 5 Feb 2004 19:27:20 +0200 Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs In-Reply-To: <200402041702.i14H2Jh03108@NewBlue.scyld.com> References: <200402041702.i14H2Jh03108@NewBlue.scyld.com> Message-ID: <200402051927.21214.michael.gindonis@hip.fi> On Wednesday 04 February 2004 19:02, beowulf-request at scyld.com wrote: > From: Mark Hahn > To: beowulf at beowulf.org > Subject: Re: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset > > > I noticed in the Linux kernel configuration that there is support for > > LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this. > > huh? ?afaikt, it's just another overly expensive, overly complicated hw > raid controller. ?I guess there must be a market for this kind of > wrongheaded crap, but I really don't understand it. Hi Mark, When purchasing a cluster or cluster hardware, one can spend as little as 20 Euro ( ~30 CAD) per node on interconnects to more than 1000 Euro per node for Myrinet or Scali. The Fusion-MPT chipset adds about 100 Euro to the cost of a motherboard. 100 Euro per node is much eaier to justify than 1000 Euro per node when the Cluster when the cluster will not be primarly running tighly coupled parallel problems. If the performance of MPI of Fusion-MPT is much better than than Ethernet with good latency, it becomes a cheap way to add flexibilty to a cluster. Here is some info about it the Chipset... http://www.lsilogic.com/files/docs/marketing_docs/storage_stand_prod/ integrated_circuits/fusion.pdf http://www.lsilogic.com/technologies/lsi_logic_innovations/ fusion___mpt_technology.html There is also information in the in the linux kernel documentation about running MPI over this kind of interconnect. ... Mike -- Michael Kustaa Gindonis Helsinki Institute of Physics, Technology Program michael.gindonis at hip.fi http://wikihip.cern.ch/twiki/bin/view/Main/MichaelGindonis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Feb 5 21:12:58 2004 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 5 Feb 2004 18:12:58 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: hi ya On Thu, 5 Feb 2004, Michael T. Prinkey wrote: > On Thu, 5 Feb 2004, Tim Wait wrote: > > > How about an electrician, who, while working on your building > > power conditioning, sends 180V through your 120V building, > > frying everything not on UPS? > > > > We were not amused. > > Oh, give the guy a break: Red, Black, White...it is all very confusing! dont forget blue and green too ... - fun to disconnect the wires at the main and move wires around ... while the bldg is "lit" i think its crazy that the "nuetral" side is tied together at the panel .. but the outlets in the building seems to work .. c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Thu Feb 5 22:07:18 2004 From: lathama at yahoo.com (Andrew Latham) Date: Thu, 5 Feb 2004 19:07:18 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: Message-ID: <20040206030718.46964.qmail@web60310.mail.yahoo.com> Trained Electrician Here Worked at a HVAC system fab. plant. I wired large Air Make Up Units. I was trained in by a very old school guy (CS degree from 1962). I watched as turnovers in workers happened and started to notice the lower paid guys that would work on 480V extension cords while they where hot with 60hp motors drawing on them! I strayed from that path for a while until a friend, who was handed the task of managing the renovation of an old building downtown. She had questions I had time. I ended up finding a retired electrician that knew his stuff. I asked him how he kept up to date. His reply was that he is on the writing committee for the National Electric Code. Needless to say I keep in contact with him on various topics. Note: CatV + Lighting = PCs + Fire Note2: Attic Access doors do not belong in the ceiling of a wiring closet. Something about fire wanting to go upwards, maybe some of you physics guys can explain it better. --- "Michael T. Prinkey" wrote: > On Thu, 5 Feb 2004, Tim Wait wrote: > > > > > > Dropping a fine gauge wire across the main power rails was an > > > interesting stunt, too. Too bad he didn't even get flash burns. > > > > How about an electrician, who, while working on your building > > power conditioning, sends 180V through your 120V building, > > frying everything not on UPS? > > > > We were not amused. > > Oh, give the guy a break: Red, Black, White...it is all very confusing! > > My most serious problem has been with the computer room UPS begin shutdown > accidentally, dropping a half-dozen raid servers. Many TBs of data were > endangered. I might be able to forgive them if it only happened once, but > I've needed to force myself to stop counting events because doing so > interferes with my ability to properly suppress homocidal urges. > > Seriously, one would think that a Darwinian effect would kick in at some > point and cull the electrical service hurd. My observations (and others > here as well) seem to dispute that. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== /------------------------------------------------------------\ Andrew Latham AKA: LATHAMA (lay-th-ham-eh) - LATHAMA.COM Penguin Loving, Moralist Agnostic. What Is an agnostic? - An agnostic thinks it impossible to know the truth in matters such as, a superbeing or the future with which religions are mainly concerned with. Or, if not impossible, at least impossible at the present time. lathama at lathama.com - lathama at yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 5 23:15:52 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 5 Feb 2004 23:15:52 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: On Thu, 5 Feb 2004, Alvin Oga wrote: > i think its crazy that the "nuetral" side is tied together > at the panel .. but the outlets in the building seems to work .. That's not crazy, that's actually rather sane. What would be crazy would be grounding the neutrals and/or ground wire in different places. Can you say "ground loop"? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Feb 5 23:26:37 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 5 Feb 2004 23:26:37 -0500 (EST) Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs In-Reply-To: <200402051927.21214.michael.gindonis@hip.fi> Message-ID: > When purchasing a cluster or cluster hardware, one can spend as little as 20 > Euro ( ~30 CAD) per node on interconnects or less, actually. you seem to be thinking of gigabit, which is indeed a very attractive cluster interconnect. otoh, there are lots of even more loosely-coupled, non-IO-intensive apps that run just fine on 100bT. > to more than 1000 Euro per node for > Myrinet or Scali. or IB. > The Fusion-MPT chipset adds about 100 Euro to the cost of a motherboard. yes, obviously. I'd probably rather have another gigabit port or two; bear in mind that some very elegant things can be done when each node has multiple network connections... really, the chipset isn't the point; it's just a $5 coprocessor. what counts is coming up with a physical layer, including affordable switches, and somehow getting millions of people to make/buy them. > 100 > Euro per node is much eaier to justify than 1000 Euro per node when the > Cluster when the cluster will not be primarly running tighly coupled parallel > problems. hmm, we've already established that gigabit is much cheaper, and for loose-coupled systems, chances are good that even 100bT will suffice. > If the performance of MPI of Fusion-MPT is much better than than > Ethernet with good latency, but does it even exist? so far, all I can find is two lines on a marketing glossy... > it becomes a cheap way to add flexibilty to a > cluster. many things could happen; I'm not optimistic about this Fusion-MPT thing. it seems to fly in the face of "do one thing, well". > Here is some info about it the Chipset... > > http://www.lsilogic.com/files/docs/marketing_docs/storage_stand_prod/ > integrated_circuits/fusion.pdf that's the vapid marketing glossy. > http://www.lsilogic.com/technologies/lsi_logic_innovations/ > fusion___mpt_technology.html that is even worse. > There is also information in the in the linux kernel documentation about > running MPI over this kind of interconnect. I'm not sure what "kind" here means, do you mean over scsi? the traditional problem with *-over-scsi (and there have been more than a couple) has been that scsi interfaces aren't optimized for low-latency. the bandwidth isn't that hard, really - 320 MB/s is around Myrinet speed, and significantly slower than IB. OK, how about FC? it's obviously got an advantage over U320 in that FC switches exist (oops, expensive) but it's really just a 1-2 Gb network protocol with 2k packets. as for the "high performance ARM-based architecture" part, well, I must admit that I don't associate ARM with high performance of the gigabyte-per-second sort. personally, I'd love to see sort of the network equivalent of the old smart-frame-buffer idea. practically, though, it really boils down to the gritty details like availability of switches, choosing a physical-layer standard, etc. gigabit is the obvious winner there, but IB is trying hard to get over that bump... (Myri seems not to be very ambitious, and 10G eth seems to be straying into a morass of tcp-offload and the like...) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 5 11:36:39 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 5 Feb 2004 11:36:39 -0500 (EST) Subject: [Beowulf] wulflogger, wulfstat's dumber cousin... Message-ID: On request I've got a second xmlsysd client going called "wulflogger". Wulflogger is just wulfstat with the ncurses stuff stripped off so that it manages connections to the xmlsysd's on a cluster, reads them at some input frequency, and writes selected status data to stdout in a simple table. The advantage of this tool is that it makes it really easy to write web or script or report applications, and it also makes it very easy to maintain a dynamic logfile of selected statistics for the entire cluster. This is and will likely remain a very simple tool. The only fanciness I envision for the future is an output descriptor format of some sort that could be input at run time, so that a user could select output fields and formats instead of getting the collections I've prebuilt. That's pretty complex (especially since wulflogger/wulfstat throttle xmlsysd to return only the collective stats it needs) so it won't be anytime soon. Only -t 1 is probably "finished" as output format goes, although -t 0 will probably get mostly cosmetic changes at this point as well. Anyway, any wulfstat/xmlsysd users might want to grab it and give it a try. It makes it pretty simple to write a perl script to generate e.g. rrd images or other graphical representations of the cluster -- in a future release I'll provide sample perl scripts for parsing out fields and doing stuff with it. It is for the moment only available from my personal website: http://www.phy.duke.edu/~rgb/Beowulf/wulflogger.php rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 3 10:38:56 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 3 Feb 2004 16:38:56 +0100 (CET) Subject: [Beowulf] O'Reilly's _Building Linux Clusters_ In-Reply-To: <20040203125618.GA6026@mikee.ath.cx> Message-ID: On Tue, 3 Feb 2004, Mike Eggleston wrote: > This book from 2000 discusses building clusters from linux. I > bought it from a discount store not because I'm going to build > another cluster from linux, but rather because of the discussions Mike, I bought this book almost when it came out. Its easy to do injustice to someone with a quick email, especially as David Spector put a lot of effort into the book, and I haven't. However, this OReilly is reckoned not to be one of the best. I always recommend 'Linux Clustering' by Charles Bookman, and 'Beowulf Cluster Computing with Linux' edited by Thomas Sterling. Online, there is the book by Bob Brown http://www.phy.duke.edu/brahma/Resources/beowulf_book.php For cluster management specifically, google for Rocks and Oscar, and there are lots of other pages. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Tue Feb 3 07:56:18 2004 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Tue, 3 Feb 2004 06:56:18 -0600 Subject: [Beowulf] O'Reilly's _Building Linux Clusters_ Message-ID: <20040203125618.GA6026@mikee.ath.cx> This book from 2000 discusses building clusters from linux. I bought it from a discount store not because I'm going to build another cluster from linux, but rather because of the discussions on cluster management. Has anyone read/implemented his approach? What other cluster management techniques/solutions are out there? Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Tue Feb 3 10:11:32 2004 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Tue, 3 Feb 2004 10:11:32 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: References: Message-ID: On Tue, 3 Feb 2004 at 9:26am, Eckhoff.Peter at epamail.epa.gov wrote > Instead we found a sensor/software combination where the sensor ties > into the > serial port of one of the nodes. So far we **have** been able to > gracefully shut down the > programs that are running. We have **not** found a way to automatically > turn off the > various cluster nodes. That's where we need some help/suggestions. Well, your high-temperature-triggered scripts should call a 'shutdown -h now'. *If* your nodes are on motherboards that support it, and *if* the BIOS is new enough to support it, and *if* the nodes were booted with 'apm=power-off' on the kernel command line, then they should actually power off. Another option would be something like this: http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP7960 With that (ungodly expensive) power strip, you can remotely cut the power to selected outlets. It probably can be automated, but you'd have to check that. As Jim said, though, all this is great, but there really does need to be one final level of hardware level failsafe. It is entirely conceivable that all your software monitoring could fail, and the temperature will still be climbing. There needs to be a piece of hardware in the room that literally cuts power to the whole damn room at a set temperature that is (obviously) above the one that trips your software shutdown scripts. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Feb 4 08:26:55 2004 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 4 Feb 2004 14:26:55 +0100 (CET) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: Message-ID: On Tue, 3 Feb 2004, Leif Nixon wrote: > Ah, that reminds me of the bad old days in industry. That in turn reminds me of recent construction work around here... Since the building with our offices and our small server room had to be renovated, the water-based cooling system for the server room had to be temporarily replaced with a mobile unit that pumps the heat into the hallway. The company responsible had no better idea than to replace the cooling system on friday afternoon -- of course without telling anybody. As the mobile unit was much too small, the server room had turned into sauna until monday when we discovered the problem. Ups. Luckily no hardware was damaged, even though the sensors in the hard-disk drives of our server measured a maximum of 47C. Regards, Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H16 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patrick at myri.com Fri Feb 6 04:17:43 2004 From: patrick at myri.com (Patrick Geoffray) Date: Fri, 06 Feb 2004 04:17:43 -0500 Subject: [Beowulf] Ambition In-Reply-To: References: Message-ID: <40235BB7.6010802@myri.com> Ah Mark, I could not resist. Actually I could, but the list has been a little boring lately, so... :-) Mark Hahn wrote: > (Myri seems not to be very ambitious, and 10G eth seems to be straying into > a morass of tcp-offload and the like...) Myri is very ambitious, but you can be carefully ambitious or marketingly ambitious. Nobody buys an interconnect looking only at the specs. People try, benchmark, run their code, rationalize and buy what they need at the right price. If you look at what people are doing, there is a lot of Ethernet (Fast and GigE) because thats good enough for many, many codes. Then there is a smaller market for more demanding needs, either in term of performance or scalability, where you want to find the sweet spot in the performance/price curve. Does it make sense to have 10Gb now ? I don't think so, and for several reasons: * PCI-Express is not here yet: It's coming, yes, but it's not available in volume. Today, PCI-X supports 1 GB bidirectional, which is 4 Gb link speed. It's clearly the bottleneck right now. HyperTransport looks attractive, but there is no connector defined yet and vendors should be able to see a potential for volume before to commit resources for a native HT interface. * 10 Gb optics are still expensive: price is going down, but there is not enough volume yet to drive the price down faster. Copper ? I still have nightmares about copper. 10 GigE will drive the technology price down as the 10 GigE market blossoms. * 10 GigE is not attractive enough yet because there is no clear improvement at the application level. Running a naive IP stack at 10 Gb requires a lot of resources on the host. RDMA is just a buzword, it's not The Solution. Storage may leverage RDMA, but not IP and certainly not MPI. That's why people are working to put processing on the data path, but it is far from obvious so it takes some time. Gigabit is the clear winner today and 10 GigE will be the clear winner tomorrow, because Ethernet is the de facto Standard. Everybody else are parasites, either breading on niches or marketing poop... Patrick _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sigut at id.ethz.ch Fri Feb 6 06:31:51 2004 From: sigut at id.ethz.ch (G.M.Sigut) Date: Fri, 6 Feb 2004 12:31:51 +0100 (MET) Subject: [Beowulf] about cluster's tunning Message-ID: <200402061131.i16BVpCQ002951@grisnir.ethz.ch> > Date: Thu, 5 Feb 2004 12:04:04 -0500 > Subject: Beowulf digest, Vol 1 #1657 - 4 msgs ... > --__--__-- > > Message: 1 > Date: Wed, 4 Feb 2004 17:34:21 -0500 (EST) > From: "Douglas Eadline, Cluster World Magazine" > Subject: Re: [Beowulf] about cluster's tunning > > You may want to look at the online course mentioned here: > > http://www.clusterworld.com/article.pl?sid=03/11/12/1919210&mode=thread&tid=10 Oh yeah. Very nice. Especially after you register (for the course) and are told your browser is no good. There is a page which helps you to select an approved browser - and that says: Unable to detect your operating system. Please select your operating system: -> Windows operating system -> Mac operating system. What a pity that I am working on a Sun. (and Linux) ... George :-( (is there a smiley for "I'm going to puke"?) >>>>>>>>>>>>>>>>>>>>>>>>> George M. Sigut <<<<<<<<<<<<<<<<<<<<<<<<<<< ETH Zurich, Informatikdienste, Sektion Systemdienste, CH-8092 Zurich Swiss Federal Inst. of Technology, Computing Services, System Services e-mail: sigut at id.ethz.ch, Phone: +41 1 632 5763, Fax: +41 1 632 1022 >>>> if my regular address does not work, try "sigut at pop.agri.ch" <<<< _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nathan at iwantka.com Fri Feb 6 08:52:20 2004 From: nathan at iwantka.com (Nathan Littlepage) Date: Fri, 6 Feb 2004 07:52:20 -0600 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com> > That's not crazy, that's actually rather sane. What would be crazy > would be grounding the neutrals and/or ground wire in > different places. > Can you say "ground loop"? > Grounding loops.. truly a bane. I remember one instance where someone wired a telecommunications switch to two different grounds. The -48v DC power had it's own ground, and someone had grounded the chassis to a different feed. I little lesser know fact was the lightning rod on the tower next to the building was linked to the same ground as the power. When lightning did strike, nothing but smoke as the charge rolled from one ground to the other on each bay. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Feb 6 09:30:10 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 6 Feb 2004 09:30:10 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com> Message-ID: On Fri, 6 Feb 2004, Nathan Littlepage wrote: > > That's not crazy, that's actually rather sane. What would be crazy > > would be grounding the neutrals and/or ground wire in > > different places. > > Can you say "ground loop"? > > > > > Grounding loops.. truly a bane. I remember one instance where someone > wired a telecommunications switch to two different grounds. The -48v DC > power had it's own ground, and someone had grounded the chassis to a > different feed. I little lesser know fact was the lightning rod on the > tower next to the building was linked to the same ground as the power. > When lightning did strike, nothing but smoke as the charge rolled from > one ground to the other on each bay. There is also a memorable instance of powered racks with incoming two phase power split into two circuits having a polarity reversal so its neutral wire on one circuit was 120V above chassic ground and the neutral on the other circuit. When somebody plugged a single unit with components on both lines -- I think it was more like "meltdown and fire". Not really a ground loop, of course... ...but plenty of people have been electrocuted or fires started because there was a lot more resistance on the neutral line to a remote "ground" than there was to a nice, local, piece of metal. Basically, AFAICT there is really nothing in the NEC or CEC that is "stupid". In fact, I think that most of the code has undergone a near-Darwinian selection process, as in electricians who fail to wire to code (and often their clients) not infrequently fail to reproduce. I don't think code is conservative ENOUGH, if anything, and like to overwire for any given situation. 12-2 is just as easy and cheap to work with as 14-2, for example. 10-2 unfortunately is not, but it gives me comfort to use it whereever I can. And I kinda wish that all circuit breakers were GFCI by code as well, not just ones servicing lines near water and pipes. However, these are still available as user choices -- code permits you to go over, just not under. Anybody curious about wiring should definitely google for the electrical wiring FAQ site. It explains wiring in relatively simple terms. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Feb 6 09:23:48 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 6 Feb 2004 15:23:48 +0100 (CET) Subject: [Beowulf] about cluster's tunning In-Reply-To: <200402061131.i16BVpCQ002951@grisnir.ethz.ch> Message-ID: It just worked fine for me. Mozilla 1.4.1 running on Fedora _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sp at scali.com Fri Feb 6 03:52:50 2004 From: sp at scali.com (Steffen Persvold) Date: Fri, 06 Feb 2004 09:52:50 +0100 Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs In-Reply-To: <200402051927.21214.michael.gindonis@hip.fi> References: <200402041702.i14H2Jh03108@NewBlue.scyld.com> <200402051927.21214.michael.gindonis@hip.fi> Message-ID: <402355E2.2040909@scali.com> Michael Gindonis wrote: > On Wednesday 04 February 2004 19:02, beowulf-request at scyld.com wrote: > >>From: Mark Hahn >>To: beowulf at beowulf.org >>Subject: Re: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset >> >> >>>I noticed in the Linux kernel configuration that there is support for >>>LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this. >> >>huh? afaikt, it's just another overly expensive, overly complicated hw >>raid controller. I guess there must be a market for this kind of >>wrongheaded crap, but I really don't understand it. > > > Hi Mark, > > When purchasing a cluster or cluster hardware, one can spend as little as 20 > Euro ( ~30 CAD) per node on interconnects to more than 1000 Euro per node for > Myrinet or Scali. > Michael, I'm not entirely sure what you mean by "Scali" here. Scali is a _software_ vendor and our MPI can use all of the interconnects that are popular within HPC today (GbE, Myrinet, InfiniBand and SCI). Best regards, -- Steffen Persvold Senior Software Engineer mob. +47 92 48 45 11 tel. +47 22 62 89 50 fax. +47 22 62 89 51 Scali - http://www.scali.com High Performance Clustering _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.kidger at quadrics.com Fri Feb 6 11:11:43 2004 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Fri, 6 Feb 2004 16:11:43 +0000 Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs In-Reply-To: References: Message-ID: <200402061611.43045.daniel.kidger@quadrics.com> On Friday 06 February 2004 4:26 am, Mark Hahn added: >> When purchasing a cluster or cluster hardware, one can spend as little as 20 >> Euro ( ~30 CAD) per node on interconnects >> to more than 1000 Euro per node for >> Myrinet or Scali. > > or IB. I guess you should add QsNet II to that list too (except that our cards are under e1000 - not counting switches) Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Fri Feb 6 13:12:49 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Fri, 06 Feb 2004 10:12:49 -0800 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: References: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com> Message-ID: <5.2.0.9.2.20040206094811.018e3688@mailhost4.jpl.nasa.gov> At 09:30 AM 2/6/2004 -0500, Robert G. Brown wrote: >On Fri, 6 Feb 2004, Nathan Littlepage wrote: > > > > That's not crazy, that's actually rather sane. What would be crazy > > > would be grounding the neutrals and/or ground wire in > > > different places. > > > Can you say "ground loop"? > > > > > > > > > Grounding loops.. truly a bane. I remember one instance where someone > > wired a telecommunications switch to two different grounds. The -48v DC > > power had it's own ground, and someone had grounded the chassis to a > > different feed. I little lesser know fact was the lightning rod on the > > tower next to the building was linked to the same ground as the power. > > When lightning did strike, nothing but smoke as the charge rolled from > > one ground to the other on each bay. > >There is also a memorable instance of powered racks with incoming two >phase power split into two circuits having a polarity reversal so its >neutral wire on one circuit was 120V above chassic ground and the >neutral on the other circuit. When somebody plugged a single unit with >components on both lines -- I think it was more like "meltdown and >fire". Not really a ground loop, of course... The classic error is wiring two sets of receptacles (e.g two racks full of gear) on the two sides of the 220, with neutrals properly connected, then having the neutral conductor fail, so the two 110V loads are in series across 110V. Works fine as long as the loads are balanced, but when you start to turn off the loads on one side, the voltages don't balance any more. >...but plenty of people have been electrocuted or fires started >because there was a lot more resistance on the neutral line to a remote >"ground" than there was to a nice, local, piece of metal. The notorious MGM Grand fire in Las Vegas, for instance, was caused by a ground/neutral/resistance thing. > Basically, >AFAICT there is really nothing in the NEC or CEC that is "stupid". In >fact, I think that most of the code has undergone a near-Darwinian >selection process, as in electricians who fail to wire to code (and >often their clients) not infrequently fail to reproduce. > >I don't think code is conservative ENOUGH, if anything, and like to >overwire for any given situation. 12-2 is just as easy and cheap to >work with as 14-2, for example. Not if you buy your wire in traincarload lots when wiring a subdivision. That extra copper adds up, not only in copper cost, but shipping, etc. Consider that the wiring harness in an automobile weighs on the order of 50-100kg, and you see why they're interested in going to multiplex buses and 42V systems. Ballparking for my house, which is, give or take 50 feet long, 20 feet wide, and 20 feet high, I'd say there are wiring runs comparable to, say, 3000 feet. That's 9000 total feet of conductors (Black,White, Ground). 12AWG is 19.8 lb/1000 ft, 14 is 12.4 lb/1000ft. Using AWG14 instead of AWG12 saves the contractor 70 pounds of copper. Copper, in huge quantities, is about $0.70/lb, so by the time it gets to the wire maker, it's probably a dollar a pound, so it saves the contractor $70 (not counting any shipping costs, etc. which could be another $0.10/lb or so) $70/house is a bunch o' bux to a builder putting up 500 homes in a tract. They make a profit by watching a thousand little details, each of which is some tiny fraction of the overall price ($70 on a 2000 ft house is 0.035/square foot, compared to $70-100/ft construction cost). It's much like automotive applications, or mass market consumer electronics, where they obssess about BOM (bill of materials) cost changes of pennies. (Do you really, really need that bypass capacitor? Does it have to be that big? How many product returns will we get if we leave it out?) This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 8.96, even after you factor in the fact that you might need more aluminum (because it's lower conductivity), it's still better than 2:1 weight difference. (Aluminum and Copper are about the same price these days, but copper has bigger fluctuations... back in the 70's copper was expensive and aluminum cheap (about 2:1)) So, 2:1 mass, 2:1 price.. changes the cost of the wire alone from $200/house down to $50/house... Consider an office building with 20-30 floors, of 10,000 square feet each. AWG12 vs AWG14 can be a BIG deal. There was a lot of arguing about the heavier neutral wire needed in light industrial office 208Y/120 wiring with all the poor power factor loads (i.e. computers with lightly loaded switching power supplies). > 10-2 unfortunately is not, but it >gives me comfort to use it whereever I can. And I kinda wish that all >circuit breakers were GFCI by code as well, not just ones servicing >lines near water and pipes. However, these are still available as user >choices -- code permits you to go over, just not under. > >Anybody curious about wiring should definitely google for the electrical >wiring FAQ site. It explains wiring in relatively simple terms. > > rgb > >-- >Robert G. Brown http://www.phy.duke.edu/~rgb/ >Duke University Dept. of Physics, Box 90305 >Durham, N.C. 27708-0305 >Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From fant at pobox.com Fri Feb 6 13:59:24 2004 From: fant at pobox.com (Andrew Fant) Date: Fri, 6 Feb 2004 13:59:24 -0500 (EST) Subject: [Beowulf] Gentoo for Science and Engineering Message-ID: Hello, I am sending this out to let people know about a new mailing-list/IRC channel which is being organized for people interested in the use of Gentoo Linux in Computational Science and Engineering applications. At this point we are just getting started, but hopefully we will grow into an organization which presents a one-stop resource about applying Gentoo to CS&E applications from the desktop to HPC clusters and grids. In addition, we will be working closely with Gentoo developers and the Core Gentoo management to provide feedback and guidance in how it can most closely meet the needs of technical end-users. Anyone who has an interest in computational science and engineering and who is interested in learning more about Gentoo or making it a better CS&E platform is most cordially invited to join About Gentoo Linux: Gentoo Linux is a source-based distribution that makes the assumption that the end-user or administrator knows more about what the system is supposed to do than the distribution developers. At the core of this is a package system known as Portage, which is similar in form to the BSD ports system. It uses the rough equivalent of an RPM spec file (called an ebuild within Gentoo) to automatically download source, compile the package (and any prerequisites) with appropriate optimizations and options as defined by the user, and install it in such a way that it can be removed or upgraded at a later time. Sometimes referred to as a meta-distribution by the developers, Gentoo initially installs a minimal environment and doesn't force the end-user to install packages and services that are unwanted or unnecessary. Also, no network daemons are started on a system unless an administrator expressly starts them. Gentoo Linux is developed by a community of developers, much as Fedora and Debian are. At present, there are over 6000 different ebuilds for different system utilities and applications in Portage. Of these, more than 100 are classified as scientific applications, including bioperl, octave, spice, and gromacs. In addition, many common scientific libraries and HPC tools are present, including Atlas, FFTW, gmp, LAM/MPI and openpbs. The main website can be found at http://www.gentoo.org. Contact information: The mailing-list is only starting now, and is rather quiet, though I hope to change that over the next couple of weeks. To subscribe, send a blank email to gentoo-science-subscribe at gentoo.org. You will get a confirmation message back. For those who want to just ask questions or find out more in a real-time setting, we are on IRC at irc.freenode.org in #gentoo-science. Of course, questions may also be directed to me at afant at geekmail.cc. Thank you for your time. Please feel free to forward this information to other groups that you feel would be interested. I apologize to anyone who considered this an off-topic post. Andy Fant Andrew Fant | This | "If I could walk THAT way... Molecular Geek | Space | I wouldn't need the talcum powder!" fant at pobox.com | For | G. Marx (apropos of Aerosmith) Boston, MA USA | Hire | http://www.pharmawulf.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Fri Feb 6 22:19:58 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sat, 7 Feb 2004 11:19:58 +0800 (CST) Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: Message-ID: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> Can you add GridEngine (SGE) and Torque (SPBS)? The problem with OpenPBS is not only it is broken, it is not under development these days, but also I found that Altair is not allowing new users to download OpenPBS. I went to its homepage today but it only leads me to the PBSPro page. SGE already has a FreeBSD-style "port", so adding a port for Gentoo Linux should also be easy. And I think SGE is more popluar these days too. SPBS is basically PBS, but with lots of problems fixed, and better Maui scheduler support. Also, please support the mpiexec parallel job starter, as it allows OpenPBS and SPBS to control slave MPI tasks. SGE: http://gridengine.sunsource.net SPBS: http://www.supercluster.org/projects/torque/ mpiexec: http://www.osc.edu/~pw/mpiexec/ Thx :-> Andrew. --- Andrew Fant ???? > In addition, many > common scientific libraries > and HPC tools are present, including Atlas, FFTW, > gmp, LAM/MPI and > openpbs. The main website can be found at > http://www.gentoo.org. > > Contact information: > > The mailing-list is only starting now, and is rather > quiet, though I hope > to change that over the next couple of weeks. To > subscribe, send a blank > email to gentoo-science-subscribe at gentoo.org. You > will get a confirmation > message back. For those who want to just ask > questions or find out more > in a real-time setting, we are on IRC at > irc.freenode.org in > #gentoo-science. Of course, questions may also be > directed to me at > afant at geekmail.cc. > > Thank you for your time. Please feel free to > forward this information to > other groups that you feel would be interested. I > apologize to anyone who > considered this an off-topic post. > > Andy Fant > > Andrew Fant | This | "If I could walk > THAT way... > Molecular Geek | Space | I wouldn't need > the talcum powder!" > fant at pobox.com | For | G. Marx > (apropos of Aerosmith) > Boston, MA USA | Hire | > http://www.pharmawulf.com > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sat Feb 7 03:18:47 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sat, 7 Feb 2004 09:18:47 +0100 (CET) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <5.2.0.9.2.20040206094811.018e3688@mailhost4.jpl.nasa.gov> Message-ID: On Fri, 6 Feb 2004, Jim Lux wrote: > This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > 8.96, even after you factor in the fact that you might need more aluminum > (because it's lower conductivity), it's still better than 2:1 weight Oh yes. Lots of telephone circuits were wired in aluminium in the 1960's in the UK. Corrosion now means these customers have difficulty getting ADSL. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Sat Feb 7 06:21:19 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Sat, 7 Feb 2004 11:21:19 +0000 Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> References: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> Message-ID: <20040207112119.GA5120@galactic.demon.co.uk> On Sat, Feb 07, 2004 at 11:19:58AM +0800, Andrew Wang wrote: > Can you add GridEngine (SGE) and Torque (SPBS)? > > The problem with OpenPBS is not only it is broken, it > is not under development these days, but also I found > that Altair is not allowing new users to download > OpenPBS. I went to its homepage today but it only > leads me to the PBSPro page. > To clarify things a bit, I hope. In the beginning was PBS - developed in house at NASA by engineers who needed a Portable Batch System. If you understand Cray NQS syntax and concepts it's familiar :) They left / sold to Veridian who in turn sold to Altair. The original PBS was GPL or a close equivalent, if I understand correctly. Altair are marketing a propietary development of PBS as PBSPro. OpenPBS remains available, though you have to register with Altair for download. What they have done very recently, which is rather sneaky, is for the site to oblige you to register for an evaluation copy of PBSPro and potentially answer a questionnaire prior to providing the link to allow you to download OpenPBS. OpenPBS is not under active development and PBSPro may have stalled. Certainly the price per node that Altair are quoting has apparently dropped significantly - though their salesmen are still persistent :) The academic community and the active users forked OpenPBS to create Scalable PBS [SPBS] which is the name most widely known. They've added patches, fixes and features, though there is still an Altair licence for OpenPBS in there. In the last couple of months, SPBS changed its name initially to StORM and then to Torque. HTH other relative newbies who may be confused by trying to find the product :) Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sat Feb 7 09:19:21 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sat, 7 Feb 2004 22:19:21 +0800 (CST) Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <20040207112119.GA5120@galactic.demon.co.uk> Message-ID: <20040207141921.68156.qmail@web16810.mail.tpe.yahoo.com> --- "Andrew M.A. Cater" ???? > Certainly the price per node that Altair are quoting > has apparently > dropped significantly - though their salesmen are > still persistent :) Both LSF and PBSPro dropped their price significantly. LSF used to be US $1000 per CPU is now $50, and PBSPro used to be a few hundred dollars, and now lower than $30. SGE (GridEngine) 6.0 has a lot of new enchancements and the SGE mailing lists are very popular; and SPBS is gaining a lot of OpenPBS users' acceptance; and Condor is adding another set of new features and then opensource in the next few months. See if LSF and PBSPro are going to drop their price again in the very near future. BTW, it is just like Linux vs M$, at the beginning, Linux wasn't there, and M$ could charge as much as it wanted, and then Linux slowly came, and M$ found it harder and harder to compete with Linux. Linux won't kill M$, and SGE/SPBS/Condor won't kill LSF or PBSPro, not in this few years. The only thing we will see, however, is the lower cost, more features, and better support by Platform Computing (LSF) and Altair (PBSPro) in order to fight back, so users win. Andrew. > The academic community and the active users forked > OpenPBS to create > Scalable PBS [SPBS] which is the name most widely > known. They've added > patches, fixes and features, though there is still > an Altair licence for > OpenPBS in there. In the last couple of months, > SPBS changed its name > initially to StORM and then to Torque. > > HTH other relative newbies who may be confused by > trying to find the > product :) > > Andy > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From br66 at HPCL.CSE.MsState.Edu Sat Feb 7 13:48:06 2004 From: br66 at HPCL.CSE.MsState.Edu (Balaji Rangasamy) Date: Sat, 7 Feb 2004 12:48:06 -0600 (CST) Subject: [Beowulf] Cluster applications. In-Reply-To: <40235BB7.6010802@myri.com> Message-ID: Hi, I am looking for a real high performance computing application to evaluate the performance of a 2-node cluster running RH9.0, connected back to back by 1GbE. Here are some characteristics of the application I am looking for: 1 Communication intensive, should not be embarassingly parallel. 2 Should be able to stress the network to the maximum. 3 Should not be a benchmark, a real application. 4 Tunable message sizes. 5 Preferably MPI 6 Free (am I greedy?). Can someone point out one/some application(s) with at least first 3 features in the above list? Thank you very much. Regards, Balaji. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Sat Feb 7 10:11:55 2004 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Sat, 07 Feb 2004 09:11:55 -0600 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: References: Message-ID: <4025003B.3020105@tamu.edu> Should we mention the problems in household wiring caused by use of aluminum wiring, then using breakers, outlets and fixtures designed for copper? I almost lost a house in Houston to that once. I spent the 8 hours after the fire department left retightening all the connections throughout. John Hearns wrote: > On Fri, 6 Feb 2004, Jim Lux wrote: > > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at >>8.96, even after you factor in the fact that you might need more aluminum >>(because it's lower conductivity), it's still better than 2:1 weight > > > Oh yes. > Lots of telephone circuits were wired in aluminium in the 1960's in the > UK. Corrosion now means these customers have difficulty getting ADSL. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From clwang at csis.hku.hk Fri Feb 6 20:51:51 2004 From: clwang at csis.hku.hk (Cho Li Wang) Date: Sat, 07 Feb 2004 09:51:51 +0800 Subject: [Beowulf] CFP: 2004 IFIP International Conference on Network and Parallel Computing (NPC2004) Message-ID: <402444B7.5E50DC8B@csis.hku.hk> NPC2004 IFIP International Conference on Network and Parallel Computing October 18-20, 2004 Wuhan, China http://grid.hust.edu.cn/npc04 **************************************************************** Call For Papers The goal of IFIP International Conference on Network and Parallel Computing (NPC 2004) is to establish an international forum for engineers and scientists to present their excellent ideas and experiences in all system fields of network and parallel computing. NPC 2004, hosted by the Huazhong University of Science and Technology, will be held in the city of Wuhan, China - the "Homeland of White Clouds and the Yellow Crane." Topics of interest include, but are not limited to: -Grid-based Computing -Cluster-based Computing -Peer-to-peer Computing -Network Security -Ubiquitous Computing -Network Architectures -Advanced Web and Proxy Services -Mobile Agents -Network Storage -Multimedia Streaming Services -Middleware Frameworks and Toolkits -Parallel & Distributed Architectures and Algorithms -Performance Modeling/ Evaluation -Programming Environments and Tools for Parallel and Distributed Platforms Submitted papers may not have appeared in or be considered for another conference. Papers must be written in English and must be in PDF format. Detailed electronic submission instructions will be posted on the conference web site. The conference proceedings will be published by Springer Verlag in the Lecture Notes in Computer Science (LNCS) Series (pending). ************************************************************************** Committee General Co-Chairs: H. J. Siegel Colorado State University, USA Guo-jie Li Chinese Academy of Sciences, China Steering Committee Chair: Kemal Ebcioglu IBM T.J. Watson Research Center, USA Program Co-Chairs: Guang-rong Gao University of Delaware, USA Zhi-wei Xu Chinese Academy of Sciences, China Program Vice-Chairs: Victor K. Prasanna University of Southern California, USA Albert Y. Zomaya University of Sydney, Australia Hai Jin Huazhong University of Science and Technology, China Local Arrangement Chair: Song Wu Huazhong University of Science and Technology, China *************************************************************************** Important Dates Paper Submission March 15, 2004 Author Notification May 1, 2004 Final Camera Ready Manuscript June 1, 2004 *************************************************************************** For more information, please contact the program vice-chair at the address below: Dr. Hai Jin, Professor Director, Cluster and Grid Computing Lab Vice-Dean, School of Computer Huazhong University of Science and Technology Wuhan, 430074, China Tel: +86-27-87543529 Fax: +86-27-87557354 e-fax: +1-425-920-8937 e-mail: hjin at hust.edu.cn _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Sat Feb 7 14:40:29 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Sat, 7 Feb 2004 11:40:29 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: On Sat, 7 Feb 2004, John Hearns wrote: > On Fri, 6 Feb 2004, Jim Lux wrote: > > > This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > > 8.96, even after you factor in the fact that you might need more aluminum > > (because it's lower conductivity), it's still better than 2:1 weight > > Oh yes. > Lots of telephone circuits were wired in aluminium in the 1960's in the > UK. Corrosion now means these customers have difficulty getting ADSL. yeah but that's 24-26awg twisted pair for phone a 14 12 10 or 8 awg cable for power have substantialy less surface area relative to it's volume. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Sat Feb 7 17:21:38 2004 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Sat, 7 Feb 2004 14:21:38 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... wires - al In-Reply-To: <4025003B.3020105@tamu.edu> Message-ID: hi ya On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote: > Should we mention the problems in household wiring caused by use of > aluminum wiring, then using breakers, outlets and fixtures designed for > copper? I almost lost a house in Houston to that once. I spent the 8 > hours after the fire department left retightening all the connections > throughout. people install wires with al or steel cores in the wire cause its way way cheaper than copper ... copper is only needed for good conduction on the outside of the wire al corrosion ... coat it with stuff :-) or wrap it w/ copper but now you have to worry about copper corrosion - house or building wiring is different animals than high voltage transmission lines too aluminum "pixie" dust does whacky things .. c ya alvin - i've always wondered why people put massive heatsinks on top of the cpu ... air will have a harder time to cool a big mass of metal as opposed to cooling a smaller piece of metal or cooling it some other way .. - problems of getting the heat out of the cpu ( 0.25"sq metal lid) - problems of getting the heat out of the cpu heatsink - blowing air down onto the heatsink is silly too .. left over from the 20-30 yr old ideas i guess > > John Hearns wrote: > > On Fri, 6 Feb 2004, Jim Lux wrote: > > > > > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > >>8.96, even after you factor in the fact that you might need more aluminum > >>(because it's lower conductivity), it's still better than 2:1 weight > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Sat Feb 7 21:36:50 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Sat, 7 Feb 2004 21:36:50 -0500 (EST) Subject: [Beowulf] Cluster applications. In-Reply-To: Message-ID: Check out: http://www.clusterworld.com/article.pl?sid=03/03/17/1838236&mode=thread&tid=8 Also, the "Right Stuff" Column in ClusterWorld addresses some of these issues. To see the a small summary of the columns look at: http://www.clusterworld.com/issues.shtml Doug On Sat, 7 Feb 2004, Balaji Rangasamy wrote: > Hi, > I am looking for a real high performance computing application to evaluate > the performance of a 2-node cluster running RH9.0, connected back to back > by 1GbE. Here are some characteristics of the application I am looking > for: > 1 Communication intensive, should not be embarassingly parallel. > 2 Should be able to stress the network to the maximum. > 3 Should not be a benchmark, a real application. > 4 Tunable message sizes. > 5 Preferably MPI > 6 Free (am I greedy?). > Can someone point out one/some application(s) with at least first 3 > features in the above list? Thank you very much. > Regards, > Balaji. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From klamman.gard at telia.com Sun Feb 8 03:50:22 2004 From: klamman.gard at telia.com (Per Lindstrom) Date: Sun, 08 Feb 2004 09:50:22 +0100 Subject: [Beowulf] Cluster applications. In-Reply-To: References: Message-ID: <4025F84E.4040706@telia.com> Hi Balaji, May I suggest the use of the GNU FEA-software CALCULIX, http://calculix.de/ When will it be up to you to decide how demanding problem your cluster have to solve. Best regards Per Lindstrom Balaji Rangasamy wrote: >Hi, >I am looking for a real high performance computing application to evaluate >the performance of a 2-node cluster running RH9.0, connected back to back >by 1GbE. Here are some characteristics of the application I am looking >for: >1 Communication intensive, should not be embarassingly parallel. >2 Should be able to stress the network to the maximum. >3 Should not be a benchmark, a real application. >4 Tunable message sizes. >5 Preferably MPI >6 Free (am I greedy?). >Can someone point out one/some application(s) with at least first 3 >features in the above list? Thank you very much. >Regards, >Balaji. > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Feb 8 10:52:44 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 8 Feb 2004 10:52:44 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <4025003B.3020105@tamu.edu> Message-ID: On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote: > Should we mention the problems in household wiring caused by use of > aluminum wiring, then using breakers, outlets and fixtures designed for > copper? I almost lost a house in Houston to that once. I spent the 8 > hours after the fire department left retightening all the connections > throughout. You mean the part where aluminum turns out to burn like magnesium, incredibly hot and impossible to quench? I would under no circumstances put aluminum wiring in, well, anything. Certainly not anything where a serious overload or arcing situation could occur, which is nearly anything. I seem to remember the government finding out about aluminum the hard way with some of their armored fighting vehicles a decade or two ago. When struck with a hot enough round, the armor itself just burned right up. rgb > > John Hearns wrote: > > On Fri, 6 Feb 2004, Jim Lux wrote: > > > > > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > >>8.96, even after you factor in the fact that you might need more aluminum > >>(because it's lower conductivity), it's still better than 2:1 weight > > > > > > Oh yes. > > Lots of telephone circuits were wired in aluminium in the 1960's in the > > UK. Corrosion now means these customers have difficulty getting ADSL. > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nathan at iwantka.com Sun Feb 8 14:10:43 2004 From: nathan at iwantka.com (Nathan Littlepage) Date: Sun, 08 Feb 2004 13:10:43 -0600 Subject: [Beowulf] DC Powered Chassis Message-ID: <402689B3.9070104@iwantka.com> With all the power talk on the 'HVAC and Room Cooling' subject. I've been looking for 1 or 2u chassis that support -48v DC as the main power source. Does anyone know of someone that manufactures these? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Mon Feb 9 00:24:28 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Sun, 8 Feb 2004 21:24:28 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: On Sun, 8 Feb 2004, Robert G. Brown wrote: > On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote: > > > Should we mention the problems in household wiring caused by use of > > aluminum wiring, then using breakers, outlets and fixtures designed for > > copper? I almost lost a house in Houston to that once. I spent the 8 > > hours after the fire department left retightening all the connections > > throughout. > > I seem to remember the government finding out about aluminum the hard > way with some of their armored fighting vehicles a decade or two ago. > When struck with a hot enough round, the armor itself just burned right > up. armor is supposed to burn. several armor desgins including that of the american abrams battle tank, are desgined to ablate under pressure from kinetic energy weapons. british chobham type composite armor, boron carbide, or aluminum or some conbination of those and others, protect larger armored vehicles from depleted uranium and tungsten sabot munitions. depleted uranium has similar or better pyrophoric properties (igniting at 500c and burning at 2000c) and the added nastyness of being a toxic heavy metal... in general taking a 10kg urunium slug, accelerating it to 15,000fps and slamming it into another object will cause a fire. It has been used in both armor and projectiles for more or less the same reasons. > rgb > > > > > John Hearns wrote: > > > On Fri, 6 Feb 2004, Jim Lux wrote: > > > > > > > > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > > >>8.96, even after you factor in the fact that you might need more aluminum > > >>(because it's lower conductivity), it's still better than 2:1 weight > > > > > > > > > Oh yes. > > > Lots of telephone circuits were wired in aluminium in the 1960's in the > > > UK. Corrosion now means these customers have difficulty getting ADSL. > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Sun Feb 8 17:49:18 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Sun, 8 Feb 2004 14:49:18 -0800 (PST) Subject: [Beowulf] DC Powered Chassis In-Reply-To: <402689B3.9070104@iwantka.com> Message-ID: http://www.rackmountpro.com/productsearch.cfm?catid=118 On Sun, 8 Feb 2004, Nathan Littlepage wrote: > With all the power talk on the 'HVAC and Room Cooling' subject. I've > been looking for 1 or 2u chassis that support -48v DC as the main power > source. Does anyone know of someone that manufactures these? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Feb 9 13:13:16 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 9 Feb 2004 13:13:16 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: On Sun, 8 Feb 2004, Joel Jaeggli wrote: > On Sun, 8 Feb 2004, Robert G. Brown wrote: > > > On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote: > > > > > Should we mention the problems in household wiring caused by use of > > > aluminum wiring, then using breakers, outlets and fixtures designed for > > > copper? I almost lost a house in Houston to that once. I spent the 8 > > > hours after the fire department left retightening all the connections > > > throughout. > > > > I seem to remember the government finding out about aluminum the hard > > way with some of their armored fighting vehicles a decade or two ago. > > When struck with a hot enough round, the armor itself just burned right > > up. > > armor is supposed to burn. several armor desgins including that of the "supposed to burn"? Where to "burn" is to release additional heat energy into an already hot environment in a self-sustaining way? Ouch. Supposed to ablate and dissipate energy (hopefully in non-destructive ways on the outside of the vehicle) sure, but naive aluminum designs can be deadly and at various points in the past have been seriously mistrusted by the military personnel supposedly being protected. See e.g. http://www.g2mil.com/aluminum.htm where they recall the early bradley flaws, and argue that the HMS Sheffield (sunk by a single exocet missle in the falklands war) went down in large measure because it was an aluminum ship, where steel ships have been hit by more than one exocet and survived. The site also presents a counterpoint that argues that aluminum isn't THAT bad a choice (as near as I can make out) provided that all one wishes to stop is "small arms fire". It very quickly loses out to steel, though, in a variety of measures when faced with RPG's or things that actually cause fires, as it is a good conductor of heat and quickly spreads a fire and structurally collapses at a relatively low temperature. The aluminum Bradley did tolerably in the first gulf war, losing only 3 to enemy fire (compared to 17 lost to friendly fire from Abrams tanks) but it does have provisions for additional armor plates of steel to be added on outside and I imagine that it used them. Most of what it faced in the gulf war OTHER than our Abrams was its forte -- small arms fire. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Sun Feb 8 17:09:36 2004 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Sun, 8 Feb 2004 14:09:36 -0800 (PST) Subject: [Beowulf] DC Powered Chassis In-Reply-To: <402689B3.9070104@iwantka.com> Message-ID: hi ya nathan On Sun, 8 Feb 2004, Nathan Littlepage wrote: > With all the power talk on the 'HVAC and Room Cooling' subject. I've > been looking for 1 or 2u chassis that support -48v DC as the main power > source. Does anyone know of someone that manufactures these? some collection of "these" http://www.Linux-1U.net/PowerSupp/DC http://www.Linux-1U.net/PowerSupp/12v problem with +12v or -48v dc inputs is you need to provide enough current to these "dc power supply" - at 12v .. we were calculating about 400A ... since we estimate 4A per mb and 100 mb per rack and double it or 50% for keeping the powersupply reasonably within its normal lifespan ( mtbf ) fun stuff... alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nathan at iwantka.com Sun Feb 8 17:31:40 2004 From: nathan at iwantka.com (Nathan Littlepage) Date: Sun, 08 Feb 2004 16:31:40 -0600 Subject: [Beowulf] DC Powered Chassis In-Reply-To: References: Message-ID: <4026B8CC.7050102@iwantka.com> Thanks! Alvin Oga wrote: >hi ya nathan > >On Sun, 8 Feb 2004, Nathan Littlepage wrote: > > > >>With all the power talk on the 'HVAC and Room Cooling' subject. I've >>been looking for 1 or 2u chassis that support -48v DC as the main power >>source. Does anyone know of someone that manufactures these? >> >> > >some collection of "these" > >http://www.Linux-1U.net/PowerSupp/DC >http://www.Linux-1U.net/PowerSupp/12v > > >problem with +12v or -48v dc inputs is you need to provide >enough current to these "dc power supply" > - at 12v .. we were calculating about 400A ... > since we estimate 4A per mb and 100 mb per rack > and double it or 50% for keeping the powersupply > reasonably within its normal lifespan ( mtbf ) > >fun stuff... >alvin > > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From clwang at csis.hku.hk Sat Feb 7 20:24:01 2004 From: clwang at csis.hku.hk (clwang at csis.hku.hk) Date: Sun, 8 Feb 2004 09:24:01 +0800 Subject: [Beowulf] CFP: GCC2004 (3rd International Conference on Grid and Cooperative Computing) Message-ID: <1076203441.40258fb1bf1c0@intranet.csis.hku.hk> ---------------------------------------------------------------- Call for Papers 3rd International Conference on Grid and Cooperative Computing (http://grid.hust.edu.cn/gcc2004) October 21-24 2004, Wuhan, China ----------------------------------------------------------------- The Third International Conference on Grid and Cooperative Computing (GCC 2004) will be held from Oct. 21 to 24, 2004 in Wuhan. It will serve as a forum to present current work by researchers in the grid computing and cooperative computing area. GCC 2004 is the follow-up of the highly successful GCC 2003 in Shanghai, China, and GCC 2002 in Sanya, China. Wuhan is rich in culture and history. Its civilization began about 3,500 years ago, and is of great importance in Chinese culture, economy and politics. It shares the same culture of Chu, formed since the ancient Kingdom of Chu more than 2,000 years ago. Numerous natural and artificial attractions and scenic spots are scattered around. Famous scenic spots in Wuhan include Yellow Crane Tower, Guiyuan Temple, East Lake, and Hubei Provincial Museum with the famous chimes playing the music of different styles. GCC 2004 will emphasize the design and analysis of grid computing and cooperative computing and their scientific, engineering, and commercial applications. In addition to technical sessions of contributed paper presentations, the conference will have several workshops, a poster session, tutorials, and vendor exhibitions. GCC 2004 invites the submission of papers in grid computing, Web services and cooperative computing, including theory and applications. The conference is soliciting only original high quality research papers on all above aspects. The main topics of interest include, but not limited to: -Resource Grid and Service Grid - Information Grid and Knowledge Grid - Grid Monitoring, Management and Organization Tools - Grid Portal - Grid Service, Web Service and their QoS - Service Orchestration - Grid Middleware and Toolkits - Grid Security - Innovative Grid Applications - Advanced Resource Reservation and Scheduling - Performance Evaluation and Modeling - Computer-Supported Cooperative Work - P2P Computing, automatic computing and so on - Meta-information Management - Software glue Technologies PAPER SUBMISSION Paper submissions must present original, unpublished research or experiences. Late-breaking advances and work-in-progress reports from ongoing research are also encouraged to be submitted to GCC 2004. All papers submitted to this conference will be peer-reviewed and accepted on the basis of their scientific merit and relevance to the conference topics. Accepted papers will be published as conference proceedings, published by Springer-Verlag in the Lecture Notes in Computer Science (LNCS) Series (Pending). It is also planned that a selection of papers from GCC 2004 proceedings will be extended and published in international journals. WORKSHOPS Proposals are solicited for workshops to be held in conjunction with the main conference. Interested individuals should submit a proposal by March 1, 2004 to the Workshop Chair. TUTORIALS Proposals are solicited for tutorials to be held at the conference. Interested individuals should submit a proposal by May 30,2004. The proposal should include a brief description of the intended audience, a lecture outline, and a vita for each lecturer. EXHIBITION/VENDOR PRESENTATIONS Companies and R&D laboratories are encouraged to present their exhibits at the conference. In addition, a full day of vendor presentations is planned. IMPORTANT DATES March 1, 2004 Workshop Proposal Due May 1, 2004 Conference Paper Due May 30, 2004 Tutorial Proposal Due June 1, 2004 Notification of Acceptance/Rejection June 30, 2004 Camera-Ready Paper Due ORGANIZATION CONFERENCE Co-CHAIRS Xicheng Lu, National University of Defense Technology, China Andrew A. Chien, University of California at San Diego, USA. PROGRAM Co-CHAIRS Hai Jin, Huazhong University of Science and Technology, China. hjin at hust.edu.cn Yi Pan, Georgia State University, USA. pan at cs.gsu.edu WORKSHOP CHAIR Nong Xiao, National University of Defense Technology, China. xiao-n at vip.sina.com, Xiao_n at sina.com. Publicity Chair Minglu Li, Shanghai Jiao Tong University, China. li-ml at cs.sjtu.edu.cn Tutorial Chair Dan Meng, Institute of Computing Technology, Chinese Academy of Sciences, China. md at ncic.ac.cn Poster Chair Song Wu, Huazhong University of Science and Technology, China. wusong at mail.hust.edu.cn LOCAL ARRANGEMENT CHAIR Pingpeng Yuan, Huazhong University of Science and Technology, China. ppyuan at mail.hust.edu.cn. Program Committee Members(More to be added) Mark Baker (University of Portsmouth, UK) Rajkumar Buyya (The University of Melbourne, Australia) Wentong Cai (Nanyang Technological University, Singapore) Jiannong Cao (Hong Kong Polytechnic University, Hong Kong) Guihai Chen (Nanjing University, China) Minyi Guo (University of Aizu, Japan) Chun-Hsi Huang (University of Connecticut, USA) Weijia Jia (City University of Hong Kong, Hong Kong) Francis Lau (The University of Hong Kong, Hong Kong) Keqin Li (State University of New York, USA) Qing Li (City University of Hong Kong, Hong Kong) Lionel Ni (Hong Kong University of Science and Technology, Hong Kong) Hong Shen (Japan Advanced Institute of Science and Technology, Japan) Yuzhong Sun (Institute of Computing Technology, CAS, China) Huaglory Tianfield (Glasgow Caledonian University, UK) Cho-Li Wang (The University of Hong Kong, Hong Kong) Jie Wu (Florida Atlantic University, USA) Cheng-Zhong Xu (Wayne State University, USA) Laurence Tianruo Yang (St. Francis Xavier University, Canada) Qiang Yang (Hong Kong University of Science & Technology, Hong Kong) Yao Zheng (Zhejiang University, China) Wanlei Zhou (Deakin University, Australia) Jianping Zhu (The University of Akron, USA) For more information, please visit conference web site at: http://grid.hust.edu.cn/gcc2004. ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgoornaden at intnet.mu Mon Feb 9 10:34:28 2004 From: rgoornaden at intnet.mu (roudy) Date: Mon, 9 Feb 2004 19:34:28 +0400 Subject: [Beowulf] parallel program References: <200402081701.i18H1qh28395@NewBlue.scyld.com> Message-ID: <001601c3ef22$35dd7be0$ab007bca@roudy> Hello Beowulf people, I have completed to build my cluster. I have have already run linpack on my cluster and it's performance is fine. Can someone help me by giving me some very big programs to run on my cluster to compare the performance with a stand-alone computer. Thanks Roudy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgoornaden at intnet.mu Mon Feb 9 10:34:28 2004 From: rgoornaden at intnet.mu (roudy) Date: Mon, 9 Feb 2004 19:34:28 +0400 Subject: [Beowulf] parallel program References: <200402081701.i18H1qh28395@NewBlue.scyld.com> Message-ID: <001601c3ef22$35dd7be0$ab007bca@roudy> Hello Beowulf people, I have completed to build my cluster. I have have already run linpack on my cluster and it's performance is fine. Can someone help me by giving me some very big programs to run on my cluster to compare the performance with a stand-alone computer. Thanks Roudy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcookeman at yahoo.com Mon Feb 9 18:00:47 2004 From: jcookeman at yahoo.com (Justin Cook) Date: Mon, 9 Feb 2004 15:00:47 -0800 (PST) Subject: [Beowulf] parallel program In-Reply-To: <001601c3ef22$35dd7be0$ab007bca@roudy> Message-ID: <20040209230047.89106.qmail@web60510.mail.yahoo.com> http://www.mpa-garching.mpg.de/galform/gadget/index.shtml There is a serial and parallel version. Have fun... Justin --- roudy wrote: > Hello Beowulf people, > I have completed to build my cluster. I have have > already run linpack on my > cluster and it's performance is fine. > Can someone help me by giving me some very big > programs to run on my cluster > to compare the performance with a stand-alone > computer. > Thanks > Roudy > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Feb 9 17:50:45 2004 From: becker at scyld.com (Donald Becker) Date: Mon, 9 Feb 2004 17:50:45 -0500 (EST) Subject: [Beowulf] BWBUG meeting Tuesday Feb 10 at 3:00, Platform Computing Message-ID: --- Note that this meeting is in VA not Maryland! -- Date: February 10, 2004 Time: 3:00 PM (doors open at 2:30) Location: Northrop Grumman IT, McLean Virginia The folks from Platform Computing will be speaking about their LSF scheduler and Grid Computing for Beowulf. This event is sponsored by the Baltimore-Washington Beowulf Users Group (BWBUG) and will be held at Northrop Grumman Information Technology 7575 Colshire Drive, 2nd floor, McLean Virginia. Please register on line at http://bwbug.org As usual there will be door prizes, food and refreshments. Need to be a member?: No ( guests are welcome ) Parking: Free T. Michael Fitzmaurice, Jr. 8110 Gatehouse Road, Suite 400W Falls Church, VA 22042 703-205-3132 office 240-475-7877 cell Email michael.fitzmaurice at ngc.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Feb 9 18:25:55 2004 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 10 Feb 2004 10:25:55 +1100 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: References: Message-ID: <200402101025.57234.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 10 Feb 2004 05:13 am, Robert G. Brown wrote: > argue that the HMS Sheffield (sunk by a single exocet missle in the > falklands war) went down in large measure because it was an aluminum ship A quick correction, the Sheffield was an all steel ship, as I believe were all the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was not brought under control because the exocet (which failed to explode) took out a large chunk of the fire fighting system. She finally sank under tow on May 10th 1982, six days after being hit. The sci.military.naval FAQ has an excellent section on the role of aluminium in the loss of warships which looks at this urban legend, and gives real examples when aluminium did cause the loss, at: http://www.hazegray.org/faq/smn6.htm#F7 as well as a section on the Type 42's at: http://www.hazegray.org/navhist/rn/destroyers/type42/ cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD4DBQFAKBcDO2KABBYQAh8RAq2vAJdRfrlHek12hced85HGV0z1nWbYAJ9GJegr FBxjHUczDti0OXNKX5VoKA== =PA8t -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Feb 9 18:31:18 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 9 Feb 2004 18:31:18 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <200402101025.57234.csamuel@vpac.org> Message-ID: On Tue, 10 Feb 2004, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Tue, 10 Feb 2004 05:13 am, Robert G. Brown wrote: > > > argue that the HMS Sheffield (sunk by a single exocet missle in the > > falklands war) went down in large measure because it was an aluminum ship > > A quick correction, the Sheffield was an all steel ship, as I believe were all > the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was > not brought under control because the exocet (which failed to explode) took > out a large chunk of the fire fighting system. She finally sank under tow on > May 10th 1982, six days after being hit. I stand corrected. Obviously one can't believe everything one googles...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Mon Feb 9 22:42:32 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Tue, 10 Feb 2004 11:42:32 +0800 (CST) Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?) Message-ID: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> >From comp.arch: "One of the things that the version 8.0 of the Intel compiler included was an "Intel-specific" flag." But looks like the purpose is to slow down AMD: http://groups.google.ca/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&group=comp.arch&selm=a13e403a.0402091438.14018f5a%40posting.google.com If intel releases 64-bit x86 CPUs and compilers, then AMD may get even better benchmarks results. Again, no matter how pretty the benchmarks results look, in the end we still need to run on the real system. So, what's the point of having benchmarks? Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 10 03:10:39 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 10 Feb 2004 09:10:39 +0100 (CET) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <200402101025.57234.csamuel@vpac.org> Message-ID: On Tue, 10 Feb 2004, Chris Samuel wrote: > A quick correction, the Sheffield was an all steel ship, as I believe were all > the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was > not brought under control because the exocet (which failed to explode) took > out a large chunk of the fire fighting system. She finally sank under tow on > May 10th 1982, six days after being hit. Steering the argument back to computers :-) I saw a documentary about the Sheffield once. Two ships were sent out as 'goalkeepers', the Sheffield and the smaller Broadsword. The Sheffield had a longer range missile system, the Broadsword a short range one (or other way around). During a period of vulnerability (can;t remember the exact reason) the Broadsword had to reboot its ageing fire control computer. I think build by Ferranti. (No slur intended on their fine engineers, but the thing was old at the time). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wardwe at navseadn.navy.mil Tue Feb 10 16:21:01 2004 From: wardwe at navseadn.navy.mil (Ward William E DLDN) Date: Tue, 10 Feb 2004 16:21:01 -0500 Subject: [Beowulf] Intel Compiler cheating against non-Intel CPUs? Message-ID: Has anyone seen this yet? Any comments or discussion? >From the message, it looks like the Intel Compilers are cheating against SSE and SSE2 capable non-Intel CPUs (ie, A64 especially). http://groups.google.ca/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=a13e403a.040 2091438.14018f5a%40posting.google.com&rnum=1 R/William Ward _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mbanck at gmx.net Tue Feb 10 13:01:16 2004 From: mbanck at gmx.net (Michael Banck) Date: Tue, 10 Feb 2004 19:01:16 +0100 Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <20040207112119.GA5120@galactic.demon.co.uk> References: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> <20040207112119.GA5120@galactic.demon.co.uk> Message-ID: <20040210180116.GA27872@blackbird.oase.mhn.de> On Sat, Feb 07, 2004 at 11:21:19AM +0000, Andrew M.A. Cater wrote: > On Sat, Feb 07, 2004 at 11:19:58AM +0800, Andrew Wang wrote: > > Can you add GridEngine (SGE) and Torque (SPBS)? > > > > The problem with OpenPBS is not only it is broken, it > > is not under development these days, but also I found > > that Altair is not allowing new users to download > > OpenPBS. I went to its homepage today but it only > > leads me to the PBSPro page. > > > To clarify things a bit, I hope. > > In the beginning was PBS - developed in house at NASA by engineers > who needed a Portable Batch System. If you understand Cray NQS syntax > and concepts it's familiar :) They left / sold to Veridian who in turn > sold to Altair. The original PBS was GPL or a close equivalent, if I > understand correctly. > > Altair are marketing a propietary development of PBS as PBSPro. OpenPBS > remains available, though you have to register with Altair for download. > What they have done very recently, which is rather sneaky, is for the > site to oblige you to register for an evaluation copy of PBSPro and > potentially answer a questionnaire prior to providing the link to allow > you to download OpenPBS. > > OpenPBS is not under active development and PBSPro may have stalled. > Certainly the price per node that Altair are quoting has apparently > dropped significantly - though their salesmen are still persistent :) > > The academic community and the active users forked OpenPBS to create > Scalable PBS [SPBS] which is the name most widely known. They've added > patches, fixes and features, though there is still an Altair licence for > OpenPBS in there. In the last couple of months, SPBS changed its name > initially to StORM and then to Torque. Thanks for the clarification. Does anybody know whether Torque is considered to be conforming to the Open Source Definition[1]? In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software License', which seems to prohibit commercial distribution, making it non-free unfortunately. Is there some other fork of PBS with a true Open Source license perhaps? thanks, Michael [1] http://www.opensource.org/docs/definition.php _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 10 18:00:05 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 11 Feb 2004 10:00:05 +1100 Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <20040210180116.GA27872@blackbird.oase.mhn.de> References: <20040207112119.GA5120@galactic.demon.co.uk> <20040210180116.GA27872@blackbird.oase.mhn.de> Message-ID: <200402111000.08919.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote: > Thanks for the clarification. Does anybody know whether Torque is > considered to be conforming to the Open Source Definition[1]? > > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software > License', which seems to prohibit commercial distribution, making it > non-free unfortunately. Is there some other fork of PBS with a true Open > Source license perhaps? My understanding is that they cannot alter the license as they have inherited that from the original OpenPBS sources, and as they do not hold all the copyrights to the code it cannot be changed unless Altair can be persuaded. My understanding is that the SuperCluster people picked the 2.3.12 version as a starting point as that was the most recent with the most liberal license (i.e. others could fork development from it). I've CC'd this to the SuperCluster folks so they can comment and correct. - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU QwJlxOBwfLiUT7Y543RwiIY= =xTbA -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 10 17:44:17 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 11 Feb 2004 09:44:17 +1100 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: References: Message-ID: <200402110944.21802.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 10 Feb 2004 07:10 pm, John Hearns wrote: > During a period of vulnerability (can;t remember the exact reason) > the Broadsword had to reboot its ageing fire control computer. > I think build by Ferranti. (No slur intended on their fine engineers, > but the thing was old at the time). I'm not aware of that one, but on a similar vein there was the widespread failure of the Patriot systems during the first Gulf War, including the attack on the barracks at Dhahran where 28 were killed. This was caused by the system truncating the values of the clock when written to memory, which over a long period of operation resulted in the system dismissing incoming missiles as false alarms. http://shelley.toich.net/projects/CS201/patriot.html However, this is starting to sound more like the RISKS digest than Beowulf, so I'll leave it there. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKV7EO2KABBYQAh8RApXjAJ9Gil07Z/XekN3XDSturEu2KihedQCfXBA7 aUUMVqTZuHfQ5RKsKGwnuNw= =+9RK -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Tue Feb 10 18:52:41 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Tue, 10 Feb 2004 15:52:41 -0800 Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>; from andrewxwang@yahoo.com.tw on Tue, Feb 10, 2004 at 11:42:32AM +0800 References: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> Message-ID: <20040210155241.A29026@fileserver.internal.keyresearch.com> On Tue, Feb 10, 2004 at 11:42:32AM +0800, Andrew Wang wrote: > Again, no matter how pretty the benchmarks results > look, in the end we still need to run on the real > system. So, what's the point of having benchmarks? There isn't much point at staring at a benchmark that isn't at all relevant to how you're using the system -- for example, a SPECcpu score with the Intel compiler in 32-bit mode isn't going to tell you much about an AMD64 app in 64-bit mode. If I remember correctly, a guy at Intel published a paper about a feedback optimization technique related to irregular strides that got a 22% improvement in mcf. When I get back to the office in a couple of days, I'll post a reference. And no, it's not at all Intel-specific. -- greg (posting from Paris. I should be asleep!) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 10 19:32:47 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 11 Feb 2004 11:32:47 +1100 Subject: Fwd: Re: [Beowulf] Gentoo for Science and Engineering Message-ID: <200402111132.49119.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Forwarded at the request of SuperCluster.org - ---------- Forwarded Message ---------- Subject: Re: [Beowulf] Gentoo for Science and Engineering Date: Wed, 11 Feb 2004 11:55 am From: help at supercluster.org To: Chris Samuel Cc: beowulf at beowulf.org Chris, Thanks for the cc. You will probably need to forward this message to beowulf as I don't think we are registered. OpenPBS 2.3.12 was selected because its license did allow anyone to modify/distribute the code for any reason with the only conditions being that the license be included and the original creators acknowledged. To our understanding, changing the license can only be done by the current license holders, ie Altair. The good news is that they are currently considering this as a possibility although we do not know which way they are leaning. As far the Cluster Resources/Supercluster is concerned, our plans are to continue to contribute to this project, developing infrastructure changes as needed, adding scalability, security, usability, and functionality enhancements, and rolling in community patches and enhancements with no intention of creating a commercial/closed product out of it. Let us know if we can be of further assistance. Thanks, Supercluster Development Group On Wed, 11 Feb 2004, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote: > > Thanks for the clarification. Does anybody know whether Torque is > > considered to be conforming to the Open Source Definition[1]? > > > > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software > > License', which seems to prohibit commercial distribution, making it > > non-free unfortunately. Is there some other fork of PBS with a true Open > > Source license perhaps? > > My understanding is that they cannot alter the license as they have > inherited that from the original OpenPBS sources, and as they do not hold > all the copyrights to the code it cannot be changed unless Altair can be > persuaded. > > My understanding is that the SuperCluster people picked the 2.3.12 version > as a starting point as that was the most recent with the most liberal > license (i.e. others could fork development from it). > > I've CC'd this to the SuperCluster folks so they can comment and correct. > > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU > QwJlxOBwfLiUT7Y543RwiIY= > =xTbA > -----END PGP SIGNATURE----- - -- - -------------------------------------------------------- Supercluster Development Group Scheduling and Resource Management of Clusters and Grids Maui Home Page - http://supercluster.org/maui Silver Home Page - http://supercluster.org/silver Documentation - http://supercluster.org/documentation - ------------------------------------------------------- - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKXgvO2KABBYQAh8RAok7AKCABbnmwiYvRf4BxeFoY+Jp9F/W1gCfReKD dKc1islXxQLdTrabQglX1MU= =xfyh -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From yduan at albert.chem.udel.edu Tue Feb 10 10:37:49 2004 From: yduan at albert.chem.udel.edu (Dr. Yong Duan) Date: Tue, 10 Feb 2004 10:37:49 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> Message-ID: On Tue, 10 Feb 2004, [big5] Andrew Wang wrote: > Again, no matter how pretty the benchmarks results > look, in the end we still need to run on the real > system. So, what's the point of having benchmarks? > > Andrew. > A guidelines, I guess. A lot of CPUs (including some rather expensive ones and often call them HPC CPUs) perform at less than half the speed of consumer grade CPUs. You'd definitely avoid those, for instance. Also, you can look at the performance in each area and figure out the relative performance expected to your own code. In the end, the most reliable benchmark is always on your own code, of course. Whether Intel compiler has been tuned for SPEC2K is probably an open question. I tried various compilers on our code and found it is also tuned for it :), consistently 10-20% faster than others. This included performance on Opterons, strangely enough. yong _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From help at supercluster.org Tue Feb 10 19:55:22 2004 From: help at supercluster.org (help at supercluster.org) Date: Tue, 10 Feb 2004 17:55:22 -0700 (MST) Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <200402111000.08919.csamuel@vpac.org> Message-ID: Chris, Thanks for the cc. You will probably need to forward this message to beowulf as I don't think we are registered. OpenPBS 2.3.12 was selected because its license did allow anyone to modify/distribute the code for any reason with the only conditions being that the license be included and the original creators acknowledged. To our understanding, changing the license can only be done by the current license holders, ie Altair. The good news is that they are currently considering this as a possibility although we do not know which way they are leaning. As far the Cluster Resources/Supercluster is concerned, our plans are to continue to contribute to this project, developing infrastructure changes as needed, adding scalability, security, usability, and functionality enhancements, and rolling in community patches and enhancements with no intention of creating a commercial/closed product out of it. Let us know if we can be of further assistance. Thanks, Supercluster Development Group On Wed, 11 Feb 2004, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote: > > > Thanks for the clarification. Does anybody know whether Torque is > > considered to be conforming to the Open Source Definition[1]? > > > > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software > > License', which seems to prohibit commercial distribution, making it > > non-free unfortunately. Is there some other fork of PBS with a true Open > > Source license perhaps? > > My understanding is that they cannot alter the license as they have inherited > that from the original OpenPBS sources, and as they do not hold all the > copyrights to the code it cannot be changed unless Altair can be persuaded. > > My understanding is that the SuperCluster people picked the 2.3.12 version as > a starting point as that was the most recent with the most liberal license > (i.e. others could fork development from it). > > I've CC'd this to the SuperCluster folks so they can comment and correct. > > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU > QwJlxOBwfLiUT7Y543RwiIY= > =xTbA > -----END PGP SIGNATURE----- > -- -------------------------------------------------------- Supercluster Development Group Scheduling and Resource Management of Clusters and Grids Maui Home Page - http://supercluster.org/maui Silver Home Page - http://supercluster.org/silver Documentation - http://supercluster.org/documentation _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bwegner at ekt.tu-darmstadt.de Wed Feb 11 05:02:23 2004 From: bwegner at ekt.tu-darmstadt.de (Bernhard Wegner) Date: Wed, 11 Feb 2004 11:02:23 +0100 Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages Message-ID: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> Hello, I have a really small "cluster" of 4 PC's which are connected by a normal Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board I thought I might be able to improve performance by connecting the machines via a Gigabit switch (which are really cheap nowadays). Everything seemed to work fine. The switch indicates 1000Mbit connections to the PC's and transfer rate for scp-ing large files is significantly higher now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than with the 100 Mbit switch. I wasn't able to actually track down the problem, but it seems that there is a problem with small messages. When I run the performance test provided with mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 byte message length, while for larger messages everything looks fine (linear dependancy of transfer time on message length, everything below 300 us). I have also tried mpich2 which shows exactly the same behavior. Does anyone have any idea? Here are the details of my system: - Suse Linux 9.0 (kernel 2.4.21) - mpich-1.2.5.2 - motherboard ASUS P4P800 - LAN (10/100/1000) on board (3COM 3C940 chipset) - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M + 8x88E1111-BAB, AT89C2051-24PI) -- Mit besten Gr??en -- Best regards, Bernhard Wegner _______________________________________________________ ======================================================= Dipl.-Ing. Bernhard Wegner Fachgebiet Energie- und Kraftwerkstechnik Technische Universit?t Darmstadt Petersenstr. 30 64287 Darmstadt Germany phone: +49-6151-162357 fax: +49-6151-166555 e-mail: bwegner at ekt.tu-darmstadt.de _______________________________________________________ ======================================================= _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From moloned at tcd.ie Wed Feb 11 12:44:59 2004 From: moloned at tcd.ie (david moloney) Date: Wed, 11 Feb 2004 17:44:59 +0000 Subject: [Beowulf] Profiling floating-point performance Message-ID: <402A6A1B.2070805@tcd.ie> I have an application written in C++ which compiles under both MSVC++ 6.0 and gcc 2.9.6 that I would like to profile in terms of floating point performance. My special requirement is that I would like not only peak and average flops numbers but also I would like a histogram of the actual x86 floating point instructions executed and their contribution to those peak and average flops numbers. Can anybody offer advice on how to do this? I tried using Vtune but it didn't seem to have this feature. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 11:44:10 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 11:44:10 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: Message-ID: On Tue, 10 Feb 2004, Dr. Yong Duan wrote: > > On Tue, 10 Feb 2004, [big5] Andrew Wang wrote: > > > Again, no matter how pretty the benchmarks results > > look, in the end we still need to run on the real > > system. So, what's the point of having benchmarks? > > > > Andrew. > > > > A guidelines, I guess. A lot of CPUs (including some rather expensive > ones and often call them HPC CPUs) perform at less than half the speed of > consumer grade CPUs. You'd definitely avoid those, for instance. > Also, you can look at the performance in each area and figure out the > relative performance expected to your own code. In the end, the most > reliable benchmark is always on your own code, of course. A short article this morning, as I'm debugging code and somewhat busy. Before discussing benchmarks in general, one needs to make certain distinctions. There are really two kinds of benchmarks. Maybe even three. Hell, more, but I'm talking broad categories. Let's try these three: * microbenchmarks * comparative benchmarks * application benchmarks Microbenchmarks measure very specific, highly targeted areas of system functionality. By their very nature they are "simple", not complex -- often the pseudocode is as simple as start_timer(); loop lotsatimes{ do_something_simple = dumb*operation; } stop_timer(); compute_speed(); print_result(); (To compute "how fast a multiply occurs"). Simple can also describe atomicity -- benchmarking "a single operation" where the operation might be complex but is a standard unitary building block of complex code. Microbenchmarks are undeniably not only useful, they are essential to anyone who takes systems/cluster/programming engineering seriously. Examples of microbenchmark suites that are in more or less common use are: lmbench (very full featured suite; one infamous user: Linux Torvalds:-) stream (very commonly cited on the list) cpu_rate (not so common -- wraps e.g. stream and other tests so variations with vector size can be explored) rand_rate (almost unknown, but it DOES benchmark all the gsl rands:-) netpipes (measure network speeds) netperf (ditto, but alas no longer maintained) I (and many others) USE these tools (I wrote two of them SO I could use them) to study systems that we are thinking of buying and using for a cluster, to study the kernel and see if the latest change made some critical operation faster or slower, to figure out if the NIC/switch combo we are using is why PVM code is moving like molasses. They are LESS commonly poached by vendors, fortunately - Larry Macvoy has lmbench bristling with anti-vendor-cooking requirements at the license level. The benchmarks are simple, but because one needs a lot of them to get an accurate picture of overall performance they tend to be too complex for typical mindless consumers... Comparative benchmarks are what I think you're really referring to. They aren't completely useless, but they do often become pissing contests (such as the top 500 list) and there are famous stories of Evil by corporations seeking to cook up good results on one or another (sometimes at the expense of overall system balance and performance!). Most of the Evil in these benchmarks arise because people end up using them as a naive basis for purchase decisions. "Ooo, that system has a linpork of 4 Gigacowflops so it must be better than that one which only gets 2.7 Gcf, so I'll buy 250 of them for my next cluster and be able to brag about my 1 Teracowflop supercomputer and make the top third of the top 500 list, which will impress my granting agencies and tenure board, who are just as ignorant as I am about meaningful measures of systems performance..." Never mind that your application is totally non-linpack-like, that the bus performance on the systems you got sucks, and that the 2.7 Gcf systems you rejected cost 1/2 the 4 Gcf systems you got so you could have had 500 at 2.7 Gcf for a net of 1.35 Tcf and balanced memory and bus performance (and run your application faster per dollar) if you'd bothered to do a cost benefit analysis. The bleed of dollars attracts the vendor sharks, who often can rattle off the aggregate specmarks and so forth for their most expensive boxes. However, they CAN be actually useful, if one of the tests in the SPEC suite happens to correspond to your application, if you bother to read all the component results in the SPECmarks, if you bother to check the compiler used and flags and system architecture in some detail to see if they appear cooked (hand tuned or optimized, based on a compiler that is lovely but very expensive and has to be factored into your CBA). Finally, there are application benchmarks. These tend to be "atomic" but at a very high level (an application is generally very complex). These are also subject to the Evil of comparative benchmarks (in fact some of comparative benchmark suites, especially in the WinX world, are a collection of application benchmarks). They also have some evil of their own when the application in question is commercial and not open source -- you have effectively no control over how it was built and tuned for your architecture, for example, and may not even have meaningful version information. However, they are also undeniably useful. Especially when the application being benchmarked is YOUR application and under your complete control. So the answer to your question appears to be: * Microbenchmarks berry berry good. Useful. Essential. Fundamental. * Comparative benchmarks sometimes good. Sometimes a force for Evil. * Application benchmark ideal if it is your application or very similar and under your control. Pissing contests in general are not useful, and even a useful higher level benchmark divorced from an associated CBA is like shopping in a store that has no price tags -- a thing of use only to those so rich that they don't have to ask. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Wed Feb 11 13:55:01 2004 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Wed, 11 Feb 2004 12:55:01 -0600 Subject: [Beowulf] how are people doing this? Message-ID: <20040211185501.GA31590@mikee.ath.cx> I feel that in a proper cluster that the nodes are all (basically) identical. I 'own' a server environment of 20+ servers that are all dedicated to specific applications and this is not a cluster. However, I would like to manage config files (/etc/resolv.conf, etc), user accounts, patches, etc., as I would in a clustered environment. I have read the papers at infrastructures.org and agree with the principles mentioned there. I have looked extensively at cfengine, though I prefer the solution be in PERL as all my servers have PERL already (the manufacturer installs PERL as default on the boxes). How is everyone managing their cluster or what are suggestions on how I can manage my server environment. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 13:08:41 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 13:08:41 -0500 (EST) Subject: [Beowulf] Profiling floating-point performance In-Reply-To: <402A6A1B.2070805@tcd.ie> Message-ID: On Wed, 11 Feb 2004, david moloney wrote: > I have an application written in C++ which compiles under both MSVC++ > 6.0 and gcc 2.9.6 that I would like to profile in terms of floating > point performance. > > My special requirement is that I would like not only peak and average > flops numbers but also I would like a histogram of the actual x86 > floating point instructions executed and their contribution to those > peak and average flops numbers. > > Can anybody offer advice on how to do this? I tried using Vtune but it > didn't seem to have this feature. I'm not sure how accurate it is overall, but see "man gprof" and compile with the -g -p flag. This will give you at least some useful times and so forth. It will NOT give you (AFAIR) "histogram of actual x86 floats etc". I don't know of anything that will -- to get them you have to instrument your code, probably so horribly that a la heisenberg your measurements would bear little resemblance to actual performance (especially if your code wants to be doing all sorts of smooth vector things in cache and register memory and you keep calling instrumentation subroutines to try to measure times that wreck state). Consider that with my best, on CPU, raw assembler based timing clock (using the onboard cycle counter) I still find the overhead of reading that clock to be in the tens of clock cycles. To microtime a single multiply is thus all but impossible -- the clock itself takes 10-40 times as long to execute as a multiply might take, depending on where the data to be multiplied is when one starts. So timing per-instruction is effectively out. Similarly, to instrument and count floating point operations requires something to "watch the machine instructions" as they stream through the CPU. Unfortunately, the only thing available to watch the instructions is the CPU itself, so you have to damn near write an assembler-interpreter to instrument this. Which in turn would be slow as molasses -- an easy 10x slower than the native code in overhead alone plus it would utterly wreck just about every code optimization known to man. Finally, there is the question of "what's a flop". The answer is, not much that's useful or consistent -- the number of floating point operations that a system does per second varies wildly depending in a complex way on system state, cache locality, whether the variable is general or register, whether the instruction is part of a complex/advanced instruction (e.g. add/multiply) or an instruction that has to be done partly in software (divide), whether or not the instruction is part of a stream of vectorized instructions, and more. That's why microbenchmarks are useful. You may not be able to extract meaningful results from your code with a simple tool (although it isn't terribly difficult to instrument major blocks or subroutines with timers and counters, which is more or less with -p and gprof do) but you can learn at least some things about how your system executes core operations in various contexts to learn how to optimize one's code with a good microbenchmark. Just sweeping stream across vector sizes from 1 to 10^8 or so teaches you a whole lot about the system's performance in different contexts, as does doing a stream-like benchmark but working through the vector in a random order (i.e. deliberately defeating any sort of vector optimization and cache benefit). Good luck, rgb > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 13:58:39 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 13:58:39 -0500 (EST) Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211185501.GA31590@mikee.ath.cx> Message-ID: On Wed, 11 Feb 2004, Mike Eggleston wrote: > I feel that in a proper cluster that the nodes are all (basically) > identical. I 'own' a server environment of 20+ servers that are > all dedicated to specific applications and this is not a cluster. > However, I would like to manage config files (/etc/resolv.conf, etc), > user accounts, patches, etc., as I would in a clustered environment. > I have read the papers at infrastructures.org and agree with the > principles mentioned there. I have looked extensively at cfengine, > though I prefer the solution be in PERL as all my servers have > PERL already (the manufacturer installs PERL as default on the boxes). > > How is everyone managing their cluster or what are suggestions > on how I can manage my server environment. Mike, this is nearly a FAQ -- the list archives should have a discussion (one of many) only a few weeks old on this very subject. There are NIS, LDAP, rsync, cfengine, and even yum/rpm-based solutions possible, and more. Oh, and dhcp actually pushes lots of stuff out all by itself these days -- it should handle the stuff in resolv.conf for example, and you should be using dhcp anyway for scalability reasons. rgb > > Mike > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Wed Feb 11 14:44:02 2004 From: bclem at rice.edu (Brent M. Clements) Date: Wed, 11 Feb 2004 13:44:02 -0600 (CST) Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211185501.GA31590@mikee.ath.cx> References: <20040211185501.GA31590@mikee.ath.cx> Message-ID: Mike, we use systemimager, systemconfigurator and a custom utility called "cupdate" to maintain our clusters. In our case it works beautifully and easilly. -Brent Brent Clements Linux Technology Specialist Information Technology Rice University On Wed, 11 Feb 2004, Mike Eggleston wrote: > I feel that in a proper cluster that the nodes are all (basically) > identical. I 'own' a server environment of 20+ servers that are > all dedicated to specific applications and this is not a cluster. > However, I would like to manage config files (/etc/resolv.conf, etc), > user accounts, patches, etc., as I would in a clustered environment. > I have read the papers at infrastructures.org and agree with the > principles mentioned there. I have looked extensively at cfengine, > though I prefer the solution be in PERL as all my servers have > PERL already (the manufacturer installs PERL as default on the boxes). > > How is everyone managing their cluster or what are suggestions > on how I can manage my server environment. > > Mike > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scyld at jasons.us Wed Feb 11 13:01:21 2004 From: scyld at jasons.us (scyld at jasons.us) Date: Wed, 11 Feb 2004 13:01:21 -0500 (EST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> References: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> Message-ID: <20040211125741.A5961@torgo.bigbroncos.org> On Wed, 11 Feb 2004, Bernhard Wegner wrote: > Hello, > > I have a really small "cluster" of 4 PC's which are connected by a normal > Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > I thought I might be able to improve performance by connecting the machines > via a Gigabit switch (which are really cheap nowadays). > > Everything seemed to work fine. The switch indicates 1000Mbit connections to > the PC's and transfer rate for scp-ing large files is significantly higher > now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > with the 100 Mbit switch. Have you tried setting the speed and duplex of the gig NICs to 1000/full on both the system side and switch side? I've found that autonegotiate rarely does especially with 3com gear. I'm guessing, based on its size, your switch isn't managed so you may have to stick to locking it on the systems and watching the behavior to see if the switch gets the negotiation right. (if traffic is bursty you have a speed mismatch and if you get loads of errors it's more likely to be duplex problem) FWIW I have the same mobo at home but haven't hooked it to gigabit yet so I'm quite curious to see how this works out. -Jason ----- Jason K. Schechner - check out www.cauce.org and help ban spam-mail. "All HELL would break loose if time got hacked." - Bill Kearney 02-04-03 ---There is no TRUTH. There is no REALITY. There is no CONSISTENCY.--- ---There are no ABSOLUTE STATEMENTS I'm very probably wrong.--- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Wed Feb 11 15:00:15 2004 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Wed, 11 Feb 2004 14:00:15 -0600 Subject: [Beowulf] how are people doing this? In-Reply-To: References: <20040211185501.GA31590@mikee.ath.cx> Message-ID: <20040211200015.GE31590@mikee.ath.cx> On Wed, 11 Feb 2004, Robert G. Brown wrote: > On Wed, 11 Feb 2004, Mike Eggleston wrote: > > > I feel that in a proper cluster that the nodes are all (basically) > > identical. I 'own' a server environment of 20+ servers that are > > all dedicated to specific applications and this is not a cluster. > > However, I would like to manage config files (/etc/resolv.conf, etc), > > user accounts, patches, etc., as I would in a clustered environment. > > I have read the papers at infrastructures.org and agree with the > > principles mentioned there. I have looked extensively at cfengine, > > though I prefer the solution be in PERL as all my servers have > > PERL already (the manufacturer installs PERL as default on the boxes). > > > > How is everyone managing their cluster or what are suggestions > > on how I can manage my server environment. > > Mike, this is nearly a FAQ -- the list archives should have a discussion > (one of many) only a few weeks old on this very subject. > > There are NIS, LDAP, rsync, cfengine, and even yum/rpm-based solutions > possible, and more. Oh, and dhcp actually pushes lots of stuff out all > by itself these days -- it should handle the stuff in resolv.conf for > example, and you should be using dhcp anyway for scalability reasons. I know it's been discussed and I apologize for asking it again. I've just not found the way that seems to fit with the picture I'm trying to reach. What I'm thinking of doing is writing a perl script that can be placed into CVS. On each server a cron process checks out the current CVS repository of server (AIX) config data and script. Then the perl script starts to check permissions, update resolv.conf, hosts, login, passwd, etc., and to check that specific packages are installed or that the packages need updating. I like a lot of what cfengine did, but I really want a script that can be maintained in CVS. For installing packages I plan for the script to mount an NFS export for pulling the packages. # mkdir /tmp/nfs.$$ # mount admin:/opt/packages /tmp/nfs.$$ # installp -d /tmp/nfs.$$ package # umount /tmp/nfs.$$ # rmdir /tmp/nfs.$$ For the account management I'm thinking of something on my admin server that pulls LDAP (M$ ADS) at some frequency (30-60 min) updating a local file with new users and their passwords. Then this file is checked into CVS for distribution to other nodes/servers. Using another file to list the users that are authorized access to the local node/server keeps my user-space to a minimum. Is that any more clear what I'm trying to do? I don't have a cluster, but I want to manage all nodes as identically as I can. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 14:35:13 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 14:35:13 -0500 (EST) Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211200015.GE31590@mikee.ath.cx> Message-ID: On Wed, 11 Feb 2004, Mike Eggleston wrote: > I know it's been discussed and I apologize for asking it again. I've > just not found the way that seems to fit with the picture I'm trying > to reach. What I'm thinking of doing is writing a perl script that > can be placed into CVS. On each server a cron process checks out the > current CVS repository of server (AIX) config data and script. Then > the perl script starts to check permissions, update resolv.conf, hosts, > login, passwd, etc., and to check that specific packages are installed > or that the packages need updating. I like a lot of what cfengine > did, but I really want a script that can be maintained in CVS. You might look into yum. You'd have to learn python, but yum already does most of what you want for rpm packages and could likely be hacked. In fact, yum would do what you want for all the config files if you roll them into an rpm package right now -- it already has precisely what it needs to install and update according to a revision number. You can run yum update as often as you wish. It will run from NFS and can be secured a variety of ways. rgb > > For installing packages I plan for the script to mount an NFS export > for pulling the packages. > > # mkdir /tmp/nfs.$$ > # mount admin:/opt/packages /tmp/nfs.$$ > # installp -d /tmp/nfs.$$ package > # umount /tmp/nfs.$$ > # rmdir /tmp/nfs.$$ > > For the account management I'm thinking of something on my admin > server that pulls LDAP (M$ ADS) at some frequency (30-60 min) updating > a local file with new users and their passwords. Then this file > is checked into CVS for distribution to other nodes/servers. Using > another file to list the users that are authorized access to the > local node/server keeps my user-space to a minimum. > > Is that any more clear what I'm trying to do? I don't have a cluster, > but I want to manage all nodes as identically as I can. > > Mike > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From canon at nersc.gov Wed Feb 11 17:05:26 2004 From: canon at nersc.gov (canon at nersc.gov) Date: Wed, 11 Feb 2004 14:05:26 -0800 Subject: [Beowulf] Profiling floating-point performance In-Reply-To: Message from david moloney of "Wed, 11 Feb 2004 17:44:59 GMT." <402A6A1B.2070805@tcd.ie> Message-ID: <200402112205.i1BM5QwA011397@pookie.nersc.gov> David, You may want to look into PAPI and perfctr. It allows you query the performance counters built into most processors. --Shane ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 PSDF Project Lead fax: 510-486-7520 National Energy Research Scientific Computing Center 1 Cyclotron Road Mailstop 943-256 Berkeley, CA 94720 canon at nersc.gov ------------------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 17:13:32 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 17:13:32 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4C0CD@orsmsx402.jf.intel.com> Message-ID: On Wed, 11 Feb 2004, Lombard, David N wrote: > > They also have some evil of > > their own when the application in question is commercial and not open > > source -- you have effectively no control over how it was built and > > tuned for your architecture, for example, and may not even have > > meaningful version information. > > Let's be fair here. An ISV application is not the definition of evil. I did not mean to imply that they were wholly evil or even evil in intent. > Clearly, "you have effectively no control over how an application was > built and tuned for your architecture" has no direct correspondence to > performance. I would have to respectfully and vehemently disagree. It has all sorts of direct correspondances. Let us make a short tally of ways that a closed source, binary only application used as a benchmark can mislead me with regard to the performance of a system. * I don't control the compiler choice. Your compiler and mine might result in me getting a very different performance even if your application "resembles" mine (AFAICT given that I cannot read the source). * I don't control the libraries. Your application is (probably) static linked in various places and might even use private libraries that are hand-optimized. My application would likely be linked dynamically with completely different libraries. Your libraries might be out of date. My libraries might be out of date. * I don't have any way of knowing whether your "canned" (say) Monte Carlo benchmark is relevant to my Monte Carlo application. Maybe your code is structured to be strictly vectorized and local, but mine requires random site access. Yours might be CPU bound. Mine might be memory bound. Since I can't see the source, I'll never know. * I have to pay money for the application to use as a benchmark before I even look at hardware. If I'm an honest soul, I probably have to buy a separate license for every platform I plan to test even before I buy the test platform OR run afoul of the Dumb Mutha Copyright Act (aka known as the "Intellectual Straightjacket Act"). Or maybe I can rely on vendor reports of the results. This adds costs to the engineering process. * Even leaving side the additional costs, there is the issue of whether the application I'm using is tuned for the hardware I'm running on. strict i386 code will not run as fast as strict i586 code will not run as fast as i686 code will not run optimally on an Athlon will not run optimally on an Opteron. Yet the Opteron will likely RUN i386 code. I just won't know whether the result is at all relevant to how the Opteron runs Opteron code. (These effects are not necessarily small.) * And if I thought about it hard, I could likely come up with a few more negatives...such as the entire raft of reasons that closed source software is a Bad Thing to encourage on general principles. The principles built right into the original beowulf mission statement (which IIRC has a very clear open source requirement for engineering reasons). The point being that while closed source commercial applications don't necessarily make "evil" benchmarks in the sense that there is any intent to hide or alter performance characteristics of a given architecture, they add a number of sources of noise to an already arcane and uncertain process. They are less reliable, more likely to mislead you (quite possibly through nobody's fault or intention), less likely to accurately predict the performance of the architecture on your application suite. And they are ultimately black boxes that you have to pay people to use. I personally am a strong proponent (in case you can't tell:-) of open source (ideally GPL) software and tools, ESPECIALLY for benchmarking. I even tried to talk Larry McVoy into GPL-ing lmbench back when it had a fairly curmudgeonly license, even though it the source itself was open enough. Note, BTW, that all of the observations above are irrelevant if the application being used as a benchmark is the application you intend to use in the form you intend to use it, purchased or not. As in: > > However, they are also undeniably useful. Especially when the > > application being benchmarked is YOUR application and under your > > complete control. > > Regardless of ownership or control, they're especially useful when > you're looking at an application being used in the way you intend on > using it. Many industrial users buy systems to run a specific list of > ISV applications. In this instance, the application benchmark can be > the most valid benchmark, as it can model the system in the way it will > be used -- and that's the most important issue. Sure. Absolutely. I'd even say that your application(s) is(are) ALWAYS the best benchmark for many or even most purposes, with the minor caveat that the microbenchmarks have a slightly different purpose and are best for the purpose for which they are intended. I doubt that Linus runs a scripted set of userspace Gnome applications to test the performance of kernel subsystems... > I'm not disagreeing with your message. I too try to make sure that > people use the right benchmarks for the right purpose; I've seen way too > many people jump to absurd conclusions based on a single data point or > completely unrelated information. I'm just trying to sharpen your > message by pointing out some too broad brush strokes... > > Well, maybe I don't put as much faith in micro benchmarks unless in the > hands of a skilled interpreter, such as yourself. My preference is for > whatever benchmarks most closely describe your use of the system. Microbenchmarks are not intended to be predictors of performance in macro-applications, although a suite of results such as lmbench can give an expert a surprisingly accurate idea of what to expect there. They are more to help you understand systems performance in certain atomic operations that are important components of many applications. A networking benchmark can easily reveal problems with your network, for example, that help you understand why this application which ran just peachy keen at one scale as a "benchmark" suddenly turns into a pig at another scale. A good CPU/memory benchmark can do the same thing wrt the memory subsystem. This is yet another major problem with an naive application benchmark or comparative benchmark (and even with microbenchmarks) -- they are OFTEN run at a single scale or with a single set of parameters. On system A, that scale might be one that lets the application remain L2-local. On system B it might not be. You might then conclude that B is much slower. On the scale that you intend to run it, both might be L2-local or both might be running out of memory. B might have a faster processor, or a better overall balance of performance and might actually be faster at that scale. I don't put much faith in benchmarks, period. With the exception of your application(s), of course. Faith isn't the point -- they are just rulers, stopwatches, measuring tools. Some of them measure "leagues per candle", or "furlongs per semester" and aren't terribly useful. Others are just what you need to make sense of a system. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Wed Feb 11 16:26:44 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed, 11 Feb 2004 22:26:44 +0100 Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211200015.GE31590@mikee.ath.cx> (Mike Eggleston's message of "Wed, 11 Feb 2004 14:00:15 -0600") References: <20040211185501.GA31590@mikee.ath.cx> <20040211200015.GE31590@mikee.ath.cx> Message-ID: Mike Eggleston writes: > I know it's been discussed and I apologize for asking it again. I've > just not found the way that seems to fit with the picture I'm trying > to reach. What I'm thinking of doing is writing a perl script that > can be placed into CVS. On each server a cron process checks out the > current CVS repository of server (AIX) config data and script. Then > the perl script starts to check permissions, update resolv.conf, hosts, > login, passwd, etc., and to check that specific packages are installed > or that the packages need updating. I like a lot of what cfengine > did, but I really want a script that can be maintained in CVS. Well, if it comes to that, surely you can place cfengine's configuration files in CVS and let cron run a script that updates the config files from CVS and then launches cfengine? You don't have to run cfd, you know; you can start cfengine any way you want. I'd really think twice before starting to re-implement cfengine's existing functionality. cfengine helped me keep my sanity in an earlier life while single-handedly adminning a heterogenous Unix environment ranging from SunOS 4.1.3_u1 through Solaris 7, diverse Tru64:s and a hodge-podge of Linuxen. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Wed Feb 11 16:58:48 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Wed, 11 Feb 2004 13:58:48 -0800 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4C0CD@orsmsx402.jf.intel.com> From: Robert G. Brown; Wednesday, February 11, 2004 8:44 AM [deletia] > > Finally, there are application benchmarks. These tend to be "atomic" > but at a very high level (an application is generally very complex). > These are also subject to the Evil of comparative benchmarks (in fact > some of comparative benchmark suites, especially in the WinX world, are > a collection of application benchmarks). True. I cringe to think how many systems were bought for scientific and technical computations based on UT2003 "benchmarks". > They also have some evil of > their own when the application in question is commercial and not open > source -- you have effectively no control over how it was built and > tuned for your architecture, for example, and may not even have > meaningful version information. Let's be fair here. An ISV application is not the definition of evil. Clearly, "you have effectively no control over how an application was built and tuned for your architecture" has no direct correspondence to performance. Having been on the ISV side of the fence, and spent a tremendous amount of energy making sure that each port of the application performed as well as it could, I'm quite confident in saying we generally succeeded in maximizing performance. Realize that we had day after day to spend on performance, usually with the attention of one or more experts from the platform vendor at our beck and call -- and those experts would spend even more time on even more narrow aspects of performance. Having said that, there are some notable ISV applications that simply do not perform as well as they should. This can occur for a host of reasons, such as they, did not care, didn't know how, could/would not to make the effort, didn't have the time, were ignored by the vendor, &etc -- basically the very same reasons that some people who don't work for ISVs fail to make their own applications perform as well as they could. > However, they are also undeniably useful. Especially when the > application being benchmarked is YOUR application and under your > complete control. Regardless of ownership or control, they're especially useful when you're looking at an application being used in the way you intend on using it. Many industrial users buy systems to run a specific list of ISV applications. In this instance, the application benchmark can be the most valid benchmark, as it can model the system in the way it will be used -- and that's the most important issue. I'm not disagreeing with your message. I too try to make sure that people use the right benchmarks for the right purpose; I've seen way too many people jump to absurd conclusions based on a single data point or completely unrelated information. I'm just trying to sharpen your message by pointing out some too broad brush strokes... Well, maybe I don't put as much faith in micro benchmarks unless in the hands of a skilled interpreter, such as yourself. My preference is for whatever benchmarks most closely describe your use of the system. -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Wed Feb 11 18:16:22 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Wed, 11 Feb 2004 15:16:22 -0800 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4C0D2@orsmsx402.jf.intel.com> From: Robert G. Brown; Wednesday, February 11, 2004 2:14 PM > On Wed, 11 Feb 2004, Lombard, David N wrote: > > > > They also have some evil of > > > their own when the application in question is commercial and not open > > > source -- you have effectively no control over how it was built and > > > tuned for your architecture, for example, and may not even have > > > meaningful version information. > > > > Let's be fair here. An ISV application is not the definition of evil. > > I did not mean to imply that they were wholly evil or even evil in > intent. > > > Clearly, "you have effectively no control over how an application was > > built and tuned for your architecture" has no direct correspondence to > > performance. > > I would have to respectfully and vehemently disagree. It has all sorts > of direct correspondances. Let us make a short tally of ways that a > closed source, binary only application used as a benchmark can mislead > me with regard to the performance of a system. > [deletia] > > Note, BTW, that all of the observations above are irrelevant if the > application being used as a benchmark is the application you intend to > use in the form you intend to use it, purchased or not. OK. So there's our difference. I only consider an application benchmark useful in this scenario. I can't imagine using an application benchmark of any sort if it isn't; you enumerated all the reasons for this in the bits I just snipped. We agree completely on this. -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Feb 11 18:07:18 2004 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 12 Feb 2004 10:07:18 +1100 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: References: Message-ID: <200402121007.30002.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 12 Feb 2004 09:13 am, Robert G. Brown wrote: > * Even leaving side the additional costs, there is the issue of > whether the application I'm using is tuned for the hardware I'm running > on. Such as ISV's including IA32 executables as part of their IA64 version. It wasn't all IA32, just bits. Very odd. We only spotted it when it failed to work on Rocks 3.1.0, which doesn't supply the IA32 compatability libraries (which Rocks 3.0.0 did). No, I'm not going to name names, but the "file" and "ldd" are your friends. cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKrWmO2KABBYQAh8RAu9JAJ41djUEj+6zEZYrY9IuPG4E9s9qugCeKhJd 2pf/pnDftPMs0zCLYb7IaRM= =t/c6 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 18:34:06 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 18:34:06 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <200402121007.30002.csamuel@vpac.org> Message-ID: On Thu, 12 Feb 2004, Chris Samuel wrote: > No, I'm not going to name names, but the "file" and "ldd" are your friends. ...and with that, I'm going to quit for the day and take my nameless friends out for a beer somewhere... (Sorry, revenge for the lies, damned lies...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Feb 11 17:23:12 2004 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 12 Feb 2004 09:23:12 +1100 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: References: Message-ID: <200402120923.19328.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 12 Feb 2004 03:44 am, Robert G. Brown wrote: > There are really two kinds of benchmarks. Maybe even > three. Lies, damn lies and statistics ? - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKqtTO2KABBYQAh8RAg+TAJ4uLkrC7zOUDlK8OYVxBuwKY/GXuQCeJFvj vd9nT5nkEuUY/3Myv0IROaU= =8pIh -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 18:32:28 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 18:32:28 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4C0D2@orsmsx402.jf.intel.com> Message-ID: On Wed, 11 Feb 2004, Lombard, David N wrote: > OK. So there's our difference. I only consider an application benchmark > useful in this scenario. I can't imagine using an application benchmark > of any sort if it isn't; you enumerated all the reasons for this in the > bits I just snipped. > > We agree completely on this. I figured that we did -- I'm getting verbose on it because I think it is an important issue to be precise on. "What's a FLOP?" is a perfectly reasonable question with a perfectly unintelligible and meaningless answer, in spite of it being cited again and again over decades to sell systems. At the same time, benchmarks are certainly useful. I think the confusion is probably my fault -- my age/history showing again. I can remember fairly clearly when awk was cited as a benchmark. Quake too, and not for people who were USING awk or necessarily going to play quake. This is what I meant by an "application benchmark" -- some sort of application that somebody thinks is a good measure of general systems performance and manage to get people to take seriously. Stuff like this is still fairly commonly used in many WinXX "benchmarks" that you'll see "published" both on the web and in real paper magazine articles. How fast can Excel update a spreadsheet that computes lunar orbital trajectories, that sort of thing. Sometimes they are almost a joke -- applications that do a lot of disk I/O (apparently, who knows) are used as a "disk performance benchmark". I won't even get started on this sort of thing and the number of variables left completely uncontrolled (for example, the disk caching subsystems both hardware and software) compared to, say, bonnie or lmbench. I also won't comment on just how much crap there is out there with stuff like this in it, sometimes from supposedly "reputable" testing companies that ought to know better or be more honest. That's why I "trust" GPL/Open microbenchmarks the most, because I can look at their sources, understand just what they are doing and how it compares to what I want to do, maybe even hack them if I need to because it isn't QUITE right, and get numbers with some meaning. Stuff like SPEC and linpack (where linpack should probably be considered micro) isn't horrible but (in the case of SPEC) isn't GPL or terribly straightforward to understand microscopically or macroscopically -- it takes experience to know how the profile it generates compares to features in your own code. Great for sales-speak, though -- "Our system gets 2301.124 specoloids/second, while THEIR system is a laughable 1721.564." Quake isn't a useful benchmark -- it is a game, and one that generally runs as fast as it needs to whereever it runs...but it is a GREAT benchmark for how a system plays quake:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Wed Feb 11 19:31:43 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed, 11 Feb 2004 19:31:43 -0500 (EST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> Message-ID: On Wed, 11 Feb 2004, Bernhard Wegner wrote: > Hello, > > I have a really small "cluster" of 4 PC's which are connected by a normal > Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > I thought I might be able to improve performance by connecting the machines > via a Gigabit switch (which are really cheap nowadays). > > Everything seemed to work fine. The switch indicates 1000Mbit connections to > the PC's and transfer rate for scp-ing large files is significantly higher > now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > with the 100 Mbit switch. > > I wasn't able to actually track down the problem, but it seems that there is > a problem with small messages. When I run the performance test provided with > mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > byte message length, while for larger messages everything looks fine (linear > dependancy of transfer time on message length, everything below 300 us). I > have also tried mpich2 which shows exactly the same behavior. > > Does anyone have any idea? First, I assume you were running the 100BT through the same onboard NICs and got reasonable performance. So some possible things: - the switch is a dog or it is broken - your cables may be old or bad (but worked fine for 100BT) - negotiation problem Some things to try: Use a cross over cable (cat5e) and see if you get the same problem. You might try using a lower level benchmark (of the micro variety) like netperf and netpipe. The Beowulf Performance Suite: http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 has these tests. Also, the December and January issues of ClusterWorld show how to test a network connection using netpipe. At some point this content will be showing up on the web-page. Also, the MPI Link-checker from Microway (www.microway.com) http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 May help. Doug > > Here are the details of my system: > - Suse Linux 9.0 (kernel 2.4.21) > - mpich-1.2.5.2 > - motherboard ASUS P4P800 > - LAN (10/100/1000) on board (3COM 3C940 chipset) > - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M + > 8x88E1111-BAB, AT89C2051-24PI) > > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Wed Feb 11 20:19:26 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 11 Feb 2004 20:19:26 -0500 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <200402121007.30002.csamuel@vpac.org> References: <200402121007.30002.csamuel@vpac.org> Message-ID: <1076548766.3950.91.camel@protein.scalableinformatics.com> On Wed, 2004-02-11 at 18:07, Chris Samuel wrote: > No, I'm not going to name names, but the "file" and "ldd" are your friends. ... and strace. Amazing how useful that one is. -- Joe Landman Scalable Informatics LLC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Thu Feb 12 03:44:12 2004 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Thu, 12 Feb 2004 09:44:12 +0100 Subject: [Beowulf] Profiling floating-point performance In-Reply-To: <402A6A1B.2070805@tcd.ie> References: <402A6A1B.2070805@tcd.ie> Message-ID: <200402120944.12719.joachim@ccrl-nece.de> david moloney: > Can anybody offer advice on how to do this? I tried using Vtune but it > didn't seem to have this feature. Try PAPI: http://icl.cs.utk.edu/papi/ It offers you all information that the CPU has to offer for this. It depends on you how to gather them. However, for an instruction-level histogramm, a simulator will probaby be more useful. And you should think about if you *really* need this - if the information you get is worth the effort. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Feb 12 03:31:21 2004 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 12 Feb 2004 09:31:21 +0100 (CET) Subject: [Beowulf] Profiling floating-point performance In-Reply-To: <402A6A1B.2070805@tcd.ie> Message-ID: On Wed, 11 Feb 2004, david moloney wrote: > > My special requirement is that I would like not only peak and average > flops numbers but also I would like a histogram of the actual x86 > floating point instructions executed and their contribution to those > peak and average flops numbers. > > Can anybody offer advice on how to do this? I tried using Vtune but it > didn't seem to have this feature. > Can't help directly, but you could look at Oprofile http://oprofile.sourceforge.net/about/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Feb 12 03:17:29 2004 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 12 Feb 2004 09:17:29 +0100 (CET) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> Message-ID: On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > Does anyone have any idea? > > Here are the details of my system: > - Suse Linux 9.0 (kernel 2.4.21) > - mpich-1.2.5.2 > - motherboard ASUS P4P800 > - LAN (10/100/1000) on board (3COM 3C940 chipset) > - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M + > 8x88E1111-BAB, AT89C2051-24PI) You might look at the P4_GLOBMEMSIZE parameter in the MPI job. export P4_GLOBMEMSIZE=20194344 (say) Try stepping through various values for this parameter, and run the Pallas benchmark. Let us know what the results are! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Feb 12 03:24:10 2004 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 12 Feb 2004 09:24:10 +0100 (CET) Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211185501.GA31590@mikee.ath.cx> Message-ID: On Wed, 11 Feb 2004, Mike Eggleston wrote: > I feel that in a proper cluster that the nodes are all (basically) > identical. I 'own' a server environment of 20+ servers that are > all dedicated to specific applications and this is not a cluster. > However, I would like to manage config files (/etc/resolv.conf, etc), > user accounts, patches, etc., as I would in a clustered environment. > I have read the papers at infrastructures.org and agree with the > principles mentioned there. I have looked extensively at cfengine, > though I prefer the solution be in PERL as all my servers have > PERL already (the manufacturer installs PERL as default on the boxes). Alternatives you might look at are: LCFG http://www.lcfg.org/ The European Datagrid people have the Quattor project http://quattor.org/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikhailberis at free.net.ph Thu Feb 12 05:37:54 2004 From: mikhailberis at free.net.ph (Dean Michael C. Berris) Date: 12 Feb 2004 18:37:54 +0800 Subject: [Beowulf] Master-Slave Problems Message-ID: <1076582124.5002.20.camel@mikhail> Good day everyone, I've just finished implementing and testing a master-slave prime number finder as a test problem for my thesis on heterogeneous cluster load balancing for parallel applications. Test results show anomalies which may be tied to work chunk size allocations to the slaves, but to test whether it will hold true for other applications and is not directly tied to the parallel prime number finder, I am in need of other problems that may be solved using a master-slave architecture. Sure it is easy to come up with just any problem and implement a solution in a master-slave model, but I'm looking for computationally intensive problems wherein the computation necessary for parts of the problem are not equal. What I mean by this is similar to the case of the parallel number finder, seeing whether 11 is prime requires less computation compared to seeing whether 9999991 is prime. Any insights or pointers to documentations or papers that have had similar problems are most welcome. TIA PS: Are ther any cluster admins there willing to spare some cycles and cluster time for a cluster needy BS Undergraduate student in the Philippines? :D -- Dean Michael C. Berris http://mikhailberis.blogspot.com mikhailberis at free.net.ph +63 919 8720686 GPG 08AE6EAC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From meetsunil80x86 at yahoo.co.in Thu Feb 12 06:58:41 2004 From: meetsunil80x86 at yahoo.co.in (=?iso-8859-1?q?sunil=20kumar?=) Date: Thu, 12 Feb 2004 11:58:41 +0000 (GMT) Subject: [Beowulf] Math Coprocessor Message-ID: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com> Hello everybody, I am a newbie in the Linux world.I would like to know know to... 1) program the 80x87 using C/C++/Fortran95 in linux platform. 2) program the 80x86 using C/C++/Fortran95 in linux platform. 3) link a C function into a fortran95 program or vice versa. Thanks in advance, sunil ________________________________________________________________________ Yahoo! India Education Special: Study in the UK now. Go to http://in.specials.yahoo.com/index1.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Feb 12 09:25:30 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Thu, 12 Feb 2004 22:25:30 +0800 (CST) Subject: [Beowulf] IA64 & AMD64 binary SPBS and SGE download Message-ID: <20040212142530.32328.qmail@web16807.mail.tpe.yahoo.com> Just FYI only. AMD64 binary from offical GridEngine homepage: http://gridengine.sunsource.net/project/gridengine/download.html (IA64 is supported but you need to build from source) IA64 and AMD64 binary rpm for Torque: http://www-user.tu-chemnitz.de/~kapet/torque/ Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Feb 12 09:18:31 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 12 Feb 2004 09:18:31 -0500 (EST) Subject: [Beowulf] Master-Slave Problems In-Reply-To: <1076582124.5002.20.camel@mikhail> Message-ID: > Sure it is easy to come up with just any problem and implement a > solution in a master-slave model, but I'm looking for computationally > intensive problems wherein the computation necessary for parts of the > problem are not equal. What I mean by this is similar to the case of the > parallel number finder, seeing whether 11 is prime requires less > computation compared to seeing whether 9999991 is prime. an easy if hackneyed one is a mandelbrot-family fractal zoomer. depending on what chunk of the space you look at, I'd guess you could find pretty much any distribution of work-per-point. if your master-slave model does smart domain decomp, this might be just the thing. true, some people will roll their eyes when they find out you're doing fractals. I certainly did, when someone here used them. but they do have nice properties, and nice pictures always help ;) > PS: Are ther any cluster admins there willing to spare some cycles and > cluster time for a cluster needy BS Undergraduate student in the > Philippines? :D send me some email. regards, mark hahn. hahn at sharcnet.ca _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 12 09:35:33 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 12 Feb 2004 09:35:33 -0500 (EST) Subject: [Beowulf] Master-Slave Problems In-Reply-To: <1076582124.5002.20.camel@mikhail> Message-ID: On 12 Feb 2004, Dean Michael C. Berris wrote: > Good day everyone, > > I've just finished implementing and testing a master-slave prime number > finder as a test problem for my thesis on heterogeneous cluster load > balancing for parallel applications. Test results show anomalies which > may be tied to work chunk size allocations to the slaves, but to test > whether it will hold true for other applications and is not directly > tied to the parallel prime number finder, I am in need of other problems > that may be solved using a master-slave architecture. > > Sure it is easy to come up with just any problem and implement a > solution in a master-slave model, but I'm looking for computationally > intensive problems wherein the computation necessary for parts of the > problem are not equal. What I mean by this is similar to the case of the > parallel number finder, seeing whether 11 is prime requires less > computation compared to seeing whether 9999991 is prime. > > Any insights or pointers to documentations or papers that have had > similar problems are most welcome. Two remarks. One, lots of problems (e.g. descent into a Mandelbrot set) have widely variable compute times for chunks of work divvied out in a master/slave model with very short and uniform messages distributing the work. Two, why not just simulate work? You're studying something in computer science, not trying to compute prime numbers or random numbers or mandelbrot sets or julia sets. Set up your master to distribute times for slaves to sleep and then reply. Select the times to distribute from the distribution (random or otherwise) of your choice, and scale a return "results" packet accordingly. This yields you complete control over the statistics of the "work" distribution and network load and lets you explore distributions that you might not easily find in the real world. It also lets you CONNECT the results of your simulations with "known" distributions to the results you obtain with real problems, which may help you identify or even categorically classify problems in terms of work-load complexity. This would doubtless make your thesis still more powerful. This is what I've been doing in my Cluster World column -- simulating work (or nearly so) with a trivial master-slave computation of random numbers (the return) accompanied by an adjustable "sleep time" that permits me to effectively sweep the granularity of the computation to demonstrate at least simple Amdahlian scaling properties of this sort of computation. In fact, I can likely give you a PVM program to do this that could easily be hacked into precisely what you'd need to implement this with little effort (a few days, INCLUDING learning how to generate distributions with e.g. the GSL). Let me know. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Thu Feb 12 10:25:03 2004 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Thu, 12 Feb 2004 10:25:03 -0500 Subject: [Beowulf] Virginia Tech upgrade In-Reply-To: References: Message-ID: <402B9ACF.2040502@lmco.com> In case anyone hasn't read slashdot in the last few hours, http://apple.slashdot.org/apple/04/02/12/0613255.shtml?tid=107&tid=126&tid=181&tid=187 Now, everyone face Doug's house and say, "Doug is always right. Doug is always right" :) Jeff > > The first thought I had was "what will they do with all the old systems?" > > Then it hit me. They put a fancy sticker on each box that says > "This machine was part of the third fastest supercomputer on the planet > Nov. 2003" or something similar. Also put a serial number on the tag and > provide a "certificate of authenticity" from VT. My guess is they can > make > a little bit on the whole deal. I wager they would sell rather quickly. > Alumni eat this kind of thing up. > > For those interested, my old www.cluster-rant.com site has morphed into > the new www.clusterworld.com site. You can check out issue contents, > submit stories, check out the polls, and rant about clusters. > > Doug > > > -- > ---------------------------------------------------------------- > Editor-in-chief ClusterWorld Magazine > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 12 10:04:29 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 12 Feb 2004 10:04:29 -0500 (EST) Subject: [Beowulf] Math Coprocessor In-Reply-To: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com> Message-ID: On Thu, 12 Feb 2004, sunil kumar wrote: > > > Hello everybody, > I am a newbie in the Linux world.I would like to > know > know to... > 1) program the 80x87 using C/C++/Fortran95 in linux Why? As one of the relatively few humans on the planet to ever actually write 8087 code (back when it was the ONLY way to use the coprocessor with the various compilers available at the time) I can authoritatively say that it isn't horribly difficult -- the x87 is sort of a RPN HPC calculator for your PC with its own stack and internal floating point commands -- but all the compilers available already use it when they can and it is appropriate, and in MANY cases their code will be as or more efficient and robust than what you could hand code. There are doubtless exceptions, but are they worth the considerable amount of work required to realize them? Are you planning to join the GCC project or something? > platform. > 2) program the 80x86 using C/C++/Fortran95 in linux > platform. This is straightfoward. But I'm not going to explain inlining of assembler here (I can give you an example/code fragment of inlined code if you want it, though). Instead... ...Google is your friend. Try e.g. "86 assembler reference gnu" http://linux.maruhn.com/cat/Development/Languages.html http://www.redhat.com/docs/manuals/enterprise/ RHEL-3-Manual/pdf/rhel-as-en.pdf http://www.linuxgazette.com/issue94/ramankutty.html or "gnu assembler manual" http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_toc.html ... > 3) link a C function into a fortran95 program or > vice or "gnu fortran manual" http://gcc.gnu.org/onlinedocs/g77/ (and that's just the beginning!) Try other search strings. Consider buying a book or two if you're unfamiliar with assembler altogether -- I don't think it is taught much anymore in CPS departments unless you are a really serious major and select the right courses. And they still have somebody who can teach them -- one thing about upper level languages is that they make assembler level programming so difficult by comparison that it has become a vanishing and highly arcane art. Well, not really vanishing, but I'll bet that no more than 10% of all programmers have a clue about what registers are and how to manipulate them with assembler commands...maybe more like 1-2%. And mostly Old Guys at that. And the serious, I mean really serious, programmers and hackers. Basically, all of this is throroughly documented ag gnu.org, and much of it is REdocumented, explained, tutorialized, and hashed over many times many other places, all on the web. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 12 10:28:37 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 12 Feb 2004 10:28:37 -0500 (EST) Subject: [Beowulf] Math Coprocessor In-Reply-To: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com> Message-ID: On Thu, 12 Feb 2004, sunil kumar wrote: > > > Hello everybody, > I am a newbie in the Linux world.I would like to > know > know to... > 1) program the 80x87 using C/C++/Fortran95 in linux > > platform. > 2) program the 80x86 using C/C++/Fortran95 in linux > platform. > 3) link a C function into a fortran95 program or > vice One last reference: man as86 (it even has a list of the supported x86 and x87 instructions at the bottom, although it does NOT teach you to program in assembler in the first place). rgb > versa. > > Thanks in advance, > sunil > > ________________________________________________________________________ > Yahoo! India Education Special: Study in the UK now. > Go to http://in.specials.yahoo.com/index1.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Feb 12 09:12:31 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 12 Feb 2004 09:12:31 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <1076548766.3950.91.camel@protein.scalableinformatics.com> Message-ID: > ... and strace. Amazing how useful that one is. true, but I've also fallen in love with ltrace, which does both syscalls and lib calls. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ao8215 at wayne.edu Thu Feb 12 08:13:48 2004 From: ao8215 at wayne.edu (Robson Pablo Sobradiel Peguin) Date: Thu, 12 Feb 2004 08:13:48 -0500 Subject: [Beowulf] Message Error Message-ID: <813143f0.10fc5818.81a9100@mirapointms3.wayne.edu> Hi I would like to know the meanings of these errors during the compilation with MPICH in the cluster: [root at master source]# make beowulf-WSU-INTEL cp /usr/local/mpich/mpich-1.2.5_intel/include/mpif.h mpif.h make FC=/usr/local/mpich/mpich-1.2.5_intel/bin/mpif90 FFLAGS="-O3 -tpp7 -xW -axW -c"\ CPFLAGS="-DSTRESS -D'POINTER=integer'" \ make LD="/usr/local/mpich/mpich-1.2.5_intel/bin/mpif90 -tpp7 -xW -axW -o" \ FFLAGS="-O3 -tpp7 -xW -axW -c" \ CPFLAGS="-DSTRESS -DMPI -D'POINTER=integer'" \ EX=DLPOLY.MBE BINROOT=../execute 3pt make[1]: Entering directory `/home/sdr/DL_POLY/dl_poly_2.13/source' make[1]: *** No rule to make target `make'. Stop. make[1]: Leaving directory `/home/sdr/DL_POLY/dl_poly_2.13/source' make: *** [beowulf-WSU-INTEL] Error 2 Thank you very much ________________________________________________________ Robson P. S. Peguin, Graduate Student Wayne State University Department of Chemical Engineering and Materials Science 4815 Fourth Street, 2015 MBE,Detroit - MI 48201 phone: (313)577-1416 fax: (313)577-3810 e-mail: robson_peguin at wayne.edu http://chem1.eng.wayne.edu/~sdr/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Wed Feb 11 23:13:12 2004 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Wed, 11 Feb 2004 22:13:12 -0600 Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: References: Message-ID: <402AFD58.9060402@tamu.edu> Realize that not all switches are created equal when working with small (and, overall, 0-byte == small) packets. A number of otherwise decent network switches are less than stellar performers with small packets. We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test system running under the RFC-2544 testing suite... There are switches that perform well with small packets, but it's been our experience that most switches, especially your lower cost switches (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some others I can't recall right now) didn't perform well with smaller packets but did fine when the packet size was about 1500 bytes. Going with cheap switches is usually not a good way to improve performance. gerry Douglas Eadline, Cluster World Magazine wrote: > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > >>Hello, >> >>I have a really small "cluster" of 4 PC's which are connected by a normal >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board >>I thought I might be able to improve performance by connecting the machines >>via a Gigabit switch (which are really cheap nowadays). >> >>Everything seemed to work fine. The switch indicates 1000Mbit connections to >>the PC's and transfer rate for scp-ing large files is significantly higher >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than >>with the 100 Mbit switch. >> >>I wasn't able to actually track down the problem, but it seems that there is >>a problem with small messages. When I run the performance test provided with >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 >>byte message length, while for larger messages everything looks fine (linear >>dependancy of transfer time on message length, everything below 300 us). I >>have also tried mpich2 which shows exactly the same behavior. >> >>Does anyone have any idea? > > > First, I assume you were running the 100BT through the same > onboard NICs and got reasonable performance. So some possible > things: > > - the switch is a dog or it is broken > - your cables may be old or bad (but worked fine for 100BT) > - negotiation problem > > Some things to try: > > Use a cross over cable (cat5e) and see if you get the same problem. > You might try using a lower level benchmark (of the micro variety) > like netperf and netpipe. > > The Beowulf Performance Suite: > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > has these tests. Also, the December and January issues of ClusterWorld > show how to test a network connection using netpipe. At some point this > content will be showing up on the web-page. > > Also, the MPI Link-checker from Microway (www.microway.com) > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > May help. > > > Doug > > >>Here are the details of my system: >> - Suse Linux 9.0 (kernel 2.4.21) >> - mpich-1.2.5.2 >> - motherboard ASUS P4P800 >> - LAN (10/100/1000) on board (3COM 3C940 chipset) >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > + > >> 8x88E1111-BAB, AT89C2051-24PI) >> >> > > -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Thu Feb 12 22:22:02 2004 From: bclem at rice.edu (Brent M. Clements) Date: Thu, 12 Feb 2004 21:22:02 -0600 (CST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <402AFD58.9060402@tamu.edu> References: <402AFD58.9060402@tamu.edu> Message-ID: The best switch that we have found both in price and speed are the GigE Switches from Dell. We use them in a few of our test clusters and smaller clusters. They are actually pretty good performers and top even some of the cisco switches. -Brent Brent Clements Linux Technology Specialist Information Technology Rice University On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote: > Realize that not all switches are created equal when working with small > (and, overall, 0-byte == small) packets. A number of otherwise decent > network switches are less than stellar performers with small packets. > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test > system running under the RFC-2544 testing suite... > > There are switches that perform well with small packets, but it's been > our experience that most switches, especially your lower cost switches > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some > others I can't recall right now) didn't perform well with smaller > packets but did fine when the packet size was about 1500 bytes. > > Going with cheap switches is usually not a good way to improve performance. > > gerry > > Douglas Eadline, Cluster World Magazine wrote: > > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > > > > >>Hello, > >> > >>I have a really small "cluster" of 4 PC's which are connected by a normal > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > >>I thought I might be able to improve performance by connecting the machines > >>via a Gigabit switch (which are really cheap nowadays). > >> > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to > >>the PC's and transfer rate for scp-ing large files is significantly higher > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > >>with the 100 Mbit switch. > >> > >>I wasn't able to actually track down the problem, but it seems that there is > >>a problem with small messages. When I run the performance test provided with > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > >>byte message length, while for larger messages everything looks fine (linear > >>dependancy of transfer time on message length, everything below 300 us). I > >>have also tried mpich2 which shows exactly the same behavior. > >> > >>Does anyone have any idea? > > > > > > First, I assume you were running the 100BT through the same > > onboard NICs and got reasonable performance. So some possible > > things: > > > > - the switch is a dog or it is broken > > - your cables may be old or bad (but worked fine for 100BT) > > - negotiation problem > > > > Some things to try: > > > > Use a cross over cable (cat5e) and see if you get the same problem. > > You might try using a lower level benchmark (of the micro variety) > > like netperf and netpipe. > > > > The Beowulf Performance Suite: > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > > > has these tests. Also, the December and January issues of ClusterWorld > > show how to test a network connection using netpipe. At some point this > > content will be showing up on the web-page. > > > > Also, the MPI Link-checker from Microway (www.microway.com) > > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > > > May help. > > > > > > Doug > > > > > >>Here are the details of my system: > >> - Suse Linux 9.0 (kernel 2.4.21) > >> - mpich-1.2.5.2 > >> - motherboard ASUS P4P800 > >> - LAN (10/100/1000) on board (3COM 3C940 chipset) > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > > > + > > > >> 8x88E1111-BAB, AT89C2051-24PI) > >> > >> > > > > > > -- > Gerry Creager -- gerry.creager at tamu.edu > Network Engineering -- AATLT, Texas A&M University > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 > Page: 979.228.0173 > Office: 903A Eller Bldg, TAMU, College Station, TX 77843 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Feb 12 22:35:51 2004 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 13 Feb 2004 14:35:51 +1100 Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: References: <402AFD58.9060402@tamu.edu> Message-ID: <200402131435.54453.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 13 Feb 2004 02:22 pm, Brent M. Clements wrote: > The best switch that we have found both in price and speed are the GigE > Switches from Dell. We use them in a few of our test clusters and smaller > clusters. They are actually pretty good performers and top even some of > the cisco switches. That's bizzare, the GigE switches I've seen in a Dell cluster *were* rebadged Cisco switches. Even had to do the usual "PortFast" routine in IOS to get PXE booting to work. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFALEYXO2KABBYQAh8RAm81AJoDHOfMZ+hrIyLVoBIr1lsESi70KACfcnYu C1JcJ3iYX22Tm99gTvKlfOs= =XWYZ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Feb 12 23:17:26 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 12 Feb 2004 20:17:26 -0800 (PST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: Message-ID: on varius revs of their code I've had regular (once a week) managment stack crash on our dell switches which doesn't make it easy to collect statistics, but they continue to forward packets just fine... the switches are actually made by accton and they are also sold by smc... depending one who has better deals the dell 5212/5224 or smc 8612t/8624t may be cheaper at any given time... the cisco cat-ios style cli and ssh support are a plus. On Thu, 12 Feb 2004, Brent M. Clements wrote: > The best switch that we have found both in price and speed are the GigE > Switches from Dell. We use them in a few of our test clusters and smaller > clusters. They are actually pretty good performers and top even some of > the cisco switches. > > -Brent > > Brent Clements > Linux Technology Specialist > Information Technology > Rice University > > > On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote: > > > Realize that not all switches are created equal when working with small > > (and, overall, 0-byte == small) packets. A number of otherwise decent > > network switches are less than stellar performers with small packets. > > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test > > system running under the RFC-2544 testing suite... > > > > There are switches that perform well with small packets, but it's been > > our experience that most switches, especially your lower cost switches > > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some > > others I can't recall right now) didn't perform well with smaller > > packets but did fine when the packet size was about 1500 bytes. > > > > Going with cheap switches is usually not a good way to improve performance. > > > > gerry > > > > Douglas Eadline, Cluster World Magazine wrote: > > > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > > > > > > > >>Hello, > > >> > > >>I have a really small "cluster" of 4 PC's which are connected by a normal > > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > > >>I thought I might be able to improve performance by connecting the machines > > >>via a Gigabit switch (which are really cheap nowadays). > > >> > > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to > > >>the PC's and transfer rate for scp-ing large files is significantly higher > > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > > >>with the 100 Mbit switch. > > >> > > >>I wasn't able to actually track down the problem, but it seems that there is > > >>a problem with small messages. When I run the performance test provided with > > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > > >>byte message length, while for larger messages everything looks fine (linear > > >>dependancy of transfer time on message length, everything below 300 us). I > > >>have also tried mpich2 which shows exactly the same behavior. > > >> > > >>Does anyone have any idea? > > > > > > > > > First, I assume you were running the 100BT through the same > > > onboard NICs and got reasonable performance. So some possible > > > things: > > > > > > - the switch is a dog or it is broken > > > - your cables may be old or bad (but worked fine for 100BT) > > > - negotiation problem > > > > > > Some things to try: > > > > > > Use a cross over cable (cat5e) and see if you get the same problem. > > > You might try using a lower level benchmark (of the micro variety) > > > like netperf and netpipe. > > > > > > The Beowulf Performance Suite: > > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > > > > > has these tests. Also, the December and January issues of ClusterWorld > > > show how to test a network connection using netpipe. At some point this > > > content will be showing up on the web-page. > > > > > > Also, the MPI Link-checker from Microway (www.microway.com) > > > > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > > > > > May help. > > > > > > > > > Doug > > > > > > > > >>Here are the details of my system: > > >> - Suse Linux 9.0 (kernel 2.4.21) > > >> - mpich-1.2.5.2 > > >> - motherboard ASUS P4P800 > > >> - LAN (10/100/1000) on board (3COM 3C940 chipset) > > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > > > > > + > > > > > >> 8x88E1111-BAB, AT89C2051-24PI) > > >> > > >> > > > > > > > > > > -- > > Gerry Creager -- gerry.creager at tamu.edu > > Network Engineering -- AATLT, Texas A&M University > > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 > > Page: 979.228.0173 > > Office: 903A Eller Bldg, TAMU, College Station, TX 77843 > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Feb 12 23:19:16 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 12 Feb 2004 20:19:16 -0800 (PST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: Message-ID: Also they support jumbo (9k) frames which is a plus for us since we do nfs over them. joelja On Thu, 12 Feb 2004, Joel Jaeggli wrote: > on varius revs of their code I've had regular (once a week) managment > stack crash on our dell switches which doesn't make it easy to collect > statistics, but they continue to forward packets just fine... the switches > are actually made by accton and they are also sold by smc... depending > one who has better deals the dell 5212/5224 or smc 8612t/8624t may be > cheaper at any given time... the cisco cat-ios style cli and ssh support > are a plus. > > On Thu, 12 Feb 2004, Brent M. Clements wrote: > > > The best switch that we have found both in price and speed are the GigE > > Switches from Dell. We use them in a few of our test clusters and smaller > > clusters. They are actually pretty good performers and top even some of > > the cisco switches. > > > > -Brent > > > > Brent Clements > > Linux Technology Specialist > > Information Technology > > Rice University > > > > > > On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote: > > > > > Realize that not all switches are created equal when working with small > > > (and, overall, 0-byte == small) packets. A number of otherwise decent > > > network switches are less than stellar performers with small packets. > > > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test > > > system running under the RFC-2544 testing suite... > > > > > > There are switches that perform well with small packets, but it's been > > > our experience that most switches, especially your lower cost switches > > > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some > > > others I can't recall right now) didn't perform well with smaller > > > packets but did fine when the packet size was about 1500 bytes. > > > > > > Going with cheap switches is usually not a good way to improve performance. > > > > > > gerry > > > > > > Douglas Eadline, Cluster World Magazine wrote: > > > > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > > > > > > > > > > >>Hello, > > > >> > > > >>I have a really small "cluster" of 4 PC's which are connected by a normal > > > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > > > >>I thought I might be able to improve performance by connecting the machines > > > >>via a Gigabit switch (which are really cheap nowadays). > > > >> > > > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to > > > >>the PC's and transfer rate for scp-ing large files is significantly higher > > > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > > > >>with the 100 Mbit switch. > > > >> > > > >>I wasn't able to actually track down the problem, but it seems that there is > > > >>a problem with small messages. When I run the performance test provided with > > > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > > > >>byte message length, while for larger messages everything looks fine (linear > > > >>dependancy of transfer time on message length, everything below 300 us). I > > > >>have also tried mpich2 which shows exactly the same behavior. > > > >> > > > >>Does anyone have any idea? > > > > > > > > > > > > First, I assume you were running the 100BT through the same > > > > onboard NICs and got reasonable performance. So some possible > > > > things: > > > > > > > > - the switch is a dog or it is broken > > > > - your cables may be old or bad (but worked fine for 100BT) > > > > - negotiation problem > > > > > > > > Some things to try: > > > > > > > > Use a cross over cable (cat5e) and see if you get the same problem. > > > > You might try using a lower level benchmark (of the micro variety) > > > > like netperf and netpipe. > > > > > > > > The Beowulf Performance Suite: > > > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > > > > > > > has these tests. Also, the December and January issues of ClusterWorld > > > > show how to test a network connection using netpipe. At some point this > > > > content will be showing up on the web-page. > > > > > > > > Also, the MPI Link-checker from Microway (www.microway.com) > > > > > > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > > > > > > > May help. > > > > > > > > > > > > Doug > > > > > > > > > > > >>Here are the details of my system: > > > >> - Suse Linux 9.0 (kernel 2.4.21) > > > >> - mpich-1.2.5.2 > > > >> - motherboard ASUS P4P800 > > > >> - LAN (10/100/1000) on board (3COM 3C940 chipset) > > > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > > > > > > > + > > > > > > > >> 8x88E1111-BAB, AT89C2051-24PI) > > > >> > > > >> > > > > > > > > > > > > > > -- > > > Gerry Creager -- gerry.creager at tamu.edu > > > Network Engineering -- AATLT, Texas A&M University > > > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 > > > Page: 979.228.0173 > > > Office: 903A Eller Bldg, TAMU, College Station, TX 77843 > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Feb 13 03:40:09 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 13 Feb 2004 09:40:09 +0100 (CET) Subject: [Beowulf] Math Coprocessor In-Reply-To: Message-ID: On Thu, 12 Feb 2004, Robert G. Brown wrote: > not really vanishing, but I'll bet that no more than 10% of all > programmers have a clue about what registers are and how to manipulate > them with assembler commands...maybe more like 1-2%. And mostly Old > Guys at that. And the serious, I mean really serious, programmers and > hackers. > Sigh. I was first taught assembler in the physics department (being as you in the States would say a physics major). The lab had Motorola 68000 trainer boards. I still have a copy of "68000 Assembly Language" by Kane, Hawkins, Leventhal kicking around. Such a nice architecture. But then again I may be the only person to own "Fortran 77: A Structured Approach". Such perversity originating from being taught Pascal by computer scientists then learning Fortran. I also remember being taught about self-modifying code by the then professor of computing science. Do they still teach that? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rmiguel at usmp.edu.pe Fri Feb 13 09:24:11 2004 From: rmiguel at usmp.edu.pe (Richard Miguel) Date: Fri, 13 Feb 2004 09:24:11 -0500 Subject: [Beowulf] problmes with MPICH References: Message-ID: <000d01c3f23d$0e2af910$1101000a@cpn.senamhi.gob.pe> Hi, i have problems with mpich, i have installed OSCAR with mpich 1.2.5.10-ch_p4-gcc using ssh for communicate with nodes.. that is ok. but, i want to use rsh and i dont want reinstall OSCAR. then i change the line in mpirun RSHCOMMAND=""ssh" by rsh these change was replicated on the nodes but mpich not use rsh. Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need help in this point. I have mpich-1.2.5.2 and fortran pgi and rsh. Thanks R. Miguel _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Fri Feb 13 09:53:38 2004 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Fri, 13 Feb 2004 14:53:38 +0000 (UTC) Subject: [Beowulf] Math Coprocessor In-Reply-To: Message-ID: On Fri, 13 Feb 2004, John Hearns wrote: > But then again I may be the only person to own "Fortran 77: > A Structured Approach". Wow! Bleeding edge stuff. On the subject of pure perversity, my Fortran notes stop with a roneotyped rip-off copy of Fortran IV Self-Taught *in French* dating from 1968. (Anyone else remember the 60s workhorse, the IBM 1130? Punched card paradise? I believe some guy in France has got one back together and working, but I don't remember where.) [Weeps sadly into Wincarnis as the memories flood back.] -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joshh at cs.earlham.edu Fri Feb 13 10:25:31 2004 From: joshh at cs.earlham.edu (joshh at cs.earlham.edu) Date: Fri, 13 Feb 2004 10:25:31 -0500 (EST) Subject: [Beowulf] Adding Latency to a Cluster Environment Message-ID: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Here is an irregular question. I am profiling a software package that runs over LAM-MPI on 16 node clusters [Details Below]. I would like to measure the effect of increased latency on the run time of the program. It would be nice if I could quantify the added latency in the process to create some statistics. If possible, I do not want to alter the code line of the program, or buy new hardware. I am looking for a software solution/idea. Bazaar Cluster: 16 Node Red Hat Linux machines running 500MHz PIII, 512MB RAM 1 100Mbps NIC card in each machine 2 100Mbps Full-Duplex switches Cairo Cluster: 16 Node YellowDog Linux machines running 1GHz PPC G4, 1GB RAM 2 1Gbps NIC cards in each machine (only one in use) 2 1Gbps Full-Duplex switches For more details on these clusters follow the link below: http://cluster.earlham.edu/html/ Thank you, Josh Hursey Earlham College Cluster Computing Group _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Fri Feb 13 11:30:25 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Fri, 13 Feb 2004 08:30:25 -0800 Subject: [Beowulf] problmes with MPICH Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F5563@orsmsx402.jf.intel.com> I'm forwarding this to the OSCAR-users list, a more appropriate venue for this question. -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf > Of Richard Miguel > Sent: Friday, February 13, 2004 6:24 AM > Cc: beowulf at beowulf.org > Subject: [Beowulf] problmes with MPICH > > Hi, i have problems with mpich, i have installed OSCAR with mpich > 1.2.5.10-ch_p4-gcc using ssh for communicate with nodes.. that is ok. but, > i > want to use rsh and i dont want reinstall OSCAR. then i change the line in > mpirun RSHCOMMAND=""ssh" by rsh these change was replicated on the nodes > but > mpich not use rsh. > Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need > help > in this point. > > I have mpich-1.2.5.2 and fortran pgi and rsh. > > Thanks > > R. Miguel > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Fri Feb 13 11:57:00 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Fri, 13 Feb 2004 08:57:00 -0800 Subject: [Beowulf] Math Coprocessor Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com> > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf > Of Martin WHEELER > Sent: Friday, February 13, 2004 6:54 AM > To: John Hearns > Cc: Robert G. Brown; sunil kumar; beowulf at beowulf.org > Subject: Re: [Beowulf] Math Coprocessor > > On Fri, 13 Feb 2004, John Hearns wrote: > > > But then again I may be the only person to own "Fortran 77: > > A Structured Approach". > > Wow! Bleeding edge stuff. > On the subject of pure perversity, my Fortran notes stop with a > roneotyped rip-off copy of Fortran IV Self-Taught *in French* dating > from 1968. (Anyone else remember the 60s workhorse, the IBM 1130? > Punched card paradise? I believe some guy in France has got one back > together and working, but I don't remember where.) > > [Weeps sadly into Wincarnis as the memories flood back.] Ah, another 1130 veteran! Group hug! There's an active 1130 group, and you too can run R2V12 on your very own 1130 simulator, complete w/ Fortran (not EMU, sigh) and other tools. IIRC, APL may even be available. http://ibm1130.org One of my hobby tasks is to port the simulator GUI to Tcl/Tk or Perl/Tk... -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Fri Feb 13 13:15:28 2004 From: djholm at fnal.gov (Don Holmgren) Date: Fri, 13 Feb 2004 12:15:28 -0600 Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote: > Here is an irregular question. I am profiling a software package that runs > over LAM-MPI on 16 node clusters [Details Below]. I would like to measure > the effect of increased latency on the run time of the program. > > It would be nice if I could quantify the added latency in the process to > create some statistics. If possible, I do not want to alter the code line > of the program, or buy new hardware. I am looking for a software > solution/idea. > > Bazaar Cluster: > 16 Node Red Hat Linux machines running 500MHz PIII, 512MB RAM > 1 100Mbps NIC card in each machine > 2 100Mbps Full-Duplex switches > > Cairo Cluster: > 16 Node YellowDog Linux machines running 1GHz PPC G4, 1GB RAM > 2 1Gbps NIC cards in each machine (only one in use) > 2 1Gbps Full-Duplex switches > > For more details on these clusters follow the link below: > http://cluster.earlham.edu/html/ > > Thank you, > > Josh Hursey > Earlham College Cluster Computing Group > Not an irregular question at all. I tried something like this a couple of years ago to investigate the bandwidth and latency sensitivity of an application which was using MPICH over Myrinet. One of D.K.Panda's students from Ohio State University had a modified version of the "mcp" for Myrinet which added quality of service features, tunable per connection. The "mcp" is the code which runs on the LANai microprocessor on the Myrinet interface card. The modifications on top of the OSU modifications to gm used a hardware timer on the interface card to add a fixed delay per packet for bandwidth tuning, and a fixed delay per message (i.e., a delay added to only the first packet of a new connection) for latency tuning. Via netpipe, I verified that I could independently tune the bandwidth and latency. Lots of fun to play with - for example, by plotting the difference in message times for two different latency setting, the eager-rendezvous threshold was easily identified. All in all a very useful experiment which told us a lot about our application. Clearly, you want to delay the sending of a message, or the processing of a received communication, without otherwise interfering with what the system is doing. Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send call is going to perturb your results because the processor won't be doing useful work during that time. That's obviously not the same as running on a network with a switch that adds the same 50 microseconds latency; in that case, the processor could be doing useful work during the delay, happily overlapping computations with communications. Nevertheless, adding busy loops might still give you useful results. You might want to look into using a LD_PRELOAD library to intercept MPI calls of interest, assuming you're using a shared library for MPI. In your version, do the busy loop, then fall into the normal call. A quick google search on "LD_PRELOAD" or "library interposers" will return a lot of examples, such as: http://uberhip.com/godber/interception/index.html http://developers.sun.com/solaris/articles/lib_interposers.html The advantage of this approach is that no modifications to your source code or compiled binaries are necessary. You'll have to think carefully about whether the added latency is slowing your application simply because the processor is not doing work during the busy loop. If I were you, I'd modify your source code and time your syncronizations (eg, MPI_Wait). If your code is cpu-bound, these will return right away, and adding latency via a busy loop is going to give you the wrong answer. If your code is communications bound, these will have a variable delay depending upon the latency and bandwidth of the network. You are likely interested in delays of 10's of microseconds. The most accurate busy loops for this sort of thing use the processor hardware timers, which tick every clock on x86. On a G5 PPC running OS-X, the hardware timer ticks every 60 cpu cycles. I'm not sure what a PPC does under Linux. On x86, you can read the cycle timer via: #include unsigned long long timerVal; rdtscll(timerVal); A crude delay loop example: rdtscll(timeStart); do { rdtscll(timeEnd); } while ((timeEnd - timeStart) < latency * usecPerTick); where latency is in microseconds, and usecPerTick is your calibration. There have been other recent postings to this mailing list about using inline assembler macros to read the time stamp counter. Injecting small latencies w/out busy loops and without disturbing your source code is going to be very difficult (though I'd love to be contradicted on that statement!). A couple of far fetched ideas in kernel land: - some ethernet interfaces have very sophisticated processors aboard. IIRC there were gigE NICs (Broadcom, maybe???) which had a MIPS cpu. Perhaps the firmware can be modified similarly to the modified mcp for gm discussed above. Obviously this has the huge disadvantage of being specific to particular network chips. - the local APIC on x86 processors has a programmable interval timer with better than microsecond granularity which can be used to generate an interrupt. Perhaps in the communications stack, or in the network device driver, a wait_queue could be used to postpone processing until after an interrupt from this timer. I would worry about considerable jitter, though. For a sample driver using this feature, see http://www.oberle.org/apic_timer-timers.html The various realtime Linux folks talk about this as well: http://www.linuxdevices.com/articles/AT6105045931.html Unfortunately, IIRC this timer is now used (since 2.4 kernel) for interprocessor interrupts on SMP systems. On uniprocessor systems it may still be available. I hope there's something useful for you in this response. I'm hoping even more that there are other responses to your question - I would love a facility which would allow me to "turn the dial" on latency and/or bandwidth. There's a substantial cost difference between a gigE cluster and a Myrinet/Infiniband/Quadrix/SCI cluster, and it would be great to simulate performance of different network architectures on specific applications. Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lusk at mcs.anl.gov Fri Feb 13 13:31:08 2004 From: lusk at mcs.anl.gov (Rusty Lusk) Date: Fri, 13 Feb 2004 12:31:08 -0600 (CST) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: <20040213.123108.12267444.lusk@localhost> > Suggestions: > - modify the routines that make MPI calls to call instead some wrapper > routines that do some thumb twiddling before making the MPI call; this > requires modification of the program source > - modify the MPI routines (well, if you use an open-source MPI > implementation) to insert some delay, then relink your binary if static With any standard-conforming MPI implementation, open-source or not, you can use the MPI "profiling" interface to provide any kind of wrapper at all. Basically, you write your own MPI_Send, etc., which does whatever you want and also calls PMPI_Send (required to be there) to do the real work. Then you link your routines in front of the MPI library, and voila! Cheers, Rusty Lusk _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From modus at pr.es.to Thu Feb 12 23:53:04 2004 From: modus at pr.es.to (Patrick Michael Kane) Date: Thu, 12 Feb 2004 20:53:04 -0800 Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <200402131435.54453.csamuel@vpac.org>; from csamuel@vpac.org on Fri, Feb 13, 2004 at 02:35:51PM +1100 References: <402AFD58.9060402@tamu.edu> <200402131435.54453.csamuel@vpac.org> Message-ID: <20040212205304.A16115@pr.es.to> * Chris Samuel (csamuel at vpac.org) [040212 20:42]: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 13 Feb 2004 02:22 pm, Brent M. Clements wrote: > > > The best switch that we have found both in price and speed are the GigE > > Switches from Dell. We use them in a few of our test clusters and smaller > > clusters. They are actually pretty good performers and top even some of > > the cisco switches. > > That's bizzare, the GigE switches I've seen in a Dell cluster *were* rebadged > Cisco switches. Even had to do the usual "PortFast" routine in IOS to get > PXE booting to work. They used to be, I believe. Now they appear to be something else (for their latest 24 port layer-2 model). I've had good luck with them with the latest firmware, before that they were fairly flakey. Check the dell forums for all the yammering and howling on the PowerEdge 5224. Best, -- Patrick Michael Kane _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Feb 13 12:44:33 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 13 Feb 2004 18:44:33 +0100 (CET) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote: > Here is an irregular question. I am profiling a software package that runs > over LAM-MPI on 16 node clusters [Details Below]. I would like to measure > the effect of increased latency on the run time of the program. It appears that in your setup MPI uses TCP/IP as underlying protocol. Latency is a fuzzy parameter in TCP/IP. Adding to something fuzzy gives a fuzzy result. So there :-) Suggestions: - modify the routines that make MPI calls to call instead some wrapper routines that do some thumb twiddling before making the MPI call; this requires modification of the program source - modify the MPI routines (well, if you use an open-source MPI implementation) to insert some delay, then relink your binary if static - modify the kernel source to insert some delays in the TCP path - pretty hard as TCP is very complex - modify the network driver to insert some delays in the Tx or Rx packet path; not very difficult, but might be leveled by the delays of TCP. The kernel modifications have the disadvantage that they also require some way to change the delay value, so adding a /proc entry, an ioctl, etc. unless you want to recompile the kernel and reboot after each delay change. > For more details on these clusters follow the link below: > http://cluster.earlham.edu/html/ Please tell to whoever coded that page that Opera doesn't display it properly. And I use Opera all the time ;-) The page also doesn't specify an important detail: the network cards/chips used in the clusters. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Fri Feb 13 14:01:25 2004 From: gropp at mcs.anl.gov (William Gropp) Date: Fri, 13 Feb 2004 13:01:25 -0600 Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: <6.0.0.22.2.20040213125745.0266bbc0@localhost> At 11:44 AM 2/13/2004, Bogdan Costescu wrote: >On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote: > > > Here is an irregular question. I am profiling a software package that runs > > over LAM-MPI on 16 node clusters [Details Below]. I would like to measure > > the effect of increased latency on the run time of the program. > >It appears that in your setup MPI uses TCP/IP as underlying protocol. >Latency is a fuzzy parameter in TCP/IP. Adding to something fuzzy gives a >fuzzy result. So there :-) > >Suggestions: >- modify the routines that make MPI calls to call instead some wrapper >routines that do some thumb twiddling before making the MPI call; this >requires modification of the program source Actually, this is not necessary, as long as you have the object files, not just the executable. The MPI profiling interface could be used to add latency to every send and receive operation; adding latency to collectives will require some care, as the exact set of communication operations that an MPI implementation uses is up to the implementation. Simply write your own MPI routine and call the PMPI version (e.g., for MPI_Send, call PMPI_Send) after adding some latency. Note also that MPI may use any communication mechanism. Even on small clusters, it may use something besides TCP (e.g., when the network is Infiniband). MPI on SMPs often uses a collection of communication approaches. Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Fri Feb 13 15:39:14 2004 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Fri, 13 Feb 2004 20:39:14 +0000 (UTC) Subject: [Beowulf] Math Coprocessor In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com> Message-ID: On Fri, 13 Feb 2004, Lombard, David N wrote: > There's an active 1130 group, and you too can run R2V12 on your very own > 1130 simulator, complete w/ Fortran (not EMU, sigh) and other tools. > IIRC, APL may even be available. http://ibm1130.org Thanks for the link -- didn't know about that. As arts faculty post-grads (applied linguistics) we were only allowed to play with Fortran (and even then were regarded with deep suspicion by the physics wallahs). Now -- where did I put that stack of cards...? Off to the attic to dig out more stuff. -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Fri Feb 13 16:54:28 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Fri, 13 Feb 2004 16:54:28 -0500 (EST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <402AFD58.9060402@tamu.edu> Message-ID: I wondered about your low cost switch statement. I had done this test before, but I thought I would redo it anyway. I have an SMC 8 port GigE EasySwitch 8508T (PriceGrabber $140 to my door). I should say that the switch is not loaded, so it may fall down if the load were higher. This is just two nodes running netpipe through the switch. Latency: 0.000034 Now starting main loop 0: 1 bytes 7287 times --> 0.22 Mbps in 0.000034 sec 1: 2 bytes 7338 times --> 0.46 Mbps in 0.000033 sec 2: 3 bytes 7469 times --> 0.68 Mbps in 0.000034 sec 3: 4 bytes 4923 times --> 0.90 Mbps in 0.000034 sec 4: 6 bytes 5545 times --> 1.36 Mbps in 0.000034 sec 5: 8 bytes 3711 times --> 1.81 Mbps in 0.000034 sec 6: 12 bytes 4637 times --> 2.67 Mbps in 0.000034 sec My opinion: If you get a switch that can not "switch" then it is broken by design. The original poster noted that his results seem to go from OK to "really bad" for basic MPI tests. If a switch does this it is "really broken". Of course it may not be the switch. BTW, the results were for a $30 NIC (netgear GA302T) running in a 66MHz slot. Top throughput was 800 Mbits/sec. Doug On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote: > Realize that not all switches are created equal when working with small > (and, overall, 0-byte == small) packets. A number of otherwise decent > network switches are less than stellar performers with small packets. > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test > system running under the RFC-2544 testing suite... > > There are switches that perform well with small packets, but it's been > our experience that most switches, especially your lower cost switches > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some > others I can't recall right now) didn't perform well with smaller > packets but did fine when the packet size was about 1500 bytes. > > Going with cheap switches is usually not a good way to improve performance. > > gerry > > Douglas Eadline, Cluster World Magazine wrote: > > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > > > > >>Hello, > >> > >>I have a really small "cluster" of 4 PC's which are connected by a normal > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > >>I thought I might be able to improve performance by connecting the machines > >>via a Gigabit switch (which are really cheap nowadays). > >> > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to > >>the PC's and transfer rate for scp-ing large files is significantly higher > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > >>with the 100 Mbit switch. > >> > >>I wasn't able to actually track down the problem, but it seems that there is > >>a problem with small messages. When I run the performance test provided with > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > >>byte message length, while for larger messages everything looks fine (linear > >>dependancy of transfer time on message length, everything below 300 us). I > >>have also tried mpich2 which shows exactly the same behavior. > >> > >>Does anyone have any idea? > > > > > > First, I assume you were running the 100BT through the same > > onboard NICs and got reasonable performance. So some possible > > things: > > > > - the switch is a dog or it is broken > > - your cables may be old or bad (but worked fine for 100BT) > > - negotiation problem > > > > Some things to try: > > > > Use a cross over cable (cat5e) and see if you get the same problem. > > You might try using a lower level benchmark (of the micro variety) > > like netperf and netpipe. > > > > The Beowulf Performance Suite: > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > > > has these tests. Also, the December and January issues of ClusterWorld > > show how to test a network connection using netpipe. At some point this > > content will be showing up on the web-page. > > > > Also, the MPI Link-checker from Microway (www.microway.com) > > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > > > May help. > > > > > > Doug > > > > > >>Here are the details of my system: > >> - Suse Linux 9.0 (kernel 2.4.21) > >> - mpich-1.2.5.2 > >> - motherboard ASUS P4P800 > >> - LAN (10/100/1000) on board (3COM 3C940 chipset) > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > > > + > > > >> 8x88E1111-BAB, AT89C2051-24PI) > >> > >> > > > > > > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Feb 13 17:46:38 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 13 Feb 2004 23:46:38 +0100 (CET) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: Message-ID: On Fri, 13 Feb 2004, Don Holmgren wrote: > I tried something like this a couple of years ago to investigate the > bandwidth and latency sensitivity of an application which was using > MPICH over Myrinet. ... which is pretty different from the setup of the original poster :-) But I'd like to see it discussed in general, so let's go on. > a modified version of the "mcp" for Myrinet which added ... Is this publicly available ? I'd like to give it a try. > The modifications on top of the OSU modifications to gm Well, that's a very important point: using GM, which doesn't try to make too many things like TCP does. I haven't used GM directly nor looked at its code, but I think that it doesn't introduce delays, like TCP does in some cases. Moreover, based on the description in the GM docs, GM is not needed to be optimized by the compiler as it's not in the fast path. Obviously, in such conditions, the results can be relied upon. > Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send > call is going to perturb your results because the processor won't be > doing useful work during that time. In the case of TCP, the processor doesn't appear to be doing anything useful for "long" times, as it spends time in kernel space. So, a 50 microseconds busy loop might not make a difference. And given the somehow non-deterministic behaviour of TCP in this respect, it might be that adding the delay before the PMPI_* or after PMPI_* calls might make a difference. The delays don't have to be busy-loops. Busy-loops are probably precise, but might have some side-effects; for example, reading some hardware counter (even more as it is on a PCI device, which is "far" from the CPU and might be even "farther" if it has any PCI bridge(s) in between) repeatedly will generate lots of "in*" operations during which the CPU is stalled waiting for data. Especially with today's CPU speeds, I/O operations are expensive in terms of CPU cycles... > You are likely interested in delays of 10's of microseconds. Well, it depends :-) The latencies for today's HW+SW seem to be in a range of about 2 orders of magnitude, so giving absolute figures doesn't make much sense IMHO. Apart from this I would rather suggest an exponential increase in the delay value. > - some ethernet interfaces have very sophisticated processors aboard. > IIRC there were gigE NICs (Broadcom, maybe???) which had a MIPS cpu. Well, if the company releases enough documentation about the chip, then yes ;-) 3Com has the 990 line which is still FastE but has a programmable processor, so it's not only GigE. > Obviously this has the huge disadvantage of being specific to > particular network chips. But there aren't so many programmable network chips these days. Those Ethernet chips might even be in wider use than Myrinet[1] and more people might benefit from such development. If I'd have to choose for the next cluster purchase the GigE network cards and I'd know that one offers such capabilities while not having significant flaws compared to the others, I'd certainly buy it. Another hardware approach: the modern 3Com cards driven by 3c59x, Cyclone and Tornado, have the means to delay a packet in their (hardware) Tx queue. There is however a catch: there is not guarantee that the packet will be sent at the exact time specified, it can be delayed; the only guarantee is that the packet is not sent before that time. However, I somehow think that this is true for most other approaches, so it's not so bad as it sounds :-) The operation is pretty simple, as the packet is "stamped" with the time when it should be transmitted, expressed as some internal clock ticks. Only one "in" operation to read the current clock is needed per packet, so this is certainly much less intrusive as the busy-loop. [ I'm too busy (but not busy-looping :-)) to try this at the moment. If somebody feels the urge, I can provide some guidance :-) ] However, anything that still uses TCP (as both your Broadcom approach and my 3Com one do) will likely generate unreliable results... > it would be great to simulate performance of different network > architectures on specific applications. Certainly ! Especially as this would provide means to justify spending money on fast interconnect ;-) [1] I don't want this to look like I'm saying "compared with Myrinet as it's the most widely used high-performance interconnect" and neglect Infiniband, SCI, etc; I have no idea about "market share" of the different interconnects. I compare with Myrinet because the original message talked about it and because I'm ignorant WRT programmable processors on other interconnect NICs. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Fri Feb 13 19:49:05 2004 From: djholm at fnal.gov (Don Holmgren) Date: Fri, 13 Feb 2004 18:49:05 -0600 Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: References: Message-ID: On Fri, 13 Feb 2004, Bogdan Costescu wrote: > On Fri, 13 Feb 2004, Don Holmgren wrote: > > > I tried something like this a couple of years ago to investigate the > > bandwidth and latency sensitivity of an application which was using > > MPICH over Myrinet. > > ... which is pretty different from the setup of the original poster :-) > But I'd like to see it discussed in general, so let's go on. > > > a modified version of the "mcp" for Myrinet which added ... > > Is this publicly available ? I'd like to give it a try. I'm afraid not, sorry, since the modified code base from OSU isn't publically available. IIRC it was part of a project for a masters degree; if it's OK with them, it's OK with me (we can take this offline). The modified MCP had a bug I never fixed which required me to reset the card and reload the driver when some counter overflowed, at something like a gigabyte of messages. Long enough to get very good statistics, though. > > > The modifications on top of the OSU modifications to gm > > Well, that's a very important point: using GM, which doesn't try to make > too many things like TCP does. I haven't used GM directly nor looked at > its code, but I think that it doesn't introduce delays, like TCP does in > some cases. Moreover, based on the description in the GM docs, GM is not > needed to be optimized by the compiler as it's not in the fast path. > Obviously, in such conditions, the results can be relied upon. I miswrote a bit; to be precise, this was a modification to the MCP, which is the NIC firmware, rather than to GM, which is the user space code that interacts with the NIC hardware. The modification caused the NIC itself to introduce interpacket delays of a configurable value. To the application (well, to MPICH and to GM) it simply looked like the external Myrinet network had a different bandwidth and/or latency. There were tiny code changes to MPICH and to GM to allow modification of the interpacket delay values in the MCP; otherwise I would have had to recompile or patch the firmware image and reload that image for each new value. You are absolutely correct that GM, like all good OS-bypass software, doesn't introduce the delays that you'd encounter with communications protocols like TCP that have to pass through the kernel/user space boundary. Much more deterministic. > > > Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send > > call is going to perturb your results because the processor won't be > > doing useful work during that time. > > In the case of TCP, the processor doesn't appear to be doing anything > useful for "long" times, as it spends time in kernel space. So, a 50 > microseconds busy loop might not make a difference. And given the somehow > non-deterministic behaviour of TCP in this respect, it might be that > adding the delay before the PMPI_* or after PMPI_* calls might make a > difference. TCP processing is likely a significant component of the natural latency, and, as you point out, during that time the CPU is busy in kernel space and isn't doing useful work. But the goal here is to add additional artificial latency in a manner that mimics a slower physical network, i.e., so that during this artificial delay the application can still be crunching numbers. In user space I don't see how to accomplish this goal (adding latency, yes; adding latency during which the cpu can do calculations, no). If delay code is added correctly in kernel space, say in the TCP/IP stack (sounds like a nasty bit of careful work!), then during that 50 usec period the CPU could certainly be doing useful work in user space. Small delays, relative to the timer tick, are very difficult to do accurately in non-realtime kernels unless you have a handy source of interrupts, like the local APIC. Assuming that LAM MPI isn't multithreaded (I have no idea), then adding a delay in the user space code in the MPI call, whether it's a sleep or a busy loop, guarantees that no useful application work can done during the delay. I'm confess to be totally ignorant of the PMPI_* calls (time for homework!) and defer humbly to the MPI masters from ANL. I'm definitely curious as to how these added latencies are implemented. > > The delays don't have to be busy-loops. Busy-loops are probably precise, > but might have some side-effects; for example, reading some hardware > counter (even more as it is on a PCI device, which is "far" from the CPU > and might be even "farther" if it has any PCI bridge(s) in between) > repeatedly will generate lots of "in*" operations during which the CPU is > stalled waiting for data. Especially with today's CPU speeds, I/O > operations are expensive in terms of CPU cycles... Agreed, though I'd hope on x86 that reading the time stamp counter is very quick and with minimal impact - it's got to be more like a register-to-register move than an I/O access. Hopefully on a modern superscalar processor this doesn't interfere with the other execution units. [As I write this, I just ran a program that reads the time stamp counter back to back to different registers, multiple times. The difference in values was a consistent 84 counts or 56 nsec on this 1.5 GHz Xeon - so, definitely minimal impact.] Without busy loops, achieving accurate delays of the order of 10's to 100's of microseconds with little jitter is a real trick in user space, (and kernel space as well!). nanosleep() won't work, delivering order 10 or 20 msec (i.e., the next timer tick) instead of the 50 usec request. > > > You are likely interested in delays of 10's of microseconds. > > Well, it depends :-) The latencies for today's HW+SW seem to be in a range > of about 2 orders of magnitude, so giving absolute figures doesn't make > much sense IMHO. Apart from this I would rather suggest an exponential > increase in the delay value. True. I was really thinking of my specific problem, not his! The relevent latency range for deciding between Infiniband and switched ethernet is ~ 6 usec to ~ 100+ usec, and the bandwidth range is ~ 100 MB/sec (gigE) to ~ 700 MB/sec (I.B.). It would be really useful to be able to inject latencies in that latency range with a precision of 5 usec or so, and to dial the bandwidth with a precision of ~ 50 MB/sec. Of course, if latency really matters, one would drop TCP/IP and use an OS-bypass, like GAMMA or MVIA. > ... > > > it would be great to simulate performance of different network > > architectures on specific applications. > > Certainly ! Especially as this would provide means to justify spending > money on fast interconnect ;-) What we need is some kind corporate soul to put up a large public cluster with the lowest latency, highest bandwidth network fabric available. Then, we can add our adjustable firmware and degrade that fabric to mimic less expensive networks, and figure out what we should really buy. Works for me! Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sat Feb 14 04:47:30 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sat, 14 Feb 2004 10:47:30 +0100 (CET) Subject: [Beowulf] Math Coprocessor In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com> Message-ID: On Fri, 13 Feb 2004, Lombard, David N wrote: > > Ah, another 1130 veteran! Group hug! > Talking about 'mature' computer systems, I was at the ATLAS centre at RAL yesterday, where they display the console of te IBM 360 in the front hall. Plenty of blinkenlights and switches to toggle. The notice beside it said it was a 15 MIPS machine. Seems impressive for a machineof this vintage. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sat Feb 14 04:43:37 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sat, 14 Feb 2004 10:43:37 +0100 (CET) Subject: [Beowulf] problmes with MPICH In-Reply-To: <000d01c3f23d$0e2af910$1101000a@cpn.senamhi.gob.pe> Message-ID: On Fri, 13 Feb 2004, Richard Miguel wrote: > Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need help > in this point. > > I have mpich-1.2.5.2 and fortran pgi and rsh. > ./configure -rsh=RSHCOMMAND >From the configure.in: "The environment variable 'RSHCOMMAND' allows you to select an alternative remote shell command (by default, configure will use 'rsh' or 'remsh' from your 'PATH'). If your remote shell command does not support the '-l' option (some AFS versions of 'rsh' have this bug), also give the option '-rshnol'. These options are useful only when building a network version of MPICH (e.g., '--with-device=ch_p4'). The configure option '-rsh' is supported for backward compatibility." SO rsh is the defautl behaviour. You can compile with the rsh command set to the rsh under $SGE_HOME/mpi also. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Feb 14 11:31:51 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 14 Feb 2004 11:31:51 -0500 (EST) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: Message-ID: given the difficulty of accurately adding a small amount of latency to a message passing interface, how about this: hack the driver to artificially pre/append a constant number of bytes to each message. they will appear to take longer to process, giving high-resolution added delays. course, this will also saturate earlier, but that's only the upper knee of the curve: you can still learn what you want... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From konstantin_kudin at yahoo.com Sat Feb 14 14:28:22 2004 From: konstantin_kudin at yahoo.com (Konstantin Kudin) Date: Sat, 14 Feb 2004 11:28:22 -0800 (PST) Subject: [Beowulf] S.M.A.R.T usage in big clusters Message-ID: <20040214192822.35170.qmail@web21203.mail.yahoo.com> I am curious if anyone is using SMART monitoring of ide drives in a big cluster. Basically, the question is in what percentage of the situations when a drive fails SMART is able to give some kind of a reasonable warning beforehand, let's say more than 24 hours. And how often it does not predict failure at all? The reason I am asking is that recently I had a drive that started getting bunch of I/O errors on certain sectors, yet SMART seemed to indicate that things were fine. Thanks! Konstantin __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Sat Feb 14 18:12:38 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Sun, 15 Feb 2004 00:12:38 +0100 (CET) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: Message-ID: On Sat, 14 Feb 2004, Mark Hahn wrote: > hack the driver to artificially pre/append a constant number of > bytes to each message. I thought of this as well, but I dsmissed it because: - if the higher level protocol uses fragmentation and checksums, I think that it's pretty hard for the driver to mess with the messages. - a side effect might be faster filling up of some FIFO buffers on the receiver side, which might influence in unexpected ways the latency that we want to measure. Another side effect might be on the switch (assuming a network that uses switches) where data might be kept longer in buffers or peak bandwidth might be reached for short times, but enough to make a difference... - for networks that offer a very low latency, simulating a large latency might require adding a big lot of junk data, many times larger than the original message. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Mon Feb 16 09:08:54 2004 From: timm at fnal.gov (Steven Timm) Date: Mon, 16 Feb 2004 08:08:54 -0600 (CST) Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <20040214192822.35170.qmail@web21203.mail.yahoo.com> References: <20040214192822.35170.qmail@web21203.mail.yahoo.com> Message-ID: We are using the SMART monitoring on our cluster. It depends on the drive model how much predictive power you will get. On the drives where we have had the most failures we've kept track of how well SMART predicted it pretty well.. it finds an error in advance about half the time. Steve Timm ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Core Support Services Dept. Assistant Group Leader, Scientific Computing Support Group Lead of Computing Farms Team On Sat, 14 Feb 2004, Konstantin Kudin wrote: > I am curious if anyone is using SMART monitoring of > ide drives in a big cluster. > > Basically, the question is in what percentage of the > situations when a drive fails SMART is able to give > some kind of a reasonable warning beforehand, let's > say more than 24 hours. And how often it does not > predict failure at all? > > The reason I am asking is that recently I had a drive > that started getting bunch of I/O errors on certain > sectors, yet SMART seemed to indicate that things were > fine. > > Thanks! > > Konstantin > > > > __________________________________ > Do you Yahoo!? > Yahoo! Finance: Get your refund fast by filing online. > http://taxes.yahoo.com/filing.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From camm at enhanced.com Mon Feb 16 10:47:01 2004 From: camm at enhanced.com (Camm Maguire) Date: 16 Feb 2004 10:47:01 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: Message-ID: <54brnzrpqi.fsf@intech19.enhanced.com> Greetings! The subject line says it all -- where can one get the most bang per watt among systems currently available? Take care, -- Camm Maguire camm at enhanced.com ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Mon Feb 16 11:19:53 2004 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Mon, 16 Feb 2004 11:19:53 -0500 (EST) Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <54brnzrpqi.fsf@intech19.enhanced.com> References: <54brnzrpqi.fsf@intech19.enhanced.com> Message-ID: On Mon, 16 Feb 2004 at 10:47am, Camm Maguire wrote > Greetings! The subject line says it all -- where can one get the most > bang per watt among systems currently available? I have no numbers or benchmarks, but my search for a quiet but powerful set of nodes led me to buy Dell Optiplex SX270s. They've got the Intel 865G chipset (800MHz FSB, 400MHz dual channel memory), P4 HT up to 3.2GHz, onboard e1000, laptop-style HDD, a 150W power supply, and little else. They're sweet little systems. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmpc at sanger.ac.uk Mon Feb 16 12:10:43 2004 From: gmpc at sanger.ac.uk (Guy Coates) Date: Mon, 16 Feb 2004 17:10:43 +0000 (GMT) Subject: Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <200402151704.i1FH4Vh21871@NewBlue.scyld.com> References: <200402151704.i1FH4Vh21871@NewBlue.scyld.com> Message-ID: > Message: 1 > Date: Sat, 14 Feb 2004 11:28:22 -0800 (PST) > From: Konstantin Kudin > To: beowulf at beowulf.org > Subject: [Beowulf] S.M.A.R.T usage in big clusters > > I am curious if anyone is using SMART monitoring of > ide drives in a big cluster. Yes. We use smartmon tools http://smartmontools.sourceforge.net/ Hard drive failures are by far the most common hardware failure we see on our systems. We've hooked smartmontools into the batch queueing system we use, so that if drives are flagged as failing, the host gets closed to new jobs. (You could extend this to do checkpoint/migration if your code supports it, ours doesn't.) Our cluster typically runs fairly short jobs (less than 1 hour or so) so jobs usually finish before the drive finally fails. I haven't collected any hard statistics on how many failures we catch before it impacts on a user's work, but my gut feeling is that it catches over 80% of the cases, and certainly enough for it to be worthwhile implementing. Cheers, Guy Coates -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Mon Feb 16 16:00:34 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Mon, 16 Feb 2004 13:00:34 -0800 Subject: [Beowulf] Max flops to watts hardware for a cluster References: <54brnzrpqi.fsf@intech19.enhanced.com> Message-ID: <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> This is an exceedingly sophisticated question.. Do you count: Wall plug watts to flops? or CPU watts to flops? does the interconnect count? (just the power in the line drivers and terminations is a big power consumer for spaceflight hardware... why LVDS is overtaking RS-422 ... 300mV into 100 ohms is a lot better than 12-15V into 100 ohms. Too bad LVDS parts don't have the common mode voltage tolerance. I'll bet that gigabit backplane in the switch burns a fair amount of power... does the memory count? This would drive more vs less cache decisions, which affect algorithm partitioning and data locality of reference. Is there a constraint on a "total minimum speed" or "maximum number of nodes"? The interesting tradeoff in speed of nodes vs number of nodes manifests itself in many ways: more interconnects, bigger switches, etc. More nodes means Larger physical size means longer cables means more cable capacitance to charge and discharge on each bit means more power in the line drivers. What's your message latency requirement? Can you do store and forward through the nodes (a'la iPSC/1 hypercubes) (saving you the switch, but adding some power in the CPU to shuffle messages around) Can free space optical interconnects be used? (power hungry Tx and Rx, but no cable length issues) Anyway.. this is an issue that is very near and dear to my heart (since I'm designing power constrained systems). One problem you'll find is that reliable and comparable (across processors/architectures) numbers are very hard to come by. I've spent a fair amount of time explaining why 40 MFLOPs in a 20 MHz DSP can actually be a lot more "crunch" at a lot less power than a 200 MIPS PowerPC 750 running at 133 MHz. Jim Lux Spacecraft Telecommunications Section Jet Propulsion Lab ----- Original Message ----- From: "Camm Maguire" To: Sent: Monday, February 16, 2004 7:47 AM Subject: [Beowulf] Max flops to watts hardware for a cluster > Greetings! The subject line says it all -- where can one get the most > bang per watt among systems currently available? > > Take care, > -- > Camm Maguire camm at enhanced.com > ========================================================================== > "The earth is but one country, and mankind its citizens." -- Baha'u'llah > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Mon Feb 16 18:11:50 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Mon, 16 Feb 2004 23:11:50 +0000 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> References: <54brnzrpqi.fsf@intech19.enhanced.com> <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> Message-ID: <20040216231150.GA3060@galactic.demon.co.uk> On Mon, Feb 16, 2004 at 01:00:34PM -0800, Jim Lux wrote: > This is an exceedingly sophisticated question.. > > Do you count: > Wall plug watts to flops? or CPU watts to flops? Via Eden / Nehemiah chips at 1GHz for 7W or Acorn ARM e.g. Simtec evaluation boards ? > does the interconnect count? (just the power in the line drivers and > terminations is a big power consumer for spaceflight hardware... why LVDS is > overtaking RS-422 ... 300mV into 100 ohms is a lot better than 12-15V into > 100 ohms. Too bad LVDS parts don't have the common mode voltage tolerance. > Cheap slow ASICs and serial port type speeds? Low power Bluetooth devices? > I'll bet that gigabit backplane in the switch burns a fair amount of > power... > > does the memory count? This would drive more vs less cache decisions, which > affect algorithm partitioning and data locality of reference. > The early Seymour Cray model - minimum numbers of standard parts that are ultra fast? > Is there a constraint on a "total minimum speed" or "maximum number of > nodes"? The interesting tradeoff in speed of nodes vs number of nodes > manifests itself in many ways: more interconnects, bigger switches, etc. > Buckyball of PDA's anyone ? :) > More nodes means Larger physical size means longer cables means more cable > capacitance to charge and discharge on each bit means more power in the line > drivers. > Xilinx FPGA type architecture? Inmos transputer-style? Node on chip? AVR Atmel-type chips? > What's your message latency requirement? Can you do store and forward > through the nodes (a'la iPSC/1 hypercubes) (saving you the switch, but > adding some power in the CPU to shuffle messages around) > > Can free space optical interconnects be used? (power hungry Tx and Rx, but > no cable length issues) > ThinkGeek do an _ultra cool_ looking green pumped laser pointer which will reach low cloudbases :) > > Anyway.. this is an issue that is very near and dear to my heart (since I'm > designing power constrained systems). One problem you'll find is that > reliable and comparable (across processors/architectures) numbers are very > hard to come by. I've spent a fair amount of time explaining why 40 MFLOPs > in a 20 MHz DSP can actually be a lot more "crunch" at a lot less power than > a 200 MIPS PowerPC 750 running at 133 MHz. > If 5W of power goes to/from Mars - then the JPL are the ones to beat on this [makes QRP radio hams look positively profligate] :) Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Feb 16 20:45:49 2004 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 17 Feb 2004 12:45:49 +1100 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> References: <54brnzrpqi.fsf@intech19.enhanced.com> <20040216231150.GA3060@galactic.demon.co.uk> <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> Message-ID: <200402171245.51746.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 17 Feb 2004 12:22 pm, Jim Lux wrote: > For those interested, all the deep space comm stuff is documented in CCSDS > specs at http://www.ccsds.org/ Cool. http://www1.ietf.org/mail-archive/ietf-announce/Current/msg27294.html This document describes how to encapsulate Internet Protocol version 4 and version 6 packets can be encapsulated in Consultative Committee for Space Data Systems (CCSDS) Space Data Link Protocols. That's going to be one hell of a round trip time for pings.. What about distributed processing between spacecraft ? OK, maybe interplanetary would be a bit much, but what about lander(s) and orbiter(s) ? - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAMXJNO2KABBYQAh8RAkTUAKCDfbAaswt3oWYDrEzXecdrqPfIPACff5cS UUAVTMwPAR3XA3lHjjf9lYc= =+LJH -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Mon Feb 16 20:22:51 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Mon, 16 Feb 2004 17:22:51 -0800 Subject: [Beowulf] Max flops to watts hardware for a cluster References: <54brnzrpqi.fsf@intech19.enhanced.com> <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> <20040216231150.GA3060@galactic.demon.co.uk> Message-ID: <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> > > > If 5W of power goes to/from Mars - then the JPL are the ones to beat on > this [makes QRP radio hams look positively profligate] :) that 15W from Mars, on the omni antenna, only gets you 7-8 bits/second, working into a 70 meter diameter dish and a cryogenically cooled receiver front end. A bit beyond the typical ham's rig or budget. Going the other way, it's hundreds of kW into the dish. Beyond QRO. More realistically, they get a hundred kbps or so on the UHF link to the orbiter from a basically omni antenna on the rover. I can't recall what the max rate on the "direct to earth" X-band high gain antenna (which is about 20 cm in diameter) is, but it's probably in the same ballpark. That's the actual signalling rate, also... there's some coding going on as well, so the "data rate" is lower, after you take out framing, error correction etc. For those interested, all the deep space comm stuff is documented in CCSDS specs at http://www.ccsds.org/ --- Actually, the low power per function (or more accurately, low energy per function) champs are probably the cellphone folks.. Battery life is a real selling point. The little GPS receivers for cellphones are actually spec'd in milliJoules/fix, for instance. That said, I don't see anyone building a big crunching cluster out of cellphones... It's all those other issues you have to deal with.. interconnects, cluster management, memory, etc. They all require energy. Jim Lux Spacecraft Telecommunications Equipment Section Jet Propulsion Laboratory _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Tue Feb 17 00:34:30 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Mon, 16 Feb 2004 21:34:30 -0800 Subject: [Beowulf] Max flops to watts hardware for a cluster References: <54brnzrpqi.fsf@intech19.enhanced.com> <20040216231150.GA3060@galactic.demon.co.uk> <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> <200402171245.51746.csamuel@vpac.org> Message-ID: <001a01c3f517$b7bbffb0$36a8a8c0@LAPTOP152422> > > This document describes how to encapsulate Internet Protocol > version 4 and version 6 packets can be encapsulated in > Consultative Committee for Space Data Systems (CCSDS) Space > Data Link Protocols. > > That's going to be one hell of a round trip time for pings.. > > > > What about distributed processing between spacecraft ? OK, maybe > interplanetary would be a bit much, but what about lander(s) and orbiter(s) ? > > > Such ideas are being contemplated, and not only by me. There are distributed computing/ cooperative robotics sorts of things, and also "formation flying" sorts of things, not to mention "sensor webs". Probably the biggest problem is not a technology one but a philosophical one. Spacecraft and mission design is exceedingly conservative, and you'd have to show that it would enable something that's needed, that can't be done by conventional approaches. It's sufficiently unusual that it doesn't fit well with the usual analysis models for spacecraft; which tend to push towards "one big X" supplied by power from "one big Y" using "one big Z" to talk to home, etc. The costing spreadsheets used in speculative mission planning don't have cells for "number of processors in cluster" and "power per node" You need a fairly straightforward model that says, in effect, you can process "x" amount of data with "y" mass and "z" watts/joules. That model must be backed up by credible analysis and experience ("heritage" in space speak). In general, the perception is that "more parts = more potential failure points = higher risk" so it's gotta be a "this is the ONLY way to make the measurement" or it's not going to fly. You're going to spend years and years getting ready to go, and you can't go fix it if it breaks. Spaceflight is a very, very, very different conceptual and planning model. (we won't even get into what you have to do if it's connected to human space flight in any way...). The time from "great idea" to "mission launch" is probably in the area of 5-7 years. The CPU flying on the Mars Rovers is a Rad6000, which is based on an old MIPS processor. Current missions in planning and development use things like PowerPC750's (derated) and Sparc7s and 8's (aka ERC32 and/or LEON) and ADSP21020 clones. Nobody is thinking about flying ARMs or Transmetas or even Pentiums. The popular scheme these days is various and sundry microcores (6502, 8051, PPC604s) in Xilinx megagate FPGAs. Actually, though, the fact that only these relatively low powered (computationally) processors are what are flying is what makes clusters attractive. If you need hundreds of megaflops to do your measurement, you're only going to get it with multiple processors. Jim Lux JPL _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikhailberis at free.net.ph Tue Feb 17 06:56:34 2004 From: mikhailberis at free.net.ph (Dean Michael C. Berris) Date: 17 Feb 2004 19:56:34 +0800 Subject: [Beowulf] Best Setup for Batch Systems Message-ID: <1077018992.18450.21.camel@mikhail> Good day everyone, I have just a 5 node cluster networked together with a 100 Mbps Ethernet hub (well, not the best setup). The master acts as a NAT host for the internal hosts, and only the master node has 2 nics, one facing the internet and another facing the internal net. The master node is accessible from the internet, and I login to it to run jobs in the background (using screen). I've been reading a lot about OpenPBS and the Maui scheduler, but as mentioned in the list and also evident in the website, the OpenPBS system is not readily downloadable/distributable. Are there any alternatives to OpenPBS which does most of the same thing (batch scheduling of jobs for clusters)? Interfaceability using a GUI frontend (without having to make one of my own) is definitely a plus. TIA -- Dean Michael C. Berris http://mikhailberis.blogspot.com mikhailberis at free.net.ph +63 919 8720686 GPG 08AE6EAC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 17 08:47:41 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 17 Feb 2004 14:47:41 +0100 (CET) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077018992.18450.21.camel@mikhail> Message-ID: On 17 Feb 2004, Dean Michael C. Berris wrote: > Good day everyone, > > > I've been reading a lot about OpenPBS and the Maui scheduler, but as > mentioned in the list and also evident in the website, the OpenPBS > system is not readily downloadable/distributable. Are there any > alternatives to OpenPBS which does most of the same thing (batch > scheduling of jobs for clusters)? Interfaceability using a GUI frontend > (without having to make one of my own) is definitely a plus. Gridengine is probably a good bet for you. http://gridengine.sunsource.net The GUI is called qmon (I don't use it much) There are binaries available, and clear instructions on how to install it. If you have problems, join the Gridengine list where we'll help. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 17 08:40:46 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 17 Feb 2004 14:40:46 +0100 (CET) Subject: [Beowulf] Linux-HA conference and tutorial, UK Message-ID: If anyone is interested in Linux-HA, the UKUUG are having a tutorial and conference in Bournemouth. The people leading the tutorial are Alan Robertson and Lars Markowsky-Bree, who head up the Linux-HA project. http://www.ukuug.org/events/winter2004/ (ps. I won't be there) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From camm at enhanced.com Tue Feb 17 11:41:19 2004 From: camm at enhanced.com (Camm Maguire) Date: 17 Feb 2004 11:41:19 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: References: Message-ID: <547jylk6a8.fsf@intech19.enhanced.com> Greetings, and thanks for the fascinating discussion! I'm mostly interested in dram flops, and also not the absolute maximum, mars-rover level technology, but say within 10% of the best available options on a more or less commodity basis. Take care, Mark Hahn writes: > > Greetings! The subject line says it all -- where can one get the most > > bang per watt among systems currently available? > > depends on which kind of flops: cache-friendly or dram-oriented? > > > > -- Camm Maguire camm at enhanced.com ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Tue Feb 17 14:59:52 2004 From: atp at piskorski.com (Andrew Piskorski) Date: Tue, 17 Feb 2004 14:59:52 -0500 Subject: [Beowulf] ECC RAM or not? Message-ID: <20040217195952.GA50999@piskorski.com> For a low-cost cluster, would you insist on ECC RAM or not, and why? My inclination would be to always use ECC for anything, but it looks as if there is no such thing as an inexpensive motherboard which also supports ECC RAM. Either you can have a cheap motherboard (well under $100) with no ECC, or a pricey (well over $100) motherboard with ECC. Am I mistaken about this, are are there really no exceptions to this seeming "ECC motherboads are always expensive" rule? Also at least some large production clusters out there, KASY0, for example, doesn't use ECC RAM, do not use ECC - I wonder why: http://aggregate.org/KASY0/cost.html -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Feb 17 18:20:12 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 17 Feb 2004 18:20:12 -0500 (EST) Subject: [Beowulf] ECC RAM or not? In-Reply-To: <20040217195952.GA50999@piskorski.com> Message-ID: > For a low-cost cluster, would you insist on ECC RAM or not, and why? how low-cost, and what kind of code? technically, the chances of seeing dram corruption depends on how much ram you have, and how much you use it (as well as environmental factors, such as altitude, of course!) for a sufficiently low-cost cluster, you'd expect to have relatively little ram, and little CPU power to churn it, and therefore low rate of bit-flips. otoh, you can bet that the recent ECC upgrade of the VT cluster had a significant real cost (probably eaten by vendors for PR reasons...) some kinds of codes are "rad hard", in the sense that if a failure gives you a possibly-wront answer, you can just check the answer. that definition pretty much excludes traditional supercomputing, and certainly all physics-based simulations. searching/optimization stuff might work well in that mode, though rechecking only catches false positives, doesn't recover from false negatives. I suspect that doing ECC is cheaper than messing around with this kind of uncertainty, even for these specialized codes. > My inclination would be to always use ECC for anything, but it looks > as if there is no such thing as an inexpensive motherboard which also > supports ECC RAM. Either you can have a cheap motherboard (well under > $100) with no ECC, or a pricey (well over $100) motherboard with ECC. well, you're really pointing out the difference between desktop and workstation/server markets. for instance, there's not much physical difference between the i875 and i865 chipsets, but the former shows up in $200 boards that need a video card, and the latter in $100 ones that have integrated video. > Am I mistaken about this, are are there really no exceptions to this > seeming "ECC motherboads are always expensive" rule? it's a marketing/market-driven phenomenon. it's useful to work out the risks when you make this kind of decision. if you have 32 low-overhead nodes containing 20K-hour power supplies, you'll need to think about doing a replacement per month. if you have a 1M-hour disk in each of 1100 nodes, you shouldn't be shocked to get a couple failures a week. if 1100 nodes with 4G but no ECC see a two undetected corruptions a day, then 32 nodes with 1G will go a couple months between events... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dieter at engr.uky.edu Tue Feb 17 18:18:20 2004 From: dieter at engr.uky.edu (William Dieter) Date: Tue, 17 Feb 2004 18:18:20 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <200402171701.i1HH13h07766@NewBlue.scyld.com> Message-ID: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu> Try the cluster design tool at . You can enter your basic memory, memory bandwidth, etc requirements, then set the metric weighting to choose designs with the least power consumption first. For example, for the default requirements (minimal memory, disk, and network requirements, at least 50 GFLOPS, and a $10,000 budget), and weighting power consumption first then memory bandwidth, followed by GFLOPS I get the following as the best design: 23 Generic Fast Ethernet NIC $8.00 $184.00 23 Cat5 Cable for Fast Ethernet $2.00 $46.00 1 Generic 24 Port Fast Ethernet Switch $76.00 $76.00 23 Pentium 4 2.4GHz $166.00 $3818.00 23 Generic Socket 478 $56.00 $1288.00 69 Generic PC3200 256MB DDR $44.00 $3036.00 23 Generic Mid-Tower Case $50.00 $1150.00 3 Generic 2x2 Shelving Unit with Wheels $50.00 $150.00 Total $9748.00 The above design gets you 50 GFLOPS and 2.67 bytes/FLOP for about 30 Amps (you get to convert Amps to Watts.) Everything else in the design is pretty minimal, but you can adjust the requirements on the form to get what you need (or if you can't let me know why not :-) The CGI tries all designs with the parts in its database to find the ones that meet your requirements and metric weighting. The model includes current consumption for switches and compute nodes based on the power supply. The parts database is a bit out of date right now... let me know what you think. Bill Dieter. dieter at engr.uky.edu On Tuesday, February 17, 2004, at 12:01 PM, Camm Maguire wrote: > Greetings, and thanks for the fascinating discussion! > > I'm mostly interested in dram flops, and also not the absolute > maximum, mars-rover level technology, but say within 10% of the best > available options on a more or less commodity basis. > > Take care, > > Mark Hahn writes: > >>> Greetings! The subject line says it all -- where can one get the >>> most >>> bang per watt among systems currently available? >> >> depends on which kind of flops: cache-friendly or dram-oriented? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 17 21:38:39 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 18 Feb 2004 13:38:39 +1100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077018992.18450.21.camel@mikhail> References: <1077018992.18450.21.camel@mikhail> Message-ID: <200402181338.50678.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 17 Feb 2004 10:56 pm, Dean Michael C. Berris wrote: > I've been reading a lot about OpenPBS and the Maui scheduler, but as > mentioned in the list and also evident in the website, the OpenPBS > system is not readily downloadable/distributable. There is a forked version of OpenPBS called 'Torque' (it was called ScalablePBS, but Altair requested it changed its name) which includes a whole host of bug fixes and enhancements (including massive scalability) and is freely downloadable under an earlier, more free, OpenPBS license. It's under active development and has an active user community, though the mailing list is moderated for some bizzare reason, which means posts can take a little while to get through. The website is at: http://www.supercluster.org/projects/torque/ Good luck! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAMtAvO2KABBYQAh8RAp8cAJsHNJuoCmIxYMNUWguwpoueopKUxACdHJiq p0nGW3X3ATurlzaV+Iw5jtg= =xwcU -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Feb 17 23:20:37 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 18 Feb 2004 12:20:37 +0800 (CST) Subject: [Beowulf] SLURM - newest (and greatest?) batch system Message-ID: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com> One of the new features of SGE 6.0 is the parallelized job container (qmaster). Another batch system called SLURM (Simple Linux Utility for Resource Management) will be releasing soon. http://www.llnl.gov/linux/slurm/slurm.html - Like SGE 6.0, it also uses threads to parallelize the job container. - licensed under the GPL!! - developed by the US gov - uses Maui - designed to be simple :) - supports lots of interconnect switches. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Feb 17 23:02:54 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 18 Feb 2004 12:02:54 +0800 (CST) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077018992.18450.21.camel@mikhail> Message-ID: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> You can choose between SGE and SPBS. SGE has more features, better fault tolerance, better documentation, and better user support. http://gridengine.sunsource.net SPBS is closer to what you have now, so you and your users (BTW, are you the only one?) don't need to learn something new. http://www.supercluster.org/ Andrew. --- "Dean Michael C. Berris" ????> Good day everyone, > > I have just a 5 node cluster networked together with > a 100 Mbps Ethernet > hub (well, not the best setup). The master acts as a > NAT host for the > internal hosts, and only the master node has 2 nics, > one facing the > internet and another facing the internal net. The > master node is > accessible from the internet, and I login to it to > run jobs in the > background (using screen). > > I've been reading a lot about OpenPBS and the Maui > scheduler, but as > mentioned in the list and also evident in the > website, the OpenPBS > system is not readily downloadable/distributable. > Are there any > alternatives to OpenPBS which does most of the same > thing (batch > scheduling of jobs for clusters)? Interfaceability > using a GUI frontend > (without having to make one of my own) is definitely > a plus. > > TIA > > -- > Dean Michael C. Berris > http://mikhailberis.blogspot.com > mikhailberis at free.net.ph > +63 919 8720686 > GPG 08AE6EAC > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Tue Feb 17 23:37:00 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Tue, 17 Feb 2004 20:37:00 -0800 Subject: [Beowulf] ECC RAM or not? References: Message-ID: <003101c3f5d8$d9b03250$36a8a8c0@LAPTOP152422> > some kinds of codes are "rad hard", in the sense that if a failure gives > you a possibly-wront answer, you can just check the answer. My practical experience with DRAM designs has been that bit errors are more likely due to noise/design issues than radiation induced single event upsets. Back in the 80's I worked on a Multibus system where we used to get double bit errors in 11/8 ecc several times a week. Everyone just said "well, that's why we have ECC" until I did some quick statistics on what the ratio between single bit (corrected but counted) and double bit errors should have been. Such high rates defied belief, and it turned out to be a bus drive problem. that definition > pretty much excludes traditional supercomputing, and certainly all > physics-based simulations. searching/optimization stuff might work well > in that mode, though rechecking only catches false positives, doesn't > recover from false negatives. I suspect that doing ECC is cheaper than > messing around with this kind of uncertainty, even for these specialized codes. There are a number of algorithms which have inherent self checking built in. In the accounting business, this is why there's double entry, and/or checksums. In the signal processing world, there are checks you can do on things like FFTs, where total power in should equal total power out. > > > if you have 32 low-overhead nodes containing 20K-hour power supplies, you'll > need to think about doing a replacement per month. > > if you have a 1M-hour disk in each of 1100 nodes, you shouldn't be shocked > to get a couple failures a week. Shades of replacing tubes in Eniac or the Q-7A MIL-HDBK-217A is the "bible" on these sorts of computations. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Tue Feb 17 23:28:53 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Tue, 17 Feb 2004 20:28:53 -0800 Subject: [Beowulf] Max flops to watts hardware for a cluster References: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu> Message-ID: <002701c3f5d7$ceae2980$36a8a8c0@LAPTOP152422> This kind of thing is way cool.. Have you published the algorithm behind the page in a concise form somewhere? It would be handy to be able to point mission/system planners to it. ----- Original Message ----- From: "William Dieter" To: Sent: Tuesday, February 17, 2004 3:18 PM Subject: Re: [Beowulf] Max flops to watts hardware for a cluster > Try the cluster design tool at > . You can enter your basic > memory, memory bandwidth, etc requirements, then set the metric > weighting to choose designs with the least power consumption first. > > For example, for the default requirements (minimal memory, disk, and > network requirements, at least 50 GFLOPS, and a $10,000 budget), and > weighting power consumption first then memory bandwidth, followed by > GFLOPS I get the following as the best design: > > 23 Generic Fast Ethernet NIC $8.00 $184.00 > 23 Cat5 Cable for Fast Ethernet $2.00 $46.00 > 1 Generic 24 Port Fast Ethernet Switch $76.00 $76.00 > 23 Pentium 4 2.4GHz $166.00 $3818.00 > 23 Generic Socket 478 $56.00 $1288.00 > 69 Generic PC3200 256MB DDR $44.00 $3036.00 > 23 Generic Mid-Tower Case $50.00 $1150.00 > 3 Generic 2x2 Shelving Unit with Wheels $50.00 $150.00 > Total $9748.00 > > The above design gets you 50 GFLOPS and 2.67 bytes/FLOP for about 30 > Amps (you get to convert Amps to Watts.) Everything else in the design > is pretty minimal, but you can adjust the requirements on the form to > get what you need (or if you can't let me know why not :-) > > The CGI tries all designs with the parts in its database to find the > ones that meet your requirements and metric weighting. The model > includes current consumption for switches and compute nodes based on > the power supply. The parts database is a bit out of date right now... > > let me know what you think. > > Bill Dieter. > dieter at engr.uky.edu > > On Tuesday, February 17, 2004, at 12:01 PM, Camm Maguire wrote: > > Greetings, and thanks for the fascinating discussion! > > > > I'm mostly interested in dram flops, and also not the absolute > > maximum, mars-rover level technology, but say within 10% of the best > > available options on a more or less commodity basis. > > > > Take care, > > > > Mark Hahn writes: > > > >>> Greetings! The subject line says it all -- where can one get the > >>> most > >>> bang per watt among systems currently available? > >> > >> depends on which kind of flops: cache-friendly or dram-oriented? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Wed Feb 18 01:26:38 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Tue, 17 Feb 2004 22:26:38 -0800 Subject: [Beowulf] ECC RAM or not? References: Message-ID: <000601c3f5e8$2a8430f0$36a8a8c0@LAPTOP152422> ----- Original Message ----- From: "Mark Hahn" To: "Jim Lux" Sent: Tuesday, February 17, 2004 9:36 PM Subject: Re: [Beowulf] ECC RAM or not? > > > some kinds of codes are "rad hard", in the sense that if a failure gives > > > you a possibly-wront answer, you can just check the answer. > > > > My practical experience with DRAM designs has been that bit errors are more > > likely due to noise/design issues than radiation induced single event > > upsets. > > understood. then again, you're using deliberately selected rad-hard-ware, no? Nope... that was off the shelf DRAMs in a commercial environment (in 1980ish time frame, so they were none too dense DRAMs, either.. 256kB on a board I think, many, many, pieces.. probably 64kbit parts..) > I was mostly thinking about a talk I saw by the folks who care for ASCI-Q, > which is in Los Alamos. they say that the altitude alone is worth a 14x > increase in particle flux, and that this caused big problems for them with > a particular register on the ES40 data path that was not ecc'ed. Indeed.. ECC on memory is only part of the problem.. you really need ECC on address and data lines for full coverage (or, more properly EDAC).. The classic paper on altitude effects was done by folks at IBM, where they ran boards in NY and in Denver and, underground in Denver. Good experimental technique, etc. > > > Back in the 80's I worked on a Multibus system where we used to get > > double bit errors in 11/8 ecc several times a week. Everyone just said > > "well, that's why we have ECC" until I did some quick statistics on what the > > ratio between single bit (corrected but counted) and double bit errors > > should have been. Such high rates defied belief, and it turned out to be a > > bus drive problem. > > makes sense. to be honest, I don't see many single-bit errors even, > but today we've only < 200 GB ram online. inside a year, it'll probably > be more like 2TB, so maybe things will get more exciting ;) It's a very mixed bag, depending on what's causing the errors. If it's radiation, smaller feature sizes mean that there's a smaller target to hit, and the amount of energy transferred is less (of course, less energy is stored in the memory cell, too) > we're also pretty much at sealevel, with lots of building over us. > reactor next door, though ;) Type of particle, and it's energy, has a huge effect on the SEU effects. I would maintain, though, that run of the mill timing margin effects, particularly over temperature; and EMI/EMC effects are probably a more important source of bit hits in modern computers. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikhailberis at free.net.ph Wed Feb 18 05:25:22 2004 From: mikhailberis at free.net.ph (Dean Michael C. Berris) Date: 18 Feb 2004 18:25:22 +0800 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> References: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> Message-ID: <1077099918.4818.15.camel@mikhail> Thanks sir, and to everyone else that responded. I'm currently reading on SGE, and am going to be choosing as soon as I get the full picture. Currently my preference is still towards SPBS (Torque) mainly because it doesn't seem as complicated to set up. However, as a Debian user, I did an apt-cache search on batch system and a couple of packages were Queue and DQS (Distributed Queueing System). I went over to the DQS website, and I'm reading on it right now. What I'd like to know would be how different DQS (and/or Queue) is with regards to SPBS and SGE? It would seem like from what I've been reading, SGE and SPBS are really for clusters (and grids), and DQS is for a collection of computers that really don't work as a cluster (or as a parallel computer). How accurate is this assessment of mine? Are there any articles written by people in the group regarding comparisons between SGE and SPBS with regards to effectivity and reliability? Scalability is also a factor because the cluster may grow as more funding and problems get into the cluster project. I hope I never cease to get enlightened from posts in the group, and insights would be most appreciated. Thanks very much and have a nice day! :) On Wed, 2004-02-18 at 12:02, Andrew Wang wrote: > You can choose between SGE and SPBS. > > SGE has more features, better fault tolerance, better > documentation, and better user support. > > http://gridengine.sunsource.net > > SPBS is closer to what you have now, so you and your > users (BTW, are you the only one?) don't need to learn > something new. > > http://www.supercluster.org/ > > Andrew. > > > --- "Dean Michael C. Berris" > ????> Good day > everyone, > > > > I have just a 5 node cluster networked together with > > a 100 Mbps Ethernet > > hub (well, not the best setup). The master acts as a > > NAT host for the > > internal hosts, and only the master node has 2 nics, > > one facing the > > internet and another facing the internal net. The > > master node is > > accessible from the internet, and I login to it to > > run jobs in the > > background (using screen). > > > > I've been reading a lot about OpenPBS and the Maui > > scheduler, but as > > mentioned in the list and also evident in the > > website, the OpenPBS > > system is not readily downloadable/distributable. > > Are there any > > alternatives to OpenPBS which does most of the same > > thing (batch > > scheduling of jobs for clusters)? Interfaceability > > using a GUI frontend > > (without having to make one of my own) is definitely > > a plus. > > > > TIA > > > > -- > > Dean Michael C. Berris > > http://mikhailberis.blogspot.com > > mikhailberis at free.net.ph > > +63 919 8720686 > > GPG 08AE6EAC > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or > > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > ----------------------------------------------------------------- > ??? Yahoo!?? > ?????????????????????? > http://tw.promo.yahoo.com/mail_premium/stationery.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Dean Michael C. Berris http://mikhailberis.blogspot.com mikhailberis at free.net.ph +63 919 8720686 GPG 08AE6EAC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Wed Feb 18 05:47:27 2004 From: ds10025 at cam.ac.uk (ds10025 at cam.ac.uk) Date: Wed, 18 Feb 2004 10:47:27 +0000 Subject: [Beowulf] Howto setup jobs using MPI In-Reply-To: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> References: <1077018992.18450.21.camel@mikhail> Message-ID: <5.1.1.6.0.20040218104616.02a89e00@imap.hermes.cam.ac.uk> Hi How best to setup jobs using MPI? Dan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mack.joseph at epa.gov Wed Feb 18 07:22:07 2004 From: mack.joseph at epa.gov (Joseph Mack) Date: Wed, 18 Feb 2004 07:22:07 -0500 Subject: [Beowulf] S.M.A.R.T usage in big clusters References: <20040214192822.35170.qmail@web21203.mail.yahoo.com> Message-ID: <403358EF.7F0BDE75@epa.gov> Steven Timm wrote: > > On the drives where we have had the most failures we've kept track > of how well SMART predicted it pretty well.. it finds an error > in advance about half the time. How do you get your information out of smartd? I've found output in syslog - presumably I can grep for this. I can get e-mail if I want (from the docs). To look at the output of the long and short tests it appears that I have to interactively use smartctl. Is there anyway to have a flag that can be looked at periodically to say "this disk is about to fail"? Thanks Joe -- Joseph Mack PhD, High Performance Computing & Scientific Visualization SAIC, Supporting the EPA Research Triangle Park, NC 919-541-0007 Federal Contact - John B. Smith 919-541-1087 - smith.johnb at epa.gov _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Wed Feb 18 09:16:48 2004 From: timm at fnal.gov (Steven Timm) Date: Wed, 18 Feb 2004 08:16:48 -0600 (CST) Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <403358EF.7F0BDE75@epa.gov> References: <20040214192822.35170.qmail@web21203.mail.yahoo.com> <403358EF.7F0BDE75@epa.gov> Message-ID: On Wed, 18 Feb 2004, Joseph Mack wrote: > Steven Timm wrote: > > > > > On the drives where we have had the most failures we've kept track > > of how well SMART predicted it pretty well.. it finds an error > > in advance about half the time. > > How do you get your information out of smartd? > > I've found output in syslog - presumably I can grep for this. At the moment we are not using smartd. I was running an older version that didn't have it as part of the package. I wrote some cron scripts that do a short test every night and capture the output to a file. But we are going to transition and use smartd and use an agent we already have that is grepping /var/log/messages for other purposes. Steve Timm > > I can get e-mail if I want (from the docs). > > To look at the output of the long and short tests it appears that > I have to interactively use smartctl. > > Is there anyway to have a flag that can be looked at periodically to > say "this disk is about to fail"? > > Thanks Joe > -- > Joseph Mack PhD, High Performance Computing & Scientific Visualization > SAIC, Supporting the EPA Research Triangle Park, NC 919-541-0007 > Federal Contact - John B. Smith 919-541-1087 - smith.johnb at epa.gov > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dieter at engr.uky.edu Wed Feb 18 09:35:55 2004 From: dieter at engr.uky.edu (William Dieter) Date: Wed, 18 Feb 2004 09:35:55 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <002701c3f5d7$ceae2980$36a8a8c0@LAPTOP152422> Message-ID: On Tuesday, February 17, 2004, at 11:28 PM, Jim Lux wrote: > This kind of thing is way cool.. > Have you published the algorithm behind the page in a concise form > somewhere? It would be handy to be able to point mission/system > planners to > it. We just submitted the paper to IEEE Computer for review last week. If you want to look at the source code, it is available through . I haven't made an official tarball release yet, but you can get the latest code through CVS. If you want to make your own parts database on our website you can do that, too. It copies one of the existing databases into a new one, so if you just want to update a few prices, or add a few new parts, it doesn't take too much effort. Bill. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hanzl at noel.feld.cvut.cz Wed Feb 18 11:28:25 2004 From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz) Date: Wed, 18 Feb 2004 17:28:25 +0100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077099918.4818.15.camel@mikhail> References: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> <1077099918.4818.15.camel@mikhail> Message-ID: <20040218172825E.hanzl@unknown-domain> > However, as a Debian user, I did an apt-cache search on batch system and > a couple of packages were Queue and DQS (Distributed Queueing System). I > went over to the DQS website, and I'm reading on it right now. What I'd > like to know would be how different DQS (and/or Queue) is with regards > to SPBS and SGE? DQS is SGE's grandfather, the genealogy goes somehow like this: DQS(Florida State Univ.) -> CODINE(Genias) -> SGE(Sun) so you can expect DQS to be much simpler but also you can expect SGE to be much improoved. (My personal choice is SGE and I am quite happy with it.) Regards Vaclav Hanzl _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Wed Feb 18 09:35:23 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 18 Feb 2004 15:35:23 +0100 (CET) Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu> Message-ID: On Tue, 17 Feb 2004, William Dieter wrote: > 23 Generic Fast Ethernet NIC $8.00 $184.00 How much in terms of power have you assigned to this item ? If you really buy a cheap low-end FE NIC, you'll most probably end up with a RTL8139 based card. This chip by design puts quite a load on the main CPU especially if you use it in a cluster context (=lots of network activity). This might increase significantly the power consumption or reduce the available flops... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dieter at engr.uky.edu Wed Feb 18 10:24:31 2004 From: dieter at engr.uky.edu (William Dieter) Date: Wed, 18 Feb 2004 10:24:31 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: Message-ID: <8CD39EDD-6226-11D8-B4A2-000393BF25C6@engr.uky.edu> On Wednesday, February 18, 2004, at 09:35 AM, Bogdan Costescu wrote: > On Tue, 17 Feb 2004, William Dieter wrote: > >> 23 Generic Fast Ethernet NIC $8.00 $184.00 > > How much in terms of power have you assigned to this item ? The tool is not perfect. We have not broken down the power to that level of detail. There is a tradeoff between how much work you have to do for each component and how much detail the model has. > If you really buy a cheap low-end FE NIC, you'll most probably end up > with a RTL8139 based card. This chip by design puts quite a load on > the main CPU especially if you use it in a cluster context (=lots of > network activity). This might increase significantly the power > consumption or reduce the available flops... To get really accurate power consumption numbers we would have to measure for many different CPU/Motherboard/NIC combinations. OTOH, there are some really cheap cards based on the Davicom 9102 chipset, (newegg.com has at least two different brands for $4.00 to $6.00). The Davicom 9102 is enough of a tulip clone that the Ethernet HOWTO recommends trying the tulip driver before the manufacturer supplied driver... Bill. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Wed Feb 18 13:27:55 2004 From: bclem at rice.edu (Brent M. Clements) Date: Wed, 18 Feb 2004 12:27:55 -0600 (CST) Subject: [Beowulf] Best or standard hpc kernel sysctl settings. Message-ID: As part of our standards documentation, I'd like to set a good starting point for tuning various kernel parameters for clusters on Rice's campus. We have a few sysctl settings that we do based on the requirements of certain codes, but I'd like to know how everyone else is tuning their linux systems in their clusters. Can I get from you guys the sysctl parameter, it's value, and the reason why you set it that way? Thanks, Brent Clements _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dag at sonsorol.org Wed Feb 18 15:54:58 2004 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed, 18 Feb 2004 15:54:58 -0500 Subject: [Beowulf] 2nd call for speakers -- Bioclusters 2004 Workshop -- March 30 Boston, MA Message-ID: <4033D122.4080008@sonsorol.org> { Apologies for the cross-posting } Enclosed is a meeting announcement for a 1 day workshop we are organizing alongside the much larger 'BioITWorld Expo' in Boston, Ma. The goals are two-fold -- recreating the vibe from the OReilly Bioinformatics Technology conference series that was recently cancelled as well as providing a forum where folks involved at the intersection of life science research and high performance IT can come together to talk shop. Feel free to pass along the enclosed announcement as appropriate. We are actively seeking technical talks and presentations focusing on how challenging problems were solved or overcome. Regards, Chris {on behalf of the organizing committee} Email: bioclusters04 at open-bio.org -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bioclusters-workshop.txt URL: From nixon at nsc.liu.se Tue Feb 17 09:16:39 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Tue, 17 Feb 2004 15:16:39 +0100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077018992.18450.21.camel@mikhail> (Dean Michael C. Berris's message of "17 Feb 2004 19:56:34 +0800") References: <1077018992.18450.21.camel@mikhail> Message-ID: "Dean Michael C. Berris" writes: > I've been reading a lot about OpenPBS and the Maui scheduler, but as > mentioned in the list and also evident in the website, the OpenPBS > system is not readily downloadable/distributable. Torque (a.k.a Storm, a.k.a. Scalable PBS) is a fork of the OpenPBS source tree, with active maintenance and a reasonable license. http://www.supercluster.org/projects/torque It plays nicely with Maui. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.giesen at kodak.com Tue Feb 17 15:58:57 2004 From: david.giesen at kodak.com (David J Giesen) Date: Tue, 17 Feb 2004 15:58:57 -0500 Subject: [Beowulf] Cluster questions for Quantum Chemistry Message-ID: <40328091.A0200730@kodak.com> Hello- (Apologies to those who have seen a similar question on the CCL mailing list) We may be in the market for a new Linux cluster these days. Unfortunately, I haven't kept up on all the latest issues, and I'd appreciate any answers you all have for any of these questions. We want to run mainly QM codes such as Gaussian 98/Gaussian 03, Jaguar and PQS on these machines with linux. We'd likely be running in parallel, typically across 3-4 dual-processor nodes. 1) Xeon vs P4: [a] At the same GHz and front-side bus speed is there a difference in performance between these chips? [b] Is there a difference in reliability? 2) AMD Opteron vs Athlon: [a] Does any QM code actually take advantage of Opteron's 64-bit technology? [b] Have people moved away from Athlon boxes because of heat problems? 3) AMD vs Intel: How to compare speeds between these two different types of processors for QM codes? Does an Athlon 2800 (2.08 GHz) run more like a 2.0 GHz P4 or a 2.8 GHz P4? 3) How important is front-side bus speed these days for quantum chemistry problems? 4) How important are 100 MHz ethernet versus 1 Gb ethernet connections between the nodes for quantum chemistry problems? Thanks in advance! Dave Any questions which highlight out my extreme stupidity are a result of exactly that (my own stupidity) rather than a reflection on the positions of the Eastman Kodak Company. -- Dr. David J. Giesen Eastman Kodak Company david.giesen at kodak.com 2/83/RL MC 02216 (ph) 1-585-58(8-0480) Rochester, NY 14650 (fax)1-585-588-1839 -- Dr. David J. Giesen Eastman Kodak Company david.giesen at kodak.com 2/83/RL MC 02216 (ph) 1-585-58(8-0480) Rochester, NY 14650 (fax)1-585-588-1839 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Feb 18 22:09:53 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Thu, 19 Feb 2004 11:09:53 +0800 (CST) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077099918.4818.15.camel@mikhail> Message-ID: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com> --- "Dean Michael C. Berris" > I'm currently reading > on SGE, and am going to be choosing as soon as I get > the full picture. > Currently my preference is still towards SPBS > (Torque) mainly because it > doesn't seem as complicated to set up. To install SGE, you don't even need to compile the source, just download the pre-compiled binary package, or grab the rpm. And also, SGE doesn't require root access, it can untar the package in your home directory, run the install scripts, and start playing with it. > What I'd > like to know would be how different DQS (and/or > Queue) is with regards > to SPBS and SGE? Debian is planning to replace DQS with SGE, but the maintainer of DQS was gone (he left the university). DQS and SGE are very similar. And PBS and SPBS are very similar too. > It would seem like from what I've been reading, SGE > and SPBS are really > for clusters (and grids), and DQS is for a > collection of computers that > really don't work as a cluster (or as a parallel > computer). How accurate > is this assessment of mine? Are you talking about compute farms? SGE is also used in compute farms as well, where people run EDA simulations, graphic rendering jobs, BLAST jobs, etc. SGE has quite a lot of resource management features. SPBS/PBS are used in HPC clusters, since before SGE was opensource, PBS was free/opensource, so more people used it in those environments. > Are there any articles written by people in the > group regarding > comparisons between SGE and SPBS with regards to > effectivity and > reliability? SGE vs PBS on the rocks cluster mailing list: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-September/002980.html SPBS has lots of patches integrated, but still if your SPBS master node crashes, your cluster is gone. In SGE, the admin can config 1 or more shadow masters, so in theory as long as any one machine in the cluster is running, your cluster is not dead. > Scalability is also a factor because > the cluster may grow > as more funding and problems get into the cluster > project. Both SGE and SPBS can scale to thousands of nodes, the question is, do you have the funding? :-) (SGE 6.0 will scale even further) > I hope I never cease to get enlightened from posts > in the group, and > insights would be most appreciated. I think you should try to install both, it is better to feel it than to just listen to other people. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Feb 18 22:38:18 2004 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 19 Feb 2004 14:38:18 +1100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com> References: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com> Message-ID: <200402191438.19333.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 19 Feb 2004 02:09 pm, Andrew Wang wrote: > SPBS has lots of patches integrated, but still if your > SPBS master node crashes, your cluster is gone. Well, depends on your definition of "gone" really. People can't queue new jobs, jobs waiting to run won't be started, but as long as your filestore is elsewhere then running jobs won't be interrupted. However, if your filestore server disappears then you're stuffed. :-) Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFANC+qO2KABBYQAh8RApu3AKCET1tayR/fx4dStcQXO+AXJgThUACdE+3q jHWTp4HmlzO8CnmObbFarWA= =PrTq -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Thu Feb 19 10:41:26 2004 From: raysonlogin at yahoo.com (Rayson Ho) Date: Thu, 19 Feb 2004 07:41:26 -0800 (PST) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <200402191438.19333.csamuel@vpac.org> Message-ID: <20040219154126.56423.qmail@web11411.mail.yahoo.com> I think it is one of the biggest problems with *PBS, especially in the compute farm environment. The more advanced batch systems (SGE and LSF) have this feature for years, not sure why *PBS still don't have it. (AFAIK, PBSPro 5.4 will include it, but isn't it late??) Rayson --- Chris Samuel wrote: > Well, depends on your definition of "gone" really. > > People can't queue new jobs, jobs waiting to run won't be started, > but as long > as your filestore is elsewhere then running jobs won't be > interrupted. > > However, if your filestore server disappears then you're stuffed. :-) > > Chris > __________________________________ Do you Yahoo!? Yahoo! Mail SpamGuard - Read only the mail you want. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.brookes at quadrics.com Thu Feb 19 10:50:43 2004 From: john.brookes at quadrics.com (john.brookes at quadrics.com) Date: Thu, 19 Feb 2004 15:50:43 -0000 Subject: [Beowulf] Best Setup for Batch Systems Message-ID: <30062B7EA51A9045B9F605FAAC1B4F6234EB15@tardis0.quadrics.com> If you keep the db on a separate filestore then - if your pbs server goes down - you can just have a failover server that 'becomes' (takes over the ipaddr and hostname - the other nodes won't even notice the difference) the original server if the original gets screwed. We've got a couple of customers that do this, but YMMV as they use: a) a somewhat non-standard PBS; b) out-of-band management to ensure that the node isn't just temporarily unresponsive. Cheers, John Brookes Quadrics > -----Original Message----- > From: Chris Samuel [mailto:csamuel at vpac.org] > Sent: 19 February 2004 03:38 > To: beowulf at beowulf.org > Subject: Re: [Beowulf] Best Setup for Batch Systems > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Thu, 19 Feb 2004 02:09 pm, Andrew Wang wrote: > > > SPBS has lots of patches integrated, but still if your > > SPBS master node crashes, your cluster is gone. > > Well, depends on your definition of "gone" really. > > People can't queue new jobs, jobs waiting to run won't be > started, but as long > as your filestore is elsewhere then running jobs won't be interrupted. > > However, if your filestore server disappears then you're stuffed. :-) > > Chris > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQFANC+qO2KABBYQAh8RApu3AKCET1tayR/fx4dStcQXO+AXJgThUACdE+3q > jHWTp4HmlzO8CnmObbFarWA= > =PrTq > -----END PGP SIGNATURE----- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From radams at csail.mit.edu Thu Feb 19 14:03:38 2004 From: radams at csail.mit.edu (Ryan Adams) Date: Thu, 19 Feb 2004 14:03:38 -0500 Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC Message-ID: <1077217418.4982.35.camel@localhost> Please forgive the length of this email, as I'm going to try to be comprehensive: I have a problem that divides nicely (embarrassingly?) into parallelizable chunks. Each chunk takes approximately 2 to 5 seconds to complete and requires no communication during that time. Essentially there is a piece of data, around 500KB that must be processed and a result returned. I'd like to process as many of these pieces of data as possible. I am considering building a small heterogeneous cluster to do this (at home, basically), and am trying to decide exactly how to architect the task distribution. The network will probably be Fast Ethernet. Initially there will be four machines processing the data, but I could imagine as many as ten in the near term. My current back-of-the-envelope math puts an aggregate load (assuming 2.0s per job, 500KB transferred each, with ten nodes) of 2.5MB/s on the network, so it would seem that 100BT can get the job done without introducing much delay compared to the 2.0s execution time. Perhaps I am doing this math wrong, but I was also thinking that since the download of the data is such an I/O-intensive task that it would be reasonable to place that in a separate thread from the floating point calculations. This way, I could hope to work on data while my socket read is blocking. My question is basically this: is 2-5 seconds too small of a job to justify a batching system like *PBS or Gridengine? It would seem that the overhead for a job that requires a few hours would be very insignificant, but what about a few seconds? Certainly, one option would be to bundle sets of these chunks together for a larger effective job. Am I wasting my time thinking about this? I've been considering rolling my own scheduling system using some kind of RPC, but I've been around software development long enough to know that it is better to use something off-the-shelf if at all possible. Thanks in advance... Ryan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 19 14:20:04 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 19 Feb 2004 14:20:04 -0500 (EST) Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <1077217418.4982.35.camel@localhost> Message-ID: On Thu, 19 Feb 2004, Ryan Adams wrote: > My question is basically this: is 2-5 seconds too small of a job to > justify a batching system like *PBS or Gridengine? It would seem that > the overhead for a job that requires a few hours would be very > insignificant, but what about a few seconds? Certainly, one option > would be to bundle sets of these chunks together for a larger effective > job. Am I wasting my time thinking about this? > > I've been considering rolling my own scheduling system using some kind > of RPC, but I've been around software development long enough to know > that it is better to use something off-the-shelf if at all possible. > > Thanks in advance... I personally think that it is too small a task to use a batching system, especially since you're likely not going to architect it as a true batching system. I think you have three primary options for ways to develop your code. Well, four if you count NFS. The SIMPLEST way is to put your data blocks in files on an NFS crossmounted filesystem, and start jobs inside e.g. a simple perl script loop that grabs "the next data file" and runs on it and writes out its results back to the NFS file system for dealing with or accruing later. You're basically using NFS as your transport mechanism. Now, NFS isn't horribly efficient relative to raw peak network speed, but neither is it completely horrible -- at 100 BT (say 9-10 MB/sec peak BW) you ought to be able to get at least half of that on an NFS read of a big file. At 5 MB/sec, your 1/2 MB file should take a 0.1 seconds to be read (plus a latency hit) which is "small" (as you note) compared to a run time of 2-5 seconds so you should be able to get nice parallel speedup on four or five hosts. You can test your combined latency and bandwidth with a simple perl script or binary that opens a dozen (different!) files inside a loop. Beware caching, which will give you insane numbers if you aren't careful (as in don't run the test twice on the files without modifying them on the server). The other three ways do it "properly" and permit you both finer control (with the NFS method you'll have to work out file locking and work distribution to make sure two nodes don't try to work on the same file at the same time) and higher BW, close to the full bandwidth of the network. They'll ALSO require more programming. a) PVM b) MPI c) raw networking. PVM is a message passing library. There is a PVM program template on my personal GPL software website: http://www.phy.duke.edu/~rgb/General/general.php that might suffice to get you started -- it should just compile and run a simple master/slave program, and you should be able to modify it fairly simply to have the master distribute the next block of work to the first worker/slave to finish. If your CPUs are well balanced the I/O transactions will antibunch and communications will be very efficient. MPI is another message passing library. I don't have an MPI template, but there are example programs in the MPI distributions and on many websites, and there are books (on both PVM and MPI) from e.g. MIT press that are quite excellent. There is also a regular MPI column in Cluster World Magazine that has been working through intro level MPI applications, and old columns by Forrest Hoffman in Linux Magazine ditto. At least -- google is your friend. Both PVM and MPI are likely to be similar in ease of programming, hassle of setting up a parallel environment, and speed, and both of them should give you a very healthy fraction of wirespeed while shielding you from having to directly manipulate the network. Finally there are raw sockets (which it sounds like you are inclined towards). Now, I have nothing against raw socket programming (he says having spent the day on xmlsysd/wulflogger/libwulf, a raw socket-based monitoring program:-). However, it is NOT trivial -- you have to invent all sorts of wheels that are already invented for you and wrapped up in simple library calls with PVM or MPI. Its advantages are maximal speed -- you can't get faster than a point to point network connection -- the ability to thread the connection/I/O component and MAYBE take advantage of letting the NIC do some of the work via DMA while the CPU is doing other work, and complete control. The disadvantages are that you'll be responsible for determining e.g. message length, dealing with a dropped connection without crashing everything, debugging your server daemon and worker clients (or worker daemons and master client) in parallel when they are running on different machines, and so forth. I >>might<< be able to provide you with some applications that aren't exactly templates but that illustrate how to get started on this approach (and refer you to some key books) but if you really are a networking novice you'll need to want to do this as an excuse to stop being a novice by writing your own application or it isn't worth it. You'll need to be a much better and more skilled programmer altogether in order to debug everything and check for the myriad of error conditions that can occur and deal with them robustly. There are really a few other approaches -- perl now supports threads so you CAN use a perl script and ssh as a master/work distribution system -- but raw sockets aren't much easier to manage in perl than they are in C and using ssh as a transport layer adds overhead at least equal to or in excess to NFS, so you'd probably want to use NFS as transport and the perl script to just manage task distribution (for which it is ideally suited in this simple a context). I have a nice example threaded perl task distribution script (which I wrote for MY Cluster Magazine column some months ago:-) which I can put somewhere if this interests you. HTH, rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dcs at et.byu.edu Thu Feb 19 15:19:56 2004 From: dcs at et.byu.edu (Dave Stirling) Date: Thu, 19 Feb 2004 13:19:56 -0700 (MST) Subject: [Beowulf] comparing MPI HPC interconnects: manageability? Message-ID: Hi all, While performance (latency, bandwidth) usually comes to the fore in discussions about high performance interconnects for MPI clusters, I'm curious as to what your experiences are from the standpoint of manageability -- NIC's and spines and switches all fail at one time or another, but I'd like input as to how individual products (Myrinet, Quadrics, Infiniband, etc) handle this. In your clusters does the hardware replacement involve simple steps (swap out the NIC, rerun some config utilities) or something more complex (such as bringing down the entire high speed network to reconfigure it so all the nodes can talk to the new hardware); i.e., How painful is it to replace a single failed NIC? I'd imagine that most cluster admins are reluctant to interrupt running jobs in order to re-initialize the equipment after hardware replacement. Any information about how your clusters running high-speed interconnects handle interconnect hardware failure/replacement would be very helpful. Thanks, Dave Stirling Brigham Young University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Feb 19 17:22:38 2004 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 20 Feb 2004 09:22:38 +1100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <20040219154126.56423.qmail@web11411.mail.yahoo.com> References: <20040219154126.56423.qmail@web11411.mail.yahoo.com> Message-ID: <200402200922.39632.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 20 Feb 2004 02:41 am, Rayson Ho wrote: [No failover support in the pbs_server] > I think it is one of the biggest problems with *PBS, especially in the > compute farm environment. Torque (formerly SPBS) is very stable, especially since we helped the SuperCluster folks clobber the various memory leaks in the server. Our pbs_server has been running for almost a month now since I last restarted it (because I was doing a bit of system maintenance, not because of PBS problems, I think it'd been running for about 2 months before that) and it's only VSZ 3148 and RSS 2136. :-) NB: I'm still running an SPBS release from early November as that's when we fixed the last memory leak and it's worked like a dream since then. > The more advanced batch systems (SGE and LSF) have this feature for > years, not sure why *PBS still don't have it. I believe it's on the SuperCluster folks list of things to do, but they've been busy working on the stability front (as well as MAUI and Silver). CC'd to the SuperCluster folks so they can respond. > (AFAIK, PBSPro 5.4 will include it, but isn't it late??) No idea, don't use it. - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFANTcuO2KABBYQAh8RAk8AAJ0ZGx3+qLPHWMjFkG7PGD8pPzwBWwCeKnUQ u1aXnixvHrknKTqtNVDRVhM= =28y0 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.kidger at quadrics.com Thu Feb 19 18:13:20 2004 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Thu, 19 Feb 2004 23:13:20 +0000 Subject: [Beowulf] comparing MPI HPC interconnects: manageability? In-Reply-To: References: Message-ID: <200402192313.20932.daniel.kidger@quadrics.com> Dave, > While performance (latency, bandwidth) usually comes to the fore in > discussions about high performance interconnects for MPI clusters, I'm > curious as to what your experiences are from the standpoint of > manageability -- NIC's and spines and switches all fail at one time or > another, but I'd like input as to how individual products (Myrinet, > Quadrics, Infiniband, etc) handle this. In your clusters does the > hardware replacement involve simple steps (swap out the NIC, rerun some > config utilities) or something more complex (such as bringing down the > entire high speed network to reconfigure it so all the nodes can talk to > the new hardware); i.e., How painful is it to replace a single failed NIC? > > I'd imagine that most cluster admins are reluctant to interrupt running > jobs in order to re-initialize the equipment after hardware replacement. > Any information about how your clusters running high-speed interconnects > handle interconnect hardware failure/replacement would be very helpful. AFAIK all interconnects would allow the swap of a NIC without bringing down the whole network - but in all cases any parallel job running on that node would need to be aborted since in general high-speed interconect PCI cards are not hot-swappable - that node woudl need to be power-cycled. As for the cables and switches, I can't speak for other vendors - but for example a line card in a Quadrics Switch can be hot-swapped even while there are running MPI jobs that are sending data through that line card at the time - the jobs simply pause until the cables are reconnected. I would expect that other interconnects are the same in this respect? Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Feb 19 18:07:43 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 20 Feb 2004 00:07:43 +0100 (CET) Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <1077217418.4982.35.camel@localhost> Message-ID: On Thu, 19 Feb 2004, Ryan Adams wrote: > Please forgive the length of this email, as I'm going to try to be > comprehensive: > There was a discussion on the Gridengine user list recently, regarding submitting lots and lots of short jobs in a bank in London. It developed into quite an interesting discussion, and I learned lots. Sorry - I tried to find the thread, but can't quite get the correct keywords. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Feb 19 20:04:32 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 20 Feb 2004 09:04:32 +0800 (CST) Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <1077217418.4982.35.camel@localhost> Message-ID: <20040220010432.88699.qmail@web16802.mail.tpe.yahoo.com> --- Ryan Adams ???? > My question is basically this: is 2-5 seconds too > small of a job to > justify a batching system like *PBS or Gridengine? Yes, 10 minutes or greater sound more reasonable. May be you can chunk 100 or more of those tasks into a job and submit it into a batch system. Also, from the "Tuning guide" HOWTO on the GridEngine website, SGE has a feature called "scheduling-on-demand" -- seems like it will help a lot since the scheduler is activated whenever a job arrives or a machine becomes available. Andrew. > Certainly, one option > would be to bundle sets of these chunks together for > a larger effective > job. Am I wasting my time thinking about this? > > I've been considering rolling my own scheduling > system using some kind > of RPC, but I've been around software development > long enough to know > that it is better to use something off-the-shelf if > at all possible. > > Thanks in advance... > > Ryan > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Feb 19 20:13:18 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 20 Feb 2004 09:13:18 +0800 (CST) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <200402200922.39632.csamuel@vpac.org> Message-ID: <20040220011318.63921.qmail@web16804.mail.tpe.yahoo.com> --- Chris Samuel ????> ----- > Torque (formerly SPBS) is very stable, especially > since we helped the > SuperCluster folks clobber the various memory leaks > in the server. It's not whether PBS itself is stable or not. There are human errors, machine problems, network problems, etc... And besides, the master machine also needed to be taken offline for OS upgrade. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tobeveryhonest at hotmail.com Fri Feb 20 03:55:22 2004 From: tobeveryhonest at hotmail.com (Salman Guy) Date: Fri, 20 Feb 2004 08:55:22 +0000 Subject: [Beowulf] want to implement a Beowulf cluster Message-ID: hi all, I want to learn Beowulf cluster implementation practically and for this purpose i need some help from u ppl.....I need reading material and ebooks so if anyone of u has done some practical work on Beowulf clusters then plz guide me or send me information regarding this, help will be appreciated ...thanx in advance _________________________________________________________________ MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. http://join.msn.com/?page=features/virus&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Feb 20 06:13:02 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 20 Feb 2004 12:13:02 +0100 (CET) Subject: [Beowulf] want to implement a Beowulf cluster In-Reply-To: Message-ID: On Fri, 20 Feb 2004, Salman Guy wrote: > hi all, > I want to learn Beowulf cluster implementation practically and for this > purpose i need some help from u ppl.....I need reading material and ebooks > so if anyone of u has done some practical work on Beowulf clusters then plz > guide me or send me information regarding this, > I think we need a FAQ here :-) Sorry I'm in a rush to go off on the train to FOSDEM in Brussels. SO I always say: Look at Robert Browns webpages at Duke The books 'Linux Clustering' by Charles Bookman and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Fri Feb 20 05:10:38 2004 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Fri, 20 Feb 2004 11:10:38 +0100 Subject: [Beowulf] comparing MPI HPC interconnects: manageability? In-Reply-To: <200402192313.20932.daniel.kidger@quadrics.com> References: <200402192313.20932.daniel.kidger@quadrics.com> Message-ID: <200402201110.38850.joachim@ccrl-nece.de> Dan Kidger: > AFAIK all interconnects would allow the swap of a NIC without bringing down > the whole network - but in all cases any parallel job running on that node > would need to be aborted since in general high-speed interconect PCI cards > are not hot-swappable - that node woudl need to be power-cycled. AFAIK, this is the same for SCI, but I would need to check this to be sure. Anyway, the application using the adapter to be swapped would have to be restarted anyway as its resources are gone. Avoiding this would be very hard, if at all possible. > As for the cables and switches, I can't speak for other vendors - but for > example a line card in a Quadrics Switch can be hot-swapped even while > there are running MPI jobs that are sending data through that line card at > the time - the jobs simply pause until the cables are reconnected. I would > expect that other interconnects are the same in this respect? SCI typically uses no external switches, and concerning the exchange of adapters or cables, there are two strategies: the application(s) has/have to wait until transfers are again successful, or the driver recognizes the problem and changes the routing. Of course, this can be combined into a two-phase strategy. I guess this is the way Scali is doing it. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Fri Feb 20 08:41:11 2004 From: lathama at yahoo.com (Andrew Latham) Date: Fri, 20 Feb 2004 05:41:11 -0800 (PST) Subject: [Beowulf] want to implement a Beowulf cluster In-Reply-To: Message-ID: <20040220134111.27571.qmail@web60305.mail.yahoo.com> or download the mailing list archive for the last year! thats an ebook all to its self --- John Hearns wrote: > On Fri, 20 Feb 2004, Salman Guy wrote: > > > hi all, > > I want to learn Beowulf cluster implementation practically and for this > > purpose i need some help from u ppl.....I need reading material and ebooks > > so if anyone of u has done some practical work on Beowulf clusters then plz > > > guide me or send me information regarding this, > > > I think we need a FAQ here :-) > Sorry I'm in a rush to go off on the train to FOSDEM in Brussels. > > SO I always say: > Look at Robert Browns webpages at Duke > > The books 'Linux Clustering' by Charles Bookman > and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== *----------------------------------------------------------* Andrew Latham AKA: LATHAMA (lay-th-ham-eh) - LATHAMA.COM LATHAMA at LATHAMA.COM - LATHAMA at YAHOO.COM If yahoo.com is down we have bigger problems than my email! *----------------------------------------------------------* _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmpc at sanger.ac.uk Fri Feb 20 13:39:49 2004 From: gmpc at sanger.ac.uk (Guy Coates) Date: Fri, 20 Feb 2004 18:39:49 +0000 (GMT) Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <200402201109.i1KB99h12383@NewBlue.scyld.com> References: <200402201109.i1KB99h12383@NewBlue.scyld.com> Message-ID: > > My question is basically this: is 2-5 seconds too small of a job to > justify a batching system like *PBS or Gridengine? That workload is do-able with the right queuing system. LSF (don't know about gridengine off hand) has a concept of "job chunking" for dealing with short running jobs. The queuing system batches up a number of jobs (eg 10 or 20) and then submits them all on one go to the work host where they run sequentially. This cuts down on the scheduling overhead. We've just had a user push 250,000 short running jobs though our cluster this-afternoon using this approach. Cheers, Guy Coates -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Fri Feb 20 15:12:09 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Fri, 20 Feb 2004 15:12:09 -0500 (EST) Subject: [Beowulf] want to implement a Beowulf cluster In-Reply-To: Message-ID: On Fri, 20 Feb 2004, John Hearns wrote: > On Fri, 20 Feb 2004, Salman Guy wrote: > > > hi all, > > I want to learn Beowulf cluster implementation practically and for this > > purpose i need some help from u ppl.....I need reading material and ebooks > > so if anyone of u has done some practical work on Beowulf clusters then plz > > guide me or send me information regarding this, > > > I think we need a FAQ here :-) There are the old FAQ and HOWTO's (still some relevant background information): http://www.canonical.org/~kragen/beowulf-faq.txt http://yara.ecn.purdue.edu/~pplinux/PPHOWTO/pphowto.html#toc1 http://www.tldp.org/HOWTO/Beowulf-HOWTO.html There are other links at ClusterWorld.com (on the right side, scroll down) that may be useful. Now is a good time to announce my effort to update the FAQ (and possibly the HOWTO). Starting next week, I plan on updating the FAQ by using the ClusterWorld.com site as a place to collect questions and answers. Stay tuned. Of course ClusterWorld magazine is designed to provide this type of information as well. > Sorry I'm in a rush to go off on the train to FOSDEM in Brussels. > > SO I always say: > Look at Robert Browns webpages at Duke > and book: http://www.phy.duke.edu/brahma/Resources/beowulf_book.php > The books 'Linux Clustering' by Charles Bookman IMO, this is not a good book for HPC clusters. > and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent. New edition: http://www.amazon.com/exec/obidos/tg/detail/-/0262692929/102-0957058-4520116?v=glance Doug ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mg_india at sancharnet.in Sat Feb 21 19:18:55 2004 From: mg_india at sancharnet.in (Sawan Gupta) Date: Sun, 22 Feb 2004 05:48:55 +0530 Subject: [Beowulf] Movie Editing Requirements Message-ID: <000001c3f8d9$79436f00$8bd2003d@myserver> Hello, My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT system with 512 DDRAM and a 128 MB Graphic Card. But when he perform some rendering operations, it takes nearly 10-15 minutes to complete. He wishes to upgrade his system to dual XEON with more RAM to minimize this time delay. I want to know whether this will suit his requirments or a cluster is just what he needs. Please tell me which cluster can suit his requirements i.e. Windows/Linux. I mean which cluster can best suit these requirements. Also are the softwares used by him also available for Linux or not. (If the solution suggested is in Linux) Regards, Sawan Gupta || Mg_India at sancharnet.in || _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Feb 21 20:50:54 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 21 Feb 2004 20:50:54 -0500 (EST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: <000001c3f8d9$79436f00$8bd2003d@myserver> Message-ID: > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > system with 512 DDRAM and a 128 MB Graphic Card. I would guess that none of this work is done by the graphics card, so that his performance is strictly dependent on the P4 and the fairly modest amount of ram he has. I would guess that most of these applications are fairly memory-intensive, and not particularly cache-friendly. I doubt HT would matter in this case, except that IIRC all HT CPUs are the 'c' model, and thus run with 6.4 GB/s of dram bandwidth. I'm sure you already know that 512M probably too low. > But when he perform some rendering operations, it takes nearly 10-15 > minutes to complete. if this was linux, I'd advise you to use tools like oprofile, vmstat, etc to find out where it's spending its time. since it's only windows, you'll probably have to resort to watching the disk light, and running that nasty little windows accessory that tells you about cpu/memory usage. > He wishes to upgrade his system to dual XEON with more RAM to minimize > this time delay. sure. though he'd almost certainly run faster with a dual-opteron, since such systems deliver noticably more memory bandwidth and lower latency. a dual-xeon can actually be slower than a uni P4c system. it would probably make sense to talk to him about how his machine and apps are configured first. for instance, is he actually using HT, and does he notice any performance difference if he turns it off? is his ram dual-bank-PC3200? any sense of how much time is spent on disk IO? > I want to know whether this will suit his requirments or a cluster is > just what he needs. clusters are clearly more scalable, and are widely used in the render/effects industry. comparing a pair of P4c's to a single dual-opteron, though, I have no idea. I think it would depend on his applications, mainly. there's no clear answer to price/performance when it comes to clusters of duals vs unis. unis tend to be too large, and in most cases wind up replicating too many components, especially moving parts, to compete. I believe most clusters, in any industry, are not unis. > Please tell me which cluster can suit his requirements i.e. > Windows/Linux. windows is the right choice in exactly one situation: when the exact configuration you need is available off-the-shelf, and you already know how to use it. linux (unix in general) is far more robust, easy-to-manage, flexible, scalable, cheap, etc. all those TCO studies sponsored by msft consist of the following astonishing conclusion: if you have windows-only users and a supply of cheap msce's and are comfortable with the crappy level of support that the ms world provides, then indeed windows is cheaper. > Also are the softwares used by him also available for Linux or not. (If > the solution suggested is in Linux) only he can decide that. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Sat Feb 21 23:30:13 2004 From: bclem at rice.edu (Brent M. Clements) Date: Sat, 21 Feb 2004 22:30:13 -0600 (CST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: References: Message-ID: Actually a beowulf cluster can also run windows. There is a port of maya to clusters...There are also many other movie editing software distributions that work very well on clusters..It also doesn't matter what os a beowulf cluster runs. -Brent Brent Clements Linux Technology Specialist Information Technology Rice University On Sat, 21 Feb 2004, Joel Jaeggli wrote: > Given that it sounds like you're on windows, a beowulf cluster is not > appropriate from your application... > > > On Sun, 22 Feb 2004, Sawan Gupta wrote: > > > > > Hello, > > > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > > system with 512 DDRAM and a 128 MB Graphic Card. > > > > But when he perform some rendering operations, it takes nearly 10-15 > > minutes to complete. > > > > He wishes to upgrade his system to dual XEON with more RAM to minimize > > this time delay. > > > > I want to know whether this will suit his requirments or a cluster is > > just what he needs. > > Please tell me which cluster can suit his requirements i.e. > > Windows/Linux. > > I mean which cluster can best suit these requirements. > > > > Also are the softwares used by him also available for Linux or not. (If > > the solution suggested is in Linux) > > > > > > Regards, > > > > Sawan Gupta > > || Mg_India at sancharnet.in || > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > -------------------------------------------------------------------------- > Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Sat Feb 21 23:18:09 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Sat, 21 Feb 2004 20:18:09 -0800 (PST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: <000001c3f8d9$79436f00$8bd2003d@myserver> Message-ID: Given that it sounds like you're on windows, a beowulf cluster is not appropriate from your application... On Sun, 22 Feb 2004, Sawan Gupta wrote: > > Hello, > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > system with 512 DDRAM and a 128 MB Graphic Card. > > But when he perform some rendering operations, it takes nearly 10-15 > minutes to complete. > > He wishes to upgrade his system to dual XEON with more RAM to minimize > this time delay. > > I want to know whether this will suit his requirments or a cluster is > just what he needs. > Please tell me which cluster can suit his requirements i.e. > Windows/Linux. > I mean which cluster can best suit these requirements. > > Also are the softwares used by him also available for Linux or not. (If > the solution suggested is in Linux) > > > Regards, > > Sawan Gupta > || Mg_India at sancharnet.in || > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From c00jsh00 at nchc.org.tw Sun Feb 22 04:32:41 2004 From: c00jsh00 at nchc.org.tw (Jyh-Shyong Ho) Date: Sun, 22 Feb 2004 17:32:41 +0800 Subject: [Beowulf] 64-bit Gaussian 03 on Opteron/SuSE References: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com> Message-ID: <40387739.3891D96C@nchc.org.tw> Hi, We have managed to built a native 64-bit version of Gaussian 03 Rev.B05 on a dual Opteron box running SLSE8 for AMD64 with 64-bit PGI Workstation 5.1.3 compiler and 64-bit GOTO library. We ran all the test cases included in Gaussian 03 source code and compared the results against the reference results ran on SGI. All tests cases are successfully completed except test602 and test605 with error at the last stage when l9999 tries to close files. There are several files in directory bsd need some modification: machine.c (add one section to return "x86_64" as machine identification) mdutil.c (add one section for x86_64) mdutil.f (add one section for x86_64) bldg03 (modify the file so it can pick up x86_64.make as g03.make) and create a make file x86_64.make (use i386.make as a template) The compiler used is pgf90, but l906 and l609 has to be compiled with pgf77, in order to pass all the test cases. We are running more tests and comparing the performance of 64-bit version abd 32-bit version. Regards Jyh-Shyong Ho, Ph.D. Research Scientist National Center for High-Performance Computing Hsinchu, Taiwan, ROC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mirk at vsnl.com Sat Feb 21 10:31:35 2004 From: mirk at vsnl.com (Mohd Irfan R Khan) Date: Sat, 21 Feb 2004 21:01:35 +0530 Subject: [Beowulf] comparing MPI HPC interconnects: manageability? In-Reply-To: Message-ID: hi I am one using SCI (Dolphin) cards and I think in dolphin u don't have to stop the whole cluster in case of failure. In this there is a matrix where it always has redundancy if one machine fails and the software provided by it (SCALI) will route the data to other machine and will reroute it back once it finds the line working properly. Regards. -----Original Message-----. From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf Of Dave Stirling Sent: Friday, February 20, 2004 1:50 AM To: beowulf at beowulf.org Subject: [Beowulf] comparing MPI HPC interconnects: manageability? Hi all, While performance (latency, bandwidth) usually comes to the fore in discussions about high performance interconnects for MPI clusters, I'm curious as to what your experiences are from the standpoint of manageability -- NIC's and spines and switches all fail at one time or another, but I'd like input as to how individual products (Myrinet, Quadrics, Infiniband, etc) handle this. In your clusters does the hardware replacement involve simple steps (swap out the NIC, rerun some config utilities) or something more complex (such as bringing down the entire high speed network to reconfigure it so all the nodes can talk to the new hardware); i.e., How painful is it to replace a single failed NIC? I'd imagine that most cluster admins are reluctant to interrupt running jobs in order to re-initialize the equipment after hardware replacement. Any information about how your clusters running high-speed interconnects handle interconnect hardware failure/replacement would be very helpful. Thanks, Dave Stirling Brigham Young University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Sun Feb 22 09:57:05 2004 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Sun, 22 Feb 2004 14:57:05 +0000 (UTC) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: Message-ID: On Sat, 21 Feb 2004, Brent M. Clements wrote: > It also doesn't matter what > os a beowulf cluster runs. ..as long as that OS conforms to the definition of free software, that is.. Or am I just an old fuddy-duddy, with out-of-date concepts? -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Sun Feb 22 11:04:17 2004 From: rauch at inf.ethz.ch (Felix Rauch) Date: Sun, 22 Feb 2004 17:04:17 +0100 (CET) Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <403358EF.7F0BDE75@epa.gov> Message-ID: On Wed, 18 Feb 2004, Joseph Mack wrote: > How do you get your information out of smartd? > > I've found output in syslog - presumably I can grep for this. I've done this for a while to get temperature information from a server in our small group server room (together with MRTG we have a nice history of temperature to show to the facilities people when the temperature was too high again...). The problem with greping for smartd information in the syslog file is that there is no current information after a log rotation. That's why I changed our cron jobs. Now I use a small setuid-root program which starts "smartctl -a /dev/sdX" and then greps for the temperature. - Felix --- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Sun Feb 22 10:20:20 2004 From: agrajag at dragaera.net (Jag) Date: 22 Feb 2004 10:20:20 -0500 Subject: [Beowulf] Movie Editing Requirements In-Reply-To: References: Message-ID: <1077463220.2561.4.camel@loiosh> On Sat, 2004-02-21 at 23:30, Brent M. Clements wrote: > Actually a beowulf cluster can also run windows. There is a port of maya > to clusters...There are also many other movie editing software > distributions that work very well on clusters..It also doesn't matter what > os a beowulf cluster runs. By definition, a beowulf cluster uses a free/open OS. So, a beowulf cluster can't run windows. However, an HPC (High Performance Computing) cluster doesn't have that requirement. I know its kinda nitpicking to try to distinguish between Beowulf cluster and HPC cluster, but in some ways it is important. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Sun Feb 22 11:30:03 2004 From: bclem at rice.edu (Brent M. Clements) Date: Sun, 22 Feb 2004 10:30:03 -0600 (CST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: <1077463220.2561.4.camel@loiosh> References: <1077463220.2561.4.camel@loiosh> Message-ID: Please don't start a flame war guys, I just had my terms mixed up...it was 1 am in the morning when I replied. -Brent On Sun, 22 Feb 2004, Jag wrote: > On Sat, 2004-02-21 at 23:30, Brent M. Clements wrote: > > Actually a beowulf cluster can also run windows. There is a port of maya > > to clusters...There are also many other movie editing software > > distributions that work very well on clusters..It also doesn't matter what > > os a beowulf cluster runs. > > By definition, a beowulf cluster uses a free/open OS. So, a beowulf > cluster can't run windows. However, an HPC (High Performance Computing) > cluster doesn't have that requirement. > > I know its kinda nitpicking to try to distinguish between Beowulf > cluster and HPC cluster, but in some ways it is important. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Feb 23 06:37:25 2004 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 23 Feb 2004 12:37:25 +0100 (CET) Subject: [Beowulf] Flashmobcomputing Message-ID: I hesitate a bit to send things seen on Slashdot to the list, but this is probably relevant: http://www.flashmobcomputing.org/ It might be worth a bit of a debate though. Given that this cluster will be composed of differing CPUs, and conneced together by 100Mbps links will it really have chance of getting into the Top 500? The bootable CF they are using is a Knoppix variant. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Mon Feb 23 07:27:20 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Mon, 23 Feb 2004 07:27:20 -0500 (EST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: Message-ID: On Sat, 21 Feb 2004, Brent M. Clements wrote: > Actually a beowulf cluster can also run windows. There is a port of maya > to clusters...There are also many other movie editing software > distributions that work very well on clusters..It also doesn't matter what > os a beowulf cluster runs. >From time to time, I think it is a important to recall the original definition of Beowulf. In the book "How to Build Beowulf", Sterling, Salmon, Becker, Savarese define Beowulf as: "A Beowulf is a collection of personal computers (PCs) interconnected by widely available networking technology running one of several open-source Unix like operating systems." There is often confusion as to "what is a Beowulf?" because the definition is more of a framework for building clusters and less of a recipe. I suppose, one could come up with definition of an HPC cluster which would read something like" "An HPC cluster is collection of commodity processors interconnected by widely available networking technology running a widely available OS." Rather broad. I think the keyword in all this is "commodity", which to me means choice and implies low cost. Doug > > -Brent > > Brent Clements > Linux Technology Specialist > Information Technology > Rice University > > > On Sat, 21 Feb 2004, Joel Jaeggli wrote: > > > Given that it sounds like you're on windows, a beowulf cluster is not > > appropriate from your application... > > > > > > On Sun, 22 Feb 2004, Sawan Gupta wrote: > > > > > > > > Hello, > > > > > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > > > system with 512 DDRAM and a 128 MB Graphic Card. > > > > > > But when he perform some rendering operations, it takes nearly 10-15 > > > minutes to complete. > > > > > > He wishes to upgrade his system to dual XEON with more RAM to minimize > > > this time delay. > > > > > > I want to know whether this will suit his requirments or a cluster is > > > just what he needs. > > > Please tell me which cluster can suit his requirements i.e. > > > Windows/Linux. > > > I mean which cluster can best suit these requirements. > > > > > > Also are the softwares used by him also available for Linux or not. (If > > > the solution suggested is in Linux) > > > > > > > > > Regards, > > > > > > Sawan Gupta > > > || Mg_India at sancharnet.in || > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > -- > > -------------------------------------------------------------------------- > > Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu > > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Feb 23 09:11:00 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 23 Feb 2004 09:11:00 -0500 (EST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: Message-ID: On Sun, 22 Feb 2004, Martin WHEELER wrote: > On Sat, 21 Feb 2004, Brent M. Clements wrote: > > > It also doesn't matter what > > os a beowulf cluster runs. > > ..as long as that OS conforms to the definition of free software, that > is.. > > Or am I just an old fuddy-duddy, with out-of-date concepts? No, you're absolutely right. It's right in there in the original beowulf documents and description, IIRC. There are some excellent reasons for this, BTW, as you'll discover the first time something doesn't just work for you "out of the box". rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Mon Feb 23 08:36:05 2004 From: bclem at rice.edu (Brent M. Clements) Date: Mon, 23 Feb 2004 07:36:05 -0600 (CST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: References: Message-ID: Again, I go back to my last email concerning this..I didn't want to start people flaming me(which has now happened), I wrote my original response at 1am in the morning and was sloppy with my terms. For that I apologize. This tangent of explanations from now over 50 people can be gotten off of and people can go about there business..Nothing to see here, move along. -Brent On Mon, 23 Feb 2004, Douglas Eadline, Cluster World Magazine wrote: > On Sat, 21 Feb 2004, Brent M. Clements wrote: > > > Actually a beowulf cluster can also run windows. There is a port of maya > > to clusters...There are also many other movie editing software > > distributions that work very well on clusters..It also doesn't matter what > > os a beowulf cluster runs. > > >From time to time, I think it is a important to recall the original > definition of Beowulf. In the book "How to Build Beowulf", Sterling, > Salmon, Becker, Savarese define Beowulf as: > > "A Beowulf is a collection of personal computers (PCs) interconnected by > widely available networking technology running one of several open-source > Unix like operating systems." > > There is often confusion as to "what is a Beowulf?" because the definition > is more of a framework for building clusters and less of a recipe. > > I suppose, one could come up with definition of an HPC cluster which > would read something like" > > "An HPC cluster is collection of commodity processors interconnected by > widely available networking technology running a widely available OS." > > Rather broad. I think the keyword in all this is "commodity", which to me > means choice and implies low cost. > > Doug > > > > > -Brent > > > > Brent Clements > > Linux Technology Specialist > > Information Technology > > Rice University > > > > > > On Sat, 21 Feb 2004, Joel Jaeggli wrote: > > > > > Given that it sounds like you're on windows, a beowulf cluster is not > > > appropriate from your application... > > > > > > > > > On Sun, 22 Feb 2004, Sawan Gupta wrote: > > > > > > > > > > > Hello, > > > > > > > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > > > > system with 512 DDRAM and a 128 MB Graphic Card. > > > > > > > > But when he perform some rendering operations, it takes nearly 10-15 > > > > minutes to complete. > > > > > > > > He wishes to upgrade his system to dual XEON with more RAM to minimize > > > > this time delay. > > > > > > > > I want to know whether this will suit his requirments or a cluster is > > > > just what he needs. > > > > Please tell me which cluster can suit his requirements i.e. > > > > Windows/Linux. > > > > I mean which cluster can best suit these requirements. > > > > > > > > Also are the softwares used by him also available for Linux or not. (If > > > > the solution suggested is in Linux) > > > > > > > > > > > > Regards, > > > > > > > > Sawan Gupta > > > > || Mg_India at sancharnet.in || > > > > > > > > _______________________________________________ > > > > Beowulf mailing list, Beowulf at beowulf.org > > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > -- > > > -------------------------------------------------------------------------- > > > Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu > > > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 > > > > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > ---------------------------------------------------------------- > Editor-in-chief ClusterWorld Magazine > Desk: 610.865.6061 > Cell: 610.390.7765 Redefining High Performance Computing > Fax: 610.865.6618 www.clusterworld.com > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Mon Feb 23 10:18:34 2004 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Mon, 23 Feb 2004 16:18:34 +0100 Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: References: Message-ID: <1077549514.31096.0.camel@qeldroma.cttc.org> El dom, 22-02-2004 a las 17:04, Felix Rauch escribi?: > On Wed, 18 Feb 2004, Joseph Mack wrote: > > How do you get your information out of smartd? > > > > I've found output in syslog - presumably I can grep for this. > > I've done this for a while to get temperature information from a > server in our small group server room (together with MRTG we have a > nice history of temperature to show to the facilities people when the > temperature was too high again...). > > The problem with greping for smartd information in the syslog file is > that there is no current information after a log rotation. That's why > I changed our cron jobs. Now I use a small setuid-root program which > starts "smartctl -a /dev/sdX" and then greps for the temperature. > > - Felix > > --- > Felix Rauch | Email: rauch at inf.ethz.ch > Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ > ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 > CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 > > > On Wed, 18 Feb 2004, Joseph Mack wrote: > > How do you get your information out of smartd? > > > > I've found output in syslog - presumably I can grep for this. > > I've done this for a while to get temperature information from a > server in our small group server room (together with MRTG we have a > nice history of temperature to show to the facilities people when the > temperature was too high again...). > > The problem with greping for smartd information in the syslog file is > that there is no current information after a log rotation. That's why > I changed our cron jobs. Now I use a small setuid-root program which > starts "smartctl -a /dev/sdX" and then greps for the temperature. > > - Felix > > --- > Felix Rauch | Email: rauch at inf.ethz.ch > Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ > ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 > CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 > On the other hand, is possible to deviate smartd log to a specific file and check it regularly when it's updated, adding this parameter to smartd: -l facility Of course some syslog.conf modifying will be needed to instruct syslogd to log on a specific file from the "facility" specified. facility.* /var/log/smartd.log Also, the '-M' coupled with the 'exec' directive should work, a script could be run to update some flags for example: -M exec /usr/bin/smartd_alert.sh -- Daniel Fernandez Centre tecnol?gic de transfer?ncia de calor - CTTC www.cttc.upc.edu c/ Colom n?11 UPC Campus Industrials Terrassa , Edifici TR4 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From m-valerio at onu.edu Mon Feb 23 11:24:23 2004 From: m-valerio at onu.edu (Matt Valerio) Date: Mon, 23 Feb 2004 11:24:23 -0500 Subject: [Beowulf] Anyone use MOSIX? Message-ID: <200402231626.i1NGQQBf052721@postoffice.onu.edu> Has anyone on this list used MOSIX before? I'm particularly interested in how it compares to other clustering software such as PVM and MPI. Any information regarding what you're using MOSIX for, recommendations about setting it up, comparisons to other software, etc, would be welcomed. Thanks! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kpodesta at redbrick.dcu.ie Mon Feb 23 13:45:44 2004 From: kpodesta at redbrick.dcu.ie (Karl Podesta) Date: Mon, 23 Feb 2004 18:45:44 +0000 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: Message-ID: <20040223184544.GB30983@carbon.redbrick.dcu.ie> On Mon, Feb 23, 2004 at 12:37:25PM +0100, John Hearns wrote: > http://www.flashmobcomputing.org/ > > It might be worth a bit of a debate though. > Given that this cluster will be composed of differing CPUs, > and conneced together by 100Mbps links will it really have chance > of getting into the Top 500? > > The bootable CF they are using is a Knoppix variant. It seems a bit loose or unfair to suggest a project like this 'registers' for the top500 list? It's a once-off, temporary system, dedicated (seemingly) to nothing but qualification to the list. They say in the FAQ that if the system proves itself, it could potentially be used for bigger problems, which is a noble idea - but they obviously don't read the beowulf list often ("it all depends on the application", etc.) :-) Additionally, a flashmob system would have a limited shelf-life, before the owners want to take their computers home. Distributed projects like SETI at home and Folding at home etc. have been running for years... I'm not familiar with the entry rules to the top500, but to be fair to existing, dedicated installations - they would have a certain 'reliability' in terms of their existence. If you needed to perform a serious calculation to a scale of 36 TFLOPS etc., then you know that there is a system that can do it. They might want to be critical of how sustainable the result from the Flashmob is, if they wanted to 'call' on it's power at any particular time in the future. (whoops, it was raining today, that's 10 TFLOPS down the drain...). Pardon the pun. Kp -- Karl Podesta Dublin City University, Ireland _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Feb 23 14:11:48 2004 From: becker at scyld.com (Donald Becker) Date: Mon, 23 Feb 2004 14:11:48 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: Message-ID: On Mon, 23 Feb 2004, John Hearns wrote: > I hesitate a bit to send things seen on Slashdot to the list, > but this is probably relevant: > > http://www.flashmobcomputing.org/ >> A Flash Mob computer, unlike an ordinary cluster, is temporary and >> organized on-the-fly for the purpose of working on a single >> problem. Flash Mob I is the first of it's kind. A bit of hype here. Flash Mob is a fun demo, but not a new system architecture. All of the software is on a live CD, which Yggdrasil pioneered back in 1993, and it's far from being the first on-the-fly cluster. One of first public demo of Scyld Beowulf was temporarily converting the email-reading machines at the ALS conference into a cluster. We did that in a few minutes, taking only a few second beyond the amount of time it took to boot the machines from floppy. Today there is the opportunity to use PXE boot, which makes configuration even easier. A key was the innovative approach of making most of the systems specialized compute slaves, with only the environment needed to support the fully-cached running application. (Note that NFS root sounds like a likely alternative, but doesn't scale and has a run-time performance impact.) > It might be worth a bit of a debate though. > Given that this cluster will be composed of differing CPUs, > and conneced together by 100Mbps links will it really have chance > of getting into the Top 500? > The bootable CF they are using is a Knoppix variant. The differing CPUs and full workstation-oriented distribution will likely pose more a problem than the switched Fast Ethernet. Unless they make significant modifications, they will run into the scalability problem that every full-installation system encounters: at every timestep a few of the machines will be paging, running cron, or doing something else that slows the machine. That would be barely noticed in a workstation environment, but is a major problem with most cluster jobs. Still, it sounds like a fun, demystifying demo that introduces people to scalable computing. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster systems Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bernd-schubert at web.de Mon Feb 23 17:05:36 2004 From: bernd-schubert at web.de (Bernd Schubert) Date: Mon, 23 Feb 2004 23:05:36 +0100 Subject: [Beowulf] 64-bit Gaussian 03 on Opteron/SuSE In-Reply-To: <40387739.3891D96C@nchc.org.tw> References: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com> <40387739.3891D96C@nchc.org.tw> Message-ID: <200402232305.36711.bernd-schubert@web.de> On Sunday 22 February 2004 10:32, Jyh-Shyong Ho wrote: > Hi, > > We have managed to built a native 64-bit version of Gaussian 03 Rev.B05 > on a dual Opteron box running SLSE8 for AMD64 with 64-bit PGI > Workstation > 5.1.3 compiler and 64-bit GOTO library. > Hello, thanks for this great information! I've forwarded it to the CCL list, since I guess on this list many people are interested in this topic. Cheers, Bernd _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Feb 23 17:47:27 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 23 Feb 2004 17:47:27 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: Message-ID: > Still, it sounds like a fun, demystifying demo that introduces people to > scalable computing. demystification is always good. IMO, the best part of this is that it'll actually demonstrate why a flash mob *CAN'T* build a supercomputer. partly the reason is hetrogeneity and other "practical" downers. but mainly, a super-computer needs a super-network. of course, in the grid nirvana, all computers would have multiple ports of infiniband, and the word would be 5 us across ;) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Feb 23 15:52:58 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 23 Feb 2004 15:52:58 -0500 (EST) Subject: [Beowulf] Anyone use MOSIX? In-Reply-To: <200402231626.i1NGQQBf052721@postoffice.onu.edu> Message-ID: > Has anyone on this list used MOSIX before? I expect many have given it a try. > I'm particularly interested in how it compares to other clustering software > such as PVM and MPI. apples and oranges, I believe. mosix more or less tries to virtualize a cluster by making multiple machines share things like a single pid space, with forwarding of signals, etc. the idea is that the OS takes care of migrating jobs across nodes, including using proxies for resources that can't be directly moved (pages can, for instance). from the PVM/MPI perspective, the most important resource would be sockets. as far as I know, MPI-on-Mosix would use proxied sockets, and would therefore have performance problems for anything closely-coupled or high-bandwidth. in principle, Mosix could provide some sort of clusterized group-comm mechanism that wouldn't require proxies, but that would be a large effort. in a way, it's a shame that MPI is such a fat interface, since there's a lot of really good work that could be done in this direction, but is simply too large for a typical thesis project :( regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Mon Feb 23 20:53:18 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Mon, 23 Feb 2004 17:53:18 -0800 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: Message-ID: <20040224015318.GB6942@greglaptop.internal.keyresearch.com> On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > of course, in the grid nirvana, > all computers would have multiple ports of infiniband, > and the word would be 5 us across ;) In grid nirvana, the speed of light would rise with Moore's Law. 5 usec is a long time now, and much longer a year from now. That's not nirvana. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 24 03:22:16 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 24 Feb 2004 09:22:16 +0100 (CET) Subject: [Beowulf] Flashmobcomputing In-Reply-To: <20040224015318.GB6942@greglaptop.internal.keyresearch.com> Message-ID: On Mon, 23 Feb 2004, Greg Lindahl wrote: > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > of course, in the grid nirvana, > > all computers would have multiple ports of infiniband, > > and the word would be 5 us across ;) > > In grid nirvana, the speed of light would rise with Moore's Law. > An odd fact I always remember is that light travels at a foot per nanosecond. (Useful to know if you are plugging coax delay lines into trigger circuits) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Tue Feb 24 04:35:48 2004 From: jakob at unthought.net (Jakob Oestergaard) Date: Tue, 24 Feb 2004 10:35:48 +0100 Subject: [Beowulf] C vs C++ challenge In-Reply-To: References: <1075512676.4915.207.camel@protein.scalableinformatics.com> Message-ID: <20040224093548.GA29776@unthought.net> On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote: > > I could easily optimize it more (do the work on a larger buffer at a > > once), but I think enough waste heat has been created here. This is a > > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3. > > Enough time wasted on finding different solutions to a simple problem? Surely > not. Let me toss my hat into the ring: ... Hi guys! Guess who's back - yes, it's your friendly neighborhood language evangelist :) I said I'd be gone one week - well, I put instant coffe in the microwave, and *wooosh* went three weeks ahead in time. What a fantastic thread this turned into - awk, perl, more C, java and God knows what. I'm almost surprised I didn't see a Fortran implementation. See, I was trying to follow up on the challenge, then things got complicated (mainly by me not being able to get the performance I wanted out of my code) - so instead of flooding your inboxes, I wrote a little "article" on my findings. It's up at: http://unthought.net/c++/c_vs_c++.html Highlights: *) Benchmarks - real numbers. *) A C++'ification of the fast C implementation (that turns out to be negligibly faster than the C implementation although the same algorithm and the same system calls are used), which is generalized and generally made usable as a template library routine (for convenient re-use in other projects - yes, this requires all that boring non-sexy stuff like freeing up memory etc.) *) Two new C++ implementations - another 15 liner that's "only" twice as slow as the C code, and another longer different-algorithm C++ implementation that is significantly faster than the fastest C implementation (so far). Now, I did not include all the extra implementations presented here. I would like to update the document with those, but I will need a little feedback from various people. First; how do I compile the java implementation? GCC-3.3.2 gives me ---------------------------------------------------------------- [falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java wordcount.java: In class `wordcount': wordcount.java: In method `wordcount.main(java.lang.String[])': wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)' in type `java.util.regex.Pattern'. words = p.split(s); ^ 1 error ---------------------------------------------------------------- Second; another much faster C implementation was posted - I'd like to test against that one as well. I'm curious as to how it was done, and I'd like to use it as an example in the document if it turns out that it makes sense to write a generic C++ implementation of whatever algorithm is used there. Well, if the code is not a government secret ;) So, well, clearly my document isn't completely updated with all the great things from this thread - but at least I think it is a decent reply to the mail where the 'programming pearl' C implementation was presented. I guess this could turn into a nice little reference/FAQ/fact type of document - the oppinions stated there are biased of course, but not completely unreasonable in my own (biased) oppinion - besides, there's real-world numbers for solving a real-world problem, that's a pretty good start I would say :) I'd love to hear what people think - if you have the time to give it a look. Let me know, flame away, give me Fortran code that is faster than my 'ego-booster' implementation at the bottom of the document! ;) Cheers all :) / jakob BTW: Yes, I had a great vacation; http://unthought.net/avoriaz/p1010050.jpg ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.kidger at quadrics.com Tue Feb 24 04:35:54 2004 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Tue, 24 Feb 2004 09:35:54 +0000 Subject: [Beowulf] Math Coprocessor In-Reply-To: References: Message-ID: <200402240935.54623.daniel.kidger@quadrics.com> > On Fri, 13 Feb 2004, John Hearns wrote: > > But then again I may be the only person to own "Fortran 77: > > A Structured Approach". I don't have that but I do have on my bookshelf "A Fortran Primer" by Elliot Organick, Addison-Wiley (1963) - so go on: does anyone own any even older Fortran texts ? Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Tue Feb 24 05:56:45 2004 From: jakob at unthought.net (Jakob Oestergaard) Date: Tue, 24 Feb 2004 11:56:45 +0100 Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <1077217418.4982.35.camel@localhost> References: <1077217418.4982.35.camel@localhost> Message-ID: <20040224105645.GC29776@unthought.net> On Thu, Feb 19, 2004 at 02:03:38PM -0500, Ryan Adams wrote: ... > I have a problem that divides nicely (embarrassingly?) into > parallelizable chunks. Each chunk takes approximately 2 to 5 seconds to > complete and requires no communication during that time. Essentially > there is a piece of data, around 500KB that must be processed and a > result returned. I'd like to process as many of these pieces of data as > possible. I am considering building a small heterogeneous cluster to do > this (at home, basically), and am trying to decide exactly how to > architect the task distribution. I had the following problem; lots and lots of compile jobs. They take from a few seconds to a few minutes each. No batch scheduling system that I tried, was up to the task (simply waaay too long latency in the scheduling). ... > I've been considering rolling my own scheduling system using some kind > of RPC, but I've been around software development long enough to know > that it is better to use something off-the-shelf if at all possible. Maybe you would want to take a quick look at ANTS http://unthought.net/antsd/ ANTS was the solution I developed for the problem I had, and from the sound of it, I think your problem may be a good fit for ANTS as well. I've been updating it as of lately, but haven't put new releases on the web site. If you're interested, I can provide you with the new releases (featuring krellm2 applet! ;) - but the basic functionality is unchanged from the old release on the web site. ANTS specifically schedules jobs very quickly - but it lacks the advanced features of "real" batch systems (like accounting, gang scheduling, job restart, etc. etc.). / jakob _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Feb 24 08:34:16 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 24 Feb 2004 08:34:16 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: <20040224015318.GB6942@greglaptop.internal.keyresearch.com> Message-ID: On Mon, 23 Feb 2004, Greg Lindahl wrote: > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > of course, in the grid nirvana, > > all computers would have multiple ports of infiniband, > > and the word would be 5 us across ;) > > In grid nirvana, the speed of light would rise with Moore's Law. I'll have to think about that one. Exponential growth of the speed of light. Hmmm. Some sort of inflationary model? Space flattening towards non-relativistic classical? The physics of Nirvana would be veeeery interesting... :-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajt at rri.sari.ac.uk Tue Feb 24 08:56:31 2004 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Tue, 24 Feb 2004 13:56:31 +0000 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> References: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Message-ID: <403B580F.1020009@rri.sari.ac.uk> Matt Valerio wrote: > Hello, > > I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 > languages. > > That being said, I think it would be interesting to see what the creator of > both C and C++ has said about the two. I ran across this interview with > Bjorn Stroustrup at > http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > > Like anything on the internet, it should be taken with a grain of salt. Can > anyone vouch for its validity, or is it a hoax to get us to all hate C++ and > stick with C? Hello, Matt. I think most people know that Brian Kernighan and Denis Richie created 'C', not Bjorn Stroustrup: He created C++. The IEEE 'interview' is a hoax, of course! but Bjorn Stroustrup doesn't think it's funny: http://www.research.att.com/~bs/bs_faq.html#IEEE Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctibirna at giref.ulaval.ca Tue Feb 24 09:15:20 2004 From: ctibirna at giref.ulaval.ca (Cristian Tibirna) Date: Tue, 24 Feb 2004 09:15:20 -0500 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> References: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Message-ID: <200402240915.20284.ctibirna@giref.ulaval.ca> On Tuesday, 24 February 2004 08:13, Matt Valerio wrote: > > Like anything on the internet, it should be taken with a grain of salt. > Can anyone vouch for its validity, or is it a hoax to get us to all hate > C++ and stick with C? Of course it's a hoax ;o) http://www.research.att.com/~bs/bs_faq.html#IEEE And in fact all the FAQ deserve a reading, no matter which language one preaches as being the Holy Grail. -- Cristian Tibirna (418) 656-2131 / 4340 Laval University - Qu?bec, CAN ... http://www.giref.ulaval.ca/~ctibirna Research professional - GIREF ... ctibirna at giref.ulaval.ca Chemical Engineering PhD Student ... tibirna at gch.ulaval.ca _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From m-valerio at onu.edu Tue Feb 24 08:13:12 2004 From: m-valerio at onu.edu (Matt Valerio) Date: Tue, 24 Feb 2004 08:13:12 -0500 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <20040224093548.GA29776@unthought.net> Message-ID: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Hello, I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 languages. That being said, I think it would be interesting to see what the creator of both C and C++ has said about the two. I ran across this interview with Bjorn Stroustrup at http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. Like anything on the internet, it should be taken with a grain of salt. Can anyone vouch for its validity, or is it a hoax to get us to all hate C++ and stick with C? -Matt -----Original Message----- From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of Jakob Oestergaard Sent: Tuesday, February 24, 2004 4:36 AM To: Beowulf Subject: Re: [Beowulf] C vs C++ challenge On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote: > > I could easily optimize it more (do the work on a larger buffer at a > > once), but I think enough waste heat has been created here. This is a > > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3. > > Enough time wasted on finding different solutions to a simple problem? Surely > not. Let me toss my hat into the ring: ... Hi guys! Guess who's back - yes, it's your friendly neighborhood language evangelist :) I said I'd be gone one week - well, I put instant coffe in the microwave, and *wooosh* went three weeks ahead in time. What a fantastic thread this turned into - awk, perl, more C, java and God knows what. I'm almost surprised I didn't see a Fortran implementation. See, I was trying to follow up on the challenge, then things got complicated (mainly by me not being able to get the performance I wanted out of my code) - so instead of flooding your inboxes, I wrote a little "article" on my findings. It's up at: http://unthought.net/c++/c_vs_c++.html Highlights: *) Benchmarks - real numbers. *) A C++'ification of the fast C implementation (that turns out to be negligibly faster than the C implementation although the same algorithm and the same system calls are used), which is generalized and generally made usable as a template library routine (for convenient re-use in other projects - yes, this requires all that boring non-sexy stuff like freeing up memory etc.) *) Two new C++ implementations - another 15 liner that's "only" twice as slow as the C code, and another longer different-algorithm C++ implementation that is significantly faster than the fastest C implementation (so far). Now, I did not include all the extra implementations presented here. I would like to update the document with those, but I will need a little feedback from various people. First; how do I compile the java implementation? GCC-3.3.2 gives me ---------------------------------------------------------------- [falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java wordcount.java: In class `wordcount': wordcount.java: In method `wordcount.main(java.lang.String[])': wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)' in type `java.util.regex.Pattern'. words = p.split(s); ^ 1 error ---------------------------------------------------------------- Second; another much faster C implementation was posted - I'd like to test against that one as well. I'm curious as to how it was done, and I'd like to use it as an example in the document if it turns out that it makes sense to write a generic C++ implementation of whatever algorithm is used there. Well, if the code is not a government secret ;) So, well, clearly my document isn't completely updated with all the great things from this thread - but at least I think it is a decent reply to the mail where the 'programming pearl' C implementation was presented. I guess this could turn into a nice little reference/FAQ/fact type of document - the oppinions stated there are biased of course, but not completely unreasonable in my own (biased) oppinion - besides, there's real-world numbers for solving a real-world problem, that's a pretty good start I would say :) I'd love to hear what people think - if you have the time to give it a look. Let me know, flame away, give me Fortran code that is faster than my 'ego-booster' implementation at the bottom of the document! ;) Cheers all :) / jakob BTW: Yes, I had a great vacation; http://unthought.net/avoriaz/p1010050.jpg ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From m-valerio at onu.edu Tue Feb 24 09:12:49 2004 From: m-valerio at onu.edu (Matt Valerio) Date: Tue, 24 Feb 2004 09:12:49 -0500 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <403B580F.1020009@rri.sari.ac.uk> Message-ID: <200402241414.i1OEErBf096719@postoffice.onu.edu> Wow, I guess I didn't do my homework! Apologizes to everyone for the misinformation! As Tony pointed out, the real interview may be found at http://www.research.att.com/~bs/ieee_interview.html. -Matt -----Original Message----- From: Tony Travis [mailto:ajt at rri.sari.ac.uk] Sent: Tuesday, February 24, 2004 8:57 AM To: Matt Valerio Cc: beowulf at beowulf.org Subject: Re: [Beowulf] C vs C++ challenge Matt Valerio wrote: > Hello, > > I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 > languages. > > That being said, I think it would be interesting to see what the creator of > both C and C++ has said about the two. I ran across this interview with > Bjorn Stroustrup at > http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > > Like anything on the internet, it should be taken with a grain of salt. Can > anyone vouch for its validity, or is it a hoax to get us to all hate C++ and > stick with C? Hello, Matt. I think most people know that Brian Kernighan and Denis Richie created 'C', not Bjorn Stroustrup: He created C++. The IEEE 'interview' is a hoax, of course! but Bjorn Stroustrup doesn't think it's funny: http://www.research.att.com/~bs/bs_faq.html#IEEE Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Feb 24 09:33:37 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 24 Feb 2004 09:33:37 -0500 (EST) Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Message-ID: On Tue, 24 Feb 2004, Matt Valerio wrote: > Hello, > > I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 > languages. > > That being said, I think it would be interesting to see what the creator of > both C and C++ has said about the two. I ran across this interview with > Bjorn Stroustrup at > http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > > Like anything on the internet, it should be taken with a grain of salt. Can > anyone vouch for its validity, or is it a hoax to get us to all hate C++ and > stick with C? I posted that up there when I found it because it is hilarious. I assume that it is a satire (not exactly the same thing as a "hoax":-). However, as is the case with much satire, it contains a lot of little nuggets that (should) make you think... about "good practice" ways of coding in C++ if nothing else. r-still-a-C-guy-at-heart-gb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Tue Feb 24 10:37:12 2004 From: jcownie at etnus.com (James Cownie) Date: Tue, 24 Feb 2004 15:37:12 +0000 Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: Message from joshh@cs.earlham.edu of "Fri, 13 Feb 2004 10:25:31 EST." <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: <1AvecS-6wh-00@etnus.com> > I am profiling a software package that runs over LAM-MPI on 16 node > cluster s [Details Below]. I would like to measure the effect of > increased latency on the run time of the program. > Look for "dimemas" on Google. It's a simulator from Cepba for parallel architectures which is intended to allow you to adjust exactly this kind of parameter. At one point they had it coupled up with Pallas' Vampir so that it could read Vampir trace files and then simulate the same execution with modified communication properties, or modified CPU properties. -- -- Jim -- James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Tue Feb 24 10:40:29 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Tue, 24 Feb 2004 07:40:29 -0800 Subject: [Beowulf] C vs C++ challenge Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F55AB@orsmsx402.jf.intel.com> From: Matt Valerio; Tuesday, February 24, 2004 6:13 AM > > Wow, I guess I didn't do my homework! Apologizes to everyone for the > misinformation! > > As Tony pointed out, the real interview may be found at > http://www.research.att.com/~bs/ieee_interview.html. For a Stroustrup statement that C proponents (as am I) will also agree with, see http://www.research.att.com/~bs/bs_faq.html#really-say-that FYI, the top of the FAQ has a .wav file with the proper pronunciation of his name... -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Tue Feb 24 12:15:01 2004 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Tue, 24 Feb 2004 18:15:01 +0100 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: Message-ID: <200402241815.01355.joachim@ccrl-nece.de> Donald Becker: > On Mon, 23 Feb 2004, John Hearns wrote: > > I hesitate a bit to send things seen on Slashdot to the list, > > but this is probably relevant: > > > > > > http://www.flashmobcomputing.org/ > > A bit of hype here. [...] Exactly. It's a nice idea (although the wrong approach, as Donald elaborated - maybe they will find out), but they shouldn't seriously clame to be the first with this "revolutionary idea" (sic!). In addition to Donald's references to earlier "on-the-fly clusters", here's another one from Germany (December 1998): http://www.heise.de/ix/artikel/E/1999/01/010/ I don't know if they actually submitted results to TOP500 - I could not find a matching entry for 1999. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Feb 24 12:15:45 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 24 Feb 2004 09:15:45 -0800 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> References: <20040224093548.GA29776@unthought.net> Message-ID: <5.2.0.9.2.20040224091430.017cb1d8@mailhost4.jpl.nasa.gov> At 08:13 AM 2/24/2004 -0500, Matt Valerio wrote: >Hello, > >I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 >languages. > >That being said, I think it would be interesting to see what the creator of >both C and C++ has said about the two. I ran across this interview with >Bjorn Stroustrup at >http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > >Like anything on the internet, it should be taken with a grain of salt. Can >anyone vouch for its validity, or is it a hoax to get us to all hate C++ and >stick with C? > >-Matt That's a classic hoax interview (and I think identified as such by RGB), and remarkably funny. Almost as good as Dijkstra's apocryphal comment that more brains have been ruined by BASIC than .... James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Feb 24 12:21:18 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 24 Feb 2004 09:21:18 -0800 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: <20040224015318.GB6942@greglaptop.internal.keyresearch.com> Message-ID: <5.2.0.9.2.20040224091607.0350aa38@mailhost4.jpl.nasa.gov> At 08:34 AM 2/24/2004 -0500, Robert G. Brown wrote: >On Mon, 23 Feb 2004, Greg Lindahl wrote: > > > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > > > of course, in the grid nirvana, > > > all computers would have multiple ports of infiniband, > > > and the word would be 5 us across ;) > > > > In grid nirvana, the speed of light would rise with Moore's Law. > >I'll have to think about that one. > >Exponential growth of the speed of light. Hmmm. Some sort of >inflationary model? Space flattening towards non-relativistic >classical? The physics of Nirvana would be veeeery interesting... 5 usec gives you a "grid diameter" of a mile or so... (if you don't worry about pesky things like wires or fibers to carry the signals). You could fit a LOT of processors in a sphere a mile in diameter. Does bring up some interesting questions about optimum interconnection strategies. Even if you put nodes on the surface of that sphere (so you can use free space optical interconnects across the middle of the sphere, you'd have about 7.2 million square meters to fool with. Say you can fit a 100 nodes in a square meter. That's almost a billion nodes. If you need bigger, one could always use fancy stuff like quantum entanglement, about which I don't know much, but which might provide a solution to communicating across large distances very quickly (at least in one frame of reference) James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From orion at cora.nwra.com Tue Feb 24 18:17:31 2004 From: orion at cora.nwra.com (Orion Poplawski) Date: Tue, 24 Feb 2004 16:17:31 -0700 Subject: [Beowulf] G5 cluster for testing Message-ID: <403BDB8B.7060904@cora.nwra.com> Anyone (vendors?) out there have a G5 cluster available for some testing? I've been charged with putting together a small cluster and have been asked to look into G5 systems as well (I guess 64 bit powerPC really....) Thanks -- Orion Poplawski System Administrator 303-415-9701 x222 Colorado Research Associates/NWRA FAX: 303-415-9702 3380 Mitchell Lane, Boulder CO 80301 http://www.co-ra.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Tue Feb 24 17:57:07 2004 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Tue, 24 Feb 2004 22:57:07 +0000 (UTC) Subject: [Beowulf] Flashmobcomputing In-Reply-To: Message-ID: On Tue, 24 Feb 2004, Robert G. Brown wrote: > > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > > In grid nirvana, the speed of light would rise with Moore's Law. > > I'll have to think about that one. Then you'll have to think *very* (exponentially?) fast. Just to keep up with where you were when you started... Shades of the Red Queen. :) Maybe Lewis Carroll already described the physics? -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Tue Feb 24 09:10:31 2004 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Tue, 24 Feb 2004 08:10:31 -0600 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> References: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Message-ID: <403B5B57.8080403@tamu.edu> Since he's now faculty here, I guess I'll walk down the hall and ask him. gerry Matt Valerio wrote: > Hello, > > I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 > languages. > > That being said, I think it would be interesting to see what the creator of > both C and C++ has said about the two. I ran across this interview with > Bjorn Stroustrup at > http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > > Like anything on the internet, it should be taken with a grain of salt. Can > anyone vouch for its validity, or is it a hoax to get us to all hate C++ and > stick with C? > > -Matt > > > > > > > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of > Jakob Oestergaard > Sent: Tuesday, February 24, 2004 4:36 AM > To: Beowulf > Subject: Re: [Beowulf] C vs C++ challenge > > On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote: > >>>I could easily optimize it more (do the work on a larger buffer at a >>>once), but I think enough waste heat has been created here. This is a >>>simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3. >> >>Enough time wasted on finding different solutions to a simple problem? > > Surely > >>not. Let me toss my hat into the ring: > > ... > > Hi guys! > > Guess who's back - yes, it's your friendly neighborhood language > evangelist :) > > I said I'd be gone one week - well, I put instant coffe in the > microwave, and *wooosh* went three weeks ahead in time. > > What a fantastic thread this turned into - awk, perl, more C, java and > God knows what. I'm almost surprised I didn't see a Fortran > implementation. > > See, I was trying to follow up on the challenge, then things got > complicated (mainly by me not being able to get the performance I wanted > out of my code) - so instead of flooding your inboxes, I wrote a little > "article" on my findings. > > It's up at: > http://unthought.net/c++/c_vs_c++.html > > Highlights: > *) Benchmarks - real numbers. > *) A C++'ification of the fast C implementation (that turns out to be > negligibly faster than the C implementation although the same > algorithm and the same system calls are used), which is generalized > and generally made usable as a template library routine (for > convenient re-use in other projects - yes, this requires all that > boring non-sexy stuff like freeing up memory etc.) > *) Two new C++ implementations - another 15 liner that's "only" twice > as slow as the C code, and another longer different-algorithm C++ > implementation that is significantly faster than the fastest C > implementation (so far). > > Now, I did not include all the extra implementations presented here. I > would like to update the document with those, but I will need a little > feedback from various people. > > First; how do I compile the java implementation? GCC-3.3.2 gives me > ---------------------------------------------------------------- > [falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java > wordcount.java: In class `wordcount': > wordcount.java: In method `wordcount.main(java.lang.String[])': > wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)' > in type `java.util.regex.Pattern'. > words = p.split(s); > ^ > 1 error > ---------------------------------------------------------------- > > Second; another much faster C implementation was posted - I'd like to > test against that one as well. I'm curious as to how it was done, and > I'd like to use it as an example in the document if it turns out that it > makes sense to write a generic C++ implementation of whatever algorithm > is used there. Well, if the code is not a government secret ;) > > So, well, clearly my document isn't completely updated with all the > great things from this thread - but at least I think it is a decent > reply to the mail where the 'programming pearl' C implementation was > presented. > > I guess this could turn into a nice little reference/FAQ/fact type of > document - the oppinions stated there are biased of course, but not > completely unreasonable in my own (biased) oppinion - besides, there's > real-world numbers for solving a real-world problem, that's a pretty > good start I would say :) > > I'd love to hear what people think - if you have the time to give it a > look. > > Let me know, flame away, give me Fortran code that is faster than my > 'ego-booster' implementation at the bottom of the document! ;) > > Cheers all :) > > / jakob > > BTW: Yes, I had a great vacation; > http://unthought.net/avoriaz/p1010050.jpg ;) > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ashley at quadrics.com Tue Feb 24 09:10:09 2004 From: ashley at quadrics.com (Ashley Pittman) Date: Tue, 24 Feb 2004 14:10:09 +0000 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: Message-ID: <1077631809.646.83.camel@ashley> On Mon, 2004-02-23 at 22:47, Mark Hahn wrote: > > Still, it sounds like a fun, demystifying demo that introduces people to > > scalable computing. > > demystification is always good. IMO, the best part of this is that > it'll actually demonstrate why a flash mob *CAN'T* build a supercomputer. > partly the reason is hetrogeneity and other "practical" downers. It will be interesting to see, I don't expect they are going to get much time to benchmark but it would be nice to have a plot of achieved performance against CPU count in this kind of configuration. Anybody care to predict how many CPU's you will need before wall clock performance starts dropping? > but mainly, a super-computer needs a super-network. That I won't dispute but does a single linpack run require a super-computer? Ashley, _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raiders at phreaker.net Tue Feb 24 10:39:19 2004 From: raiders at phreaker.net (raiders at phreaker.net) Date: Tue, 24 Feb 2004 23:39:19 +0800 Subject: [Beowulf] Subclusters... Message-ID: <200402242339.19310.raiders@phreaker.net> We are on a project as described below: - IA32 linux cluster for general parallel programming - five head nodes, each head node will have about 15 compute nodes and dedicated storage - groups of cluster-users will be restricted to their own clusters normally (some exclusions may apply) - SGE/PBS, GbE etc are standard choices But the people in power want one single software or admin console (cluster toolkit?) to manage the entire cluster from one adm station (which may or may not be one of the head nodes). I looked around and could not find any suitable solution (ROCKS, oscar, etc). ROCKS, oscar etc can manage only one cluster at a time and cannot handle subclusters. (I might be wrong) I believe that only custom programming can help. Appreciate any expert opinion Thanks, Shawn _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Tue Feb 24 23:38:54 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Tue, 24 Feb 2004 20:38:54 -0800 (PST) Subject: [Beowulf] G5 cluster for testing In-Reply-To: <403BDB8B.7060904@cora.nwra.com> Message-ID: I'd suggest asking your friendly IBM sales guy about ppc970 blades... joelja On Tue, 24 Feb 2004, Orion Poplawski wrote: > Anyone (vendors?) out there have a G5 cluster available for some > testing? I've been charged with putting together a small cluster and > have been asked to look into G5 systems as well (I guess 64 bit powerPC > really....) > > Thanks > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Feb 25 02:10:39 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Tue, 24 Feb 2004 23:10:39 -0800 Subject: [Beowulf] G5 cluster for testing In-Reply-To: <403BDB8B.7060904@cora.nwra.com> References: <403BDB8B.7060904@cora.nwra.com> Message-ID: <20040225071039.GA29125@cse.ucdavis.edu> On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote: > Anyone (vendors?) out there have a G5 cluster available for some For the most part I'm finding that cluster performance is mostly predictable by single node performance, and the scaling of the interconnect. At least as an approximation, I'm going to use to find a good place to start for my next couple cluster designs. I'm current benchmarking: Dual G5 Opteron duals (1.4, 1.8, and 2.2) Opteron quad 1.4 Itanium dual 1.4 GHz Dual P4-3.0 GHz+HT Single P4-3.0 GHz+HT Alas, my single node performance testing on the G5 has been foiled by my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem working. Anyone else have MPICH and shared memory working on OSX? Or maybe a dual g5 linux account for an evening of benchmarking? Normally using ch_p4 and localhost wouldn't be to big a deal, but ping localhost on OSX is something like 40 times than linux, mpich with ch_p4 on OSX is around 20 times worse than linux with shared memory. > testing? I've been charged with putting together a small cluster and > have been asked to look into G5 systems as well (I guess 64 bit powerPC > really....) Assuming all the applications and tools work under all environments your considering I'd figure out what interconnect you want to get first. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Feb 25 03:34:38 2004 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 25 Feb 2004 09:34:38 +0100 (CET) Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <1077542843.26492.12.camel@qeldroma.cttc.org> Message-ID: On Mon, 23 Feb 2004, Daniel Fernandez wrote: [...] > On the other hand, is possible to deviate smartd log to a specific file > and check it regularly when it's updated, adding this parameter to > smartd: > > -l facility > > Of course some syslog.conf modifying will be needed to instruct syslogd > to log on a specific file from the "facility" specified. Thanks for the hints, I was not yet aware of the -l and -M flags. Still, I think directly calling "smartctl" from a cron job is the better solution. With just smartd and the flags above, you still won't get any updates if the smartd simply dies and you won't even notice, because grep simply finds the last entry in the log. Besides, you still have the problem of log rotate (except if you let grow your log file forever...). Regards, Felix --- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Feb 25 04:18:14 2004 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 25 Feb 2004 10:18:14 +0100 (CET) Subject: [Beowulf] Subclusters... In-Reply-To: <200402242339.19310.raiders@phreaker.net> Message-ID: On Tue, 24 Feb 2004 raiders at phreaker.net wrote: > - groups of cluster-users will be restricted to their own clusters normally > (some exclusions may apply) > - SGE/PBS, GbE etc are standard choices > A very quick answer from me is to think of the whole thing as one cluster, then install it. In SGE, it is possible to have groups of users defined, and to allow only certain groups/users access to each queue. So (say) you could have a Physics group, a Chemistry group etc. As for access to the public facing nodes, again quickly off the top of my head, you just need authentication which allows logins only from the appropriate group. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Feb 25 07:01:02 2004 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 25 Feb 2004 13:01:02 +0100 (CET) Subject: [Beowulf] FOSDEM talk Message-ID: There is a current thread on SMART usage. There was also a thread about six months ago on lm_sensors, about the output format of sensors, and how one has to parse it. Sorry if this message is a bit of a ramble. At FOSDEM over the weekend I went to a talk by Robert Love on his work on Linux kernel and destop integration, on HAL and DBUS. One slide made me sit up and take notice, as he had an example of a kernel message saying 'overheating'. The message format was something like an SNMP OID, as I remember org.kernel.processor.overheating (or something like that). One could then think of a process listening on the netlink socket, generating (for example) an SNMP trap on receiving events of this category. A better way of doing things than running sensors periodically then parsing the output. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Feb 25 04:55:22 2004 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 25 Feb 2004 10:55:22 +0100 (CET) Subject: [Beowulf] Subclusters... In-Reply-To: <200402242339.19310.raiders@phreaker.net> Message-ID: On Tue, 24 Feb 2004 raiders at phreaker.net wrote: > We are on a project as described below: > > - IA32 linux cluster for general parallel programming > - five head nodes, each head node will have about 15 compute nodes and > dedicated storage > - groups of cluster-users will be restricted to their own clusters normally > (some exclusions may apply) > - SGE/PBS, GbE etc are standard choices > > But the people in power want one single software or admin console (cluster > toolkit?) to manage the entire cluster from one adm station (which may or may > not be one of the head nodes). Thinking about this, the way I would architect things is to stop thinking of subclusters - yet of course give the users their allocation of resources. So, choose your cluster install method of choice. Have one admin/master node and install all 75 nodes. Have 5 public facing machines, and have logins go through a load-balancer or round robin. When a user logs in they get directed to the least loaded machine. Why? If one machine goes down (fault or upgrade) the users still have four machines. They don't "see" this as you have entries in the DNS for e.g. necromancy.hogwarts defence-darkarts.hogwarts potions.hogwarts spells.hogwarts magical-creatures.hogwarts all pointing the same way. It would be better to have 5 separate storage nodes, but the login machines in your scenario will have to do that job also. Just allocate storage per group. The 75 compute nodes are installed within the cluster. Now, at a first pass you want to 'saw things up' into 15 node lumps. This can be done easily - just put a queue or queues on each and allow only certain groups access. But I will contend this is a bad idea. Batch queueing systems have facilities to look after fair shares of resources between groups. Say you have the 5 separate groups scenario. Say today Professor Snape isn't doing any potions work. The 15 potions machines will lie idel, while there are plenty of jobs in necromancy just dying to run. Use the fairshare in SGE or LSF. Each group will get their allocated share of CPU. You'll also have redundancy - so that you can take machines out for maintenance/repairs without impacting any one group, ie. the load is shared across 75 machines not 5. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patricka at its.uct.ac.za Wed Feb 25 09:01:49 2004 From: patricka at its.uct.ac.za (Patrick) Date: Wed, 25 Feb 2004 16:01:49 +0200 Subject: [Beowulf] G5 cluster for testing References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: <026001c3fba7$e9af5d00$a61b9e89@nawty> Has anyone here actually tried out Xgrid ? Apples grid stuff. It seems to be not so fussy in regards to the type of macs you attach and suchlike ? as well as them being configurable via Zeroconf. P ----- Original Message ----- From: "Bill Broadley" To: "Orion Poplawski" Cc: Sent: Wednesday, February 25, 2004 9:10 AM Subject: Re: [Beowulf] G5 cluster for testing > On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote: > > Anyone (vendors?) out there have a G5 cluster available for some > > For the most part I'm finding that cluster performance is mostly > predictable by single node performance, and the scaling of the > interconnect. At least as an approximation, I'm going to use to find > a good place to start for my next couple cluster designs. > > I'm current benchmarking: > Dual G5 > Opteron duals (1.4, 1.8, and 2.2) > Opteron quad 1.4 > Itanium dual 1.4 GHz > Dual P4-3.0 GHz+HT > Single P4-3.0 GHz+HT > > Alas, my single node performance testing on the G5 has been foiled by > my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem > working. > > Anyone else have MPICH and shared memory working on OSX? Or maybe a dual > g5 linux account for an evening of benchmarking? > > Normally using ch_p4 and localhost wouldn't be to big a deal, but > ping localhost on OSX is something like 40 times than linux, mpich with > ch_p4 on OSX is around 20 times worse than linux with shared memory. > > > testing? I've been charged with putting together a small cluster and > > have been asked to look into G5 systems as well (I guess 64 bit powerPC > > really....) > > Assuming all the applications and tools work under all environments your > considering I'd figure out what interconnect you want to get first. > > -- > Bill Broadley > Computational Science and Engineering > UC Davis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Feb 25 08:22:15 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 25 Feb 2004 21:22:15 +0800 (CST) Subject: [Beowulf] G5 cluster for testing In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: <20040225132215.21497.qmail@web16813.mail.tpe.yahoo.com> I've heard that LAM works better with OSX. Andrew. --- Bill Broadley ????> On > Alas, my single node performance testing on the G5 > has been foiled by > my inability to get MPICH, OSX, and ./configure > --with-device=ch_shmem > working. > > Anyone else have MPICH and shared memory working on > OSX? Or maybe a dual > g5 linux account for an evening of benchmarking? > > Normally using ch_p4 and localhost wouldn't be to > big a deal, but > ping localhost on OSX is something like 40 times > than linux, mpich with > ch_p4 on OSX is around 20 times worse than linux > with shared memory. > > > testing? I've been charged with putting together > a small cluster and > > have been asked to look into G5 systems as well (I > guess 64 bit powerPC > > really....) > > Assuming all the applications and tools work under > all environments your > considering I'd figure out what interconnect you > want to get first. > > -- > Bill Broadley > Computational Science and Engineering > UC Davis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Feb 25 08:27:54 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 25 Feb 2004 21:27:54 +0800 (CST) Subject: [Beowulf] Subclusters... In-Reply-To: <200402242339.19310.raiders@phreaker.net> Message-ID: <20040225132754.14108.qmail@web16809.mail.tpe.yahoo.com> GridEngine has the concept of a CELL. It is not well documented, but it works like pointing to a different cell gives you a different configuration, ie. different subcluster. When you setup SGE, it will ask you for the name of the cell, so on the same head node, each time you run the sge install script, use a different cell name. This way you will get 5 different SGE clusters controlled by the same headnode. Better ask on the SGE mailing list since I've never played around with this too much. http://gridengine.sunsource.net/project/gridengine/maillist.html Andrew. --- raiders at phreaker.net ????> We are on a project as described below: > > - IA32 linux cluster for general parallel > programming > - five head nodes, each head node will have about 15 > compute nodes and > dedicated storage > - groups of cluster-users will be restricted to > their own clusters normally > (some exclusions may apply) > - SGE/PBS, GbE etc are standard choices > > But the people in power want one single software or > admin console (cluster > toolkit?) to manage the entire cluster from one adm > station (which may or may > not be one of the head nodes). > > I looked around and could not find any suitable > solution (ROCKS, oscar, etc). > ROCKS, oscar etc can manage only one cluster at a > time and cannot handle > subclusters. (I might be wrong) > > I believe that only custom programming can help. > Appreciate any expert > opinion > > Thanks, > Shawn > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ashley at quadrics.com Wed Feb 25 08:03:02 2004 From: ashley at quadrics.com (Ashley Pittman) Date: Wed, 25 Feb 2004 13:03:02 +0000 Subject: [Beowulf] G5 cluster for testing In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu> References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: <1077714182.656.235.camel@ashley> On Wed, 2004-02-25 at 07:10, Bill Broadley wrote: > On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote: > > Anyone (vendors?) out there have a G5 cluster available for some > > For the most part I'm finding that cluster performance is mostly > predictable by single node performance, and the scaling of the > interconnect. There is a third issue here which you've missed which is that interconnect performance can depends on the PCI bridge that it's plugged into. It would be more correct to say that performance is predictable by dual-node performance and scaling of the interconnect. Of course this may not make a difference for Ethernet or even gig-e but it does matter at the high end. Ashley, _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Wed Feb 25 14:32:12 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed, 25 Feb 2004 11:32:12 -0800 Subject: [Beowulf] G5 cluster for testing In-Reply-To: <1077714182.656.235.camel@ashley> References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <1077714182.656.235.camel@ashley> Message-ID: <20040225193212.GA14558@greglaptop.internal.keyresearch.com> On Wed, Feb 25, 2004 at 01:03:02PM +0000, Ashley Pittman wrote: > There is a third issue here which you've missed which is that > interconnect performance can depends on the PCI bridge that it's plugged > into. Doesn't the G5 have exactly one chipset implementation available? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Thu Feb 26 05:38:09 2004 From: pesch at attglobal.net (pesch at attglobal.net) Date: Thu, 26 Feb 2004 02:38:09 -0800 Subject: [Beowulf] Flashmobcomputing References: Message-ID: <403DCC91.5A10A77B@attglobal.net> Nothing moves faster than the speed of light - with the exception of bad news (according to the late Douglas Adams); therefore, at the grid nirvana, bad news must get increasingly more bad. Which leads me to the hypothesis that nirvana is that locus at the irs which stores the access codes for the pentium microcode backdoors... "Robert G. Brown" wrote: > On Mon, 23 Feb 2004, Greg Lindahl wrote: > > > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > > > of course, in the grid nirvana, > > > all computers would have multiple ports of infiniband, > > > and the word would be 5 us across ;) > > > > In grid nirvana, the speed of light would rise with Moore's Law. > > I'll have to think about that one. > > Exponential growth of the speed of light. Hmmm. Some sort of > inflationary model? Space flattening towards non-relativistic > classical? The physics of Nirvana would be veeeery interesting... > > :-) > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Feb 25 22:02:03 2004 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 26 Feb 2004 14:02:03 +1100 Subject: [Beowulf] Flashmobcomputing In-Reply-To: <403DCC91.5A10A77B@attglobal.net> References: <403DCC91.5A10A77B@attglobal.net> Message-ID: <200402261402.05015.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 26 Feb 2004 09:38 pm, pesch at attglobal.net wrote: > Nothing moves faster than the speed of light - with the exception of bad > news (according to the late Douglas Adams); The only things known to go faster than ordinary light is monarchy, according to the philosopher Ly Tin Weedle. He reasoned like this: you can't have more than one king, and tradition demands that there is no gap between kings, so when a king dies the succession must therefore pass to the heir instantaneously. Presumably, he said, there must be some elementary particles - -- kingons, or possibly queons -- that do this job, but of course succession sometimes fails if, in mid-flight, they strike an anti-particle, or republicon. His ambitious plans to use his discovery to send messages, involving the careful torturing of a small king in order to modulate the signal, were never fully expanded because, at that point, the bar closed. - -- (Terry Pratchett, Mort) courtesy of: http://www.co.uk.lspace.org/books/pqf/mort.html - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAPWGrO2KABBYQAh8RAsm3AJ4zV3fEk8q/8Jm/zqY4xiBzGvKj4ACfeT+N 3NhDhvgiJyhukmnzBFHUaMQ= =NgG+ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Feb 25 21:32:59 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed, 25 Feb 2004 18:32:59 -0800 Subject: [Beowulf] Cray buys Octigabay Message-ID: <20040226023258.GA9211@cse.ucdavis.edu> An interesting development: http://www.octigabay.com/ http://www.octigabay.com/newsEvents/cray_release.htm http://www.cray.com/ http://www.cray.com/media/2004/february/octigabay.html -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hpcatcnc at yahoo.com Thu Feb 26 01:56:09 2004 From: hpcatcnc at yahoo.com (prakash borade) Date: Wed, 25 Feb 2004 22:56:09 -0800 (PST) Subject: [Beowulf] predifined nodes for a job Message-ID: <20040226065609.19075.qmail@web21507.mail.yahoo.com> can any body tell how can i allot some fix predefined machines from my cluster to the job i have tried using option -machinefile mcfile where mcfile is the fiel in a local dir contaning required machinnames also i don't want to use the machine from which i will issue mpirun command and the mpich is installed opn thish machine do i have any solution fro this __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hpcatcnc at yahoo.com Thu Feb 26 02:04:08 2004 From: hpcatcnc at yahoo.com (prakash borade) Date: Wed, 25 Feb 2004 23:04:08 -0800 (PST) Subject: [Beowulf] predifined nodes for a job Message-ID: <20040226070408.73987.qmail@web21509.mail.yahoo.com> can any body tell how can i allot some fix predefined machines from my cluster to the job i have tried using option -machinefile mcfile where mcfile is the fiel in a local dir contaning required machinnames also i don't want to use the machine from which i will issue mpirun command and the mpich is installed opn thish machine do i have any solution fro this __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sdutta at deas.harvard.edu Thu Feb 26 07:43:51 2004 From: sdutta at deas.harvard.edu (Suvendra Nath Dutta) Date: Thu, 26 Feb 2004 07:43:51 -0500 (EST) Subject: [Beowulf] G5 cluster for testing In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu> References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: I fought for a while to get a OSX cluster up, precisely to test the G5 performance. I had lots of problems with setting up NFS and setting up MPICH to use shared memory on the dual processors. I was able to take advantage of the firewire networking built into OS X. We were taking the harder route of staying away from all non-open source tools to do NFS (NFSManager) or MPI (Pooch). As was pointed out in another message, we are mostly keen on just testing performance of three applications that we will run on our cluster rather than HPL numbers. Finally we gave up the struggle. We are now working with Apple to benchmark on an existing setup instead of us trying to set everything up ourselves. Unfortunately there isn't a howto on doing this yet. I'll post numbers when we get it. Suvendra. On Tue, 24 Feb 2004, Bill Broadley wrote: > On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote: > > Anyone (vendors?) out there have a G5 cluster available for some > > For the most part I'm finding that cluster performance is mostly > predictable by single node performance, and the scaling of the > interconnect. At least as an approximation, I'm going to use to find > a good place to start for my next couple cluster designs. > > I'm current benchmarking: > Dual G5 > Opteron duals (1.4, 1.8, and 2.2) > Opteron quad 1.4 > Itanium dual 1.4 GHz > Dual P4-3.0 GHz+HT > Single P4-3.0 GHz+HT > > Alas, my single node performance testing on the G5 has been foiled by > my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem > working. > > Anyone else have MPICH and shared memory working on OSX? Or maybe a dual > g5 linux account for an evening of benchmarking? > > Normally using ch_p4 and localhost wouldn't be to big a deal, but > ping localhost on OSX is something like 40 times than linux, mpich with > ch_p4 on OSX is around 20 times worse than linux with shared memory. > > > testing? I've been charged with putting together a small cluster and > > have been asked to look into G5 systems as well (I guess 64 bit powerPC > > really....) > > Assuming all the applications and tools work under all environments your > considering I'd figure out what interconnect you want to get first. > > -- > Bill Broadley > Computational Science and Engineering > UC Davis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Thu Feb 26 06:55:22 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu, 26 Feb 2004 03:55:22 -0800 Subject: [Beowulf] G5 cluster for testing In-Reply-To: <1077714182.656.235.camel@ashley> References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <1077714182.656.235.camel@ashley> Message-ID: <20040226115522.GA12286@cse.ucdavis.edu> On Wed, Feb 25, 2004 at 01:03:02PM +0000, Ashley Pittman wrote: > There is a third issue here which you've missed which is that > interconnect performance can depends on the PCI bridge that it's plugged > into. It would be more correct to say that performance is predictable > by dual-node performance and scaling of the interconnect. Of course > this may not make a difference for Ethernet or even gig-e but it does > matter at the high end. Take this chart for instance: http://www.myri.com/myrinet/PCIX/bus_performance.html On any decent size cluster the node performance or interconnect performance is likely to be significantly larger effects on cluster performance then any of the differences on that chart. Or maybe your talking about sticking $1200 Myrinet cards in a 133 MB/sec PCI slot? Don't forget peak bandwidth measurements assume huge (10000-64000 byte packets), latency tolerance, and zero computation. Not exactly the use I'd expect in a typical production cluster. So my suggestion is: #1 Pick your application(s), this is why your buying a cluster right? #2 For compatible nodes pick the node with the best perf or price/perf. #3 For compatible interconnects pick the one with the best scaling or price/scaling for the number of nodes you can afford/fit. #3 If you get a choice of PCI-X bridges, sure consult the URL above and pick the fastest one. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 26 11:51:42 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 26 Feb 2004 11:51:42 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: <403DCC91.5A10A77B@attglobal.net> Message-ID: On Thu, 26 Feb 2004 pesch at attglobal.net wrote: > Nothing moves faster than the speed of light - with the exception of bad > news (according to the late Douglas Adams); therefore, at the grid > nirvana, bad news must get increasingly more bad. Which leads me to the > hypothesis that nirvana is that locus at the irs which stores the access > codes for the pentium microcode backdoors... This is not exactly correct. Or rather, it might well be true (something mandala-like in the image of that locus:-) but isn't strictly logical or on topic for the list. The correct LIST conclusion is that for us to build transluminal clusters, we need to insure that all the messages (news) carried are bad. Now, who is going to develop BMPI (Bad Message Passing Interface)? Any volunteers? ;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From choyhauyan at yahoo.com Thu Feb 26 00:51:24 2004 From: choyhauyan at yahoo.com (choy hau yan) Date: Wed, 25 Feb 2004 21:51:24 -0800 (PST) Subject: [Beowulf] shared distributed memory ? Message-ID: <20040226055124.33033.qmail@web41313.mail.yahoo.com> I am a user for scyld beowulf cluster. I use mpi for parallel computing.I have some question: > I got 2 processors that in shared memory and then > connect with TCP/IP to another 2 processors in shared > memory. > > I use mpisend/recv for communication, but why can't I > call this shared distributed memory? > The speedup with this architecture is very low.why? > > speedup: > 2 processor: 1.61 > 3 processor: 2.31 > 4 processor: 2.30 > actually with shared memory, the speedup is more high > that distributed becasue almost no cos communication > in shared memory.right? Hope that some one can answer my question. thanks.. > __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From moor007 at bellsouth.net Thu Feb 26 16:51:03 2004 From: moor007 at bellsouth.net (moor007 at bellsouth.net) Date: Thu, 26 Feb 2004 15:51:03 -0600 Subject: [Beowulf] Cluster HW Message-ID: <200402261551.03785.moor007@bellsouth.net> I apologize for having to ask in this forum...but I really do not know where to begin. I just upgraded my interconnects from the Dolphinics (SCI) and want to sell them (rarely used) because only one of the four applications I use would utilize them. Is there a forum/market, besides Ebay, for this type of specialty HW? Tim _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Feb 26 17:34:08 2004 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 27 Feb 2004 09:34:08 +1100 Subject: [Beowulf] G5 cluster for testing In-Reply-To: References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: <200402270934.10022.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 26 Feb 2004 11:43 pm, Suvendra Nath Dutta wrote: > We were taking the harder route of staying away from all non-open source > tools to do NFS (NFSManager) or MPI (Pooch). There is also Black Lab Linux from Terrasoft which build clusters on YDL with BProc, MPICH, etc for Macs. No idea whether it supports G5's or how FOSS it is though.. cheers! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD4DBQFAPnRgO2KABBYQAh8RAq6sAJMEJwyT1vn3MV9RM/Fwpy6gs4CZAJ9QAGf2 oyEbIVcHgTfcs+Jk2xb7dg== =92C8 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Feb 26 19:01:51 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 26 Feb 2004 19:01:51 -0500 (EST) Subject: [Beowulf] shared distributed memory ? In-Reply-To: <20040226055124.33033.qmail@web41313.mail.yahoo.com> Message-ID: A few questions: What are your processor speeds? What is your interconnect? What is your application? Having to communicate with another node vs the same node is not the same thing. (ping local host and ping the other node) Obviously your application is sensitive to the interconnect (either bandwidth of latency) Really fast processors and a slow interconnect usually means poor scalability for some applications. Doug On Wed, 25 Feb 2004, choy hau yan wrote: > I am a user for scyld beowulf cluster. I use mpi for > parallel computing.I have some question: > > > I got 2 processors that in shared memory and then > > connect with TCP/IP to another 2 processors in > shared > > memory. > > > > I use mpisend/recv for communication, but why can't > I > > call this shared distributed memory? > > The speedup with this architecture is very low.why? > > > > speedup: > > 2 processor: 1.61 > > 3 processor: 2.31 > > 4 processor: 2.30 > > actually with shared memory, the speedup is more > high > > that distributed becasue almost no cos communication > > in shared memory.right? Hope that some one can > answer my question. thanks.. > > > > > __________________________________ > Do you Yahoo!? > Get better spam protection with Yahoo! Mail. > http://antispam.yahoo.com/tools > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From graham.mullier at syngenta.com Fri Feb 27 05:12:18 2004 From: graham.mullier at syngenta.com (graham.mullier at syngenta.com) Date: Fri, 27 Feb 2004 10:12:18 -0000 Subject: [Beowulf] Flashmobcomputing Message-ID: <0B27450D68F1D511993E0001FA7ED2B30437D3BD@ukjhmbx12.ukjh.zeneca.com> Hmm, presumably a 'bad' message will need to have the Evil Bit set (http://www.ietf.org/rfc/rfc3514.txt)? Graham -----Original Message----- From: Robert G. Brown [mailto:rgb at phy.duke.edu] [...] The correct LIST conclusion is that for us to build transluminal clusters, we need to insure that all the messages (news) carried are bad. Now, who is going to develop BMPI (Bad Message Passing Interface)? Any volunteers? ;-) [...] _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From anantanagb at yahoo.com Fri Feb 27 04:49:38 2004 From: anantanagb at yahoo.com (anantanag bhat) Date: Fri, 27 Feb 2004 01:49:38 -0800 (PST) Subject: [Beowulf] P4_error: net_recv read : probable EOF on socket:1 Message-ID: <20040227094938.32769.qmail@web21322.mail.yahoo.com> Sir, I have installed MPICH on my 8 processor Cluster. Every thing was running fine for first few days. Now if I starts the run in the node4, it is getting stuck. after 2hour. the error in the .out file is as below "P4_error: net_recv read : probable EOF on socket:1" But it is not the same in first 3 nodes. In these runs are going fine. Can anybody please help me to solve this. Thanks in advance __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Feb 27 08:11:43 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 27 Feb 2004 08:11:43 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B30437D3BD@ukjhmbx12.ukjh.zeneca.com> Message-ID: On Fri, 27 Feb 2004 graham.mullier at syngenta.com wrote: > Hmm, presumably a 'bad' message will need to have the Evil Bit set > (http://www.ietf.org/rfc/rfc3514.txt)? You know, I just joined the ietf.org list a week or two ago to see if there was any possibility of leveraging their influence on e.g. AV vendors to get them to stop mailing bounce messages back to the "From" address on viruses, given that there hasn't been a virus that hasn't forged its From header to an innocent third party for several years now. Finding myself sucked into an endless discussion with people who want the ietf to issue an RFC to call for digitally signing all mail and using said signatures to drive all spam white/blacklisting (imagine the keyservice THAT would require and the gazillion dollar profits it would generate) I have gradually started to wonder if the ietf has degenerated into a kind of a cruel joke. This RFC, however, lifts my spirits and renews my confidence that the original luminaries that designed in the Internet have not fully stopped glowing in the chaotic darkness that surrounds them. Armed with the complete confidence that my design is based on both sound protocol and Dr. D. Adams' valuable empirical observation about bad news, I will start work on a PVM version that sets the Evil Bit right away. I fully expect to win a Nobel Prize from the proof that communications are transluminal in the resulting cluster. It must be that the Evil Bit is somehow a time-reversal bit or a tachyonic bit -- Bad News must somehow propagate backwards in time from the event. I most certainly will acknowledge all of the contributions of all you "little people" when I receive my invitation to Stockholm. I'm so happy. Sniff. rgb > > Graham > > -----Original Message----- > From: Robert G. Brown [mailto:rgb at phy.duke.edu] > [...] The correct LIST conclusion is that > for us to build transluminal clusters, we need to insure that all the > messages (news) carried are bad. > > Now, who is going to develop BMPI (Bad Message Passing Interface)? Any > volunteers? > > ;-) > [...] > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Feb 27 12:38:09 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 27 Feb 2004 18:38:09 +0100 (CET) Subject: [Beowulf] Flashmobcomputing In-Reply-To: Message-ID: On Fri, 27 Feb 2004, Robert G. Brown wrote: > > Armed with the complete confidence that my design is based on both sound > protocol and Dr. D. Adams' valuable empirical observation about bad > news, I will start work on a PVM version that sets the Evil Bit right > away. I fully expect to win a Nobel Prize from the proof that > communications are transluminal in the resulting cluster. It must be > that the Evil Bit is somehow a time-reversal bit or a tachyonic bit -- > Bad News must somehow propagate backwards in time from the event. > Once this phase of the research has been completed, can we make an application to the NSF for an extension into using SEP fields for systems management? http://www.fact-index.com/s/so/somebody_else_s_problem_field.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Fri Feb 27 16:23:23 2004 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Fri, 27 Feb 2004 16:23:23 -0500 Subject: [Beowulf] Single Processor vs SMP In-Reply-To: <20040227092124.GA8410@blackTiger> References: <20040227092124.GA8410@blackTiger> Message-ID: <403FB54B.5030401@comcast.net> Paulo, I hoping someone will jump and say that the answer depends upon the code(s) you're running. If possible test your codes on a dual CPU box with one copy running and then two copies running (make sure one copy is on one CPU). Test this on the architectures you are interested in. If you can also test on multiple nodes with some kind of interconnect to judge how the code(s) scale with number of nodes and with interconnect. For example, at work I use a code that we tested on single and dual CPU machines. It was an older PIII/500 box that used the old Intel 440BX chipset (if I remember correctly). We found that running two copies only resulted in a 30% penalty for running duals. We also tested on a cluster with Myrinet and GigE. Myrinet only gave this code about a 2% decrease in wall clock time (we measure speed in wall clock time since that is what is important to us). Then we got quotes for machines and did the price/performance calculation and determined which cluster was the best. I highly recommend doing the same thing for your code(s). Be sure to check out Opterons since they have an interesting memory subsytem that should allow your codes to have little penalty in running on dual machines ("should" is the operative word. You should test your codes to determine if this is true). Good Luck! Jeff >Hello, > >I'm currently working in a physics department that is in the process of >building a high performance Beowulf cluster and I have some doubts in >terms of what type of hardware to acquire. > >The programming systems that will be used are MPI and HPF. Does anyone >knows any study comparing the performance of single cpu machines vs smp >machines or even between the several cpu's available (intel p4, amd athlon, >powerpc g5, ...)? > >Thanks for any advice > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From paulojjs at bragatel.pt Fri Feb 27 04:21:24 2004 From: paulojjs at bragatel.pt (Paulo Silva) Date: Fri, 27 Feb 2004 09:21:24 +0000 Subject: [Beowulf] Single Processor vs SMP Message-ID: <20040227092124.GA8410@blackTiger> Hello, I'm currently working in a physics department that is in the process of building a high performance Beowulf cluster and I have some doubts in terms of what type of hardware to acquire. The programming systems that will be used are MPI and HPF. Does anyone knows any study comparing the performance of single cpu machines vs smp machines or even between the several cpu's available (intel p4, amd athlon, powerpc g5, ...)? Thanks for any advice -- Paulo Jorge Jesus Silva perl -we 'print "paulojjs".reverse "\ntp.letagarb@"' The best you get is an even break. -- Franklin Adams -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From ddw at dreamscape.com Sat Feb 28 01:19:37 2004 From: ddw at dreamscape.com (Daniel Williams) Date: Sat, 28 Feb 2004 01:19:37 -0500 Subject: [Beowulf] Flashmobcomputing - the evil bit References: <200402271705.i1RH5vh16216@NewBlue.scyld.com> Message-ID: <404032F8.F5DDDA59@dreamscape.com> The problem with this idea is that Linux is too good at dealing with flawed or malicious data, so even if the evil bit is set, it still would not qualify as "bad news", and thus would not travel superluminally. Consequently, I would speculate that the only system that could communicate superluminally is one running some form of Winblows, since *any* data, of *any* kind, with or without the evil bit set is bad news for MS operating systems, and likely to cause a crash. The problem with superluminal cluster computing then becomes obvious - you can't get any actual useful calculation done faster than lightspeed, because the only operating systems that work at that speed can't do any useful work. DDW > Hmm, presumably a 'bad' message will need to have the Evil Bit set > (http://www.ietf.org/rfc/rfc3514.txt)? > > Graham > > -----Original Message----- > From: Robert G. Brown [mailto:rgb at phy.duke.edu] > [...] The correct LIST conclusion is that > for us to build transluminal clusters, we need to insure that all the > messages (news) carried are bad. > > Now, who is going to develop BMPI (Bad Message Passing Interface)? Any > volunteers? > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Feb 1 00:39:40 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sun, 1 Feb 2004 13:39:40 +0800 (CST) Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 Message-ID: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html 2.6 looks very promising, wondering when distributions will include it. Also ia64 performance looks bad when compared to Xeon or amd64. Intel switching to amd64 is a good choice ;-> Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Sun Feb 1 05:40:34 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Sun, 1 Feb 2004 10:40:34 +0000 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> Message-ID: <20040201104034.GA9280@galactic.demon.co.uk> On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote: > http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html > > 2.6 looks very promising, wondering when distributions > will include it. > Debian unstable does today. The new installer for the next release of Debian (currently Debian testing) which is in beta test may well include a 2.6 kernel option. > Also ia64 performance looks bad when compared to Xeon > or amd64. Intel switching to amd64 is a good choice > ;-> > Newsflash: Severe weather means Hell freezes over, preventing flying pigs from taking off :) IIRC: Since you seem well aware of SPBS / storm - is the newest storm release fully free / GPL'd such that I can use it anywhere? Thanks, Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Sun Feb 1 05:57:37 2004 From: xyzzy at speakeasy.org (Trent Piepho) Date: Sun, 1 Feb 2004 02:57:37 -0800 (PST) Subject: [Beowulf] C vs C++ challenge In-Reply-To: <1075512676.4915.207.camel@protein.scalableinformatics.com> Message-ID: > I could easily optimize it more (do the work on a larger buffer at a > once), but I think enough waste heat has been created here. This is a > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3. Enough time wasted on finding different solutions to a simple problem? Surely not. Let me toss my hat into the ring: Awk Perl C My program (C) wrnpc10.txt 1.771 1.125 0.506 0.164 shaks12.txt 3.055 1.877 0.955 0.243 big.txt 20.339 12.792 5.858 1.196 vbig.txt 101.466 63.770 29.079 5.666 All times are from a dual PIII-1GHz on a ServerWorks board with 1GB dual channel PC133 ram. Each time is the best of three runs and is wall time. The awk version is by Selva Nair, Perl by Joe Landman, C version by Robert G Brown. The Java version isn't portable enough for me to run (go Java!) and I didn't see the source for a C++/STL version. Compiler used was gcc 2.96, awk was 3.1.0, and perl was 5.6.1. The actual results for shaks12.txt, which are of course never the same: version total unique awk 902299 31384 perl 23903 C 902299 37499 My 906912 27321 wc 901325 I considered words to be formed from 0-9, a-z, A-Z, and '. Everything is lower cased. The shaks12.txt is complicated by the use of the single quote for as both for quotations and for contractions. I also have the list of words and counts, sorted no less, but do not print it. I'll give you guys a few days, and see if anyone finds a solution before I reveal my secrets. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sun Feb 1 06:51:22 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sun, 1 Feb 2004 12:51:22 +0100 (CET) Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> Message-ID: On Sun, 1 Feb 2004, [big5] Andrew Wang wrote: > http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html > > 2.6 looks very promising, wondering when distributions > will include it. > It's already possible to use yum to get a 2.6 kernel for Fedora. (Must start testing it myself). This prompted me to look at the Fedora roadmap: http://fedora.redhat.com/participate/schedule/ Looks like 2.6 will be in Fedora 2, scheduled for April. And very interestingly: "and integrating work on other architectures (at least AMD64, and possibly also SPARC)." _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From klamman.gard at telia.com Sun Feb 1 07:30:52 2004 From: klamman.gard at telia.com (Per Lindstrom) Date: Sun, 01 Feb 2004 13:30:52 +0100 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk> Message-ID: <401CF17C.8080706@telia.com> I have experienced some problems to compile SMP support for the 2.6.1-kernel on my Intel Xeon based workstation: M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz Chipset: Intel 7505 CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB Graph: GeForce FX 5200 128MB The SMP support works fine all the way up to kernel 2.4.22 but when there is stop for the XEON. The SMP support works fine for the Intel Tualatin workstation all the way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to build a 2.6.0. Please advice if some one have solved this problem. Best regards Per Lindstrom . . Andrew M.A. Cater wrote: >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote: > > >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html >> >>2.6 looks very promising, wondering when distributions >>will include it. >> >> >> >Debian unstable does today. The new installer for the next release >of Debian (currently Debian testing) which is in beta test may well >include a 2.6 kernel option. > > > >>Also ia64 performance looks bad when compared to Xeon >>or amd64. Intel switching to amd64 is a good choice >>;-> >> >> >> >Newsflash: Severe weather means Hell freezes over, preventing flying >pigs from taking off :) > >IIRC: Since you seem well aware of SPBS / storm - is the newest storm >release fully free / GPL'd such that I can use it anywhere? > >Thanks, > >Andy >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sun Feb 1 06:44:18 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sun, 1 Feb 2004 12:44:18 +0100 (CET) Subject: [Beowulf] HVAC and room cooling... In-Reply-To: <401C253E.9040206@obs.unige.ch> Message-ID: On Sat, 31 Jan 2004, Pfenniger Daniel wrote: > > Note that in the responded message John was confusing N2 and NO2. Eeek! I am outed as a physicist... I've come out of the lab (closet). Guess I can now wear a slide rule with pride. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sun Feb 1 12:50:44 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sun, 1 Feb 2004 12:50:44 -0500 (EST) Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <401CF17C.8080706@telia.com> Message-ID: > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz > Chipset: Intel 7505 > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB all extremely mundane and FULLY supported. > Graph: GeForce FX 5200 128MB bzzt. take it out, try again. don't even *think* about loading the binary nvidia driver. > The SMP support works fine all the way up to kernel 2.4.22 but when > there is stop for the XEON. needless to say, 2.6 has been extensively tested on xeons, and it works fine. your problem is specific to your config. if you want help, you'll have to start by describing how it fails. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From shaeffer at neuralscape.com Sun Feb 1 13:17:15 2004 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Sun, 1 Feb 2004 10:17:15 -0800 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <401CF17C.8080706@telia.com> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk> <401CF17C.8080706@telia.com> Message-ID: <20040201181715.GB8159@synapse.neuralscape.com> On Sun, Feb 01, 2004 at 01:30:52PM +0100, Per Lindstrom wrote: > I have experienced some problems to compile SMP support for the > 2.6.1-kernel on my Intel Xeon based workstation: > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz > Chipset: Intel 7505 > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB > Graph: GeForce FX 5200 128MB > > The SMP support works fine all the way up to kernel 2.4.22 but when > there is stop for the XEON. I am compiling linux-2.6.2-rc2 on dual XEONs with no problems. The kernels actually run too. I'm just starting performance testing, but results are very promising. Thanks, Karen > > The SMP support works fine for the Intel Tualatin workstation all the > way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to > build a 2.6.0. > > Please advice if some one have solved this problem. > > Best regards > Per Lindstrom > . > . > Andrew M.A. Cater wrote: > > >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote: > > > > > >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html > >> > >>2.6 looks very promising, wondering when distributions > >>will include it. > >> > >> > >> > >Debian unstable does today. The new installer for the next release > >of Debian (currently Debian testing) which is in beta test may well > >include a 2.6 kernel option. > > > > > > > >>Also ia64 performance looks bad when compared to Xeon > >>or amd64. Intel switching to amd64 is a good choice > >>;-> > >> > >> > >> > >Newsflash: Severe weather means Hell freezes over, preventing flying > >pigs from taking off :) > > > >IIRC: Since you seem well aware of SPBS / storm - is the newest storm > >release fully free / GPL'd such that I can use it anywhere? > > > >Thanks, > > > >Andy > >_______________________________________________ > >Beowulf mailing list, Beowulf at beowulf.org > >To change your subscription (digest mode or unsubscribe) visit > >http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf ---end quoted text--- -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 shaeffer at neuralscape.com http://www.neuralscape.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From poobah_99 at hotmail.com Sun Feb 1 14:24:03 2004 From: poobah_99 at hotmail.com (Ryan Kastrukoff) Date: Sun, 01 Feb 2004 11:24:03 -0800 Subject: [Beowulf] unsubscribe universe beowulf@beowulf.org Message-ID: _________________________________________________________________ The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail http://join.msn.com/?page=dept/bcomm&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Sun Feb 1 14:33:03 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Sun, 01 Feb 2004 14:33:03 -0500 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> <20040201104034.GA9280@galactic.demon.co.uk> Message-ID: <401D546F.7090109@scalableinformatics.com> Andrew M.A. Cater wrote: > >>Also ia64 performance looks bad when compared to Xeon >>or amd64. Intel switching to amd64 is a good choice >>;-> >> >> >> >Newsflash: Severe weather means Hell freezes over, preventing flying >pigs from taking off :) > > Note: http://www.hometownvalue.com/hell.htm which is zip code 48169 According to weather.com, this zip code is about 27 F right now. As 32 F is officially "freezing over", we can with all accuracy note that indeed, Hell (MI) has frozen over. Note 2: It was quite a bit colder last week and up to yesterday where southeast Michigan was hovering in the low negative/positive single digits in degrees F. We shouldn't complain as the folks in Minnesota have not seen the high side of 0 very much recently. As for the aerodynamic porcine units, you are on your own. Joe _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From toon at moene.indiv.nluug.nl Sun Feb 1 10:37:37 2004 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Sun, 01 Feb 2004 16:37:37 +0100 Subject: [Beowulf] HVAC and room cooling... In-Reply-To: <401C2C97.8020903@tamu.edu> References: <401BE891.708@obs.unige.ch> <401C0807.4000209@telia.com> <401C253E.9040206@obs.unige.ch> <401C2C97.8020903@tamu.edu> Message-ID: <401D1D41.8090709@moene.indiv.nluug.nl> Gerry Creager (N5JXS) wrote: > That's the end of gas exchange physiology I. There will be a short quiz > Monday. We'll continue with the next module. I encourage everyone to > have read the Pulmonary Medicine chapters in Harrison's for the next > lecture. Hmmm, I won't hold my breath on that one :-) -- Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Feb 1 15:53:54 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 1 Feb 2004 15:53:54 -0500 (EST) Subject: [Beowulf] HVAC and room cooling... In-Reply-To: <401D1D41.8090709@moene.indiv.nluug.nl> Message-ID: On Sun, 1 Feb 2004, Toon Moene wrote: > Gerry Creager (N5JXS) wrote: > > > That's the end of gas exchange physiology I. There will be a short quiz > > Monday. We'll continue with the next module. I encourage everyone to > > have read the Pulmonary Medicine chapters in Harrison's for the next > > lecture. > > Hmmm, I won't hold my breath on that one :-) Careful or I'll beat you with John's slide rule (what kinda physicist uses a slide rule for anything other than a blunt instrument?;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Feb 1 21:35:43 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Mon, 2 Feb 2004 10:35:43 +0800 (CST) Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201104034.GA9280@galactic.demon.co.uk> Message-ID: <20040202023543.11015.qmail@web16807.mail.tpe.yahoo.com> --- "Andrew M.A. Cater" > IIRC: Since you seem well aware of SPBS / storm - is > the newest storm > release fully free / GPL'd such that I can use it > anywhere? They now call it "torque", not sure when they are going to get a new name again :( Not sure what you mean by "use it anywhere". You can use SPBS (yes, I like this name better) in commerical environments. If you make modifications to SPBS, you need to provide the source code for download. If you want to modify the source, and sell it as a product, you may want to use SGE. AFAIK, SGE uses a license similar to the BSD, while OpenPBS uses a license similar to GPL. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Mon Feb 2 05:19:30 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Mon, 02 Feb 2004 11:19:30 +0100 Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: <1075566655.2560.8.camel@loiosh> (agrajag@dragaera.net's message of "31 Jan 2004 11:30:56 -0500") References: <1075566655.2560.8.camel@loiosh> Message-ID: Jag writes: > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > >> NIS works fine for many purposes as well, but be warned -- in certain >> configurations and for certain tasks it becomes a very high overhead >> protocol. In particular, it adds an NIS hit to every file stat, for >> example, so that it can check groups and permissions. > > A good way around this is to run nscd (Name Services Caching Daemon). I'm really, really suspicious against nscd. I've more than once seen it hang on to stale information forever for no good reason at all. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Mon Feb 2 07:45:05 2004 From: bclem at rice.edu (Brent M. Clements) Date: Mon, 2 Feb 2004 06:45:05 -0600 (CST) Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: References: <1075566655.2560.8.camel@loiosh> Message-ID: Nscd is a necessary evil sometimes though. -B Brent Clements Linux Technology Specialist Information Technology Rice University On Mon, 2 Feb 2004, Leif Nixon wrote: > Jag writes: > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > > > >> NIS works fine for many purposes as well, but be warned -- in certain > >> configurations and for certain tasks it becomes a very high overhead > >> protocol. In particular, it adds an NIS hit to every file stat, for > >> example, so that it can check groups and permissions. > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > I'm really, really suspicious against nscd. I've more than once seen > it hang on to stale information forever for no good reason at all. > > -- > Leif Nixon Systems expert > ------------------------------------------------------------ > National Supercomputer Centre Linkoping University > ------------------------------------------------------------ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Mon Feb 2 10:32:01 2004 From: timm at fnal.gov (Steven Timm) Date: Mon, 2 Feb 2004 09:32:01 -0600 (CST) Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: <1075730850.3936.19.camel@protein.scalableinformatics.com> References: <1075566655.2560.8.camel@loiosh> <1075730850.3936.19.camel@protein.scalableinformatics.com> Message-ID: On Mon, 2 Feb 2004, Joe Landman wrote: > I have tried to avoid NIS on linux, as it appears not to be as stable as > needed under heavy load. I have had customers bring it crashing down > when it serves login information, just by running simple scripts across > the cluster. To clarify, the problem is when there is some cron job (or reboot) in which a couple of hundred nodes all go after the NIS server at once. It's magnified by the fact that there's an NIS lookup done even when it's a user in the local password file such as root. The problems can be mitigated by having a lot of nodes be slaves. At one point I had all of the nodes of my cluster be slaves. But the problem with that is that the transmission protocol is not perfect and every once in a while you wind up with a slave server that is down a map or two. We've now shifted to pushing out our password files via rsync. > > I prefer pushing name service lookups through DNS, and I tend to use > dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html). > Setting up a full blown named/bind system for a cluster seems like > significant overkill in most cases. > > On the authentication side, I had high hopes for LDAP, but haven't been > able to easily/repeatably make a working LDAP server with databases. I > am starting to think more along the lines of a simple database with pam > modules on the frontend. See > http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or > http://sourceforge.net/projects/pam-mysql/ for examples. Our set of kerberos 5 kdc's have thus far been able to handle the load of some 1500 nodes with more still coming. Plus then we have no real passwords in the passwd file and thus the security issues of distributing it are much less critical. Steve Timm > > > > On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote: > > Nscd is a necessary evil sometimes though. > > > > -B > > > > Brent Clements > > Linux Technology Specialist > > Information Technology > > Rice University > > > > > > On Mon, 2 Feb 2004, Leif Nixon wrote: > > > > > Jag writes: > > > > > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > > > > > > > >> NIS works fine for many purposes as well, but be warned -- in certain > > > >> configurations and for certain tasks it becomes a very high overhead > > > >> protocol. In particular, it adds an NIS hit to every file stat, for > > > >> example, so that it can check groups and permissions. > > > > > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > > > > > I'm really, really suspicious against nscd. I've more than once seen > > > it hang on to stale information forever for no good reason at all. > > > > > > -- > > > Leif Nixon Systems expert > > > ------------------------------------------------------------ > > > National Supercomputer Centre Linkoping University > > > ------------------------------------------------------------ > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Feb 2 09:07:30 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 02 Feb 2004 09:07:30 -0500 Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: References: <1075566655.2560.8.camel@loiosh> Message-ID: <1075730850.3936.19.camel@protein.scalableinformatics.com> I have tried to avoid NIS on linux, as it appears not to be as stable as needed under heavy load. I have had customers bring it crashing down when it serves login information, just by running simple scripts across the cluster. I prefer pushing name service lookups through DNS, and I tend to use dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html). Setting up a full blown named/bind system for a cluster seems like significant overkill in most cases. On the authentication side, I had high hopes for LDAP, but haven't been able to easily/repeatably make a working LDAP server with databases. I am starting to think more along the lines of a simple database with pam modules on the frontend. See http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or http://sourceforge.net/projects/pam-mysql/ for examples. On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote: > Nscd is a necessary evil sometimes though. > > -B > > Brent Clements > Linux Technology Specialist > Information Technology > Rice University > > > On Mon, 2 Feb 2004, Leif Nixon wrote: > > > Jag writes: > > > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > > > > > >> NIS works fine for many purposes as well, but be warned -- in certain > > >> configurations and for certain tasks it becomes a very high overhead > > >> protocol. In particular, it adds an NIS hit to every file stat, for > > >> example, so that it can check groups and permissions. > > > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > > > I'm really, really suspicious against nscd. I've more than once seen > > it hang on to stale information forever for no good reason at all. > > > > -- > > Leif Nixon Systems expert > > ------------------------------------------------------------ > > National Supercomputer Centre Linkoping University > > ------------------------------------------------------------ > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Mon Feb 2 09:24:25 2004 From: bclem at rice.edu (Brent M. Clements) Date: Mon, 2 Feb 2004 08:24:25 -0600 (CST) Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: <1075730850.3936.19.camel@protein.scalableinformatics.com> References: <1075566655.2560.8.camel@loiosh> <1075730850.3936.19.camel@protein.scalableinformatics.com> Message-ID: We use ldap extensively here on all of our clusters that IT maintains. We like it because it allows great flexibility if we need to write web based account management systems for groups on campus. LDAP is actually very very easy to implement, especially if you use redhat as your distribution. We use redhat mostly exclusive here so our setup and configuration for ldap is pretty cookie-cutter. -Brent Brent Clements Linux Technology Specialist Information Technology Rice University On Mon, 2 Feb 2004, Joe Landman wrote: > I have tried to avoid NIS on linux, as it appears not to be as stable as > needed under heavy load. I have had customers bring it crashing down > when it serves login information, just by running simple scripts across > the cluster. > > I prefer pushing name service lookups through DNS, and I tend to use > dnsmasq for these (http://www.thekelleys.org.uk/dnsmasq/doc.html). > Setting up a full blown named/bind system for a cluster seems like > significant overkill in most cases. > > On the authentication side, I had high hopes for LDAP, but haven't been > able to easily/repeatably make a working LDAP server with databases. I > am starting to think more along the lines of a simple database with pam > modules on the frontend. See > http://freshmeat.net/projects/pam_pgsql/?topic_id=136 or > http://sourceforge.net/projects/pam-mysql/ for examples. > > > > On Mon, 2004-02-02 at 07:45, Brent M. Clements wrote: > > Nscd is a necessary evil sometimes though. > > > > -B > > > > Brent Clements > > Linux Technology Specialist > > Information Technology > > Rice University > > > > > > On Mon, 2 Feb 2004, Leif Nixon wrote: > > > > > Jag writes: > > > > > > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > > > > > > > >> NIS works fine for many purposes as well, but be warned -- in certain > > > >> configurations and for certain tasks it becomes a very high overhead > > > >> protocol. In particular, it adds an NIS hit to every file stat, for > > > >> example, so that it can check groups and permissions. > > > > > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > > > > > I'm really, really suspicious against nscd. I've more than once seen > > > it hang on to stale information forever for no good reason at all. > > > > > > -- > > > Leif Nixon Systems expert > > > ------------------------------------------------------------ > > > National Supercomputer Centre Linkoping University > > > ------------------------------------------------------------ > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Feb 2 09:29:49 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 02 Feb 2004 09:29:49 -0500 Subject: [Beowulf] Authentication within beowulf clusters. In-Reply-To: References: <1075566655.2560.8.camel@loiosh> <1075730850.3936.19.camel@protein.scalableinformatics.com> Message-ID: <1075732189.3936.28.camel@protein.scalableinformatics.com> On Mon, 2004-02-02 at 09:24, Brent M. Clements wrote: > We use ldap extensively here on all of our clusters that IT maintains. We > like it because it allows great flexibility if we need to write web > based account management systems for groups on campus. LDAP is actually > very very easy to implement, especially if you use redhat as your > distribution. We use redhat mostly exclusive here so our setup and > configuration for ldap is pretty cookie-cutter. I know the clients are rather easy, it is setting up the server that I found somewhat difficult. I did go through the howto's, used the RH packages. Had some issues I could not find resolution to. This was about a year ago. I have a nice LDAP server set up with a completely read-only database now. I haven't been able to convince it to let clients write (e.g. password and other changes). Not sure what I am doing wrong, relatively sure it is pilot error. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jonbernard at uab.edu Mon Feb 2 11:46:21 2004 From: jonbernard at uab.edu (Jon B Bernard) Date: Mon, 2 Feb 2004 10:46:21 -0600 Subject: [Beowulf] HVAC and room cooling... Message-ID: <92E49C92F9CDBF4EA106E2E7154938830202B1F3@UABEXMB1.ad.uab.edu> The American Society of Heating, Refrigerating and Air-Conditioning Engineers (www.ashrae.org) has just released "Thermal Guidelines for Data Processing Environments". It looks like there's also a summary available in the January issue of their journal, or online for $8. Jon -----Original Message----- From: Brent M. Clements [mailto:bclem at rice.edu] Sent: Friday, January 30, 2004 11:18 PM To: rossini at u.washington.edu Cc: John Bushnell; beowulf at beowulf.org Subject: Re: [Beowulf] HVAC and room cooling... I have found that the best thing to do is outsource the colocation of your equipment. The cost of installing and maintaining the proper type of cooling and ventilation for mid-large size clusters costs more than to colocate. We are currently exploring placing our larger clusters in colocation facilities right now. The only downside that we have is that we can't find colocation facilities that will give us 24/7 physical access to our equipment. As you all know...researchers push beowulf hardware to the limits and the meantime to failure is higher. -B Brent Clements Linux Technology Specialist Information Technology Rice University On Fri, 30 Jan 2004, A.J. Rossini wrote: > John Bushnell writes: > > > (So many watts) times 'x' equals how many "tons" of AC. Multiply > > by at least two of course ;-) > > Or 3, sigh... > > >>Also, does anyone have any brilliant thoughts for cooling an internal > >>room that can't affordably get chilled water? (I've been suggesting > >>to people that it isn't possible, but someone brought up "portable > >>liquid nitrogen" -- for the room, NOT for overclocking -- I'm trying > >>to get stable systems, not instability :-). > > > > You can have an external heat exchanger. If you are lucky and are, > > say, on the first floor somewhere close to an external wall, it is > > pretty simple to run a small pipe between the internal AC and the > > heat exchanger outside. Don't know how far it is practical to run > > one though. We have one in our computer room, but it is only six > > feet or so from the exchanger outside. Our newer AC runs on chilled > > water which was quoted for a lot less than another inside/outside > > combo, but we already had a leftover chilled water supply in the > > computer room. > > I've looked at the chilled-water approach. They estimated between > $40k-$80k. oops (this room is REALLY in the middle of the building. > Great for other computing purposes, but not for cooling). > > I'm looking for the proverbial vent-free A/C. Sort of like > frictionless tables and similar devices I recall from undergraduate > physics... > > Thanks for the comments! > > best, > -tony > > -- > rossini at u.washington.edu http://www.analytics.washington.edu/ > Biomedical and Health Informatics University of Washington > Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center > UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable > FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email > > CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be > confidential and privileged. If you received this message in error, > please destroy it and notify the sender. Thank you. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Eckhoff.Peter at epamail.epa.gov Mon Feb 2 16:27:25 2004 From: Eckhoff.Peter at epamail.epa.gov (Eckhoff.Peter at epamail.epa.gov) Date: Mon, 02 Feb 2004 16:27:25 -0500 Subject: [Beowulf] Re: HVAC and room cooling... Message-ID: We have 3 - 16 hard drive file servers, 13 compute nodes and a master unit. We had to spread the load from 3 to 4 - 20 Amp circuits to keep from popping circuit breakers. We have AC coming into an interior room and experienced several problems. Problem 1: There was no adequate exhaust system. 5 active vents in , 1 passive vent out and in the wrong location. Solution: We substituted in several grates in place of acoustic tiles. The heat is vented up into the plenum above. There are fans atop the rack venting the interior of the rack into one of the grates above. The other heat follows. Problem 2: What do you do when the AC stops? Maintenance and the occasional AC system oops can be devastating to a cluster in a small room. Solution 2a: We are tied directly into a security system. When a sensor in the room reaches a temperature level, "Security" responds dependent upon the level detected. Solution 2b: We installed a backup automated telephone dialer. Not that we don't trust "Security", but we wanted a backup to let us know what was going on. When the temperature reaches a certain level, the phone dials us with an automated message: " This is the Sensaphone 1108. The time is 1:36 AM and ... [ ed. your CPUs are about to fry... Have a nice night!!!" ;-) ] Solution 2c: Install a thermal sensor into a serial or tcp/ip socket. Some vendors have software that read these sensors and will shut down the machines. We are still working on our system. Others' experiences and solutions are welcomed. We are using dual Tyan motherboards with dual AMD MP processors. Good luck!! Peter ******************************************* Peter Eckhoff Environmental Scientist U.S. Environmental Protection Agency 4930 Page Road, D243-01 Research Triangle Park, NC 27709 Tel: (919) 541-5385 Fax: (919) 541-0044 E-mail: eckhoff.peter at epa.gov Website: www.epa.gov/scram001 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Feb 2 19:56:33 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 02 Feb 2004 16:56:33 -0800 Subject: [Beowulf] Re: HVAC and room cooling... Message-ID: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov> At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote: >Problem 2: What do you do when the AC stops? Maintenance and the >occasional AC system oops can be devastating to a cluster in a small room. > >Solution 2a: We are tied directly into a security system. When a >sensor in the room reaches a temperature level, "Security" responds >dependent upon the >level detected. > >Solution 2b: We installed a backup automated telephone dialer. Not >that we don't trust "Security", but we wanted a backup to let us know what was >going on. > When the temperature reaches a certain level, the phone dials us with >an > automated message: > " This is the Sensaphone 1108. The time is 1:36 AM and ... > [ ed. your CPUs are about to fry... Have a nice night!!!" ;-) ] YOu need to seriously consider a "failsafe" totally automated shutdown (as in chop the power when temperature gets to, say, 40C, in the room)... Security might be busy (maybe there was a big problem with the chiller plant catching fire or the boiler exploding.. if they're directing fire engine traffic, the last thing they're going to be thinking about is going over to your machine room and shutting down your hardware. The autodialer is nice, but, what if you're out of town when the balloon goes up? A simple temperature sensor with a contact closure wired into the "shunt trip" on your power distribution will work quite nicely as a "kill it before it melts". Sure, the file system will be corrupted, and so forth, but, at least, you'll have functioning hardware to rebuild it on. Automated monitoring and tcp sockets are nice for management in the day to day situation, ideal for answering questions like: Should we get another fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if there's a DDoS attack on someone near you, and netops decides to shut down the router. What if all those Windows desktops run amok, sending mass emails to each other or trying to remotely manage each other's IIS, bringing the network to a grinding halt. The upshot is: Do not trust computers to save your computers in the ultimate extreme. Have a totally separate, bulletproof system. It's cheap, it's reliable, all that stuff. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Tue Feb 3 08:40:25 2004 From: gropp at mcs.anl.gov (William Gropp) Date: Tue, 03 Feb 2004 07:40:25 -0600 Subject: [Beowulf] O'Reilly's _Building Linux Clusters_ In-Reply-To: <20040203125618.GA6026@mikee.ath.cx> References: <20040203125618.GA6026@mikee.ath.cx> Message-ID: <6.0.0.22.2.20040203073727.02614538@localhost> At 06:56 AM 2/3/2004, Mike Eggleston wrote: >This book from 2000 discusses building clusters from linux. I >bought it from a discount store not because I'm going to build >another cluster from linux, but rather because of the discussions >on cluster management. Has anyone read/implemented his approach? >What other cluster management techniques/solutions are out there? Beowulf Cluster Computing With Linux, 2nd edition (MIT Press) includes chapters on cluster setup and cluster management (new in the 2nd edition). Disclaimer: I'm one of the editors of this book. Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Tue Feb 3 09:05:07 2004 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Tue, 3 Feb 2004 08:05:07 -0600 Subject: [Beowulf] O'Reilly's _Building Linux Clusters_ In-Reply-To: <6.0.0.22.2.20040203073727.02614538@localhost> References: <20040203125618.GA6026@mikee.ath.cx> <6.0.0.22.2.20040203073727.02614538@localhost> Message-ID: <20040203140507.GB6026@mikee.ath.cx> On Tue, 03 Feb 2004, William Gropp wrote: > At 06:56 AM 2/3/2004, Mike Eggleston wrote: > >This book from 2000 discusses building clusters from linux. I > >bought it from a discount store not because I'm going to build > >another cluster from linux, but rather because of the discussions > >on cluster management. Has anyone read/implemented his approach? > >What other cluster management techniques/solutions are out there? > > Beowulf Cluster Computing With Linux, 2nd edition (MIT Press) includes > chapters on cluster setup and cluster management (new in the 2nd > edition). Disclaimer: I'm one of the editors of this book. > > Bill > > I have the 1st edition and it does have a chapter discussing some of the management. How would this method scale to managing a (not really a cluster) group of AIX servers? Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Eckhoff.Peter at epamail.epa.gov Tue Feb 3 09:26:37 2004 From: Eckhoff.Peter at epamail.epa.gov (Eckhoff.Peter at epamail.epa.gov) Date: Tue, 03 Feb 2004 09:26:37 -0500 Subject: [Beowulf] Re: HVAC and room cooling... Message-ID: Hello Jim The main goal for us is to stay up and running as long as we can. (Please read the last paragraph before responding to this one:) Most of our temperature problems have been caused by AC maintenance induced temperature spikes. Having "security" open the doors slows the room heating process. The Sensaphone call to us helps us to know that there is a problem and we can phone in to be briefed. "Do we have to come in or has the room already begun to cool?" The last of the Solutions is for just the type of incident that you describe. These are very rare but like you say, they need to be planned for. Our ideal goal would be one that signals a problem to the cluster. The cluster takes the signal and gracefully shuts down the programs and then shuts down the nodes. We did not find such a solution on the commercial market for our "came with the room" UPS. Instead we found a sensor/software combination where the sensor ties into the serial port of one of the nodes. So far we **have** been able to gracefully shut down the programs that are running. We have **not** found a way to automatically turn off the various cluster nodes. That's where we need some help/suggestions. ******************************************* Peter Eckhoff Environmental Scientist U.S. Environmental Protection Agency 4930 Page Road, D243-01 Research Triangle Park, NC 27709 Tel: (919) 541-5385 Fax: (919) 541-0044 E-mail: eckhoff.peter at epa.gov Website: www.epa.gov/scram001 Jim Lux cc: Subject: Re: [Beowulf] Re: HVAC and room cooling... 02/02/04 07:56 PM At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote: >Problem 2: What do you do when the AC stops? Maintenance and the >occasional AC system oops can be devastating to a cluster in a small room. > >Solution 2a: We are tied directly into a security system. When a >sensor in the room reaches a temperature level, "Security" responds >dependent upon the >level detected. > >Solution 2b: We installed a backup automated telephone dialer. Not >that we don't trust "Security", but we wanted a backup to let us know what was >going on. > When the temperature reaches a certain level, the phone dials us with >an > automated message: > " This is the Sensaphone 1108. The time is 1:36 AM and ... > [ ed. your CPUs are about to fry... Have a nice night!!!" ;-) ] YOu need to seriously consider a "failsafe" totally automated shutdown (as in chop the power when temperature gets to, say, 40C, in the room)... Security might be busy (maybe there was a big problem with the chiller plant catching fire or the boiler exploding.. if they're directing fire engine traffic, the last thing they're going to be thinking about is going over to your machine room and shutting down your hardware. The autodialer is nice, but, what if you're out of town when the balloon goes up? A simple temperature sensor with a contact closure wired into the "shunt trip" on your power distribution will work quite nicely as a "kill it before it melts". Sure, the file system will be corrupted, and so forth, but, at least, you'll have functioning hardware to rebuild it on. Automated monitoring and tcp sockets are nice for management in the day to day situation, ideal for answering questions like: Should we get another fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if there's a DDoS attack on someone near you, and netops decides to shut down the router. What if all those Windows desktops run amok, sending mass emails to each other or trying to remotely manage each other's IIS, bringing the network to a grinding halt. The upshot is: Do not trust computers to save your computers in the ultimate extreme. Have a totally separate, bulletproof system. It's cheap, it's reliable, all that stuff. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From grid at iki.fi Tue Feb 3 09:26:53 2004 From: grid at iki.fi (Michael Kustaa Gindonis) Date: Tue, 3 Feb 2004 16:26:53 +0200 Subject: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset In-Reply-To: <200402021546.i12Fk4h24131@NewBlue.scyld.com> References: <200402021546.i12Fk4h24131@NewBlue.scyld.com> Message-ID: <200402031626.53453.grid@iki.fi> Hi, I noticed in the Linux kernel configuration that there is support for LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this. Do any readers of this list have any experiences in this area? Knowledge about LSI's plans to support this chipset in the future? ... Mike On Monday 02 February 2004 17:46, beowulf-request at scyld.com wrote: > Send Beowulf mailing list submissions to > beowulf at beowulf.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.beowulf.org/mailman/listinfo/beowulf > or, via email, send a message with subject or body 'help' to > beowulf-request at beowulf.org > > You can reach the person managing the list at > beowulf-admin at beowulf.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Beowulf digest..." > > > Today's Topics: > > 1. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Mark Hahn) > 2. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Karen Shaeffer) > 3. unsubscribe universe beowulf at beowulf.org (Ryan Kastrukoff) > 4. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (Joe Landman) > 5. Re: HVAC and room cooling... (Toon Moene) > 6. Re: HVAC and room cooling... (Robert G. Brown) > 7. Re: Linux 2.4 vs 2.6 AND ia64 vs amd64 (=?big5?q?Andrew=20Wang?=) > 8. Re: Authentication within beowulf clusters. (Leif Nixon) > 9. Re: Authentication within beowulf clusters. (Brent M. Clements) > 10. Re: Authentication within beowulf clusters. (Joe Landman) > 11. Re: Authentication within beowulf clusters. (Brent M. Clements) > 12. Re: Authentication within beowulf clusters. (Joe Landman) > 13. Re: Authentication within beowulf clusters. (Steven Timm) > > --__--__-- > > Message: 1 > Date: Sun, 1 Feb 2004 12:50:44 -0500 (EST) > From: Mark Hahn > To: Per Lindstrom > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 > > > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz > > Chipset: Intel 7505 > > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB > > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB > > all extremely mundane and FULLY supported. > > > Graph: GeForce FX 5200 128MB > > bzzt. take it out, try again. don't even *think* about loading the > binary nvidia driver. > > > The SMP support works fine all the way up to kernel 2.4.22 but when > > there is stop for the XEON. > > needless to say, 2.6 has been extensively tested on xeons, and it works > fine. your problem is specific to your config. > > if you want help, you'll have to start by describing how it fails. > > > --__--__-- > > Message: 2 > Date: Sun, 1 Feb 2004 10:17:15 -0800 > From: Karen Shaeffer > To: Per Lindstrom > Cc: "Andrew M.A. Cater" , > beowulf at beowulf.org Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs > amd64 > > On Sun, Feb 01, 2004 at 01:30:52PM +0100, Per Lindstrom wrote: > > I have experienced some problems to compile SMP support for the > > 2.6.1-kernel on my Intel Xeon based workstation: > > M.B: Intel SE7505VB2 ATX PCI, FSB 533MHz > > Chipset: Intel 7505 > > CPU: 2 pcs Intel 2,8GHz Xeon, 512kB L2 cache 533FSB > > RAM: 2GB DDR SDRAM PC2100 Reg ECC exp. till 8GB > > Graph: GeForce FX 5200 128MB > > > > The SMP support works fine all the way up to kernel 2.4.22 but when > > there is stop for the XEON. > > I am compiling linux-2.6.2-rc2 on dual XEONs with no problems. The kernels > actually run too. I'm just starting performance testing, but results are > very promising. > > Thanks, > Karen > > > The SMP support works fine for the Intel Tualatin workstation all the > > way up to kernel 2.4.24 and gives problem on 2.6.1 I have not tested to > > build a 2.6.0. > > > > Please advice if some one have solved this problem. > > > > Best regards > > Per Lindstrom > > . > > . > > > > Andrew M.A. Cater wrote: > > >On Sun, Feb 01, 2004 at 01:39:40PM +0800, Andrew Wang wrote: > > >>http://www.infoworld.com/infoworld/article/04/01/30/05FElinux_1.html > > >> > > >>2.6 looks very promising, wondering when distributions > > >>will include it. > > > > > >Debian unstable does today. The new installer for the next release > > >of Debian (currently Debian testing) which is in beta test may well > > >include a 2.6 kernel option. > > > > > >>Also ia64 performance looks bad when compared to Xeon > > >>or amd64. Intel switching to amd64 is a good choice > > >>;-> > > > > > >Newsflash: Severe weather means Hell freezes over, preventing flying > > >pigs from taking off :) > > > > > >IIRC: Since you seem well aware of SPBS / storm - is the newest storm > > >release fully free / GPL'd such that I can use it anywhere? > > > > > >Thanks, > > > > > >Andy > > >_______________________________________________ > > >Beowulf mailing list, Beowulf at beowulf.org > > >To change your subscription (digest mode or unsubscribe) visit > > >http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > ---end quoted text--- > > -- > Karen Shaeffer > Neuralscape, Palo Alto, Ca. 94306 > shaeffer at neuralscape.com http://www.neuralscape.com > > --__--__-- > > Message: 3 > From: "Ryan Kastrukoff" > To: beowulf at beowulf.org > Date: Sun, 01 Feb 2004 11:24:03 -0800 > Subject: [Beowulf] unsubscribe universe beowulf at beowulf.org > > > > _________________________________________________________________ > The new MSN 8: smart spam protection and 2 months FREE* > http://join.msn.com/?page=features/junkmail > http://join.msn.com/?page=dept/bcomm&pgmarket=en-ca&RU=http%3a%2f%2fjoin.ms >n.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca > > > --__--__-- > > Message: 4 > Date: Sun, 01 Feb 2004 14:33:03 -0500 > From: Joe Landman > To: "Andrew M.A. Cater" > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 > > Andrew M.A. Cater wrote: > >>Also ia64 performance looks bad when compared to Xeon > >>or amd64. Intel switching to amd64 is a good choice > >>;-> > > > >Newsflash: Severe weather means Hell freezes over, preventing flying > >pigs from taking off :) > > Note: http://www.hometownvalue.com/hell.htm which is zip code 48169 > According to weather.com, this zip code is about 27 F right now. As 32 > F is officially "freezing over", we can with all accuracy note that > indeed, Hell (MI) has frozen over. > > Note 2: It was quite a bit colder last week and up to yesterday where > southeast Michigan was hovering in the low negative/positive single > digits in degrees F. We shouldn't complain as the folks in Minnesota > have not seen the high side of 0 very much recently. > > As for the aerodynamic porcine units, you are on your own. > > Joe > > --__--__-- > > Message: 5 > Date: Sun, 01 Feb 2004 16:37:37 +0100 > From: Toon Moene > Organization: Moene Computational Physics, Maartensdijk, The Netherlands > To: gerry.creager at tamu.edu > CC: Pfenniger Daniel , > Per Lindstrom , > John Hearns , rossini at u.washington.edu, > beowulf at beowulf.org > Subject: Re: [Beowulf] HVAC and room cooling... > > Gerry Creager (N5JXS) wrote: > > That's the end of gas exchange physiology I. There will be a short quiz > > Monday. We'll continue with the next module. I encourage everyone to > > have read the Pulmonary Medicine chapters in Harrison's for the next > > lecture. > > Hmmm, I won't hold my breath on that one :-) > > -- > Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html > GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction) > > > --__--__-- > > Message: 6 > Date: Sun, 1 Feb 2004 15:53:54 -0500 (EST) > From: "Robert G. Brown" > To: Toon Moene > Cc: gerry.creager at tamu.edu, Pfenniger Daniel > , Per Lindstrom , > John Hearns , , > > Subject: Re: [Beowulf] HVAC and room cooling... > > On Sun, 1 Feb 2004, Toon Moene wrote: > > Gerry Creager (N5JXS) wrote: > > > That's the end of gas exchange physiology I. There will be a short > > > quiz Monday. We'll continue with the next module. I encourage > > > everyone to have read the Pulmonary Medicine chapters in Harrison's for > > > the next lecture. > > > > Hmmm, I won't hold my breath on that one :-) > > Careful or I'll beat you with John's slide rule (what kinda physicist > uses a slide rule for anything other than a blunt instrument?;-) > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > > --__--__-- > > Message: 7 > Date: Mon, 2 Feb 2004 10:35:43 +0800 (CST) > From: =?big5?q?Andrew=20Wang?= > Subject: Re: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 > To: beowulf at beowulf.org > > --- "Andrew M.A. Cater" > > > IIRC: Since you seem well aware of SPBS / storm - is > > the newest storm > > release fully free / GPL'd such that I can use it > > anywhere? > > They now call it "torque", not sure when they are > going to get a new name again :( > > Not sure what you mean by "use it anywhere". You can > use SPBS (yes, I like this name better) in commerical > environments. If you make modifications to SPBS, you > need to provide the source code for download. > > If you want to modify the source, and sell it as a > product, you may want to use SGE. > > AFAIK, SGE uses a license similar to the BSD, while > OpenPBS uses a license similar to GPL. > > Andrew. > > > ----------------------------------------------------------------- > ?C???? Yahoo!?_?? > ?????C???B?????????B?R?A???????A???b?H?????? > http://tw.promo.yahoo.com/mail_premium/stationery.html > > --__--__-- > > Message: 8 > To: Beowulf Mailing List > Subject: Re: [Beowulf] Authentication within beowulf clusters. > From: Leif Nixon > Date: Mon, 02 Feb 2004 11:19:30 +0100 > > Jag writes: > > On Sat, 2004-01-31 at 10:25, Robert G. Brown wrote: > >> NIS works fine for many purposes as well, but be warned -- in certain > >> configurations and for certain tasks it becomes a very high overhead > >> protocol. In particular, it adds an NIS hit to every file stat, for > >> example, so that it can check groups and permissions. > > > > A good way around this is to run nscd (Name Services Caching Daemon). > > I'm really, really suspicious against nscd. I've more than once seen > it hang on to stale information forever for no good reason at all. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Tue Feb 3 04:21:24 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Tue, 03 Feb 2004 10:21:24 +0100 Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov> (Jim Lux's message of "Mon, 02 Feb 2004 16:56:33 -0800") References: <5.2.0.9.2.20040202164809.018dcd58@mailhost4.jpl.nasa.gov> Message-ID: Jim Lux writes: > YOu need to seriously consider a "failsafe" totally automated shutdown > (as in chop the power when temperature gets to, say, 40C, in the > room)... Security might be busy (maybe there was a big problem with > the chiller plant catching fire or the boiler exploding.. if they're > directing fire engine traffic, the last thing they're going to be > thinking about is going over to your machine room and shutting down > your hardware. Ah, that reminds me of the bad old days in industry. The A/C went belly up the night between Friday and Saturday. That triggered the alarm down at Security, who promptly called the on-duty ventilation technicians and notified us. Excellent. Except that the A/C alarm was never reset properly, so when the A/C failed again Saturday afternoon nobody noticed. When the temperature reached 35C, the thermal kill switch triggered automatically. Pity that the electrician had never got around to actually, like, *wire* it to anything. We arrived Monday morning to the smell of frying electronics. Expensive weekend, that. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From verycoldpenguin at hotmail.com Tue Feb 3 11:24:18 2004 From: verycoldpenguin at hotmail.com (Gareth Glaccum) Date: Tue, 03 Feb 2004 16:24:18 +0000 Subject: [Beowulf] Re: HVAC and room cooling... Message-ID: We sell solutions with automated power-off scripts upon node overheat using some of the APC products controlled from a linux master. Not that particular unit though. Gareth >From: Joshua Baker-LePain >To: Eckhoff.Peter at epamail.epa.gov >CC: beowulf at scyld.com >Subject: Re: [Beowulf] Re: HVAC and room cooling... >Date: Tue, 3 Feb 2004 10:11:32 -0500 (EST) > >On Tue, 3 Feb 2004 at 9:26am, Eckhoff.Peter at epamail.epa.gov wrote > > > Instead we found a sensor/software combination where the sensor ties > > into the > > serial port of one of the nodes. So far we **have** been able to > > gracefully shut down the > > programs that are running. We have **not** found a way to automatically > > turn off the > > various cluster nodes. That's where we need some help/suggestions. > >Well, your high-temperature-triggered scripts should call a 'shutdown -h >now'. *If* your nodes are on motherboards that support it, and *if* the >BIOS is new enough to support it, and *if* the nodes were booted with >'apm=power-off' on the kernel command line, then they should actually >power off. > >Another option would be something like this: > >http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP7960 > >With that (ungodly expensive) power strip, you can remotely cut the power >to selected outlets. It probably can be automated, but you'd have to >check that. > >As Jim said, though, all this is great, but there really does need to be >one final level of hardware level failsafe. It is entirely conceivable >that all your software monitoring could fail, and the temperature will >still be climbing. There needs to be a piece of hardware in the room that >literally cuts power to the whole damn room at a set temperature that is >(obviously) above the one that trips your software shutdown scripts. > >-- >Joshua Baker-LePain >Department of Biomedical Engineering >Duke University >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _________________________________________________________________ Stay in touch with absent friends - get MSN Messenger http://www.msn.co.uk/messenger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rmiguel at usmp.edu.pe Tue Feb 3 12:06:50 2004 From: rmiguel at usmp.edu.pe (Richard Miguel) Date: Tue, 3 Feb 2004 12:06:50 -0500 Subject: [Beowulf] about cluster's tunning References: <200402021546.i12Fk4h24131@NewBlue.scyld.com> <200402031626.53453.grid@iki.fi> Message-ID: <015f01c3ea78$24daccc0$1101000a@cpn.senamhi.gob.pe> Hi.. i have a cluster with 27 nodes PIV Intel .. I have installed a model for climate forecast. My question is how i can improvement the performance of my cluster.. there is techniques for tunning of clusters througth operative system or network hardware?. thanks for yours anwers.. and suggests.. R. Miguel _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Feb 3 13:12:24 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 3 Feb 2004 13:12:24 -0500 (EST) Subject: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset In-Reply-To: <200402031626.53453.grid@iki.fi> Message-ID: > I noticed in the Linux kernel configuration that there is support for LSI's > Fusion-MPT chipset. Also, it is possible to run MPI over this. huh? afaikt, it's just another overly expensive, overly complicated hw raid controller. I guess there must be a market for this kind of wrongheaded crap, but I really don't understand it. I guess it's just the impulse to offload whatever possible from the host; that's an understandable idea, but you really need to look at whether it makes sense, or whether it's just a holdover from bygone days when your million-dollar mainframe was actually compute-bound ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 3 23:01:17 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 4 Feb 2004 15:01:17 +1100 Subject: [Beowulf] Linux 2.4 vs 2.6 AND ia64 vs amd64 In-Reply-To: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> References: <20040201053940.60342.qmail@web16809.mail.tpe.yahoo.com> Message-ID: <200402041501.19592.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 1 Feb 2004 04:39 pm, Andrew Wang wrote: > 2.6 looks very promising, wondering when distributions > will include it. Mandrake 10 will include it (beta 2 just appeared with 2.6.2rc3 - they reckon the final 2.6.2 will make the release of Mdk10). - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAIG6NO2KABBYQAh8RAlCaAJ9Y5LKBLZQjGvCJCzO7ViuwZMGFiQCePiI+ Q2x2XGPUUWKYDT2nRv/5DHI= =S0ef -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 4 08:17:30 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 4 Feb 2004 08:17:30 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: Message-ID: On Tue, 3 Feb 2004, Leif Nixon wrote: > Jim Lux writes: > > > YOu need to seriously consider a "failsafe" totally automated shutdown > > (as in chop the power when temperature gets to, say, 40C, in the > > room)... Security might be busy (maybe there was a big problem with > > the chiller plant catching fire or the boiler exploding.. if they're > > directing fire engine traffic, the last thing they're going to be > > thinking about is going over to your machine room and shutting down > > your hardware. > > Ah, that reminds me of the bad old days in industry. > > The A/C went belly up the night between Friday and Saturday. That > triggered the alarm down at Security, who promptly called the on-duty > ventilation technicians and notified us. Excellent. > > Except that the A/C alarm was never reset properly, so when the A/C > failed again Saturday afternoon nobody noticed. > > When the temperature reached 35C, the thermal kill switch triggered > automatically. Pity that the electrician had never got around to > actually, like, *wire* it to anything. > > We arrived Monday morning to the smell of frying electronics. > Expensive weekend, that. Did you ever manage to track down the electrician and put bamboo slivers underneath his toenails or something? That one seems like it would be worth some sort of retaliation. A small nuclear device planted in his front lawn. An anonymous call to the IRS. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Wed Feb 4 17:34:21 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed, 4 Feb 2004 17:34:21 -0500 (EST) Subject: [Beowulf] about cluster's tunning In-Reply-To: <015f01c3ea78$24daccc0$1101000a@cpn.senamhi.gob.pe> Message-ID: You may want to look at the online course mentioned here: http://www.clusterworld.com/article.pl?sid=03/11/12/1919210&mode=thread&tid=10 Doug On Tue, 3 Feb 2004, Richard Miguel wrote: > Hi.. i have a cluster with 27 nodes PIV Intel .. I have installed a model > for climate forecast. My question is how i can improvement the performance > of my cluster.. there is techniques for tunning of clusters througth > operative system or network hardware?. > > thanks for yours anwers.. and suggests.. > > R. Miguel > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Wed Feb 4 15:08:04 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed, 04 Feb 2004 21:08:04 +0100 Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: (Robert G. Brown's message of "Wed, 4 Feb 2004 08:17:30 -0500 (EST)") References: Message-ID: "Robert G. Brown" writes: > On Tue, 3 Feb 2004, Leif Nixon wrote: > >> When the temperature reached 35C, the thermal kill switch triggered >> automatically. Pity that the electrician had never got around to >> actually, like, *wire* it to anything. >> >> We arrived Monday morning to the smell of frying electronics. >> Expensive weekend, that. > > Did you ever manage to track down the electrician and put bamboo slivers > underneath his toenails or something? Sadly, no. And don't get me started on luser electricians. "Ooops, did that feed go to the computer room?" "Hmmm, what's on this circuit? Let's toggle it and see what reboots." (Yes, it really happened. I don't often shout at people, but that time...) Dropping a fine gauge wire across the main power rails was an interesting stunt, too. Too bad he didn't even get flash burns. I think the main point here is: If you get hold of a competent electrician, take *real* good care of him. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From waitt at saic.com Thu Feb 5 07:41:24 2004 From: waitt at saic.com (Tim Wait) Date: Thu, 05 Feb 2004 07:41:24 -0500 Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: References: Message-ID: <402239F4.8030304@saic.com> > Dropping a fine gauge wire across the main power rails was an > interesting stunt, too. Too bad he didn't even get flash burns. How about an electrician, who, while working on your building power conditioning, sends 180V through your 120V building, frying everything not on UPS? We were not amused. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Thu Feb 5 11:23:13 2004 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Thu, 5 Feb 2004 11:23:13 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: <402239F4.8030304@saic.com> Message-ID: On Thu, 5 Feb 2004, Tim Wait wrote: > > > Dropping a fine gauge wire across the main power rails was an > > interesting stunt, too. Too bad he didn't even get flash burns. > > How about an electrician, who, while working on your building > power conditioning, sends 180V through your 120V building, > frying everything not on UPS? > > We were not amused. Oh, give the guy a break: Red, Black, White...it is all very confusing! My most serious problem has been with the computer room UPS begin shutdown accidentally, dropping a half-dozen raid servers. Many TBs of data were endangered. I might be able to forgive them if it only happened once, but I've needed to force myself to stop counting events because doing so interferes with my ability to properly suppress homocidal urges. Seriously, one would think that a Darwinian effect would kick in at some point and cull the electrical service hurd. My observations (and others here as well) seem to dispute that. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.gindonis at hip.fi Thu Feb 5 12:27:20 2004 From: michael.gindonis at hip.fi (Michael Gindonis) Date: Thu, 5 Feb 2004 19:27:20 +0200 Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs In-Reply-To: <200402041702.i14H2Jh03108@NewBlue.scyld.com> References: <200402041702.i14H2Jh03108@NewBlue.scyld.com> Message-ID: <200402051927.21214.michael.gindonis@hip.fi> On Wednesday 04 February 2004 19:02, beowulf-request at scyld.com wrote: > From: Mark Hahn > To: beowulf at beowulf.org > Subject: Re: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset > > > I noticed in the Linux kernel configuration that there is support for > > LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this. > > huh? ?afaikt, it's just another overly expensive, overly complicated hw > raid controller. ?I guess there must be a market for this kind of > wrongheaded crap, but I really don't understand it. Hi Mark, When purchasing a cluster or cluster hardware, one can spend as little as 20 Euro ( ~30 CAD) per node on interconnects to more than 1000 Euro per node for Myrinet or Scali. The Fusion-MPT chipset adds about 100 Euro to the cost of a motherboard. 100 Euro per node is much eaier to justify than 1000 Euro per node when the Cluster when the cluster will not be primarly running tighly coupled parallel problems. If the performance of MPI of Fusion-MPT is much better than than Ethernet with good latency, it becomes a cheap way to add flexibilty to a cluster. Here is some info about it the Chipset... http://www.lsilogic.com/files/docs/marketing_docs/storage_stand_prod/ integrated_circuits/fusion.pdf http://www.lsilogic.com/technologies/lsi_logic_innovations/ fusion___mpt_technology.html There is also information in the in the linux kernel documentation about running MPI over this kind of interconnect. ... Mike -- Michael Kustaa Gindonis Helsinki Institute of Physics, Technology Program michael.gindonis at hip.fi http://wikihip.cern.ch/twiki/bin/view/Main/MichaelGindonis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Feb 5 21:12:58 2004 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 5 Feb 2004 18:12:58 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: hi ya On Thu, 5 Feb 2004, Michael T. Prinkey wrote: > On Thu, 5 Feb 2004, Tim Wait wrote: > > > How about an electrician, who, while working on your building > > power conditioning, sends 180V through your 120V building, > > frying everything not on UPS? > > > > We were not amused. > > Oh, give the guy a break: Red, Black, White...it is all very confusing! dont forget blue and green too ... - fun to disconnect the wires at the main and move wires around ... while the bldg is "lit" i think its crazy that the "nuetral" side is tied together at the panel .. but the outlets in the building seems to work .. c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Thu Feb 5 22:07:18 2004 From: lathama at yahoo.com (Andrew Latham) Date: Thu, 5 Feb 2004 19:07:18 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: Message-ID: <20040206030718.46964.qmail@web60310.mail.yahoo.com> Trained Electrician Here Worked at a HVAC system fab. plant. I wired large Air Make Up Units. I was trained in by a very old school guy (CS degree from 1962). I watched as turnovers in workers happened and started to notice the lower paid guys that would work on 480V extension cords while they where hot with 60hp motors drawing on them! I strayed from that path for a while until a friend, who was handed the task of managing the renovation of an old building downtown. She had questions I had time. I ended up finding a retired electrician that knew his stuff. I asked him how he kept up to date. His reply was that he is on the writing committee for the National Electric Code. Needless to say I keep in contact with him on various topics. Note: CatV + Lighting = PCs + Fire Note2: Attic Access doors do not belong in the ceiling of a wiring closet. Something about fire wanting to go upwards, maybe some of you physics guys can explain it better. --- "Michael T. Prinkey" wrote: > On Thu, 5 Feb 2004, Tim Wait wrote: > > > > > > Dropping a fine gauge wire across the main power rails was an > > > interesting stunt, too. Too bad he didn't even get flash burns. > > > > How about an electrician, who, while working on your building > > power conditioning, sends 180V through your 120V building, > > frying everything not on UPS? > > > > We were not amused. > > Oh, give the guy a break: Red, Black, White...it is all very confusing! > > My most serious problem has been with the computer room UPS begin shutdown > accidentally, dropping a half-dozen raid servers. Many TBs of data were > endangered. I might be able to forgive them if it only happened once, but > I've needed to force myself to stop counting events because doing so > interferes with my ability to properly suppress homocidal urges. > > Seriously, one would think that a Darwinian effect would kick in at some > point and cull the electrical service hurd. My observations (and others > here as well) seem to dispute that. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== /------------------------------------------------------------\ Andrew Latham AKA: LATHAMA (lay-th-ham-eh) - LATHAMA.COM Penguin Loving, Moralist Agnostic. What Is an agnostic? - An agnostic thinks it impossible to know the truth in matters such as, a superbeing or the future with which religions are mainly concerned with. Or, if not impossible, at least impossible at the present time. lathama at lathama.com - lathama at yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 5 23:15:52 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 5 Feb 2004 23:15:52 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: On Thu, 5 Feb 2004, Alvin Oga wrote: > i think its crazy that the "nuetral" side is tied together > at the panel .. but the outlets in the building seems to work .. That's not crazy, that's actually rather sane. What would be crazy would be grounding the neutrals and/or ground wire in different places. Can you say "ground loop"? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Feb 5 23:26:37 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 5 Feb 2004 23:26:37 -0500 (EST) Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs In-Reply-To: <200402051927.21214.michael.gindonis@hip.fi> Message-ID: > When purchasing a cluster or cluster hardware, one can spend as little as 20 > Euro ( ~30 CAD) per node on interconnects or less, actually. you seem to be thinking of gigabit, which is indeed a very attractive cluster interconnect. otoh, there are lots of even more loosely-coupled, non-IO-intensive apps that run just fine on 100bT. > to more than 1000 Euro per node for > Myrinet or Scali. or IB. > The Fusion-MPT chipset adds about 100 Euro to the cost of a motherboard. yes, obviously. I'd probably rather have another gigabit port or two; bear in mind that some very elegant things can be done when each node has multiple network connections... really, the chipset isn't the point; it's just a $5 coprocessor. what counts is coming up with a physical layer, including affordable switches, and somehow getting millions of people to make/buy them. > 100 > Euro per node is much eaier to justify than 1000 Euro per node when the > Cluster when the cluster will not be primarly running tighly coupled parallel > problems. hmm, we've already established that gigabit is much cheaper, and for loose-coupled systems, chances are good that even 100bT will suffice. > If the performance of MPI of Fusion-MPT is much better than than > Ethernet with good latency, but does it even exist? so far, all I can find is two lines on a marketing glossy... > it becomes a cheap way to add flexibilty to a > cluster. many things could happen; I'm not optimistic about this Fusion-MPT thing. it seems to fly in the face of "do one thing, well". > Here is some info about it the Chipset... > > http://www.lsilogic.com/files/docs/marketing_docs/storage_stand_prod/ > integrated_circuits/fusion.pdf that's the vapid marketing glossy. > http://www.lsilogic.com/technologies/lsi_logic_innovations/ > fusion___mpt_technology.html that is even worse. > There is also information in the in the linux kernel documentation about > running MPI over this kind of interconnect. I'm not sure what "kind" here means, do you mean over scsi? the traditional problem with *-over-scsi (and there have been more than a couple) has been that scsi interfaces aren't optimized for low-latency. the bandwidth isn't that hard, really - 320 MB/s is around Myrinet speed, and significantly slower than IB. OK, how about FC? it's obviously got an advantage over U320 in that FC switches exist (oops, expensive) but it's really just a 1-2 Gb network protocol with 2k packets. as for the "high performance ARM-based architecture" part, well, I must admit that I don't associate ARM with high performance of the gigabyte-per-second sort. personally, I'd love to see sort of the network equivalent of the old smart-frame-buffer idea. practically, though, it really boils down to the gritty details like availability of switches, choosing a physical-layer standard, etc. gigabit is the obvious winner there, but IB is trying hard to get over that bump... (Myri seems not to be very ambitious, and 10G eth seems to be straying into a morass of tcp-offload and the like...) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 5 11:36:39 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 5 Feb 2004 11:36:39 -0500 (EST) Subject: [Beowulf] wulflogger, wulfstat's dumber cousin... Message-ID: On request I've got a second xmlsysd client going called "wulflogger". Wulflogger is just wulfstat with the ncurses stuff stripped off so that it manages connections to the xmlsysd's on a cluster, reads them at some input frequency, and writes selected status data to stdout in a simple table. The advantage of this tool is that it makes it really easy to write web or script or report applications, and it also makes it very easy to maintain a dynamic logfile of selected statistics for the entire cluster. This is and will likely remain a very simple tool. The only fanciness I envision for the future is an output descriptor format of some sort that could be input at run time, so that a user could select output fields and formats instead of getting the collections I've prebuilt. That's pretty complex (especially since wulflogger/wulfstat throttle xmlsysd to return only the collective stats it needs) so it won't be anytime soon. Only -t 1 is probably "finished" as output format goes, although -t 0 will probably get mostly cosmetic changes at this point as well. Anyway, any wulfstat/xmlsysd users might want to grab it and give it a try. It makes it pretty simple to write a perl script to generate e.g. rrd images or other graphical representations of the cluster -- in a future release I'll provide sample perl scripts for parsing out fields and doing stuff with it. It is for the moment only available from my personal website: http://www.phy.duke.edu/~rgb/Beowulf/wulflogger.php rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 3 10:38:56 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 3 Feb 2004 16:38:56 +0100 (CET) Subject: [Beowulf] O'Reilly's _Building Linux Clusters_ In-Reply-To: <20040203125618.GA6026@mikee.ath.cx> Message-ID: On Tue, 3 Feb 2004, Mike Eggleston wrote: > This book from 2000 discusses building clusters from linux. I > bought it from a discount store not because I'm going to build > another cluster from linux, but rather because of the discussions Mike, I bought this book almost when it came out. Its easy to do injustice to someone with a quick email, especially as David Spector put a lot of effort into the book, and I haven't. However, this OReilly is reckoned not to be one of the best. I always recommend 'Linux Clustering' by Charles Bookman, and 'Beowulf Cluster Computing with Linux' edited by Thomas Sterling. Online, there is the book by Bob Brown http://www.phy.duke.edu/brahma/Resources/beowulf_book.php For cluster management specifically, google for Rocks and Oscar, and there are lots of other pages. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Tue Feb 3 07:56:18 2004 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Tue, 3 Feb 2004 06:56:18 -0600 Subject: [Beowulf] O'Reilly's _Building Linux Clusters_ Message-ID: <20040203125618.GA6026@mikee.ath.cx> This book from 2000 discusses building clusters from linux. I bought it from a discount store not because I'm going to build another cluster from linux, but rather because of the discussions on cluster management. Has anyone read/implemented his approach? What other cluster management techniques/solutions are out there? Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Tue Feb 3 10:11:32 2004 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Tue, 3 Feb 2004 10:11:32 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: References: Message-ID: On Tue, 3 Feb 2004 at 9:26am, Eckhoff.Peter at epamail.epa.gov wrote > Instead we found a sensor/software combination where the sensor ties > into the > serial port of one of the nodes. So far we **have** been able to > gracefully shut down the > programs that are running. We have **not** found a way to automatically > turn off the > various cluster nodes. That's where we need some help/suggestions. Well, your high-temperature-triggered scripts should call a 'shutdown -h now'. *If* your nodes are on motherboards that support it, and *if* the BIOS is new enough to support it, and *if* the nodes were booted with 'apm=power-off' on the kernel command line, then they should actually power off. Another option would be something like this: http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP7960 With that (ungodly expensive) power strip, you can remotely cut the power to selected outlets. It probably can be automated, but you'd have to check that. As Jim said, though, all this is great, but there really does need to be one final level of hardware level failsafe. It is entirely conceivable that all your software monitoring could fail, and the temperature will still be climbing. There needs to be a piece of hardware in the room that literally cuts power to the whole damn room at a set temperature that is (obviously) above the one that trips your software shutdown scripts. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Feb 4 08:26:55 2004 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 4 Feb 2004 14:26:55 +0100 (CET) Subject: [Beowulf] Re: HVAC and room cooling... In-Reply-To: Message-ID: On Tue, 3 Feb 2004, Leif Nixon wrote: > Ah, that reminds me of the bad old days in industry. That in turn reminds me of recent construction work around here... Since the building with our offices and our small server room had to be renovated, the water-based cooling system for the server room had to be temporarily replaced with a mobile unit that pumps the heat into the hallway. The company responsible had no better idea than to replace the cooling system on friday afternoon -- of course without telling anybody. As the mobile unit was much too small, the server room had turned into sauna until monday when we discovered the problem. Ups. Luckily no hardware was damaged, even though the sensors in the hard-disk drives of our server measured a maximum of 47C. Regards, Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H16 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patrick at myri.com Fri Feb 6 04:17:43 2004 From: patrick at myri.com (Patrick Geoffray) Date: Fri, 06 Feb 2004 04:17:43 -0500 Subject: [Beowulf] Ambition In-Reply-To: References: Message-ID: <40235BB7.6010802@myri.com> Ah Mark, I could not resist. Actually I could, but the list has been a little boring lately, so... :-) Mark Hahn wrote: > (Myri seems not to be very ambitious, and 10G eth seems to be straying into > a morass of tcp-offload and the like...) Myri is very ambitious, but you can be carefully ambitious or marketingly ambitious. Nobody buys an interconnect looking only at the specs. People try, benchmark, run their code, rationalize and buy what they need at the right price. If you look at what people are doing, there is a lot of Ethernet (Fast and GigE) because thats good enough for many, many codes. Then there is a smaller market for more demanding needs, either in term of performance or scalability, where you want to find the sweet spot in the performance/price curve. Does it make sense to have 10Gb now ? I don't think so, and for several reasons: * PCI-Express is not here yet: It's coming, yes, but it's not available in volume. Today, PCI-X supports 1 GB bidirectional, which is 4 Gb link speed. It's clearly the bottleneck right now. HyperTransport looks attractive, but there is no connector defined yet and vendors should be able to see a potential for volume before to commit resources for a native HT interface. * 10 Gb optics are still expensive: price is going down, but there is not enough volume yet to drive the price down faster. Copper ? I still have nightmares about copper. 10 GigE will drive the technology price down as the 10 GigE market blossoms. * 10 GigE is not attractive enough yet because there is no clear improvement at the application level. Running a naive IP stack at 10 Gb requires a lot of resources on the host. RDMA is just a buzword, it's not The Solution. Storage may leverage RDMA, but not IP and certainly not MPI. That's why people are working to put processing on the data path, but it is far from obvious so it takes some time. Gigabit is the clear winner today and 10 GigE will be the clear winner tomorrow, because Ethernet is the de facto Standard. Everybody else are parasites, either breading on niches or marketing poop... Patrick _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sigut at id.ethz.ch Fri Feb 6 06:31:51 2004 From: sigut at id.ethz.ch (G.M.Sigut) Date: Fri, 6 Feb 2004 12:31:51 +0100 (MET) Subject: [Beowulf] about cluster's tunning Message-ID: <200402061131.i16BVpCQ002951@grisnir.ethz.ch> > Date: Thu, 5 Feb 2004 12:04:04 -0500 > Subject: Beowulf digest, Vol 1 #1657 - 4 msgs ... > --__--__-- > > Message: 1 > Date: Wed, 4 Feb 2004 17:34:21 -0500 (EST) > From: "Douglas Eadline, Cluster World Magazine" > Subject: Re: [Beowulf] about cluster's tunning > > You may want to look at the online course mentioned here: > > http://www.clusterworld.com/article.pl?sid=03/11/12/1919210&mode=thread&tid=10 Oh yeah. Very nice. Especially after you register (for the course) and are told your browser is no good. There is a page which helps you to select an approved browser - and that says: Unable to detect your operating system. Please select your operating system: -> Windows operating system -> Mac operating system. What a pity that I am working on a Sun. (and Linux) ... George :-( (is there a smiley for "I'm going to puke"?) >>>>>>>>>>>>>>>>>>>>>>>>> George M. Sigut <<<<<<<<<<<<<<<<<<<<<<<<<<< ETH Zurich, Informatikdienste, Sektion Systemdienste, CH-8092 Zurich Swiss Federal Inst. of Technology, Computing Services, System Services e-mail: sigut at id.ethz.ch, Phone: +41 1 632 5763, Fax: +41 1 632 1022 >>>> if my regular address does not work, try "sigut at pop.agri.ch" <<<< _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nathan at iwantka.com Fri Feb 6 08:52:20 2004 From: nathan at iwantka.com (Nathan Littlepage) Date: Fri, 6 Feb 2004 07:52:20 -0600 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com> > That's not crazy, that's actually rather sane. What would be crazy > would be grounding the neutrals and/or ground wire in > different places. > Can you say "ground loop"? > Grounding loops.. truly a bane. I remember one instance where someone wired a telecommunications switch to two different grounds. The -48v DC power had it's own ground, and someone had grounded the chassis to a different feed. I little lesser know fact was the lightning rod on the tower next to the building was linked to the same ground as the power. When lightning did strike, nothing but smoke as the charge rolled from one ground to the other on each bay. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Feb 6 09:30:10 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 6 Feb 2004 09:30:10 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com> Message-ID: On Fri, 6 Feb 2004, Nathan Littlepage wrote: > > That's not crazy, that's actually rather sane. What would be crazy > > would be grounding the neutrals and/or ground wire in > > different places. > > Can you say "ground loop"? > > > > > Grounding loops.. truly a bane. I remember one instance where someone > wired a telecommunications switch to two different grounds. The -48v DC > power had it's own ground, and someone had grounded the chassis to a > different feed. I little lesser know fact was the lightning rod on the > tower next to the building was linked to the same ground as the power. > When lightning did strike, nothing but smoke as the charge rolled from > one ground to the other on each bay. There is also a memorable instance of powered racks with incoming two phase power split into two circuits having a polarity reversal so its neutral wire on one circuit was 120V above chassic ground and the neutral on the other circuit. When somebody plugged a single unit with components on both lines -- I think it was more like "meltdown and fire". Not really a ground loop, of course... ...but plenty of people have been electrocuted or fires started because there was a lot more resistance on the neutral line to a remote "ground" than there was to a nice, local, piece of metal. Basically, AFAICT there is really nothing in the NEC or CEC that is "stupid". In fact, I think that most of the code has undergone a near-Darwinian selection process, as in electricians who fail to wire to code (and often their clients) not infrequently fail to reproduce. I don't think code is conservative ENOUGH, if anything, and like to overwire for any given situation. 12-2 is just as easy and cheap to work with as 14-2, for example. 10-2 unfortunately is not, but it gives me comfort to use it whereever I can. And I kinda wish that all circuit breakers were GFCI by code as well, not just ones servicing lines near water and pipes. However, these are still available as user choices -- code permits you to go over, just not under. Anybody curious about wiring should definitely google for the electrical wiring FAQ site. It explains wiring in relatively simple terms. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Feb 6 09:23:48 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 6 Feb 2004 15:23:48 +0100 (CET) Subject: [Beowulf] about cluster's tunning In-Reply-To: <200402061131.i16BVpCQ002951@grisnir.ethz.ch> Message-ID: It just worked fine for me. Mozilla 1.4.1 running on Fedora _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sp at scali.com Fri Feb 6 03:52:50 2004 From: sp at scali.com (Steffen Persvold) Date: Fri, 06 Feb 2004 09:52:50 +0100 Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs In-Reply-To: <200402051927.21214.michael.gindonis@hip.fi> References: <200402041702.i14H2Jh03108@NewBlue.scyld.com> <200402051927.21214.michael.gindonis@hip.fi> Message-ID: <402355E2.2040909@scali.com> Michael Gindonis wrote: > On Wednesday 04 February 2004 19:02, beowulf-request at scyld.com wrote: > >>From: Mark Hahn >>To: beowulf at beowulf.org >>Subject: Re: [Beowulf] Experiences with MPI over LSI's Fusion-MPT chipset >> >> >>>I noticed in the Linux kernel configuration that there is support for >>>LSI's Fusion-MPT chipset. Also, it is possible to run MPI over this. >> >>huh? afaikt, it's just another overly expensive, overly complicated hw >>raid controller. I guess there must be a market for this kind of >>wrongheaded crap, but I really don't understand it. > > > Hi Mark, > > When purchasing a cluster or cluster hardware, one can spend as little as 20 > Euro ( ~30 CAD) per node on interconnects to more than 1000 Euro per node for > Myrinet or Scali. > Michael, I'm not entirely sure what you mean by "Scali" here. Scali is a _software_ vendor and our MPI can use all of the interconnects that are popular within HPC today (GbE, Myrinet, InfiniBand and SCI). Best regards, -- Steffen Persvold Senior Software Engineer mob. +47 92 48 45 11 tel. +47 22 62 89 50 fax. +47 22 62 89 51 Scali - http://www.scali.com High Performance Clustering _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.kidger at quadrics.com Fri Feb 6 11:11:43 2004 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Fri, 6 Feb 2004 16:11:43 +0000 Subject: [Beowulf] Re: Beowulf digest, Vol 1 #1656 - 5 msgs In-Reply-To: References: Message-ID: <200402061611.43045.daniel.kidger@quadrics.com> On Friday 06 February 2004 4:26 am, Mark Hahn added: >> When purchasing a cluster or cluster hardware, one can spend as little as 20 >> Euro ( ~30 CAD) per node on interconnects >> to more than 1000 Euro per node for >> Myrinet or Scali. > > or IB. I guess you should add QsNet II to that list too (except that our cards are under e1000 - not counting switches) Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Fri Feb 6 13:12:49 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Fri, 06 Feb 2004 10:12:49 -0800 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: References: <00ae01c3ecb8$80954670$6c45a8c0@ntbrt.bigrivertelephone.com> Message-ID: <5.2.0.9.2.20040206094811.018e3688@mailhost4.jpl.nasa.gov> At 09:30 AM 2/6/2004 -0500, Robert G. Brown wrote: >On Fri, 6 Feb 2004, Nathan Littlepage wrote: > > > > That's not crazy, that's actually rather sane. What would be crazy > > > would be grounding the neutrals and/or ground wire in > > > different places. > > > Can you say "ground loop"? > > > > > > > > > Grounding loops.. truly a bane. I remember one instance where someone > > wired a telecommunications switch to two different grounds. The -48v DC > > power had it's own ground, and someone had grounded the chassis to a > > different feed. I little lesser know fact was the lightning rod on the > > tower next to the building was linked to the same ground as the power. > > When lightning did strike, nothing but smoke as the charge rolled from > > one ground to the other on each bay. > >There is also a memorable instance of powered racks with incoming two >phase power split into two circuits having a polarity reversal so its >neutral wire on one circuit was 120V above chassic ground and the >neutral on the other circuit. When somebody plugged a single unit with >components on both lines -- I think it was more like "meltdown and >fire". Not really a ground loop, of course... The classic error is wiring two sets of receptacles (e.g two racks full of gear) on the two sides of the 220, with neutrals properly connected, then having the neutral conductor fail, so the two 110V loads are in series across 110V. Works fine as long as the loads are balanced, but when you start to turn off the loads on one side, the voltages don't balance any more. >...but plenty of people have been electrocuted or fires started >because there was a lot more resistance on the neutral line to a remote >"ground" than there was to a nice, local, piece of metal. The notorious MGM Grand fire in Las Vegas, for instance, was caused by a ground/neutral/resistance thing. > Basically, >AFAICT there is really nothing in the NEC or CEC that is "stupid". In >fact, I think that most of the code has undergone a near-Darwinian >selection process, as in electricians who fail to wire to code (and >often their clients) not infrequently fail to reproduce. > >I don't think code is conservative ENOUGH, if anything, and like to >overwire for any given situation. 12-2 is just as easy and cheap to >work with as 14-2, for example. Not if you buy your wire in traincarload lots when wiring a subdivision. That extra copper adds up, not only in copper cost, but shipping, etc. Consider that the wiring harness in an automobile weighs on the order of 50-100kg, and you see why they're interested in going to multiplex buses and 42V systems. Ballparking for my house, which is, give or take 50 feet long, 20 feet wide, and 20 feet high, I'd say there are wiring runs comparable to, say, 3000 feet. That's 9000 total feet of conductors (Black,White, Ground). 12AWG is 19.8 lb/1000 ft, 14 is 12.4 lb/1000ft. Using AWG14 instead of AWG12 saves the contractor 70 pounds of copper. Copper, in huge quantities, is about $0.70/lb, so by the time it gets to the wire maker, it's probably a dollar a pound, so it saves the contractor $70 (not counting any shipping costs, etc. which could be another $0.10/lb or so) $70/house is a bunch o' bux to a builder putting up 500 homes in a tract. They make a profit by watching a thousand little details, each of which is some tiny fraction of the overall price ($70 on a 2000 ft house is 0.035/square foot, compared to $70-100/ft construction cost). It's much like automotive applications, or mass market consumer electronics, where they obssess about BOM (bill of materials) cost changes of pennies. (Do you really, really need that bypass capacitor? Does it have to be that big? How many product returns will we get if we leave it out?) This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at 8.96, even after you factor in the fact that you might need more aluminum (because it's lower conductivity), it's still better than 2:1 weight difference. (Aluminum and Copper are about the same price these days, but copper has bigger fluctuations... back in the 70's copper was expensive and aluminum cheap (about 2:1)) So, 2:1 mass, 2:1 price.. changes the cost of the wire alone from $200/house down to $50/house... Consider an office building with 20-30 floors, of 10,000 square feet each. AWG12 vs AWG14 can be a BIG deal. There was a lot of arguing about the heavier neutral wire needed in light industrial office 208Y/120 wiring with all the poor power factor loads (i.e. computers with lightly loaded switching power supplies). > 10-2 unfortunately is not, but it >gives me comfort to use it whereever I can. And I kinda wish that all >circuit breakers were GFCI by code as well, not just ones servicing >lines near water and pipes. However, these are still available as user >choices -- code permits you to go over, just not under. > >Anybody curious about wiring should definitely google for the electrical >wiring FAQ site. It explains wiring in relatively simple terms. > > rgb > >-- >Robert G. Brown http://www.phy.duke.edu/~rgb/ >Duke University Dept. of Physics, Box 90305 >Durham, N.C. 27708-0305 >Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From fant at pobox.com Fri Feb 6 13:59:24 2004 From: fant at pobox.com (Andrew Fant) Date: Fri, 6 Feb 2004 13:59:24 -0500 (EST) Subject: [Beowulf] Gentoo for Science and Engineering Message-ID: Hello, I am sending this out to let people know about a new mailing-list/IRC channel which is being organized for people interested in the use of Gentoo Linux in Computational Science and Engineering applications. At this point we are just getting started, but hopefully we will grow into an organization which presents a one-stop resource about applying Gentoo to CS&E applications from the desktop to HPC clusters and grids. In addition, we will be working closely with Gentoo developers and the Core Gentoo management to provide feedback and guidance in how it can most closely meet the needs of technical end-users. Anyone who has an interest in computational science and engineering and who is interested in learning more about Gentoo or making it a better CS&E platform is most cordially invited to join About Gentoo Linux: Gentoo Linux is a source-based distribution that makes the assumption that the end-user or administrator knows more about what the system is supposed to do than the distribution developers. At the core of this is a package system known as Portage, which is similar in form to the BSD ports system. It uses the rough equivalent of an RPM spec file (called an ebuild within Gentoo) to automatically download source, compile the package (and any prerequisites) with appropriate optimizations and options as defined by the user, and install it in such a way that it can be removed or upgraded at a later time. Sometimes referred to as a meta-distribution by the developers, Gentoo initially installs a minimal environment and doesn't force the end-user to install packages and services that are unwanted or unnecessary. Also, no network daemons are started on a system unless an administrator expressly starts them. Gentoo Linux is developed by a community of developers, much as Fedora and Debian are. At present, there are over 6000 different ebuilds for different system utilities and applications in Portage. Of these, more than 100 are classified as scientific applications, including bioperl, octave, spice, and gromacs. In addition, many common scientific libraries and HPC tools are present, including Atlas, FFTW, gmp, LAM/MPI and openpbs. The main website can be found at http://www.gentoo.org. Contact information: The mailing-list is only starting now, and is rather quiet, though I hope to change that over the next couple of weeks. To subscribe, send a blank email to gentoo-science-subscribe at gentoo.org. You will get a confirmation message back. For those who want to just ask questions or find out more in a real-time setting, we are on IRC at irc.freenode.org in #gentoo-science. Of course, questions may also be directed to me at afant at geekmail.cc. Thank you for your time. Please feel free to forward this information to other groups that you feel would be interested. I apologize to anyone who considered this an off-topic post. Andy Fant Andrew Fant | This | "If I could walk THAT way... Molecular Geek | Space | I wouldn't need the talcum powder!" fant at pobox.com | For | G. Marx (apropos of Aerosmith) Boston, MA USA | Hire | http://www.pharmawulf.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Fri Feb 6 22:19:58 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sat, 7 Feb 2004 11:19:58 +0800 (CST) Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: Message-ID: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> Can you add GridEngine (SGE) and Torque (SPBS)? The problem with OpenPBS is not only it is broken, it is not under development these days, but also I found that Altair is not allowing new users to download OpenPBS. I went to its homepage today but it only leads me to the PBSPro page. SGE already has a FreeBSD-style "port", so adding a port for Gentoo Linux should also be easy. And I think SGE is more popluar these days too. SPBS is basically PBS, but with lots of problems fixed, and better Maui scheduler support. Also, please support the mpiexec parallel job starter, as it allows OpenPBS and SPBS to control slave MPI tasks. SGE: http://gridengine.sunsource.net SPBS: http://www.supercluster.org/projects/torque/ mpiexec: http://www.osc.edu/~pw/mpiexec/ Thx :-> Andrew. --- Andrew Fant ???? > In addition, many > common scientific libraries > and HPC tools are present, including Atlas, FFTW, > gmp, LAM/MPI and > openpbs. The main website can be found at > http://www.gentoo.org. > > Contact information: > > The mailing-list is only starting now, and is rather > quiet, though I hope > to change that over the next couple of weeks. To > subscribe, send a blank > email to gentoo-science-subscribe at gentoo.org. You > will get a confirmation > message back. For those who want to just ask > questions or find out more > in a real-time setting, we are on IRC at > irc.freenode.org in > #gentoo-science. Of course, questions may also be > directed to me at > afant at geekmail.cc. > > Thank you for your time. Please feel free to > forward this information to > other groups that you feel would be interested. I > apologize to anyone who > considered this an off-topic post. > > Andy Fant > > Andrew Fant | This | "If I could walk > THAT way... > Molecular Geek | Space | I wouldn't need > the talcum powder!" > fant at pobox.com | For | G. Marx > (apropos of Aerosmith) > Boston, MA USA | Hire | > http://www.pharmawulf.com > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sat Feb 7 03:18:47 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sat, 7 Feb 2004 09:18:47 +0100 (CET) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <5.2.0.9.2.20040206094811.018e3688@mailhost4.jpl.nasa.gov> Message-ID: On Fri, 6 Feb 2004, Jim Lux wrote: > This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > 8.96, even after you factor in the fact that you might need more aluminum > (because it's lower conductivity), it's still better than 2:1 weight Oh yes. Lots of telephone circuits were wired in aluminium in the 1960's in the UK. Corrosion now means these customers have difficulty getting ADSL. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Sat Feb 7 06:21:19 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Sat, 7 Feb 2004 11:21:19 +0000 Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> References: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> Message-ID: <20040207112119.GA5120@galactic.demon.co.uk> On Sat, Feb 07, 2004 at 11:19:58AM +0800, Andrew Wang wrote: > Can you add GridEngine (SGE) and Torque (SPBS)? > > The problem with OpenPBS is not only it is broken, it > is not under development these days, but also I found > that Altair is not allowing new users to download > OpenPBS. I went to its homepage today but it only > leads me to the PBSPro page. > To clarify things a bit, I hope. In the beginning was PBS - developed in house at NASA by engineers who needed a Portable Batch System. If you understand Cray NQS syntax and concepts it's familiar :) They left / sold to Veridian who in turn sold to Altair. The original PBS was GPL or a close equivalent, if I understand correctly. Altair are marketing a propietary development of PBS as PBSPro. OpenPBS remains available, though you have to register with Altair for download. What they have done very recently, which is rather sneaky, is for the site to oblige you to register for an evaluation copy of PBSPro and potentially answer a questionnaire prior to providing the link to allow you to download OpenPBS. OpenPBS is not under active development and PBSPro may have stalled. Certainly the price per node that Altair are quoting has apparently dropped significantly - though their salesmen are still persistent :) The academic community and the active users forked OpenPBS to create Scalable PBS [SPBS] which is the name most widely known. They've added patches, fixes and features, though there is still an Altair licence for OpenPBS in there. In the last couple of months, SPBS changed its name initially to StORM and then to Torque. HTH other relative newbies who may be confused by trying to find the product :) Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sat Feb 7 09:19:21 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sat, 7 Feb 2004 22:19:21 +0800 (CST) Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <20040207112119.GA5120@galactic.demon.co.uk> Message-ID: <20040207141921.68156.qmail@web16810.mail.tpe.yahoo.com> --- "Andrew M.A. Cater" ???? > Certainly the price per node that Altair are quoting > has apparently > dropped significantly - though their salesmen are > still persistent :) Both LSF and PBSPro dropped their price significantly. LSF used to be US $1000 per CPU is now $50, and PBSPro used to be a few hundred dollars, and now lower than $30. SGE (GridEngine) 6.0 has a lot of new enchancements and the SGE mailing lists are very popular; and SPBS is gaining a lot of OpenPBS users' acceptance; and Condor is adding another set of new features and then opensource in the next few months. See if LSF and PBSPro are going to drop their price again in the very near future. BTW, it is just like Linux vs M$, at the beginning, Linux wasn't there, and M$ could charge as much as it wanted, and then Linux slowly came, and M$ found it harder and harder to compete with Linux. Linux won't kill M$, and SGE/SPBS/Condor won't kill LSF or PBSPro, not in this few years. The only thing we will see, however, is the lower cost, more features, and better support by Platform Computing (LSF) and Altair (PBSPro) in order to fight back, so users win. Andrew. > The academic community and the active users forked > OpenPBS to create > Scalable PBS [SPBS] which is the name most widely > known. They've added > patches, fixes and features, though there is still > an Altair licence for > OpenPBS in there. In the last couple of months, > SPBS changed its name > initially to StORM and then to Torque. > > HTH other relative newbies who may be confused by > trying to find the > product :) > > Andy > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From br66 at HPCL.CSE.MsState.Edu Sat Feb 7 13:48:06 2004 From: br66 at HPCL.CSE.MsState.Edu (Balaji Rangasamy) Date: Sat, 7 Feb 2004 12:48:06 -0600 (CST) Subject: [Beowulf] Cluster applications. In-Reply-To: <40235BB7.6010802@myri.com> Message-ID: Hi, I am looking for a real high performance computing application to evaluate the performance of a 2-node cluster running RH9.0, connected back to back by 1GbE. Here are some characteristics of the application I am looking for: 1 Communication intensive, should not be embarassingly parallel. 2 Should be able to stress the network to the maximum. 3 Should not be a benchmark, a real application. 4 Tunable message sizes. 5 Preferably MPI 6 Free (am I greedy?). Can someone point out one/some application(s) with at least first 3 features in the above list? Thank you very much. Regards, Balaji. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Sat Feb 7 10:11:55 2004 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Sat, 07 Feb 2004 09:11:55 -0600 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: References: Message-ID: <4025003B.3020105@tamu.edu> Should we mention the problems in household wiring caused by use of aluminum wiring, then using breakers, outlets and fixtures designed for copper? I almost lost a house in Houston to that once. I spent the 8 hours after the fire department left retightening all the connections throughout. John Hearns wrote: > On Fri, 6 Feb 2004, Jim Lux wrote: > > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at >>8.96, even after you factor in the fact that you might need more aluminum >>(because it's lower conductivity), it's still better than 2:1 weight > > > Oh yes. > Lots of telephone circuits were wired in aluminium in the 1960's in the > UK. Corrosion now means these customers have difficulty getting ADSL. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From clwang at csis.hku.hk Fri Feb 6 20:51:51 2004 From: clwang at csis.hku.hk (Cho Li Wang) Date: Sat, 07 Feb 2004 09:51:51 +0800 Subject: [Beowulf] CFP: 2004 IFIP International Conference on Network and Parallel Computing (NPC2004) Message-ID: <402444B7.5E50DC8B@csis.hku.hk> NPC2004 IFIP International Conference on Network and Parallel Computing October 18-20, 2004 Wuhan, China http://grid.hust.edu.cn/npc04 **************************************************************** Call For Papers The goal of IFIP International Conference on Network and Parallel Computing (NPC 2004) is to establish an international forum for engineers and scientists to present their excellent ideas and experiences in all system fields of network and parallel computing. NPC 2004, hosted by the Huazhong University of Science and Technology, will be held in the city of Wuhan, China - the "Homeland of White Clouds and the Yellow Crane." Topics of interest include, but are not limited to: -Grid-based Computing -Cluster-based Computing -Peer-to-peer Computing -Network Security -Ubiquitous Computing -Network Architectures -Advanced Web and Proxy Services -Mobile Agents -Network Storage -Multimedia Streaming Services -Middleware Frameworks and Toolkits -Parallel & Distributed Architectures and Algorithms -Performance Modeling/ Evaluation -Programming Environments and Tools for Parallel and Distributed Platforms Submitted papers may not have appeared in or be considered for another conference. Papers must be written in English and must be in PDF format. Detailed electronic submission instructions will be posted on the conference web site. The conference proceedings will be published by Springer Verlag in the Lecture Notes in Computer Science (LNCS) Series (pending). ************************************************************************** Committee General Co-Chairs: H. J. Siegel Colorado State University, USA Guo-jie Li Chinese Academy of Sciences, China Steering Committee Chair: Kemal Ebcioglu IBM T.J. Watson Research Center, USA Program Co-Chairs: Guang-rong Gao University of Delaware, USA Zhi-wei Xu Chinese Academy of Sciences, China Program Vice-Chairs: Victor K. Prasanna University of Southern California, USA Albert Y. Zomaya University of Sydney, Australia Hai Jin Huazhong University of Science and Technology, China Local Arrangement Chair: Song Wu Huazhong University of Science and Technology, China *************************************************************************** Important Dates Paper Submission March 15, 2004 Author Notification May 1, 2004 Final Camera Ready Manuscript June 1, 2004 *************************************************************************** For more information, please contact the program vice-chair at the address below: Dr. Hai Jin, Professor Director, Cluster and Grid Computing Lab Vice-Dean, School of Computer Huazhong University of Science and Technology Wuhan, 430074, China Tel: +86-27-87543529 Fax: +86-27-87557354 e-fax: +1-425-920-8937 e-mail: hjin at hust.edu.cn _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Sat Feb 7 14:40:29 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Sat, 7 Feb 2004 11:40:29 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: On Sat, 7 Feb 2004, John Hearns wrote: > On Fri, 6 Feb 2004, Jim Lux wrote: > > > This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > > 8.96, even after you factor in the fact that you might need more aluminum > > (because it's lower conductivity), it's still better than 2:1 weight > > Oh yes. > Lots of telephone circuits were wired in aluminium in the 1960's in the > UK. Corrosion now means these customers have difficulty getting ADSL. yeah but that's 24-26awg twisted pair for phone a 14 12 10 or 8 awg cable for power have substantialy less surface area relative to it's volume. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Sat Feb 7 17:21:38 2004 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Sat, 7 Feb 2004 14:21:38 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... wires - al In-Reply-To: <4025003B.3020105@tamu.edu> Message-ID: hi ya On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote: > Should we mention the problems in household wiring caused by use of > aluminum wiring, then using breakers, outlets and fixtures designed for > copper? I almost lost a house in Houston to that once. I spent the 8 > hours after the fire department left retightening all the connections > throughout. people install wires with al or steel cores in the wire cause its way way cheaper than copper ... copper is only needed for good conduction on the outside of the wire al corrosion ... coat it with stuff :-) or wrap it w/ copper but now you have to worry about copper corrosion - house or building wiring is different animals than high voltage transmission lines too aluminum "pixie" dust does whacky things .. c ya alvin - i've always wondered why people put massive heatsinks on top of the cpu ... air will have a harder time to cool a big mass of metal as opposed to cooling a smaller piece of metal or cooling it some other way .. - problems of getting the heat out of the cpu ( 0.25"sq metal lid) - problems of getting the heat out of the cpu heatsink - blowing air down onto the heatsink is silly too .. left over from the 20-30 yr old ideas i guess > > John Hearns wrote: > > On Fri, 6 Feb 2004, Jim Lux wrote: > > > > > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > >>8.96, even after you factor in the fact that you might need more aluminum > >>(because it's lower conductivity), it's still better than 2:1 weight > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Sat Feb 7 21:36:50 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Sat, 7 Feb 2004 21:36:50 -0500 (EST) Subject: [Beowulf] Cluster applications. In-Reply-To: Message-ID: Check out: http://www.clusterworld.com/article.pl?sid=03/03/17/1838236&mode=thread&tid=8 Also, the "Right Stuff" Column in ClusterWorld addresses some of these issues. To see the a small summary of the columns look at: http://www.clusterworld.com/issues.shtml Doug On Sat, 7 Feb 2004, Balaji Rangasamy wrote: > Hi, > I am looking for a real high performance computing application to evaluate > the performance of a 2-node cluster running RH9.0, connected back to back > by 1GbE. Here are some characteristics of the application I am looking > for: > 1 Communication intensive, should not be embarassingly parallel. > 2 Should be able to stress the network to the maximum. > 3 Should not be a benchmark, a real application. > 4 Tunable message sizes. > 5 Preferably MPI > 6 Free (am I greedy?). > Can someone point out one/some application(s) with at least first 3 > features in the above list? Thank you very much. > Regards, > Balaji. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From klamman.gard at telia.com Sun Feb 8 03:50:22 2004 From: klamman.gard at telia.com (Per Lindstrom) Date: Sun, 08 Feb 2004 09:50:22 +0100 Subject: [Beowulf] Cluster applications. In-Reply-To: References: Message-ID: <4025F84E.4040706@telia.com> Hi Balaji, May I suggest the use of the GNU FEA-software CALCULIX, http://calculix.de/ When will it be up to you to decide how demanding problem your cluster have to solve. Best regards Per Lindstrom Balaji Rangasamy wrote: >Hi, >I am looking for a real high performance computing application to evaluate >the performance of a 2-node cluster running RH9.0, connected back to back >by 1GbE. Here are some characteristics of the application I am looking >for: >1 Communication intensive, should not be embarassingly parallel. >2 Should be able to stress the network to the maximum. >3 Should not be a benchmark, a real application. >4 Tunable message sizes. >5 Preferably MPI >6 Free (am I greedy?). >Can someone point out one/some application(s) with at least first 3 >features in the above list? Thank you very much. >Regards, >Balaji. > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Feb 8 10:52:44 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 8 Feb 2004 10:52:44 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <4025003B.3020105@tamu.edu> Message-ID: On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote: > Should we mention the problems in household wiring caused by use of > aluminum wiring, then using breakers, outlets and fixtures designed for > copper? I almost lost a house in Houston to that once. I spent the 8 > hours after the fire department left retightening all the connections > throughout. You mean the part where aluminum turns out to burn like magnesium, incredibly hot and impossible to quench? I would under no circumstances put aluminum wiring in, well, anything. Certainly not anything where a serious overload or arcing situation could occur, which is nearly anything. I seem to remember the government finding out about aluminum the hard way with some of their armored fighting vehicles a decade or two ago. When struck with a hot enough round, the armor itself just burned right up. rgb > > John Hearns wrote: > > On Fri, 6 Feb 2004, Jim Lux wrote: > > > > > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > >>8.96, even after you factor in the fact that you might need more aluminum > >>(because it's lower conductivity), it's still better than 2:1 weight > > > > > > Oh yes. > > Lots of telephone circuits were wired in aluminium in the 1960's in the > > UK. Corrosion now means these customers have difficulty getting ADSL. > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nathan at iwantka.com Sun Feb 8 14:10:43 2004 From: nathan at iwantka.com (Nathan Littlepage) Date: Sun, 08 Feb 2004 13:10:43 -0600 Subject: [Beowulf] DC Powered Chassis Message-ID: <402689B3.9070104@iwantka.com> With all the power talk on the 'HVAC and Room Cooling' subject. I've been looking for 1 or 2u chassis that support -48v DC as the main power source. Does anyone know of someone that manufactures these? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Mon Feb 9 00:24:28 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Sun, 8 Feb 2004 21:24:28 -0800 (PST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: On Sun, 8 Feb 2004, Robert G. Brown wrote: > On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote: > > > Should we mention the problems in household wiring caused by use of > > aluminum wiring, then using breakers, outlets and fixtures designed for > > copper? I almost lost a house in Houston to that once. I spent the 8 > > hours after the fire department left retightening all the connections > > throughout. > > I seem to remember the government finding out about aluminum the hard > way with some of their armored fighting vehicles a decade or two ago. > When struck with a hot enough round, the armor itself just burned right > up. armor is supposed to burn. several armor desgins including that of the american abrams battle tank, are desgined to ablate under pressure from kinetic energy weapons. british chobham type composite armor, boron carbide, or aluminum or some conbination of those and others, protect larger armored vehicles from depleted uranium and tungsten sabot munitions. depleted uranium has similar or better pyrophoric properties (igniting at 500c and burning at 2000c) and the added nastyness of being a toxic heavy metal... in general taking a 10kg urunium slug, accelerating it to 15,000fps and slamming it into another object will cause a fire. It has been used in both armor and projectiles for more or less the same reasons. > rgb > > > > > John Hearns wrote: > > > On Fri, 6 Feb 2004, Jim Lux wrote: > > > > > > > > >>This is why aluminum wiring was popular: the density of Al at 2.7 vs Cu at > > >>8.96, even after you factor in the fact that you might need more aluminum > > >>(because it's lower conductivity), it's still better than 2:1 weight > > > > > > > > > Oh yes. > > > Lots of telephone circuits were wired in aluminium in the 1960's in the > > > UK. Corrosion now means these customers have difficulty getting ADSL. > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Sun Feb 8 17:49:18 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Sun, 8 Feb 2004 14:49:18 -0800 (PST) Subject: [Beowulf] DC Powered Chassis In-Reply-To: <402689B3.9070104@iwantka.com> Message-ID: http://www.rackmountpro.com/productsearch.cfm?catid=118 On Sun, 8 Feb 2004, Nathan Littlepage wrote: > With all the power talk on the 'HVAC and Room Cooling' subject. I've > been looking for 1 or 2u chassis that support -48v DC as the main power > source. Does anyone know of someone that manufactures these? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Feb 9 13:13:16 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 9 Feb 2004 13:13:16 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: Message-ID: On Sun, 8 Feb 2004, Joel Jaeggli wrote: > On Sun, 8 Feb 2004, Robert G. Brown wrote: > > > On Sat, 7 Feb 2004, Gerry Creager N5JXS wrote: > > > > > Should we mention the problems in household wiring caused by use of > > > aluminum wiring, then using breakers, outlets and fixtures designed for > > > copper? I almost lost a house in Houston to that once. I spent the 8 > > > hours after the fire department left retightening all the connections > > > throughout. > > > > I seem to remember the government finding out about aluminum the hard > > way with some of their armored fighting vehicles a decade or two ago. > > When struck with a hot enough round, the armor itself just burned right > > up. > > armor is supposed to burn. several armor desgins including that of the "supposed to burn"? Where to "burn" is to release additional heat energy into an already hot environment in a self-sustaining way? Ouch. Supposed to ablate and dissipate energy (hopefully in non-destructive ways on the outside of the vehicle) sure, but naive aluminum designs can be deadly and at various points in the past have been seriously mistrusted by the military personnel supposedly being protected. See e.g. http://www.g2mil.com/aluminum.htm where they recall the early bradley flaws, and argue that the HMS Sheffield (sunk by a single exocet missle in the falklands war) went down in large measure because it was an aluminum ship, where steel ships have been hit by more than one exocet and survived. The site also presents a counterpoint that argues that aluminum isn't THAT bad a choice (as near as I can make out) provided that all one wishes to stop is "small arms fire". It very quickly loses out to steel, though, in a variety of measures when faced with RPG's or things that actually cause fires, as it is a good conductor of heat and quickly spreads a fire and structurally collapses at a relatively low temperature. The aluminum Bradley did tolerably in the first gulf war, losing only 3 to enemy fire (compared to 17 lost to friendly fire from Abrams tanks) but it does have provisions for additional armor plates of steel to be added on outside and I imagine that it used them. Most of what it faced in the gulf war OTHER than our Abrams was its forte -- small arms fire. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Sun Feb 8 17:09:36 2004 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Sun, 8 Feb 2004 14:09:36 -0800 (PST) Subject: [Beowulf] DC Powered Chassis In-Reply-To: <402689B3.9070104@iwantka.com> Message-ID: hi ya nathan On Sun, 8 Feb 2004, Nathan Littlepage wrote: > With all the power talk on the 'HVAC and Room Cooling' subject. I've > been looking for 1 or 2u chassis that support -48v DC as the main power > source. Does anyone know of someone that manufactures these? some collection of "these" http://www.Linux-1U.net/PowerSupp/DC http://www.Linux-1U.net/PowerSupp/12v problem with +12v or -48v dc inputs is you need to provide enough current to these "dc power supply" - at 12v .. we were calculating about 400A ... since we estimate 4A per mb and 100 mb per rack and double it or 50% for keeping the powersupply reasonably within its normal lifespan ( mtbf ) fun stuff... alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nathan at iwantka.com Sun Feb 8 17:31:40 2004 From: nathan at iwantka.com (Nathan Littlepage) Date: Sun, 08 Feb 2004 16:31:40 -0600 Subject: [Beowulf] DC Powered Chassis In-Reply-To: References: Message-ID: <4026B8CC.7050102@iwantka.com> Thanks! Alvin Oga wrote: >hi ya nathan > >On Sun, 8 Feb 2004, Nathan Littlepage wrote: > > > >>With all the power talk on the 'HVAC and Room Cooling' subject. I've >>been looking for 1 or 2u chassis that support -48v DC as the main power >>source. Does anyone know of someone that manufactures these? >> >> > >some collection of "these" > >http://www.Linux-1U.net/PowerSupp/DC >http://www.Linux-1U.net/PowerSupp/12v > > >problem with +12v or -48v dc inputs is you need to provide >enough current to these "dc power supply" > - at 12v .. we were calculating about 400A ... > since we estimate 4A per mb and 100 mb per rack > and double it or 50% for keeping the powersupply > reasonably within its normal lifespan ( mtbf ) > >fun stuff... >alvin > > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From clwang at csis.hku.hk Sat Feb 7 20:24:01 2004 From: clwang at csis.hku.hk (clwang at csis.hku.hk) Date: Sun, 8 Feb 2004 09:24:01 +0800 Subject: [Beowulf] CFP: GCC2004 (3rd International Conference on Grid and Cooperative Computing) Message-ID: <1076203441.40258fb1bf1c0@intranet.csis.hku.hk> ---------------------------------------------------------------- Call for Papers 3rd International Conference on Grid and Cooperative Computing (http://grid.hust.edu.cn/gcc2004) October 21-24 2004, Wuhan, China ----------------------------------------------------------------- The Third International Conference on Grid and Cooperative Computing (GCC 2004) will be held from Oct. 21 to 24, 2004 in Wuhan. It will serve as a forum to present current work by researchers in the grid computing and cooperative computing area. GCC 2004 is the follow-up of the highly successful GCC 2003 in Shanghai, China, and GCC 2002 in Sanya, China. Wuhan is rich in culture and history. Its civilization began about 3,500 years ago, and is of great importance in Chinese culture, economy and politics. It shares the same culture of Chu, formed since the ancient Kingdom of Chu more than 2,000 years ago. Numerous natural and artificial attractions and scenic spots are scattered around. Famous scenic spots in Wuhan include Yellow Crane Tower, Guiyuan Temple, East Lake, and Hubei Provincial Museum with the famous chimes playing the music of different styles. GCC 2004 will emphasize the design and analysis of grid computing and cooperative computing and their scientific, engineering, and commercial applications. In addition to technical sessions of contributed paper presentations, the conference will have several workshops, a poster session, tutorials, and vendor exhibitions. GCC 2004 invites the submission of papers in grid computing, Web services and cooperative computing, including theory and applications. The conference is soliciting only original high quality research papers on all above aspects. The main topics of interest include, but not limited to: -Resource Grid and Service Grid - Information Grid and Knowledge Grid - Grid Monitoring, Management and Organization Tools - Grid Portal - Grid Service, Web Service and their QoS - Service Orchestration - Grid Middleware and Toolkits - Grid Security - Innovative Grid Applications - Advanced Resource Reservation and Scheduling - Performance Evaluation and Modeling - Computer-Supported Cooperative Work - P2P Computing, automatic computing and so on - Meta-information Management - Software glue Technologies PAPER SUBMISSION Paper submissions must present original, unpublished research or experiences. Late-breaking advances and work-in-progress reports from ongoing research are also encouraged to be submitted to GCC 2004. All papers submitted to this conference will be peer-reviewed and accepted on the basis of their scientific merit and relevance to the conference topics. Accepted papers will be published as conference proceedings, published by Springer-Verlag in the Lecture Notes in Computer Science (LNCS) Series (Pending). It is also planned that a selection of papers from GCC 2004 proceedings will be extended and published in international journals. WORKSHOPS Proposals are solicited for workshops to be held in conjunction with the main conference. Interested individuals should submit a proposal by March 1, 2004 to the Workshop Chair. TUTORIALS Proposals are solicited for tutorials to be held at the conference. Interested individuals should submit a proposal by May 30,2004. The proposal should include a brief description of the intended audience, a lecture outline, and a vita for each lecturer. EXHIBITION/VENDOR PRESENTATIONS Companies and R&D laboratories are encouraged to present their exhibits at the conference. In addition, a full day of vendor presentations is planned. IMPORTANT DATES March 1, 2004 Workshop Proposal Due May 1, 2004 Conference Paper Due May 30, 2004 Tutorial Proposal Due June 1, 2004 Notification of Acceptance/Rejection June 30, 2004 Camera-Ready Paper Due ORGANIZATION CONFERENCE Co-CHAIRS Xicheng Lu, National University of Defense Technology, China Andrew A. Chien, University of California at San Diego, USA. PROGRAM Co-CHAIRS Hai Jin, Huazhong University of Science and Technology, China. hjin at hust.edu.cn Yi Pan, Georgia State University, USA. pan at cs.gsu.edu WORKSHOP CHAIR Nong Xiao, National University of Defense Technology, China. xiao-n at vip.sina.com, Xiao_n at sina.com. Publicity Chair Minglu Li, Shanghai Jiao Tong University, China. li-ml at cs.sjtu.edu.cn Tutorial Chair Dan Meng, Institute of Computing Technology, Chinese Academy of Sciences, China. md at ncic.ac.cn Poster Chair Song Wu, Huazhong University of Science and Technology, China. wusong at mail.hust.edu.cn LOCAL ARRANGEMENT CHAIR Pingpeng Yuan, Huazhong University of Science and Technology, China. ppyuan at mail.hust.edu.cn. Program Committee Members(More to be added) Mark Baker (University of Portsmouth, UK) Rajkumar Buyya (The University of Melbourne, Australia) Wentong Cai (Nanyang Technological University, Singapore) Jiannong Cao (Hong Kong Polytechnic University, Hong Kong) Guihai Chen (Nanjing University, China) Minyi Guo (University of Aizu, Japan) Chun-Hsi Huang (University of Connecticut, USA) Weijia Jia (City University of Hong Kong, Hong Kong) Francis Lau (The University of Hong Kong, Hong Kong) Keqin Li (State University of New York, USA) Qing Li (City University of Hong Kong, Hong Kong) Lionel Ni (Hong Kong University of Science and Technology, Hong Kong) Hong Shen (Japan Advanced Institute of Science and Technology, Japan) Yuzhong Sun (Institute of Computing Technology, CAS, China) Huaglory Tianfield (Glasgow Caledonian University, UK) Cho-Li Wang (The University of Hong Kong, Hong Kong) Jie Wu (Florida Atlantic University, USA) Cheng-Zhong Xu (Wayne State University, USA) Laurence Tianruo Yang (St. Francis Xavier University, Canada) Qiang Yang (Hong Kong University of Science & Technology, Hong Kong) Yao Zheng (Zhejiang University, China) Wanlei Zhou (Deakin University, Australia) Jianping Zhu (The University of Akron, USA) For more information, please visit conference web site at: http://grid.hust.edu.cn/gcc2004. ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgoornaden at intnet.mu Mon Feb 9 10:34:28 2004 From: rgoornaden at intnet.mu (roudy) Date: Mon, 9 Feb 2004 19:34:28 +0400 Subject: [Beowulf] parallel program References: <200402081701.i18H1qh28395@NewBlue.scyld.com> Message-ID: <001601c3ef22$35dd7be0$ab007bca@roudy> Hello Beowulf people, I have completed to build my cluster. I have have already run linpack on my cluster and it's performance is fine. Can someone help me by giving me some very big programs to run on my cluster to compare the performance with a stand-alone computer. Thanks Roudy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgoornaden at intnet.mu Mon Feb 9 10:34:28 2004 From: rgoornaden at intnet.mu (roudy) Date: Mon, 9 Feb 2004 19:34:28 +0400 Subject: [Beowulf] parallel program References: <200402081701.i18H1qh28395@NewBlue.scyld.com> Message-ID: <001601c3ef22$35dd7be0$ab007bca@roudy> Hello Beowulf people, I have completed to build my cluster. I have have already run linpack on my cluster and it's performance is fine. Can someone help me by giving me some very big programs to run on my cluster to compare the performance with a stand-alone computer. Thanks Roudy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcookeman at yahoo.com Mon Feb 9 18:00:47 2004 From: jcookeman at yahoo.com (Justin Cook) Date: Mon, 9 Feb 2004 15:00:47 -0800 (PST) Subject: [Beowulf] parallel program In-Reply-To: <001601c3ef22$35dd7be0$ab007bca@roudy> Message-ID: <20040209230047.89106.qmail@web60510.mail.yahoo.com> http://www.mpa-garching.mpg.de/galform/gadget/index.shtml There is a serial and parallel version. Have fun... Justin --- roudy wrote: > Hello Beowulf people, > I have completed to build my cluster. I have have > already run linpack on my > cluster and it's performance is fine. > Can someone help me by giving me some very big > programs to run on my cluster > to compare the performance with a stand-alone > computer. > Thanks > Roudy > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Feb 9 17:50:45 2004 From: becker at scyld.com (Donald Becker) Date: Mon, 9 Feb 2004 17:50:45 -0500 (EST) Subject: [Beowulf] BWBUG meeting Tuesday Feb 10 at 3:00, Platform Computing Message-ID: --- Note that this meeting is in VA not Maryland! -- Date: February 10, 2004 Time: 3:00 PM (doors open at 2:30) Location: Northrop Grumman IT, McLean Virginia The folks from Platform Computing will be speaking about their LSF scheduler and Grid Computing for Beowulf. This event is sponsored by the Baltimore-Washington Beowulf Users Group (BWBUG) and will be held at Northrop Grumman Information Technology 7575 Colshire Drive, 2nd floor, McLean Virginia. Please register on line at http://bwbug.org As usual there will be door prizes, food and refreshments. Need to be a member?: No ( guests are welcome ) Parking: Free T. Michael Fitzmaurice, Jr. 8110 Gatehouse Road, Suite 400W Falls Church, VA 22042 703-205-3132 office 240-475-7877 cell Email michael.fitzmaurice at ngc.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Feb 9 18:25:55 2004 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 10 Feb 2004 10:25:55 +1100 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: References: Message-ID: <200402101025.57234.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 10 Feb 2004 05:13 am, Robert G. Brown wrote: > argue that the HMS Sheffield (sunk by a single exocet missle in the > falklands war) went down in large measure because it was an aluminum ship A quick correction, the Sheffield was an all steel ship, as I believe were all the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was not brought under control because the exocet (which failed to explode) took out a large chunk of the fire fighting system. She finally sank under tow on May 10th 1982, six days after being hit. The sci.military.naval FAQ has an excellent section on the role of aluminium in the loss of warships which looks at this urban legend, and gives real examples when aluminium did cause the loss, at: http://www.hazegray.org/faq/smn6.htm#F7 as well as a section on the Type 42's at: http://www.hazegray.org/navhist/rn/destroyers/type42/ cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD4DBQFAKBcDO2KABBYQAh8RAq2vAJdRfrlHek12hced85HGV0z1nWbYAJ9GJegr FBxjHUczDti0OXNKX5VoKA== =PA8t -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Feb 9 18:31:18 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 9 Feb 2004 18:31:18 -0500 (EST) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <200402101025.57234.csamuel@vpac.org> Message-ID: On Tue, 10 Feb 2004, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Tue, 10 Feb 2004 05:13 am, Robert G. Brown wrote: > > > argue that the HMS Sheffield (sunk by a single exocet missle in the > > falklands war) went down in large measure because it was an aluminum ship > > A quick correction, the Sheffield was an all steel ship, as I believe were all > the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was > not brought under control because the exocet (which failed to explode) took > out a large chunk of the fire fighting system. She finally sank under tow on > May 10th 1982, six days after being hit. I stand corrected. Obviously one can't believe everything one googles...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Mon Feb 9 22:42:32 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Tue, 10 Feb 2004 11:42:32 +0800 (CST) Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?) Message-ID: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> >From comp.arch: "One of the things that the version 8.0 of the Intel compiler included was an "Intel-specific" flag." But looks like the purpose is to slow down AMD: http://groups.google.ca/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&group=comp.arch&selm=a13e403a.0402091438.14018f5a%40posting.google.com If intel releases 64-bit x86 CPUs and compilers, then AMD may get even better benchmarks results. Again, no matter how pretty the benchmarks results look, in the end we still need to run on the real system. So, what's the point of having benchmarks? Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 10 03:10:39 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 10 Feb 2004 09:10:39 +0100 (CET) Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: <200402101025.57234.csamuel@vpac.org> Message-ID: On Tue, 10 Feb 2004, Chris Samuel wrote: > A quick correction, the Sheffield was an all steel ship, as I believe were all > the Royal Navy's Type 42 destroyers. There was a major fire, which IIRC was > not brought under control because the exocet (which failed to explode) took > out a large chunk of the fire fighting system. She finally sank under tow on > May 10th 1982, six days after being hit. Steering the argument back to computers :-) I saw a documentary about the Sheffield once. Two ships were sent out as 'goalkeepers', the Sheffield and the smaller Broadsword. The Sheffield had a longer range missile system, the Broadsword a short range one (or other way around). During a period of vulnerability (can;t remember the exact reason) the Broadsword had to reboot its ageing fire control computer. I think build by Ferranti. (No slur intended on their fine engineers, but the thing was old at the time). _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wardwe at navseadn.navy.mil Tue Feb 10 16:21:01 2004 From: wardwe at navseadn.navy.mil (Ward William E DLDN) Date: Tue, 10 Feb 2004 16:21:01 -0500 Subject: [Beowulf] Intel Compiler cheating against non-Intel CPUs? Message-ID: Has anyone seen this yet? Any comments or discussion? >From the message, it looks like the Intel Compilers are cheating against SSE and SSE2 capable non-Intel CPUs (ie, A64 especially). http://groups.google.ca/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=a13e403a.040 2091438.14018f5a%40posting.google.com&rnum=1 R/William Ward _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mbanck at gmx.net Tue Feb 10 13:01:16 2004 From: mbanck at gmx.net (Michael Banck) Date: Tue, 10 Feb 2004 19:01:16 +0100 Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <20040207112119.GA5120@galactic.demon.co.uk> References: <20040207031958.71235.qmail@web16803.mail.tpe.yahoo.com> <20040207112119.GA5120@galactic.demon.co.uk> Message-ID: <20040210180116.GA27872@blackbird.oase.mhn.de> On Sat, Feb 07, 2004 at 11:21:19AM +0000, Andrew M.A. Cater wrote: > On Sat, Feb 07, 2004 at 11:19:58AM +0800, Andrew Wang wrote: > > Can you add GridEngine (SGE) and Torque (SPBS)? > > > > The problem with OpenPBS is not only it is broken, it > > is not under development these days, but also I found > > that Altair is not allowing new users to download > > OpenPBS. I went to its homepage today but it only > > leads me to the PBSPro page. > > > To clarify things a bit, I hope. > > In the beginning was PBS - developed in house at NASA by engineers > who needed a Portable Batch System. If you understand Cray NQS syntax > and concepts it's familiar :) They left / sold to Veridian who in turn > sold to Altair. The original PBS was GPL or a close equivalent, if I > understand correctly. > > Altair are marketing a propietary development of PBS as PBSPro. OpenPBS > remains available, though you have to register with Altair for download. > What they have done very recently, which is rather sneaky, is for the > site to oblige you to register for an evaluation copy of PBSPro and > potentially answer a questionnaire prior to providing the link to allow > you to download OpenPBS. > > OpenPBS is not under active development and PBSPro may have stalled. > Certainly the price per node that Altair are quoting has apparently > dropped significantly - though their salesmen are still persistent :) > > The academic community and the active users forked OpenPBS to create > Scalable PBS [SPBS] which is the name most widely known. They've added > patches, fixes and features, though there is still an Altair licence for > OpenPBS in there. In the last couple of months, SPBS changed its name > initially to StORM and then to Torque. Thanks for the clarification. Does anybody know whether Torque is considered to be conforming to the Open Source Definition[1]? In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software License', which seems to prohibit commercial distribution, making it non-free unfortunately. Is there some other fork of PBS with a true Open Source license perhaps? thanks, Michael [1] http://www.opensource.org/docs/definition.php _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 10 18:00:05 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 11 Feb 2004 10:00:05 +1100 Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <20040210180116.GA27872@blackbird.oase.mhn.de> References: <20040207112119.GA5120@galactic.demon.co.uk> <20040210180116.GA27872@blackbird.oase.mhn.de> Message-ID: <200402111000.08919.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote: > Thanks for the clarification. Does anybody know whether Torque is > considered to be conforming to the Open Source Definition[1]? > > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software > License', which seems to prohibit commercial distribution, making it > non-free unfortunately. Is there some other fork of PBS with a true Open > Source license perhaps? My understanding is that they cannot alter the license as they have inherited that from the original OpenPBS sources, and as they do not hold all the copyrights to the code it cannot be changed unless Altair can be persuaded. My understanding is that the SuperCluster people picked the 2.3.12 version as a starting point as that was the most recent with the most liberal license (i.e. others could fork development from it). I've CC'd this to the SuperCluster folks so they can comment and correct. - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU QwJlxOBwfLiUT7Y543RwiIY= =xTbA -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 10 17:44:17 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 11 Feb 2004 09:44:17 +1100 Subject: [Beowulf] Re: HVAC and room cooling... wires In-Reply-To: References: Message-ID: <200402110944.21802.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 10 Feb 2004 07:10 pm, John Hearns wrote: > During a period of vulnerability (can;t remember the exact reason) > the Broadsword had to reboot its ageing fire control computer. > I think build by Ferranti. (No slur intended on their fine engineers, > but the thing was old at the time). I'm not aware of that one, but on a similar vein there was the widespread failure of the Patriot systems during the first Gulf War, including the attack on the barracks at Dhahran where 28 were killed. This was caused by the system truncating the values of the clock when written to memory, which over a long period of operation resulted in the system dismissing incoming missiles as false alarms. http://shelley.toich.net/projects/CS201/patriot.html However, this is starting to sound more like the RISKS digest than Beowulf, so I'll leave it there. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKV7EO2KABBYQAh8RApXjAJ9Gil07Z/XekN3XDSturEu2KihedQCfXBA7 aUUMVqTZuHfQ5RKsKGwnuNw= =+9RK -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Tue Feb 10 18:52:41 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Tue, 10 Feb 2004 15:52:41 -0800 Subject: [Beowulf] Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com>; from andrewxwang@yahoo.com.tw on Tue, Feb 10, 2004 at 11:42:32AM +0800 References: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> Message-ID: <20040210155241.A29026@fileserver.internal.keyresearch.com> On Tue, Feb 10, 2004 at 11:42:32AM +0800, Andrew Wang wrote: > Again, no matter how pretty the benchmarks results > look, in the end we still need to run on the real > system. So, what's the point of having benchmarks? There isn't much point at staring at a benchmark that isn't at all relevant to how you're using the system -- for example, a SPECcpu score with the Intel compiler in 32-bit mode isn't going to tell you much about an AMD64 app in 64-bit mode. If I remember correctly, a guy at Intel published a paper about a feedback optimization technique related to irregular strides that got a 22% improvement in mcf. When I get back to the office in a couple of days, I'll post a reference. And no, it's not at all Intel-specific. -- greg (posting from Paris. I should be asleep!) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 10 19:32:47 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 11 Feb 2004 11:32:47 +1100 Subject: Fwd: Re: [Beowulf] Gentoo for Science and Engineering Message-ID: <200402111132.49119.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Forwarded at the request of SuperCluster.org - ---------- Forwarded Message ---------- Subject: Re: [Beowulf] Gentoo for Science and Engineering Date: Wed, 11 Feb 2004 11:55 am From: help at supercluster.org To: Chris Samuel Cc: beowulf at beowulf.org Chris, Thanks for the cc. You will probably need to forward this message to beowulf as I don't think we are registered. OpenPBS 2.3.12 was selected because its license did allow anyone to modify/distribute the code for any reason with the only conditions being that the license be included and the original creators acknowledged. To our understanding, changing the license can only be done by the current license holders, ie Altair. The good news is that they are currently considering this as a possibility although we do not know which way they are leaning. As far the Cluster Resources/Supercluster is concerned, our plans are to continue to contribute to this project, developing infrastructure changes as needed, adding scalability, security, usability, and functionality enhancements, and rolling in community patches and enhancements with no intention of creating a commercial/closed product out of it. Let us know if we can be of further assistance. Thanks, Supercluster Development Group On Wed, 11 Feb 2004, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote: > > Thanks for the clarification. Does anybody know whether Torque is > > considered to be conforming to the Open Source Definition[1]? > > > > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software > > License', which seems to prohibit commercial distribution, making it > > non-free unfortunately. Is there some other fork of PBS with a true Open > > Source license perhaps? > > My understanding is that they cannot alter the license as they have > inherited that from the original OpenPBS sources, and as they do not hold > all the copyrights to the code it cannot be changed unless Altair can be > persuaded. > > My understanding is that the SuperCluster people picked the 2.3.12 version > as a starting point as that was the most recent with the most liberal > license (i.e. others could fork development from it). > > I've CC'd this to the SuperCluster folks so they can comment and correct. > > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU > QwJlxOBwfLiUT7Y543RwiIY= > =xTbA > -----END PGP SIGNATURE----- - -- - -------------------------------------------------------- Supercluster Development Group Scheduling and Resource Management of Clusters and Grids Maui Home Page - http://supercluster.org/maui Silver Home Page - http://supercluster.org/silver Documentation - http://supercluster.org/documentation - ------------------------------------------------------- - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKXgvO2KABBYQAh8RAok7AKCABbnmwiYvRf4BxeFoY+Jp9F/W1gCfReKD dKc1islXxQLdTrabQglX1MU= =xfyh -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From yduan at albert.chem.udel.edu Tue Feb 10 10:37:49 2004 From: yduan at albert.chem.udel.edu (Dr. Yong Duan) Date: Tue, 10 Feb 2004 10:37:49 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <20040210034232.2892.qmail@web16808.mail.tpe.yahoo.com> Message-ID: On Tue, 10 Feb 2004, [big5] Andrew Wang wrote: > Again, no matter how pretty the benchmarks results > look, in the end we still need to run on the real > system. So, what's the point of having benchmarks? > > Andrew. > A guidelines, I guess. A lot of CPUs (including some rather expensive ones and often call them HPC CPUs) perform at less than half the speed of consumer grade CPUs. You'd definitely avoid those, for instance. Also, you can look at the performance in each area and figure out the relative performance expected to your own code. In the end, the most reliable benchmark is always on your own code, of course. Whether Intel compiler has been tuned for SPEC2K is probably an open question. I tried various compilers on our code and found it is also tuned for it :), consistently 10-20% faster than others. This included performance on Opterons, strangely enough. yong _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From help at supercluster.org Tue Feb 10 19:55:22 2004 From: help at supercluster.org (help at supercluster.org) Date: Tue, 10 Feb 2004 17:55:22 -0700 (MST) Subject: [Beowulf] Gentoo for Science and Engineering In-Reply-To: <200402111000.08919.csamuel@vpac.org> Message-ID: Chris, Thanks for the cc. You will probably need to forward this message to beowulf as I don't think we are registered. OpenPBS 2.3.12 was selected because its license did allow anyone to modify/distribute the code for any reason with the only conditions being that the license be included and the original creators acknowledged. To our understanding, changing the license can only be done by the current license holders, ie Altair. The good news is that they are currently considering this as a possibility although we do not know which way they are leaning. As far the Cluster Resources/Supercluster is concerned, our plans are to continue to contribute to this project, developing infrastructure changes as needed, adding scalability, security, usability, and functionality enhancements, and rolling in community patches and enhancements with no intention of creating a commercial/closed product out of it. Let us know if we can be of further assistance. Thanks, Supercluster Development Group On Wed, 11 Feb 2004, Chris Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Wed, 11 Feb 2004 05:01 am, Michael Banck wrote: > > > Thanks for the clarification. Does anybody know whether Torque is > > considered to be conforming to the Open Source Definition[1]? > > > > In the Torque tarball, I was only able to find a 'OpenPBS v2.3 Software > > License', which seems to prohibit commercial distribution, making it > > non-free unfortunately. Is there some other fork of PBS with a true Open > > Source license perhaps? > > My understanding is that they cannot alter the license as they have inherited > that from the original OpenPBS sources, and as they do not hold all the > copyrights to the code it cannot be changed unless Altair can be persuaded. > > My understanding is that the SuperCluster people picked the 2.3.12 version as > a starting point as that was the most recent with the most liberal license > (i.e. others could fork development from it). > > I've CC'd this to the SuperCluster folks so they can comment and correct. > > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQFAKWJ1O2KABBYQAh8RAolwAJ9RKYlBK+HvTq6TI9uTYzjkB/iC4wCfedeU > QwJlxOBwfLiUT7Y543RwiIY= > =xTbA > -----END PGP SIGNATURE----- > -- -------------------------------------------------------- Supercluster Development Group Scheduling and Resource Management of Clusters and Grids Maui Home Page - http://supercluster.org/maui Silver Home Page - http://supercluster.org/silver Documentation - http://supercluster.org/documentation _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bwegner at ekt.tu-darmstadt.de Wed Feb 11 05:02:23 2004 From: bwegner at ekt.tu-darmstadt.de (Bernhard Wegner) Date: Wed, 11 Feb 2004 11:02:23 +0100 Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages Message-ID: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> Hello, I have a really small "cluster" of 4 PC's which are connected by a normal Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board I thought I might be able to improve performance by connecting the machines via a Gigabit switch (which are really cheap nowadays). Everything seemed to work fine. The switch indicates 1000Mbit connections to the PC's and transfer rate for scp-ing large files is significantly higher now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than with the 100 Mbit switch. I wasn't able to actually track down the problem, but it seems that there is a problem with small messages. When I run the performance test provided with mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 byte message length, while for larger messages everything looks fine (linear dependancy of transfer time on message length, everything below 300 us). I have also tried mpich2 which shows exactly the same behavior. Does anyone have any idea? Here are the details of my system: - Suse Linux 9.0 (kernel 2.4.21) - mpich-1.2.5.2 - motherboard ASUS P4P800 - LAN (10/100/1000) on board (3COM 3C940 chipset) - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M + 8x88E1111-BAB, AT89C2051-24PI) -- Mit besten Gr??en -- Best regards, Bernhard Wegner _______________________________________________________ ======================================================= Dipl.-Ing. Bernhard Wegner Fachgebiet Energie- und Kraftwerkstechnik Technische Universit?t Darmstadt Petersenstr. 30 64287 Darmstadt Germany phone: +49-6151-162357 fax: +49-6151-166555 e-mail: bwegner at ekt.tu-darmstadt.de _______________________________________________________ ======================================================= _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From moloned at tcd.ie Wed Feb 11 12:44:59 2004 From: moloned at tcd.ie (david moloney) Date: Wed, 11 Feb 2004 17:44:59 +0000 Subject: [Beowulf] Profiling floating-point performance Message-ID: <402A6A1B.2070805@tcd.ie> I have an application written in C++ which compiles under both MSVC++ 6.0 and gcc 2.9.6 that I would like to profile in terms of floating point performance. My special requirement is that I would like not only peak and average flops numbers but also I would like a histogram of the actual x86 floating point instructions executed and their contribution to those peak and average flops numbers. Can anybody offer advice on how to do this? I tried using Vtune but it didn't seem to have this feature. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 11:44:10 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 11:44:10 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: Message-ID: On Tue, 10 Feb 2004, Dr. Yong Duan wrote: > > On Tue, 10 Feb 2004, [big5] Andrew Wang wrote: > > > Again, no matter how pretty the benchmarks results > > look, in the end we still need to run on the real > > system. So, what's the point of having benchmarks? > > > > Andrew. > > > > A guidelines, I guess. A lot of CPUs (including some rather expensive > ones and often call them HPC CPUs) perform at less than half the speed of > consumer grade CPUs. You'd definitely avoid those, for instance. > Also, you can look at the performance in each area and figure out the > relative performance expected to your own code. In the end, the most > reliable benchmark is always on your own code, of course. A short article this morning, as I'm debugging code and somewhat busy. Before discussing benchmarks in general, one needs to make certain distinctions. There are really two kinds of benchmarks. Maybe even three. Hell, more, but I'm talking broad categories. Let's try these three: * microbenchmarks * comparative benchmarks * application benchmarks Microbenchmarks measure very specific, highly targeted areas of system functionality. By their very nature they are "simple", not complex -- often the pseudocode is as simple as start_timer(); loop lotsatimes{ do_something_simple = dumb*operation; } stop_timer(); compute_speed(); print_result(); (To compute "how fast a multiply occurs"). Simple can also describe atomicity -- benchmarking "a single operation" where the operation might be complex but is a standard unitary building block of complex code. Microbenchmarks are undeniably not only useful, they are essential to anyone who takes systems/cluster/programming engineering seriously. Examples of microbenchmark suites that are in more or less common use are: lmbench (very full featured suite; one infamous user: Linux Torvalds:-) stream (very commonly cited on the list) cpu_rate (not so common -- wraps e.g. stream and other tests so variations with vector size can be explored) rand_rate (almost unknown, but it DOES benchmark all the gsl rands:-) netpipes (measure network speeds) netperf (ditto, but alas no longer maintained) I (and many others) USE these tools (I wrote two of them SO I could use them) to study systems that we are thinking of buying and using for a cluster, to study the kernel and see if the latest change made some critical operation faster or slower, to figure out if the NIC/switch combo we are using is why PVM code is moving like molasses. They are LESS commonly poached by vendors, fortunately - Larry Macvoy has lmbench bristling with anti-vendor-cooking requirements at the license level. The benchmarks are simple, but because one needs a lot of them to get an accurate picture of overall performance they tend to be too complex for typical mindless consumers... Comparative benchmarks are what I think you're really referring to. They aren't completely useless, but they do often become pissing contests (such as the top 500 list) and there are famous stories of Evil by corporations seeking to cook up good results on one or another (sometimes at the expense of overall system balance and performance!). Most of the Evil in these benchmarks arise because people end up using them as a naive basis for purchase decisions. "Ooo, that system has a linpork of 4 Gigacowflops so it must be better than that one which only gets 2.7 Gcf, so I'll buy 250 of them for my next cluster and be able to brag about my 1 Teracowflop supercomputer and make the top third of the top 500 list, which will impress my granting agencies and tenure board, who are just as ignorant as I am about meaningful measures of systems performance..." Never mind that your application is totally non-linpack-like, that the bus performance on the systems you got sucks, and that the 2.7 Gcf systems you rejected cost 1/2 the 4 Gcf systems you got so you could have had 500 at 2.7 Gcf for a net of 1.35 Tcf and balanced memory and bus performance (and run your application faster per dollar) if you'd bothered to do a cost benefit analysis. The bleed of dollars attracts the vendor sharks, who often can rattle off the aggregate specmarks and so forth for their most expensive boxes. However, they CAN be actually useful, if one of the tests in the SPEC suite happens to correspond to your application, if you bother to read all the component results in the SPECmarks, if you bother to check the compiler used and flags and system architecture in some detail to see if they appear cooked (hand tuned or optimized, based on a compiler that is lovely but very expensive and has to be factored into your CBA). Finally, there are application benchmarks. These tend to be "atomic" but at a very high level (an application is generally very complex). These are also subject to the Evil of comparative benchmarks (in fact some of comparative benchmark suites, especially in the WinX world, are a collection of application benchmarks). They also have some evil of their own when the application in question is commercial and not open source -- you have effectively no control over how it was built and tuned for your architecture, for example, and may not even have meaningful version information. However, they are also undeniably useful. Especially when the application being benchmarked is YOUR application and under your complete control. So the answer to your question appears to be: * Microbenchmarks berry berry good. Useful. Essential. Fundamental. * Comparative benchmarks sometimes good. Sometimes a force for Evil. * Application benchmark ideal if it is your application or very similar and under your control. Pissing contests in general are not useful, and even a useful higher level benchmark divorced from an associated CBA is like shopping in a store that has no price tags -- a thing of use only to those so rich that they don't have to ask. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Wed Feb 11 13:55:01 2004 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Wed, 11 Feb 2004 12:55:01 -0600 Subject: [Beowulf] how are people doing this? Message-ID: <20040211185501.GA31590@mikee.ath.cx> I feel that in a proper cluster that the nodes are all (basically) identical. I 'own' a server environment of 20+ servers that are all dedicated to specific applications and this is not a cluster. However, I would like to manage config files (/etc/resolv.conf, etc), user accounts, patches, etc., as I would in a clustered environment. I have read the papers at infrastructures.org and agree with the principles mentioned there. I have looked extensively at cfengine, though I prefer the solution be in PERL as all my servers have PERL already (the manufacturer installs PERL as default on the boxes). How is everyone managing their cluster or what are suggestions on how I can manage my server environment. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 13:08:41 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 13:08:41 -0500 (EST) Subject: [Beowulf] Profiling floating-point performance In-Reply-To: <402A6A1B.2070805@tcd.ie> Message-ID: On Wed, 11 Feb 2004, david moloney wrote: > I have an application written in C++ which compiles under both MSVC++ > 6.0 and gcc 2.9.6 that I would like to profile in terms of floating > point performance. > > My special requirement is that I would like not only peak and average > flops numbers but also I would like a histogram of the actual x86 > floating point instructions executed and their contribution to those > peak and average flops numbers. > > Can anybody offer advice on how to do this? I tried using Vtune but it > didn't seem to have this feature. I'm not sure how accurate it is overall, but see "man gprof" and compile with the -g -p flag. This will give you at least some useful times and so forth. It will NOT give you (AFAIR) "histogram of actual x86 floats etc". I don't know of anything that will -- to get them you have to instrument your code, probably so horribly that a la heisenberg your measurements would bear little resemblance to actual performance (especially if your code wants to be doing all sorts of smooth vector things in cache and register memory and you keep calling instrumentation subroutines to try to measure times that wreck state). Consider that with my best, on CPU, raw assembler based timing clock (using the onboard cycle counter) I still find the overhead of reading that clock to be in the tens of clock cycles. To microtime a single multiply is thus all but impossible -- the clock itself takes 10-40 times as long to execute as a multiply might take, depending on where the data to be multiplied is when one starts. So timing per-instruction is effectively out. Similarly, to instrument and count floating point operations requires something to "watch the machine instructions" as they stream through the CPU. Unfortunately, the only thing available to watch the instructions is the CPU itself, so you have to damn near write an assembler-interpreter to instrument this. Which in turn would be slow as molasses -- an easy 10x slower than the native code in overhead alone plus it would utterly wreck just about every code optimization known to man. Finally, there is the question of "what's a flop". The answer is, not much that's useful or consistent -- the number of floating point operations that a system does per second varies wildly depending in a complex way on system state, cache locality, whether the variable is general or register, whether the instruction is part of a complex/advanced instruction (e.g. add/multiply) or an instruction that has to be done partly in software (divide), whether or not the instruction is part of a stream of vectorized instructions, and more. That's why microbenchmarks are useful. You may not be able to extract meaningful results from your code with a simple tool (although it isn't terribly difficult to instrument major blocks or subroutines with timers and counters, which is more or less with -p and gprof do) but you can learn at least some things about how your system executes core operations in various contexts to learn how to optimize one's code with a good microbenchmark. Just sweeping stream across vector sizes from 1 to 10^8 or so teaches you a whole lot about the system's performance in different contexts, as does doing a stream-like benchmark but working through the vector in a random order (i.e. deliberately defeating any sort of vector optimization and cache benefit). Good luck, rgb > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 13:58:39 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 13:58:39 -0500 (EST) Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211185501.GA31590@mikee.ath.cx> Message-ID: On Wed, 11 Feb 2004, Mike Eggleston wrote: > I feel that in a proper cluster that the nodes are all (basically) > identical. I 'own' a server environment of 20+ servers that are > all dedicated to specific applications and this is not a cluster. > However, I would like to manage config files (/etc/resolv.conf, etc), > user accounts, patches, etc., as I would in a clustered environment. > I have read the papers at infrastructures.org and agree with the > principles mentioned there. I have looked extensively at cfengine, > though I prefer the solution be in PERL as all my servers have > PERL already (the manufacturer installs PERL as default on the boxes). > > How is everyone managing their cluster or what are suggestions > on how I can manage my server environment. Mike, this is nearly a FAQ -- the list archives should have a discussion (one of many) only a few weeks old on this very subject. There are NIS, LDAP, rsync, cfengine, and even yum/rpm-based solutions possible, and more. Oh, and dhcp actually pushes lots of stuff out all by itself these days -- it should handle the stuff in resolv.conf for example, and you should be using dhcp anyway for scalability reasons. rgb > > Mike > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Wed Feb 11 14:44:02 2004 From: bclem at rice.edu (Brent M. Clements) Date: Wed, 11 Feb 2004 13:44:02 -0600 (CST) Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211185501.GA31590@mikee.ath.cx> References: <20040211185501.GA31590@mikee.ath.cx> Message-ID: Mike, we use systemimager, systemconfigurator and a custom utility called "cupdate" to maintain our clusters. In our case it works beautifully and easilly. -Brent Brent Clements Linux Technology Specialist Information Technology Rice University On Wed, 11 Feb 2004, Mike Eggleston wrote: > I feel that in a proper cluster that the nodes are all (basically) > identical. I 'own' a server environment of 20+ servers that are > all dedicated to specific applications and this is not a cluster. > However, I would like to manage config files (/etc/resolv.conf, etc), > user accounts, patches, etc., as I would in a clustered environment. > I have read the papers at infrastructures.org and agree with the > principles mentioned there. I have looked extensively at cfengine, > though I prefer the solution be in PERL as all my servers have > PERL already (the manufacturer installs PERL as default on the boxes). > > How is everyone managing their cluster or what are suggestions > on how I can manage my server environment. > > Mike > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scyld at jasons.us Wed Feb 11 13:01:21 2004 From: scyld at jasons.us (scyld at jasons.us) Date: Wed, 11 Feb 2004 13:01:21 -0500 (EST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> References: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> Message-ID: <20040211125741.A5961@torgo.bigbroncos.org> On Wed, 11 Feb 2004, Bernhard Wegner wrote: > Hello, > > I have a really small "cluster" of 4 PC's which are connected by a normal > Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > I thought I might be able to improve performance by connecting the machines > via a Gigabit switch (which are really cheap nowadays). > > Everything seemed to work fine. The switch indicates 1000Mbit connections to > the PC's and transfer rate for scp-ing large files is significantly higher > now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > with the 100 Mbit switch. Have you tried setting the speed and duplex of the gig NICs to 1000/full on both the system side and switch side? I've found that autonegotiate rarely does especially with 3com gear. I'm guessing, based on its size, your switch isn't managed so you may have to stick to locking it on the systems and watching the behavior to see if the switch gets the negotiation right. (if traffic is bursty you have a speed mismatch and if you get loads of errors it's more likely to be duplex problem) FWIW I have the same mobo at home but haven't hooked it to gigabit yet so I'm quite curious to see how this works out. -Jason ----- Jason K. Schechner - check out www.cauce.org and help ban spam-mail. "All HELL would break loose if time got hacked." - Bill Kearney 02-04-03 ---There is no TRUTH. There is no REALITY. There is no CONSISTENCY.--- ---There are no ABSOLUTE STATEMENTS I'm very probably wrong.--- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Wed Feb 11 15:00:15 2004 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Wed, 11 Feb 2004 14:00:15 -0600 Subject: [Beowulf] how are people doing this? In-Reply-To: References: <20040211185501.GA31590@mikee.ath.cx> Message-ID: <20040211200015.GE31590@mikee.ath.cx> On Wed, 11 Feb 2004, Robert G. Brown wrote: > On Wed, 11 Feb 2004, Mike Eggleston wrote: > > > I feel that in a proper cluster that the nodes are all (basically) > > identical. I 'own' a server environment of 20+ servers that are > > all dedicated to specific applications and this is not a cluster. > > However, I would like to manage config files (/etc/resolv.conf, etc), > > user accounts, patches, etc., as I would in a clustered environment. > > I have read the papers at infrastructures.org and agree with the > > principles mentioned there. I have looked extensively at cfengine, > > though I prefer the solution be in PERL as all my servers have > > PERL already (the manufacturer installs PERL as default on the boxes). > > > > How is everyone managing their cluster or what are suggestions > > on how I can manage my server environment. > > Mike, this is nearly a FAQ -- the list archives should have a discussion > (one of many) only a few weeks old on this very subject. > > There are NIS, LDAP, rsync, cfengine, and even yum/rpm-based solutions > possible, and more. Oh, and dhcp actually pushes lots of stuff out all > by itself these days -- it should handle the stuff in resolv.conf for > example, and you should be using dhcp anyway for scalability reasons. I know it's been discussed and I apologize for asking it again. I've just not found the way that seems to fit with the picture I'm trying to reach. What I'm thinking of doing is writing a perl script that can be placed into CVS. On each server a cron process checks out the current CVS repository of server (AIX) config data and script. Then the perl script starts to check permissions, update resolv.conf, hosts, login, passwd, etc., and to check that specific packages are installed or that the packages need updating. I like a lot of what cfengine did, but I really want a script that can be maintained in CVS. For installing packages I plan for the script to mount an NFS export for pulling the packages. # mkdir /tmp/nfs.$$ # mount admin:/opt/packages /tmp/nfs.$$ # installp -d /tmp/nfs.$$ package # umount /tmp/nfs.$$ # rmdir /tmp/nfs.$$ For the account management I'm thinking of something on my admin server that pulls LDAP (M$ ADS) at some frequency (30-60 min) updating a local file with new users and their passwords. Then this file is checked into CVS for distribution to other nodes/servers. Using another file to list the users that are authorized access to the local node/server keeps my user-space to a minimum. Is that any more clear what I'm trying to do? I don't have a cluster, but I want to manage all nodes as identically as I can. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 14:35:13 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 14:35:13 -0500 (EST) Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211200015.GE31590@mikee.ath.cx> Message-ID: On Wed, 11 Feb 2004, Mike Eggleston wrote: > I know it's been discussed and I apologize for asking it again. I've > just not found the way that seems to fit with the picture I'm trying > to reach. What I'm thinking of doing is writing a perl script that > can be placed into CVS. On each server a cron process checks out the > current CVS repository of server (AIX) config data and script. Then > the perl script starts to check permissions, update resolv.conf, hosts, > login, passwd, etc., and to check that specific packages are installed > or that the packages need updating. I like a lot of what cfengine > did, but I really want a script that can be maintained in CVS. You might look into yum. You'd have to learn python, but yum already does most of what you want for rpm packages and could likely be hacked. In fact, yum would do what you want for all the config files if you roll them into an rpm package right now -- it already has precisely what it needs to install and update according to a revision number. You can run yum update as often as you wish. It will run from NFS and can be secured a variety of ways. rgb > > For installing packages I plan for the script to mount an NFS export > for pulling the packages. > > # mkdir /tmp/nfs.$$ > # mount admin:/opt/packages /tmp/nfs.$$ > # installp -d /tmp/nfs.$$ package > # umount /tmp/nfs.$$ > # rmdir /tmp/nfs.$$ > > For the account management I'm thinking of something on my admin > server that pulls LDAP (M$ ADS) at some frequency (30-60 min) updating > a local file with new users and their passwords. Then this file > is checked into CVS for distribution to other nodes/servers. Using > another file to list the users that are authorized access to the > local node/server keeps my user-space to a minimum. > > Is that any more clear what I'm trying to do? I don't have a cluster, > but I want to manage all nodes as identically as I can. > > Mike > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From canon at nersc.gov Wed Feb 11 17:05:26 2004 From: canon at nersc.gov (canon at nersc.gov) Date: Wed, 11 Feb 2004 14:05:26 -0800 Subject: [Beowulf] Profiling floating-point performance In-Reply-To: Message from david moloney of "Wed, 11 Feb 2004 17:44:59 GMT." <402A6A1B.2070805@tcd.ie> Message-ID: <200402112205.i1BM5QwA011397@pookie.nersc.gov> David, You may want to look into PAPI and perfctr. It allows you query the performance counters built into most processors. --Shane ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 PSDF Project Lead fax: 510-486-7520 National Energy Research Scientific Computing Center 1 Cyclotron Road Mailstop 943-256 Berkeley, CA 94720 canon at nersc.gov ------------------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 17:13:32 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 17:13:32 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4C0CD@orsmsx402.jf.intel.com> Message-ID: On Wed, 11 Feb 2004, Lombard, David N wrote: > > They also have some evil of > > their own when the application in question is commercial and not open > > source -- you have effectively no control over how it was built and > > tuned for your architecture, for example, and may not even have > > meaningful version information. > > Let's be fair here. An ISV application is not the definition of evil. I did not mean to imply that they were wholly evil or even evil in intent. > Clearly, "you have effectively no control over how an application was > built and tuned for your architecture" has no direct correspondence to > performance. I would have to respectfully and vehemently disagree. It has all sorts of direct correspondances. Let us make a short tally of ways that a closed source, binary only application used as a benchmark can mislead me with regard to the performance of a system. * I don't control the compiler choice. Your compiler and mine might result in me getting a very different performance even if your application "resembles" mine (AFAICT given that I cannot read the source). * I don't control the libraries. Your application is (probably) static linked in various places and might even use private libraries that are hand-optimized. My application would likely be linked dynamically with completely different libraries. Your libraries might be out of date. My libraries might be out of date. * I don't have any way of knowing whether your "canned" (say) Monte Carlo benchmark is relevant to my Monte Carlo application. Maybe your code is structured to be strictly vectorized and local, but mine requires random site access. Yours might be CPU bound. Mine might be memory bound. Since I can't see the source, I'll never know. * I have to pay money for the application to use as a benchmark before I even look at hardware. If I'm an honest soul, I probably have to buy a separate license for every platform I plan to test even before I buy the test platform OR run afoul of the Dumb Mutha Copyright Act (aka known as the "Intellectual Straightjacket Act"). Or maybe I can rely on vendor reports of the results. This adds costs to the engineering process. * Even leaving side the additional costs, there is the issue of whether the application I'm using is tuned for the hardware I'm running on. strict i386 code will not run as fast as strict i586 code will not run as fast as i686 code will not run optimally on an Athlon will not run optimally on an Opteron. Yet the Opteron will likely RUN i386 code. I just won't know whether the result is at all relevant to how the Opteron runs Opteron code. (These effects are not necessarily small.) * And if I thought about it hard, I could likely come up with a few more negatives...such as the entire raft of reasons that closed source software is a Bad Thing to encourage on general principles. The principles built right into the original beowulf mission statement (which IIRC has a very clear open source requirement for engineering reasons). The point being that while closed source commercial applications don't necessarily make "evil" benchmarks in the sense that there is any intent to hide or alter performance characteristics of a given architecture, they add a number of sources of noise to an already arcane and uncertain process. They are less reliable, more likely to mislead you (quite possibly through nobody's fault or intention), less likely to accurately predict the performance of the architecture on your application suite. And they are ultimately black boxes that you have to pay people to use. I personally am a strong proponent (in case you can't tell:-) of open source (ideally GPL) software and tools, ESPECIALLY for benchmarking. I even tried to talk Larry McVoy into GPL-ing lmbench back when it had a fairly curmudgeonly license, even though it the source itself was open enough. Note, BTW, that all of the observations above are irrelevant if the application being used as a benchmark is the application you intend to use in the form you intend to use it, purchased or not. As in: > > However, they are also undeniably useful. Especially when the > > application being benchmarked is YOUR application and under your > > complete control. > > Regardless of ownership or control, they're especially useful when > you're looking at an application being used in the way you intend on > using it. Many industrial users buy systems to run a specific list of > ISV applications. In this instance, the application benchmark can be > the most valid benchmark, as it can model the system in the way it will > be used -- and that's the most important issue. Sure. Absolutely. I'd even say that your application(s) is(are) ALWAYS the best benchmark for many or even most purposes, with the minor caveat that the microbenchmarks have a slightly different purpose and are best for the purpose for which they are intended. I doubt that Linus runs a scripted set of userspace Gnome applications to test the performance of kernel subsystems... > I'm not disagreeing with your message. I too try to make sure that > people use the right benchmarks for the right purpose; I've seen way too > many people jump to absurd conclusions based on a single data point or > completely unrelated information. I'm just trying to sharpen your > message by pointing out some too broad brush strokes... > > Well, maybe I don't put as much faith in micro benchmarks unless in the > hands of a skilled interpreter, such as yourself. My preference is for > whatever benchmarks most closely describe your use of the system. Microbenchmarks are not intended to be predictors of performance in macro-applications, although a suite of results such as lmbench can give an expert a surprisingly accurate idea of what to expect there. They are more to help you understand systems performance in certain atomic operations that are important components of many applications. A networking benchmark can easily reveal problems with your network, for example, that help you understand why this application which ran just peachy keen at one scale as a "benchmark" suddenly turns into a pig at another scale. A good CPU/memory benchmark can do the same thing wrt the memory subsystem. This is yet another major problem with an naive application benchmark or comparative benchmark (and even with microbenchmarks) -- they are OFTEN run at a single scale or with a single set of parameters. On system A, that scale might be one that lets the application remain L2-local. On system B it might not be. You might then conclude that B is much slower. On the scale that you intend to run it, both might be L2-local or both might be running out of memory. B might have a faster processor, or a better overall balance of performance and might actually be faster at that scale. I don't put much faith in benchmarks, period. With the exception of your application(s), of course. Faith isn't the point -- they are just rulers, stopwatches, measuring tools. Some of them measure "leagues per candle", or "furlongs per semester" and aren't terribly useful. Others are just what you need to make sense of a system. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nixon at nsc.liu.se Wed Feb 11 16:26:44 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed, 11 Feb 2004 22:26:44 +0100 Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211200015.GE31590@mikee.ath.cx> (Mike Eggleston's message of "Wed, 11 Feb 2004 14:00:15 -0600") References: <20040211185501.GA31590@mikee.ath.cx> <20040211200015.GE31590@mikee.ath.cx> Message-ID: Mike Eggleston writes: > I know it's been discussed and I apologize for asking it again. I've > just not found the way that seems to fit with the picture I'm trying > to reach. What I'm thinking of doing is writing a perl script that > can be placed into CVS. On each server a cron process checks out the > current CVS repository of server (AIX) config data and script. Then > the perl script starts to check permissions, update resolv.conf, hosts, > login, passwd, etc., and to check that specific packages are installed > or that the packages need updating. I like a lot of what cfengine > did, but I really want a script that can be maintained in CVS. Well, if it comes to that, surely you can place cfengine's configuration files in CVS and let cron run a script that updates the config files from CVS and then launches cfengine? You don't have to run cfd, you know; you can start cfengine any way you want. I'd really think twice before starting to re-implement cfengine's existing functionality. cfengine helped me keep my sanity in an earlier life while single-handedly adminning a heterogenous Unix environment ranging from SunOS 4.1.3_u1 through Solaris 7, diverse Tru64:s and a hodge-podge of Linuxen. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Wed Feb 11 16:58:48 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Wed, 11 Feb 2004 13:58:48 -0800 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4C0CD@orsmsx402.jf.intel.com> From: Robert G. Brown; Wednesday, February 11, 2004 8:44 AM [deletia] > > Finally, there are application benchmarks. These tend to be "atomic" > but at a very high level (an application is generally very complex). > These are also subject to the Evil of comparative benchmarks (in fact > some of comparative benchmark suites, especially in the WinX world, are > a collection of application benchmarks). True. I cringe to think how many systems were bought for scientific and technical computations based on UT2003 "benchmarks". > They also have some evil of > their own when the application in question is commercial and not open > source -- you have effectively no control over how it was built and > tuned for your architecture, for example, and may not even have > meaningful version information. Let's be fair here. An ISV application is not the definition of evil. Clearly, "you have effectively no control over how an application was built and tuned for your architecture" has no direct correspondence to performance. Having been on the ISV side of the fence, and spent a tremendous amount of energy making sure that each port of the application performed as well as it could, I'm quite confident in saying we generally succeeded in maximizing performance. Realize that we had day after day to spend on performance, usually with the attention of one or more experts from the platform vendor at our beck and call -- and those experts would spend even more time on even more narrow aspects of performance. Having said that, there are some notable ISV applications that simply do not perform as well as they should. This can occur for a host of reasons, such as they, did not care, didn't know how, could/would not to make the effort, didn't have the time, were ignored by the vendor, &etc -- basically the very same reasons that some people who don't work for ISVs fail to make their own applications perform as well as they could. > However, they are also undeniably useful. Especially when the > application being benchmarked is YOUR application and under your > complete control. Regardless of ownership or control, they're especially useful when you're looking at an application being used in the way you intend on using it. Many industrial users buy systems to run a specific list of ISV applications. In this instance, the application benchmark can be the most valid benchmark, as it can model the system in the way it will be used -- and that's the most important issue. I'm not disagreeing with your message. I too try to make sure that people use the right benchmarks for the right purpose; I've seen way too many people jump to absurd conclusions based on a single data point or completely unrelated information. I'm just trying to sharpen your message by pointing out some too broad brush strokes... Well, maybe I don't put as much faith in micro benchmarks unless in the hands of a skilled interpreter, such as yourself. My preference is for whatever benchmarks most closely describe your use of the system. -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Wed Feb 11 18:16:22 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Wed, 11 Feb 2004 15:16:22 -0800 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) Message-ID: <187D3A7CAB42A54DB61F1D05F012572201D4C0D2@orsmsx402.jf.intel.com> From: Robert G. Brown; Wednesday, February 11, 2004 2:14 PM > On Wed, 11 Feb 2004, Lombard, David N wrote: > > > > They also have some evil of > > > their own when the application in question is commercial and not open > > > source -- you have effectively no control over how it was built and > > > tuned for your architecture, for example, and may not even have > > > meaningful version information. > > > > Let's be fair here. An ISV application is not the definition of evil. > > I did not mean to imply that they were wholly evil or even evil in > intent. > > > Clearly, "you have effectively no control over how an application was > > built and tuned for your architecture" has no direct correspondence to > > performance. > > I would have to respectfully and vehemently disagree. It has all sorts > of direct correspondances. Let us make a short tally of ways that a > closed source, binary only application used as a benchmark can mislead > me with regard to the performance of a system. > [deletia] > > Note, BTW, that all of the observations above are irrelevant if the > application being used as a benchmark is the application you intend to > use in the form you intend to use it, purchased or not. OK. So there's our difference. I only consider an application benchmark useful in this scenario. I can't imagine using an application benchmark of any sort if it isn't; you enumerated all the reasons for this in the bits I just snipped. We agree completely on this. -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Feb 11 18:07:18 2004 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 12 Feb 2004 10:07:18 +1100 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: References: Message-ID: <200402121007.30002.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 12 Feb 2004 09:13 am, Robert G. Brown wrote: > * Even leaving side the additional costs, there is the issue of > whether the application I'm using is tuned for the hardware I'm running > on. Such as ISV's including IA32 executables as part of their IA64 version. It wasn't all IA32, just bits. Very odd. We only spotted it when it failed to work on Rocks 3.1.0, which doesn't supply the IA32 compatability libraries (which Rocks 3.0.0 did). No, I'm not going to name names, but the "file" and "ldd" are your friends. cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKrWmO2KABBYQAh8RAu9JAJ41djUEj+6zEZYrY9IuPG4E9s9qugCeKhJd 2pf/pnDftPMs0zCLYb7IaRM= =t/c6 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 18:34:06 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 18:34:06 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <200402121007.30002.csamuel@vpac.org> Message-ID: On Thu, 12 Feb 2004, Chris Samuel wrote: > No, I'm not going to name names, but the "file" and "ldd" are your friends. ...and with that, I'm going to quit for the day and take my nameless friends out for a beer somewhere... (Sorry, revenge for the lies, damned lies...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Feb 11 17:23:12 2004 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 12 Feb 2004 09:23:12 +1100 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: References: Message-ID: <200402120923.19328.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 12 Feb 2004 03:44 am, Robert G. Brown wrote: > There are really two kinds of benchmarks. Maybe even > three. Lies, damn lies and statistics ? - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAKqtTO2KABBYQAh8RAg+TAJ4uLkrC7zOUDlK8OYVxBuwKY/GXuQCeJFvj vd9nT5nkEuUY/3Myv0IROaU= =8pIh -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Feb 11 18:32:28 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 11 Feb 2004 18:32:28 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <187D3A7CAB42A54DB61F1D05F012572201D4C0D2@orsmsx402.jf.intel.com> Message-ID: On Wed, 11 Feb 2004, Lombard, David N wrote: > OK. So there's our difference. I only consider an application benchmark > useful in this scenario. I can't imagine using an application benchmark > of any sort if it isn't; you enumerated all the reasons for this in the > bits I just snipped. > > We agree completely on this. I figured that we did -- I'm getting verbose on it because I think it is an important issue to be precise on. "What's a FLOP?" is a perfectly reasonable question with a perfectly unintelligible and meaningless answer, in spite of it being cited again and again over decades to sell systems. At the same time, benchmarks are certainly useful. I think the confusion is probably my fault -- my age/history showing again. I can remember fairly clearly when awk was cited as a benchmark. Quake too, and not for people who were USING awk or necessarily going to play quake. This is what I meant by an "application benchmark" -- some sort of application that somebody thinks is a good measure of general systems performance and manage to get people to take seriously. Stuff like this is still fairly commonly used in many WinXX "benchmarks" that you'll see "published" both on the web and in real paper magazine articles. How fast can Excel update a spreadsheet that computes lunar orbital trajectories, that sort of thing. Sometimes they are almost a joke -- applications that do a lot of disk I/O (apparently, who knows) are used as a "disk performance benchmark". I won't even get started on this sort of thing and the number of variables left completely uncontrolled (for example, the disk caching subsystems both hardware and software) compared to, say, bonnie or lmbench. I also won't comment on just how much crap there is out there with stuff like this in it, sometimes from supposedly "reputable" testing companies that ought to know better or be more honest. That's why I "trust" GPL/Open microbenchmarks the most, because I can look at their sources, understand just what they are doing and how it compares to what I want to do, maybe even hack them if I need to because it isn't QUITE right, and get numbers with some meaning. Stuff like SPEC and linpack (where linpack should probably be considered micro) isn't horrible but (in the case of SPEC) isn't GPL or terribly straightforward to understand microscopically or macroscopically -- it takes experience to know how the profile it generates compares to features in your own code. Great for sales-speak, though -- "Our system gets 2301.124 specoloids/second, while THEIR system is a laughable 1721.564." Quake isn't a useful benchmark -- it is a game, and one that generally runs as fast as it needs to whereever it runs...but it is a GREAT benchmark for how a system plays quake:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Wed Feb 11 19:31:43 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed, 11 Feb 2004 19:31:43 -0500 (EST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> Message-ID: On Wed, 11 Feb 2004, Bernhard Wegner wrote: > Hello, > > I have a really small "cluster" of 4 PC's which are connected by a normal > Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > I thought I might be able to improve performance by connecting the machines > via a Gigabit switch (which are really cheap nowadays). > > Everything seemed to work fine. The switch indicates 1000Mbit connections to > the PC's and transfer rate for scp-ing large files is significantly higher > now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > with the 100 Mbit switch. > > I wasn't able to actually track down the problem, but it seems that there is > a problem with small messages. When I run the performance test provided with > mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > byte message length, while for larger messages everything looks fine (linear > dependancy of transfer time on message length, everything below 300 us). I > have also tried mpich2 which shows exactly the same behavior. > > Does anyone have any idea? First, I assume you were running the 100BT through the same onboard NICs and got reasonable performance. So some possible things: - the switch is a dog or it is broken - your cables may be old or bad (but worked fine for 100BT) - negotiation problem Some things to try: Use a cross over cable (cat5e) and see if you get the same problem. You might try using a lower level benchmark (of the micro variety) like netperf and netpipe. The Beowulf Performance Suite: http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 has these tests. Also, the December and January issues of ClusterWorld show how to test a network connection using netpipe. At some point this content will be showing up on the web-page. Also, the MPI Link-checker from Microway (www.microway.com) http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 May help. Doug > > Here are the details of my system: > - Suse Linux 9.0 (kernel 2.4.21) > - mpich-1.2.5.2 > - motherboard ASUS P4P800 > - LAN (10/100/1000) on board (3COM 3C940 chipset) > - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M + > 8x88E1111-BAB, AT89C2051-24PI) > > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Wed Feb 11 20:19:26 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 11 Feb 2004 20:19:26 -0500 Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <200402121007.30002.csamuel@vpac.org> References: <200402121007.30002.csamuel@vpac.org> Message-ID: <1076548766.3950.91.camel@protein.scalableinformatics.com> On Wed, 2004-02-11 at 18:07, Chris Samuel wrote: > No, I'm not going to name names, but the "file" and "ldd" are your friends. ... and strace. Amazing how useful that one is. -- Joe Landman Scalable Informatics LLC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Thu Feb 12 03:44:12 2004 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Thu, 12 Feb 2004 09:44:12 +0100 Subject: [Beowulf] Profiling floating-point performance In-Reply-To: <402A6A1B.2070805@tcd.ie> References: <402A6A1B.2070805@tcd.ie> Message-ID: <200402120944.12719.joachim@ccrl-nece.de> david moloney: > Can anybody offer advice on how to do this? I tried using Vtune but it > didn't seem to have this feature. Try PAPI: http://icl.cs.utk.edu/papi/ It offers you all information that the CPU has to offer for this. It depends on you how to gather them. However, for an instruction-level histogramm, a simulator will probaby be more useful. And you should think about if you *really* need this - if the information you get is worth the effort. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Feb 12 03:31:21 2004 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 12 Feb 2004 09:31:21 +0100 (CET) Subject: [Beowulf] Profiling floating-point performance In-Reply-To: <402A6A1B.2070805@tcd.ie> Message-ID: On Wed, 11 Feb 2004, david moloney wrote: > > My special requirement is that I would like not only peak and average > flops numbers but also I would like a histogram of the actual x86 > floating point instructions executed and their contribution to those > peak and average flops numbers. > > Can anybody offer advice on how to do this? I tried using Vtune but it > didn't seem to have this feature. > Can't help directly, but you could look at Oprofile http://oprofile.sourceforge.net/about/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Feb 12 03:17:29 2004 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 12 Feb 2004 09:17:29 +0100 (CET) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <2A1D2986D33@hrz4.hrz.tu-darmstadt.de> Message-ID: On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > Does anyone have any idea? > > Here are the details of my system: > - Suse Linux 9.0 (kernel 2.4.21) > - mpich-1.2.5.2 > - motherboard ASUS P4P800 > - LAN (10/100/1000) on board (3COM 3C940 chipset) > - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M + > 8x88E1111-BAB, AT89C2051-24PI) You might look at the P4_GLOBMEMSIZE parameter in the MPI job. export P4_GLOBMEMSIZE=20194344 (say) Try stepping through various values for this parameter, and run the Pallas benchmark. Let us know what the results are! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Feb 12 03:24:10 2004 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 12 Feb 2004 09:24:10 +0100 (CET) Subject: [Beowulf] how are people doing this? In-Reply-To: <20040211185501.GA31590@mikee.ath.cx> Message-ID: On Wed, 11 Feb 2004, Mike Eggleston wrote: > I feel that in a proper cluster that the nodes are all (basically) > identical. I 'own' a server environment of 20+ servers that are > all dedicated to specific applications and this is not a cluster. > However, I would like to manage config files (/etc/resolv.conf, etc), > user accounts, patches, etc., as I would in a clustered environment. > I have read the papers at infrastructures.org and agree with the > principles mentioned there. I have looked extensively at cfengine, > though I prefer the solution be in PERL as all my servers have > PERL already (the manufacturer installs PERL as default on the boxes). Alternatives you might look at are: LCFG http://www.lcfg.org/ The European Datagrid people have the Quattor project http://quattor.org/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikhailberis at free.net.ph Thu Feb 12 05:37:54 2004 From: mikhailberis at free.net.ph (Dean Michael C. Berris) Date: 12 Feb 2004 18:37:54 +0800 Subject: [Beowulf] Master-Slave Problems Message-ID: <1076582124.5002.20.camel@mikhail> Good day everyone, I've just finished implementing and testing a master-slave prime number finder as a test problem for my thesis on heterogeneous cluster load balancing for parallel applications. Test results show anomalies which may be tied to work chunk size allocations to the slaves, but to test whether it will hold true for other applications and is not directly tied to the parallel prime number finder, I am in need of other problems that may be solved using a master-slave architecture. Sure it is easy to come up with just any problem and implement a solution in a master-slave model, but I'm looking for computationally intensive problems wherein the computation necessary for parts of the problem are not equal. What I mean by this is similar to the case of the parallel number finder, seeing whether 11 is prime requires less computation compared to seeing whether 9999991 is prime. Any insights or pointers to documentations or papers that have had similar problems are most welcome. TIA PS: Are ther any cluster admins there willing to spare some cycles and cluster time for a cluster needy BS Undergraduate student in the Philippines? :D -- Dean Michael C. Berris http://mikhailberis.blogspot.com mikhailberis at free.net.ph +63 919 8720686 GPG 08AE6EAC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From meetsunil80x86 at yahoo.co.in Thu Feb 12 06:58:41 2004 From: meetsunil80x86 at yahoo.co.in (=?iso-8859-1?q?sunil=20kumar?=) Date: Thu, 12 Feb 2004 11:58:41 +0000 (GMT) Subject: [Beowulf] Math Coprocessor Message-ID: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com> Hello everybody, I am a newbie in the Linux world.I would like to know know to... 1) program the 80x87 using C/C++/Fortran95 in linux platform. 2) program the 80x86 using C/C++/Fortran95 in linux platform. 3) link a C function into a fortran95 program or vice versa. Thanks in advance, sunil ________________________________________________________________________ Yahoo! India Education Special: Study in the UK now. Go to http://in.specials.yahoo.com/index1.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Feb 12 09:25:30 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Thu, 12 Feb 2004 22:25:30 +0800 (CST) Subject: [Beowulf] IA64 & AMD64 binary SPBS and SGE download Message-ID: <20040212142530.32328.qmail@web16807.mail.tpe.yahoo.com> Just FYI only. AMD64 binary from offical GridEngine homepage: http://gridengine.sunsource.net/project/gridengine/download.html (IA64 is supported but you need to build from source) IA64 and AMD64 binary rpm for Torque: http://www-user.tu-chemnitz.de/~kapet/torque/ Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Feb 12 09:18:31 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 12 Feb 2004 09:18:31 -0500 (EST) Subject: [Beowulf] Master-Slave Problems In-Reply-To: <1076582124.5002.20.camel@mikhail> Message-ID: > Sure it is easy to come up with just any problem and implement a > solution in a master-slave model, but I'm looking for computationally > intensive problems wherein the computation necessary for parts of the > problem are not equal. What I mean by this is similar to the case of the > parallel number finder, seeing whether 11 is prime requires less > computation compared to seeing whether 9999991 is prime. an easy if hackneyed one is a mandelbrot-family fractal zoomer. depending on what chunk of the space you look at, I'd guess you could find pretty much any distribution of work-per-point. if your master-slave model does smart domain decomp, this might be just the thing. true, some people will roll their eyes when they find out you're doing fractals. I certainly did, when someone here used them. but they do have nice properties, and nice pictures always help ;) > PS: Are ther any cluster admins there willing to spare some cycles and > cluster time for a cluster needy BS Undergraduate student in the > Philippines? :D send me some email. regards, mark hahn. hahn at sharcnet.ca _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 12 09:35:33 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 12 Feb 2004 09:35:33 -0500 (EST) Subject: [Beowulf] Master-Slave Problems In-Reply-To: <1076582124.5002.20.camel@mikhail> Message-ID: On 12 Feb 2004, Dean Michael C. Berris wrote: > Good day everyone, > > I've just finished implementing and testing a master-slave prime number > finder as a test problem for my thesis on heterogeneous cluster load > balancing for parallel applications. Test results show anomalies which > may be tied to work chunk size allocations to the slaves, but to test > whether it will hold true for other applications and is not directly > tied to the parallel prime number finder, I am in need of other problems > that may be solved using a master-slave architecture. > > Sure it is easy to come up with just any problem and implement a > solution in a master-slave model, but I'm looking for computationally > intensive problems wherein the computation necessary for parts of the > problem are not equal. What I mean by this is similar to the case of the > parallel number finder, seeing whether 11 is prime requires less > computation compared to seeing whether 9999991 is prime. > > Any insights or pointers to documentations or papers that have had > similar problems are most welcome. Two remarks. One, lots of problems (e.g. descent into a Mandelbrot set) have widely variable compute times for chunks of work divvied out in a master/slave model with very short and uniform messages distributing the work. Two, why not just simulate work? You're studying something in computer science, not trying to compute prime numbers or random numbers or mandelbrot sets or julia sets. Set up your master to distribute times for slaves to sleep and then reply. Select the times to distribute from the distribution (random or otherwise) of your choice, and scale a return "results" packet accordingly. This yields you complete control over the statistics of the "work" distribution and network load and lets you explore distributions that you might not easily find in the real world. It also lets you CONNECT the results of your simulations with "known" distributions to the results you obtain with real problems, which may help you identify or even categorically classify problems in terms of work-load complexity. This would doubtless make your thesis still more powerful. This is what I've been doing in my Cluster World column -- simulating work (or nearly so) with a trivial master-slave computation of random numbers (the return) accompanied by an adjustable "sleep time" that permits me to effectively sweep the granularity of the computation to demonstrate at least simple Amdahlian scaling properties of this sort of computation. In fact, I can likely give you a PVM program to do this that could easily be hacked into precisely what you'd need to implement this with little effort (a few days, INCLUDING learning how to generate distributions with e.g. the GSL). Let me know. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Thu Feb 12 10:25:03 2004 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Thu, 12 Feb 2004 10:25:03 -0500 Subject: [Beowulf] Virginia Tech upgrade In-Reply-To: References: Message-ID: <402B9ACF.2040502@lmco.com> In case anyone hasn't read slashdot in the last few hours, http://apple.slashdot.org/apple/04/02/12/0613255.shtml?tid=107&tid=126&tid=181&tid=187 Now, everyone face Doug's house and say, "Doug is always right. Doug is always right" :) Jeff > > The first thought I had was "what will they do with all the old systems?" > > Then it hit me. They put a fancy sticker on each box that says > "This machine was part of the third fastest supercomputer on the planet > Nov. 2003" or something similar. Also put a serial number on the tag and > provide a "certificate of authenticity" from VT. My guess is they can > make > a little bit on the whole deal. I wager they would sell rather quickly. > Alumni eat this kind of thing up. > > For those interested, my old www.cluster-rant.com site has morphed into > the new www.clusterworld.com site. You can check out issue contents, > submit stories, check out the polls, and rant about clusters. > > Doug > > > -- > ---------------------------------------------------------------- > Editor-in-chief ClusterWorld Magazine > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 12 10:04:29 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 12 Feb 2004 10:04:29 -0500 (EST) Subject: [Beowulf] Math Coprocessor In-Reply-To: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com> Message-ID: On Thu, 12 Feb 2004, sunil kumar wrote: > > > Hello everybody, > I am a newbie in the Linux world.I would like to > know > know to... > 1) program the 80x87 using C/C++/Fortran95 in linux Why? As one of the relatively few humans on the planet to ever actually write 8087 code (back when it was the ONLY way to use the coprocessor with the various compilers available at the time) I can authoritatively say that it isn't horribly difficult -- the x87 is sort of a RPN HPC calculator for your PC with its own stack and internal floating point commands -- but all the compilers available already use it when they can and it is appropriate, and in MANY cases their code will be as or more efficient and robust than what you could hand code. There are doubtless exceptions, but are they worth the considerable amount of work required to realize them? Are you planning to join the GCC project or something? > platform. > 2) program the 80x86 using C/C++/Fortran95 in linux > platform. This is straightfoward. But I'm not going to explain inlining of assembler here (I can give you an example/code fragment of inlined code if you want it, though). Instead... ...Google is your friend. Try e.g. "86 assembler reference gnu" http://linux.maruhn.com/cat/Development/Languages.html http://www.redhat.com/docs/manuals/enterprise/ RHEL-3-Manual/pdf/rhel-as-en.pdf http://www.linuxgazette.com/issue94/ramankutty.html or "gnu assembler manual" http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_toc.html ... > 3) link a C function into a fortran95 program or > vice or "gnu fortran manual" http://gcc.gnu.org/onlinedocs/g77/ (and that's just the beginning!) Try other search strings. Consider buying a book or two if you're unfamiliar with assembler altogether -- I don't think it is taught much anymore in CPS departments unless you are a really serious major and select the right courses. And they still have somebody who can teach them -- one thing about upper level languages is that they make assembler level programming so difficult by comparison that it has become a vanishing and highly arcane art. Well, not really vanishing, but I'll bet that no more than 10% of all programmers have a clue about what registers are and how to manipulate them with assembler commands...maybe more like 1-2%. And mostly Old Guys at that. And the serious, I mean really serious, programmers and hackers. Basically, all of this is throroughly documented ag gnu.org, and much of it is REdocumented, explained, tutorialized, and hashed over many times many other places, all on the web. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 12 10:28:37 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 12 Feb 2004 10:28:37 -0500 (EST) Subject: [Beowulf] Math Coprocessor In-Reply-To: <20040212115841.30858.qmail@web8307.mail.in.yahoo.com> Message-ID: On Thu, 12 Feb 2004, sunil kumar wrote: > > > Hello everybody, > I am a newbie in the Linux world.I would like to > know > know to... > 1) program the 80x87 using C/C++/Fortran95 in linux > > platform. > 2) program the 80x86 using C/C++/Fortran95 in linux > platform. > 3) link a C function into a fortran95 program or > vice One last reference: man as86 (it even has a list of the supported x86 and x87 instructions at the bottom, although it does NOT teach you to program in assembler in the first place). rgb > versa. > > Thanks in advance, > sunil > > ________________________________________________________________________ > Yahoo! India Education Special: Study in the UK now. > Go to http://in.specials.yahoo.com/index1.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Feb 12 09:12:31 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 12 Feb 2004 09:12:31 -0500 (EST) Subject: [Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?) In-Reply-To: <1076548766.3950.91.camel@protein.scalableinformatics.com> Message-ID: > ... and strace. Amazing how useful that one is. true, but I've also fallen in love with ltrace, which does both syscalls and lib calls. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ao8215 at wayne.edu Thu Feb 12 08:13:48 2004 From: ao8215 at wayne.edu (Robson Pablo Sobradiel Peguin) Date: Thu, 12 Feb 2004 08:13:48 -0500 Subject: [Beowulf] Message Error Message-ID: <813143f0.10fc5818.81a9100@mirapointms3.wayne.edu> Hi I would like to know the meanings of these errors during the compilation with MPICH in the cluster: [root at master source]# make beowulf-WSU-INTEL cp /usr/local/mpich/mpich-1.2.5_intel/include/mpif.h mpif.h make FC=/usr/local/mpich/mpich-1.2.5_intel/bin/mpif90 FFLAGS="-O3 -tpp7 -xW -axW -c"\ CPFLAGS="-DSTRESS -D'POINTER=integer'" \ make LD="/usr/local/mpich/mpich-1.2.5_intel/bin/mpif90 -tpp7 -xW -axW -o" \ FFLAGS="-O3 -tpp7 -xW -axW -c" \ CPFLAGS="-DSTRESS -DMPI -D'POINTER=integer'" \ EX=DLPOLY.MBE BINROOT=../execute 3pt make[1]: Entering directory `/home/sdr/DL_POLY/dl_poly_2.13/source' make[1]: *** No rule to make target `make'. Stop. make[1]: Leaving directory `/home/sdr/DL_POLY/dl_poly_2.13/source' make: *** [beowulf-WSU-INTEL] Error 2 Thank you very much ________________________________________________________ Robson P. S. Peguin, Graduate Student Wayne State University Department of Chemical Engineering and Materials Science 4815 Fourth Street, 2015 MBE,Detroit - MI 48201 phone: (313)577-1416 fax: (313)577-3810 e-mail: robson_peguin at wayne.edu http://chem1.eng.wayne.edu/~sdr/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Wed Feb 11 23:13:12 2004 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Wed, 11 Feb 2004 22:13:12 -0600 Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: References: Message-ID: <402AFD58.9060402@tamu.edu> Realize that not all switches are created equal when working with small (and, overall, 0-byte == small) packets. A number of otherwise decent network switches are less than stellar performers with small packets. We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test system running under the RFC-2544 testing suite... There are switches that perform well with small packets, but it's been our experience that most switches, especially your lower cost switches (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some others I can't recall right now) didn't perform well with smaller packets but did fine when the packet size was about 1500 bytes. Going with cheap switches is usually not a good way to improve performance. gerry Douglas Eadline, Cluster World Magazine wrote: > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > >>Hello, >> >>I have a really small "cluster" of 4 PC's which are connected by a normal >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board >>I thought I might be able to improve performance by connecting the machines >>via a Gigabit switch (which are really cheap nowadays). >> >>Everything seemed to work fine. The switch indicates 1000Mbit connections to >>the PC's and transfer rate for scp-ing large files is significantly higher >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than >>with the 100 Mbit switch. >> >>I wasn't able to actually track down the problem, but it seems that there is >>a problem with small messages. When I run the performance test provided with >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 >>byte message length, while for larger messages everything looks fine (linear >>dependancy of transfer time on message length, everything below 300 us). I >>have also tried mpich2 which shows exactly the same behavior. >> >>Does anyone have any idea? > > > First, I assume you were running the 100BT through the same > onboard NICs and got reasonable performance. So some possible > things: > > - the switch is a dog or it is broken > - your cables may be old or bad (but worked fine for 100BT) > - negotiation problem > > Some things to try: > > Use a cross over cable (cat5e) and see if you get the same problem. > You might try using a lower level benchmark (of the micro variety) > like netperf and netpipe. > > The Beowulf Performance Suite: > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > has these tests. Also, the December and January issues of ClusterWorld > show how to test a network connection using netpipe. At some point this > content will be showing up on the web-page. > > Also, the MPI Link-checker from Microway (www.microway.com) > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > May help. > > > Doug > > >>Here are the details of my system: >> - Suse Linux 9.0 (kernel 2.4.21) >> - mpich-1.2.5.2 >> - motherboard ASUS P4P800 >> - LAN (10/100/1000) on board (3COM 3C940 chipset) >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > + > >> 8x88E1111-BAB, AT89C2051-24PI) >> >> > > -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Thu Feb 12 22:22:02 2004 From: bclem at rice.edu (Brent M. Clements) Date: Thu, 12 Feb 2004 21:22:02 -0600 (CST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <402AFD58.9060402@tamu.edu> References: <402AFD58.9060402@tamu.edu> Message-ID: The best switch that we have found both in price and speed are the GigE Switches from Dell. We use them in a few of our test clusters and smaller clusters. They are actually pretty good performers and top even some of the cisco switches. -Brent Brent Clements Linux Technology Specialist Information Technology Rice University On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote: > Realize that not all switches are created equal when working with small > (and, overall, 0-byte == small) packets. A number of otherwise decent > network switches are less than stellar performers with small packets. > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test > system running under the RFC-2544 testing suite... > > There are switches that perform well with small packets, but it's been > our experience that most switches, especially your lower cost switches > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some > others I can't recall right now) didn't perform well with smaller > packets but did fine when the packet size was about 1500 bytes. > > Going with cheap switches is usually not a good way to improve performance. > > gerry > > Douglas Eadline, Cluster World Magazine wrote: > > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > > > > >>Hello, > >> > >>I have a really small "cluster" of 4 PC's which are connected by a normal > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > >>I thought I might be able to improve performance by connecting the machines > >>via a Gigabit switch (which are really cheap nowadays). > >> > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to > >>the PC's and transfer rate for scp-ing large files is significantly higher > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > >>with the 100 Mbit switch. > >> > >>I wasn't able to actually track down the problem, but it seems that there is > >>a problem with small messages. When I run the performance test provided with > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > >>byte message length, while for larger messages everything looks fine (linear > >>dependancy of transfer time on message length, everything below 300 us). I > >>have also tried mpich2 which shows exactly the same behavior. > >> > >>Does anyone have any idea? > > > > > > First, I assume you were running the 100BT through the same > > onboard NICs and got reasonable performance. So some possible > > things: > > > > - the switch is a dog or it is broken > > - your cables may be old or bad (but worked fine for 100BT) > > - negotiation problem > > > > Some things to try: > > > > Use a cross over cable (cat5e) and see if you get the same problem. > > You might try using a lower level benchmark (of the micro variety) > > like netperf and netpipe. > > > > The Beowulf Performance Suite: > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > > > has these tests. Also, the December and January issues of ClusterWorld > > show how to test a network connection using netpipe. At some point this > > content will be showing up on the web-page. > > > > Also, the MPI Link-checker from Microway (www.microway.com) > > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > > > May help. > > > > > > Doug > > > > > >>Here are the details of my system: > >> - Suse Linux 9.0 (kernel 2.4.21) > >> - mpich-1.2.5.2 > >> - motherboard ASUS P4P800 > >> - LAN (10/100/1000) on board (3COM 3C940 chipset) > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > > > + > > > >> 8x88E1111-BAB, AT89C2051-24PI) > >> > >> > > > > > > -- > Gerry Creager -- gerry.creager at tamu.edu > Network Engineering -- AATLT, Texas A&M University > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 > Page: 979.228.0173 > Office: 903A Eller Bldg, TAMU, College Station, TX 77843 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Feb 12 22:35:51 2004 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 13 Feb 2004 14:35:51 +1100 Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: References: <402AFD58.9060402@tamu.edu> Message-ID: <200402131435.54453.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 13 Feb 2004 02:22 pm, Brent M. Clements wrote: > The best switch that we have found both in price and speed are the GigE > Switches from Dell. We use them in a few of our test clusters and smaller > clusters. They are actually pretty good performers and top even some of > the cisco switches. That's bizzare, the GigE switches I've seen in a Dell cluster *were* rebadged Cisco switches. Even had to do the usual "PortFast" routine in IOS to get PXE booting to work. Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFALEYXO2KABBYQAh8RAm81AJoDHOfMZ+hrIyLVoBIr1lsESi70KACfcnYu C1JcJ3iYX22Tm99gTvKlfOs= =XWYZ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Feb 12 23:17:26 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 12 Feb 2004 20:17:26 -0800 (PST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: Message-ID: on varius revs of their code I've had regular (once a week) managment stack crash on our dell switches which doesn't make it easy to collect statistics, but they continue to forward packets just fine... the switches are actually made by accton and they are also sold by smc... depending one who has better deals the dell 5212/5224 or smc 8612t/8624t may be cheaper at any given time... the cisco cat-ios style cli and ssh support are a plus. On Thu, 12 Feb 2004, Brent M. Clements wrote: > The best switch that we have found both in price and speed are the GigE > Switches from Dell. We use them in a few of our test clusters and smaller > clusters. They are actually pretty good performers and top even some of > the cisco switches. > > -Brent > > Brent Clements > Linux Technology Specialist > Information Technology > Rice University > > > On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote: > > > Realize that not all switches are created equal when working with small > > (and, overall, 0-byte == small) packets. A number of otherwise decent > > network switches are less than stellar performers with small packets. > > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test > > system running under the RFC-2544 testing suite... > > > > There are switches that perform well with small packets, but it's been > > our experience that most switches, especially your lower cost switches > > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some > > others I can't recall right now) didn't perform well with smaller > > packets but did fine when the packet size was about 1500 bytes. > > > > Going with cheap switches is usually not a good way to improve performance. > > > > gerry > > > > Douglas Eadline, Cluster World Magazine wrote: > > > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > > > > > > > >>Hello, > > >> > > >>I have a really small "cluster" of 4 PC's which are connected by a normal > > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > > >>I thought I might be able to improve performance by connecting the machines > > >>via a Gigabit switch (which are really cheap nowadays). > > >> > > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to > > >>the PC's and transfer rate for scp-ing large files is significantly higher > > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > > >>with the 100 Mbit switch. > > >> > > >>I wasn't able to actually track down the problem, but it seems that there is > > >>a problem with small messages. When I run the performance test provided with > > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > > >>byte message length, while for larger messages everything looks fine (linear > > >>dependancy of transfer time on message length, everything below 300 us). I > > >>have also tried mpich2 which shows exactly the same behavior. > > >> > > >>Does anyone have any idea? > > > > > > > > > First, I assume you were running the 100BT through the same > > > onboard NICs and got reasonable performance. So some possible > > > things: > > > > > > - the switch is a dog or it is broken > > > - your cables may be old or bad (but worked fine for 100BT) > > > - negotiation problem > > > > > > Some things to try: > > > > > > Use a cross over cable (cat5e) and see if you get the same problem. > > > You might try using a lower level benchmark (of the micro variety) > > > like netperf and netpipe. > > > > > > The Beowulf Performance Suite: > > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > > > > > has these tests. Also, the December and January issues of ClusterWorld > > > show how to test a network connection using netpipe. At some point this > > > content will be showing up on the web-page. > > > > > > Also, the MPI Link-checker from Microway (www.microway.com) > > > > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > > > > > May help. > > > > > > > > > Doug > > > > > > > > >>Here are the details of my system: > > >> - Suse Linux 9.0 (kernel 2.4.21) > > >> - mpich-1.2.5.2 > > >> - motherboard ASUS P4P800 > > >> - LAN (10/100/1000) on board (3COM 3C940 chipset) > > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > > > > > + > > > > > >> 8x88E1111-BAB, AT89C2051-24PI) > > >> > > >> > > > > > > > > > > -- > > Gerry Creager -- gerry.creager at tamu.edu > > Network Engineering -- AATLT, Texas A&M University > > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 > > Page: 979.228.0173 > > Office: 903A Eller Bldg, TAMU, College Station, TX 77843 > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Feb 12 23:19:16 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 12 Feb 2004 20:19:16 -0800 (PST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: Message-ID: Also they support jumbo (9k) frames which is a plus for us since we do nfs over them. joelja On Thu, 12 Feb 2004, Joel Jaeggli wrote: > on varius revs of their code I've had regular (once a week) managment > stack crash on our dell switches which doesn't make it easy to collect > statistics, but they continue to forward packets just fine... the switches > are actually made by accton and they are also sold by smc... depending > one who has better deals the dell 5212/5224 or smc 8612t/8624t may be > cheaper at any given time... the cisco cat-ios style cli and ssh support > are a plus. > > On Thu, 12 Feb 2004, Brent M. Clements wrote: > > > The best switch that we have found both in price and speed are the GigE > > Switches from Dell. We use them in a few of our test clusters and smaller > > clusters. They are actually pretty good performers and top even some of > > the cisco switches. > > > > -Brent > > > > Brent Clements > > Linux Technology Specialist > > Information Technology > > Rice University > > > > > > On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote: > > > > > Realize that not all switches are created equal when working with small > > > (and, overall, 0-byte == small) packets. A number of otherwise decent > > > network switches are less than stellar performers with small packets. > > > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test > > > system running under the RFC-2544 testing suite... > > > > > > There are switches that perform well with small packets, but it's been > > > our experience that most switches, especially your lower cost switches > > > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some > > > others I can't recall right now) didn't perform well with smaller > > > packets but did fine when the packet size was about 1500 bytes. > > > > > > Going with cheap switches is usually not a good way to improve performance. > > > > > > gerry > > > > > > Douglas Eadline, Cluster World Magazine wrote: > > > > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > > > > > > > > > > >>Hello, > > > >> > > > >>I have a really small "cluster" of 4 PC's which are connected by a normal > > > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > > > >>I thought I might be able to improve performance by connecting the machines > > > >>via a Gigabit switch (which are really cheap nowadays). > > > >> > > > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to > > > >>the PC's and transfer rate for scp-ing large files is significantly higher > > > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > > > >>with the 100 Mbit switch. > > > >> > > > >>I wasn't able to actually track down the problem, but it seems that there is > > > >>a problem with small messages. When I run the performance test provided with > > > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > > > >>byte message length, while for larger messages everything looks fine (linear > > > >>dependancy of transfer time on message length, everything below 300 us). I > > > >>have also tried mpich2 which shows exactly the same behavior. > > > >> > > > >>Does anyone have any idea? > > > > > > > > > > > > First, I assume you were running the 100BT through the same > > > > onboard NICs and got reasonable performance. So some possible > > > > things: > > > > > > > > - the switch is a dog or it is broken > > > > - your cables may be old or bad (but worked fine for 100BT) > > > > - negotiation problem > > > > > > > > Some things to try: > > > > > > > > Use a cross over cable (cat5e) and see if you get the same problem. > > > > You might try using a lower level benchmark (of the micro variety) > > > > like netperf and netpipe. > > > > > > > > The Beowulf Performance Suite: > > > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > > > > > > > has these tests. Also, the December and January issues of ClusterWorld > > > > show how to test a network connection using netpipe. At some point this > > > > content will be showing up on the web-page. > > > > > > > > Also, the MPI Link-checker from Microway (www.microway.com) > > > > > > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > > > > > > > May help. > > > > > > > > > > > > Doug > > > > > > > > > > > >>Here are the details of my system: > > > >> - Suse Linux 9.0 (kernel 2.4.21) > > > >> - mpich-1.2.5.2 > > > >> - motherboard ASUS P4P800 > > > >> - LAN (10/100/1000) on board (3COM 3C940 chipset) > > > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > > > > > > > + > > > > > > > >> 8x88E1111-BAB, AT89C2051-24PI) > > > >> > > > >> > > > > > > > > > > > > > > -- > > > Gerry Creager -- gerry.creager at tamu.edu > > > Network Engineering -- AATLT, Texas A&M University > > > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 > > > Page: 979.228.0173 > > > Office: 903A Eller Bldg, TAMU, College Station, TX 77843 > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Feb 13 03:40:09 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 13 Feb 2004 09:40:09 +0100 (CET) Subject: [Beowulf] Math Coprocessor In-Reply-To: Message-ID: On Thu, 12 Feb 2004, Robert G. Brown wrote: > not really vanishing, but I'll bet that no more than 10% of all > programmers have a clue about what registers are and how to manipulate > them with assembler commands...maybe more like 1-2%. And mostly Old > Guys at that. And the serious, I mean really serious, programmers and > hackers. > Sigh. I was first taught assembler in the physics department (being as you in the States would say a physics major). The lab had Motorola 68000 trainer boards. I still have a copy of "68000 Assembly Language" by Kane, Hawkins, Leventhal kicking around. Such a nice architecture. But then again I may be the only person to own "Fortran 77: A Structured Approach". Such perversity originating from being taught Pascal by computer scientists then learning Fortran. I also remember being taught about self-modifying code by the then professor of computing science. Do they still teach that? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rmiguel at usmp.edu.pe Fri Feb 13 09:24:11 2004 From: rmiguel at usmp.edu.pe (Richard Miguel) Date: Fri, 13 Feb 2004 09:24:11 -0500 Subject: [Beowulf] problmes with MPICH References: Message-ID: <000d01c3f23d$0e2af910$1101000a@cpn.senamhi.gob.pe> Hi, i have problems with mpich, i have installed OSCAR with mpich 1.2.5.10-ch_p4-gcc using ssh for communicate with nodes.. that is ok. but, i want to use rsh and i dont want reinstall OSCAR. then i change the line in mpirun RSHCOMMAND=""ssh" by rsh these change was replicated on the nodes but mpich not use rsh. Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need help in this point. I have mpich-1.2.5.2 and fortran pgi and rsh. Thanks R. Miguel _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Fri Feb 13 09:53:38 2004 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Fri, 13 Feb 2004 14:53:38 +0000 (UTC) Subject: [Beowulf] Math Coprocessor In-Reply-To: Message-ID: On Fri, 13 Feb 2004, John Hearns wrote: > But then again I may be the only person to own "Fortran 77: > A Structured Approach". Wow! Bleeding edge stuff. On the subject of pure perversity, my Fortran notes stop with a roneotyped rip-off copy of Fortran IV Self-Taught *in French* dating from 1968. (Anyone else remember the 60s workhorse, the IBM 1130? Punched card paradise? I believe some guy in France has got one back together and working, but I don't remember where.) [Weeps sadly into Wincarnis as the memories flood back.] -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joshh at cs.earlham.edu Fri Feb 13 10:25:31 2004 From: joshh at cs.earlham.edu (joshh at cs.earlham.edu) Date: Fri, 13 Feb 2004 10:25:31 -0500 (EST) Subject: [Beowulf] Adding Latency to a Cluster Environment Message-ID: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Here is an irregular question. I am profiling a software package that runs over LAM-MPI on 16 node clusters [Details Below]. I would like to measure the effect of increased latency on the run time of the program. It would be nice if I could quantify the added latency in the process to create some statistics. If possible, I do not want to alter the code line of the program, or buy new hardware. I am looking for a software solution/idea. Bazaar Cluster: 16 Node Red Hat Linux machines running 500MHz PIII, 512MB RAM 1 100Mbps NIC card in each machine 2 100Mbps Full-Duplex switches Cairo Cluster: 16 Node YellowDog Linux machines running 1GHz PPC G4, 1GB RAM 2 1Gbps NIC cards in each machine (only one in use) 2 1Gbps Full-Duplex switches For more details on these clusters follow the link below: http://cluster.earlham.edu/html/ Thank you, Josh Hursey Earlham College Cluster Computing Group _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Fri Feb 13 11:30:25 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Fri, 13 Feb 2004 08:30:25 -0800 Subject: [Beowulf] problmes with MPICH Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F5563@orsmsx402.jf.intel.com> I'm forwarding this to the OSCAR-users list, a more appropriate venue for this question. -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf > Of Richard Miguel > Sent: Friday, February 13, 2004 6:24 AM > Cc: beowulf at beowulf.org > Subject: [Beowulf] problmes with MPICH > > Hi, i have problems with mpich, i have installed OSCAR with mpich > 1.2.5.10-ch_p4-gcc using ssh for communicate with nodes.. that is ok. but, > i > want to use rsh and i dont want reinstall OSCAR. then i change the line in > mpirun RSHCOMMAND=""ssh" by rsh these change was replicated on the nodes > but > mpich not use rsh. > Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need > help > in this point. > > I have mpich-1.2.5.2 and fortran pgi and rsh. > > Thanks > > R. Miguel > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Fri Feb 13 11:57:00 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Fri, 13 Feb 2004 08:57:00 -0800 Subject: [Beowulf] Math Coprocessor Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com> > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf > Of Martin WHEELER > Sent: Friday, February 13, 2004 6:54 AM > To: John Hearns > Cc: Robert G. Brown; sunil kumar; beowulf at beowulf.org > Subject: Re: [Beowulf] Math Coprocessor > > On Fri, 13 Feb 2004, John Hearns wrote: > > > But then again I may be the only person to own "Fortran 77: > > A Structured Approach". > > Wow! Bleeding edge stuff. > On the subject of pure perversity, my Fortran notes stop with a > roneotyped rip-off copy of Fortran IV Self-Taught *in French* dating > from 1968. (Anyone else remember the 60s workhorse, the IBM 1130? > Punched card paradise? I believe some guy in France has got one back > together and working, but I don't remember where.) > > [Weeps sadly into Wincarnis as the memories flood back.] Ah, another 1130 veteran! Group hug! There's an active 1130 group, and you too can run R2V12 on your very own 1130 simulator, complete w/ Fortran (not EMU, sigh) and other tools. IIRC, APL may even be available. http://ibm1130.org One of my hobby tasks is to port the simulator GUI to Tcl/Tk or Perl/Tk... -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Fri Feb 13 13:15:28 2004 From: djholm at fnal.gov (Don Holmgren) Date: Fri, 13 Feb 2004 12:15:28 -0600 Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote: > Here is an irregular question. I am profiling a software package that runs > over LAM-MPI on 16 node clusters [Details Below]. I would like to measure > the effect of increased latency on the run time of the program. > > It would be nice if I could quantify the added latency in the process to > create some statistics. If possible, I do not want to alter the code line > of the program, or buy new hardware. I am looking for a software > solution/idea. > > Bazaar Cluster: > 16 Node Red Hat Linux machines running 500MHz PIII, 512MB RAM > 1 100Mbps NIC card in each machine > 2 100Mbps Full-Duplex switches > > Cairo Cluster: > 16 Node YellowDog Linux machines running 1GHz PPC G4, 1GB RAM > 2 1Gbps NIC cards in each machine (only one in use) > 2 1Gbps Full-Duplex switches > > For more details on these clusters follow the link below: > http://cluster.earlham.edu/html/ > > Thank you, > > Josh Hursey > Earlham College Cluster Computing Group > Not an irregular question at all. I tried something like this a couple of years ago to investigate the bandwidth and latency sensitivity of an application which was using MPICH over Myrinet. One of D.K.Panda's students from Ohio State University had a modified version of the "mcp" for Myrinet which added quality of service features, tunable per connection. The "mcp" is the code which runs on the LANai microprocessor on the Myrinet interface card. The modifications on top of the OSU modifications to gm used a hardware timer on the interface card to add a fixed delay per packet for bandwidth tuning, and a fixed delay per message (i.e., a delay added to only the first packet of a new connection) for latency tuning. Via netpipe, I verified that I could independently tune the bandwidth and latency. Lots of fun to play with - for example, by plotting the difference in message times for two different latency setting, the eager-rendezvous threshold was easily identified. All in all a very useful experiment which told us a lot about our application. Clearly, you want to delay the sending of a message, or the processing of a received communication, without otherwise interfering with what the system is doing. Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send call is going to perturb your results because the processor won't be doing useful work during that time. That's obviously not the same as running on a network with a switch that adds the same 50 microseconds latency; in that case, the processor could be doing useful work during the delay, happily overlapping computations with communications. Nevertheless, adding busy loops might still give you useful results. You might want to look into using a LD_PRELOAD library to intercept MPI calls of interest, assuming you're using a shared library for MPI. In your version, do the busy loop, then fall into the normal call. A quick google search on "LD_PRELOAD" or "library interposers" will return a lot of examples, such as: http://uberhip.com/godber/interception/index.html http://developers.sun.com/solaris/articles/lib_interposers.html The advantage of this approach is that no modifications to your source code or compiled binaries are necessary. You'll have to think carefully about whether the added latency is slowing your application simply because the processor is not doing work during the busy loop. If I were you, I'd modify your source code and time your syncronizations (eg, MPI_Wait). If your code is cpu-bound, these will return right away, and adding latency via a busy loop is going to give you the wrong answer. If your code is communications bound, these will have a variable delay depending upon the latency and bandwidth of the network. You are likely interested in delays of 10's of microseconds. The most accurate busy loops for this sort of thing use the processor hardware timers, which tick every clock on x86. On a G5 PPC running OS-X, the hardware timer ticks every 60 cpu cycles. I'm not sure what a PPC does under Linux. On x86, you can read the cycle timer via: #include unsigned long long timerVal; rdtscll(timerVal); A crude delay loop example: rdtscll(timeStart); do { rdtscll(timeEnd); } while ((timeEnd - timeStart) < latency * usecPerTick); where latency is in microseconds, and usecPerTick is your calibration. There have been other recent postings to this mailing list about using inline assembler macros to read the time stamp counter. Injecting small latencies w/out busy loops and without disturbing your source code is going to be very difficult (though I'd love to be contradicted on that statement!). A couple of far fetched ideas in kernel land: - some ethernet interfaces have very sophisticated processors aboard. IIRC there were gigE NICs (Broadcom, maybe???) which had a MIPS cpu. Perhaps the firmware can be modified similarly to the modified mcp for gm discussed above. Obviously this has the huge disadvantage of being specific to particular network chips. - the local APIC on x86 processors has a programmable interval timer with better than microsecond granularity which can be used to generate an interrupt. Perhaps in the communications stack, or in the network device driver, a wait_queue could be used to postpone processing until after an interrupt from this timer. I would worry about considerable jitter, though. For a sample driver using this feature, see http://www.oberle.org/apic_timer-timers.html The various realtime Linux folks talk about this as well: http://www.linuxdevices.com/articles/AT6105045931.html Unfortunately, IIRC this timer is now used (since 2.4 kernel) for interprocessor interrupts on SMP systems. On uniprocessor systems it may still be available. I hope there's something useful for you in this response. I'm hoping even more that there are other responses to your question - I would love a facility which would allow me to "turn the dial" on latency and/or bandwidth. There's a substantial cost difference between a gigE cluster and a Myrinet/Infiniband/Quadrix/SCI cluster, and it would be great to simulate performance of different network architectures on specific applications. Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lusk at mcs.anl.gov Fri Feb 13 13:31:08 2004 From: lusk at mcs.anl.gov (Rusty Lusk) Date: Fri, 13 Feb 2004 12:31:08 -0600 (CST) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: <20040213.123108.12267444.lusk@localhost> > Suggestions: > - modify the routines that make MPI calls to call instead some wrapper > routines that do some thumb twiddling before making the MPI call; this > requires modification of the program source > - modify the MPI routines (well, if you use an open-source MPI > implementation) to insert some delay, then relink your binary if static With any standard-conforming MPI implementation, open-source or not, you can use the MPI "profiling" interface to provide any kind of wrapper at all. Basically, you write your own MPI_Send, etc., which does whatever you want and also calls PMPI_Send (required to be there) to do the real work. Then you link your routines in front of the MPI library, and voila! Cheers, Rusty Lusk _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From modus at pr.es.to Thu Feb 12 23:53:04 2004 From: modus at pr.es.to (Patrick Michael Kane) Date: Thu, 12 Feb 2004 20:53:04 -0800 Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <200402131435.54453.csamuel@vpac.org>; from csamuel@vpac.org on Fri, Feb 13, 2004 at 02:35:51PM +1100 References: <402AFD58.9060402@tamu.edu> <200402131435.54453.csamuel@vpac.org> Message-ID: <20040212205304.A16115@pr.es.to> * Chris Samuel (csamuel at vpac.org) [040212 20:42]: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 13 Feb 2004 02:22 pm, Brent M. Clements wrote: > > > The best switch that we have found both in price and speed are the GigE > > Switches from Dell. We use them in a few of our test clusters and smaller > > clusters. They are actually pretty good performers and top even some of > > the cisco switches. > > That's bizzare, the GigE switches I've seen in a Dell cluster *were* rebadged > Cisco switches. Even had to do the usual "PortFast" routine in IOS to get > PXE booting to work. They used to be, I believe. Now they appear to be something else (for their latest 24 port layer-2 model). I've had good luck with them with the latest firmware, before that they were fairly flakey. Check the dell forums for all the yammering and howling on the PowerEdge 5224. Best, -- Patrick Michael Kane _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Feb 13 12:44:33 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 13 Feb 2004 18:44:33 +0100 (CET) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote: > Here is an irregular question. I am profiling a software package that runs > over LAM-MPI on 16 node clusters [Details Below]. I would like to measure > the effect of increased latency on the run time of the program. It appears that in your setup MPI uses TCP/IP as underlying protocol. Latency is a fuzzy parameter in TCP/IP. Adding to something fuzzy gives a fuzzy result. So there :-) Suggestions: - modify the routines that make MPI calls to call instead some wrapper routines that do some thumb twiddling before making the MPI call; this requires modification of the program source - modify the MPI routines (well, if you use an open-source MPI implementation) to insert some delay, then relink your binary if static - modify the kernel source to insert some delays in the TCP path - pretty hard as TCP is very complex - modify the network driver to insert some delays in the Tx or Rx packet path; not very difficult, but might be leveled by the delays of TCP. The kernel modifications have the disadvantage that they also require some way to change the delay value, so adding a /proc entry, an ioctl, etc. unless you want to recompile the kernel and reboot after each delay change. > For more details on these clusters follow the link below: > http://cluster.earlham.edu/html/ Please tell to whoever coded that page that Opera doesn't display it properly. And I use Opera all the time ;-) The page also doesn't specify an important detail: the network cards/chips used in the clusters. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Fri Feb 13 14:01:25 2004 From: gropp at mcs.anl.gov (William Gropp) Date: Fri, 13 Feb 2004 13:01:25 -0600 Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: References: <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: <6.0.0.22.2.20040213125745.0266bbc0@localhost> At 11:44 AM 2/13/2004, Bogdan Costescu wrote: >On Fri, 13 Feb 2004 joshh at cs.earlham.edu wrote: > > > Here is an irregular question. I am profiling a software package that runs > > over LAM-MPI on 16 node clusters [Details Below]. I would like to measure > > the effect of increased latency on the run time of the program. > >It appears that in your setup MPI uses TCP/IP as underlying protocol. >Latency is a fuzzy parameter in TCP/IP. Adding to something fuzzy gives a >fuzzy result. So there :-) > >Suggestions: >- modify the routines that make MPI calls to call instead some wrapper >routines that do some thumb twiddling before making the MPI call; this >requires modification of the program source Actually, this is not necessary, as long as you have the object files, not just the executable. The MPI profiling interface could be used to add latency to every send and receive operation; adding latency to collectives will require some care, as the exact set of communication operations that an MPI implementation uses is up to the implementation. Simply write your own MPI routine and call the PMPI version (e.g., for MPI_Send, call PMPI_Send) after adding some latency. Note also that MPI may use any communication mechanism. Even on small clusters, it may use something besides TCP (e.g., when the network is Infiniband). MPI on SMPs often uses a collection of communication approaches. Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Fri Feb 13 15:39:14 2004 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Fri, 13 Feb 2004 20:39:14 +0000 (UTC) Subject: [Beowulf] Math Coprocessor In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com> Message-ID: On Fri, 13 Feb 2004, Lombard, David N wrote: > There's an active 1130 group, and you too can run R2V12 on your very own > 1130 simulator, complete w/ Fortran (not EMU, sigh) and other tools. > IIRC, APL may even be available. http://ibm1130.org Thanks for the link -- didn't know about that. As arts faculty post-grads (applied linguistics) we were only allowed to play with Fortran (and even then were regarded with deep suspicion by the physics wallahs). Now -- where did I put that stack of cards...? Off to the attic to dig out more stuff. -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Fri Feb 13 16:54:28 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Fri, 13 Feb 2004 16:54:28 -0500 (EST) Subject: [Beowulf] gigabit ethernet: horrible performance for 0 byte messages In-Reply-To: <402AFD58.9060402@tamu.edu> Message-ID: I wondered about your low cost switch statement. I had done this test before, but I thought I would redo it anyway. I have an SMC 8 port GigE EasySwitch 8508T (PriceGrabber $140 to my door). I should say that the switch is not loaded, so it may fall down if the load were higher. This is just two nodes running netpipe through the switch. Latency: 0.000034 Now starting main loop 0: 1 bytes 7287 times --> 0.22 Mbps in 0.000034 sec 1: 2 bytes 7338 times --> 0.46 Mbps in 0.000033 sec 2: 3 bytes 7469 times --> 0.68 Mbps in 0.000034 sec 3: 4 bytes 4923 times --> 0.90 Mbps in 0.000034 sec 4: 6 bytes 5545 times --> 1.36 Mbps in 0.000034 sec 5: 8 bytes 3711 times --> 1.81 Mbps in 0.000034 sec 6: 12 bytes 4637 times --> 2.67 Mbps in 0.000034 sec My opinion: If you get a switch that can not "switch" then it is broken by design. The original poster noted that his results seem to go from OK to "really bad" for basic MPI tests. If a switch does this it is "really broken". Of course it may not be the switch. BTW, the results were for a $30 NIC (netgear GA302T) running in a 66MHz slot. Top throughput was 800 Mbits/sec. Doug On Wed, 11 Feb 2004, Gerry Creager N5JXS wrote: > Realize that not all switches are created equal when working with small > (and, overall, 0-byte == small) packets. A number of otherwise decent > network switches are less than stellar performers with small packets. > We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test > system running under the RFC-2544 testing suite... > > There are switches that perform well with small packets, but it's been > our experience that most switches, especially your lower cost switches > (Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some > others I can't recall right now) didn't perform well with smaller > packets but did fine when the packet size was about 1500 bytes. > > Going with cheap switches is usually not a good way to improve performance. > > gerry > > Douglas Eadline, Cluster World Magazine wrote: > > On Wed, 11 Feb 2004, Bernhard Wegner wrote: > > > > > >>Hello, > >> > >>I have a really small "cluster" of 4 PC's which are connected by a normal > >>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board > >>I thought I might be able to improve performance by connecting the machines > >>via a Gigabit switch (which are really cheap nowadays). > >> > >>Everything seemed to work fine. The switch indicates 1000Mbit connections to > >>the PC's and transfer rate for scp-ing large files is significantly higher > >>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than > >>with the 100 Mbit switch. > >> > >>I wasn't able to actually track down the problem, but it seems that there is > >>a problem with small messages. When I run the performance test provided with > >>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 > >>byte message length, while for larger messages everything looks fine (linear > >>dependancy of transfer time on message length, everything below 300 us). I > >>have also tried mpich2 which shows exactly the same behavior. > >> > >>Does anyone have any idea? > > > > > > First, I assume you were running the 100BT through the same > > onboard NICs and got reasonable performance. So some possible > > things: > > > > - the switch is a dog or it is broken > > - your cables may be old or bad (but worked fine for 100BT) > > - negotiation problem > > > > Some things to try: > > > > Use a cross over cable (cat5e) and see if you get the same problem. > > You might try using a lower level benchmark (of the micro variety) > > like netperf and netpipe. > > > > The Beowulf Performance Suite: > > http://www.clusterworld.com/article.pl?sid=03/03/17/1838236 > > > > has these tests. Also, the December and January issues of ClusterWorld > > show how to test a network connection using netpipe. At some point this > > content will be showing up on the web-page. > > > > Also, the MPI Link-checker from Microway (www.microway.com) > > > > http://www.clusterworld.com/article.pl?sid=04/02/09/1952250 > > > > May help. > > > > > > Doug > > > > > >>Here are the details of my system: > >> - Suse Linux 9.0 (kernel 2.4.21) > >> - mpich-1.2.5.2 > >> - motherboard ASUS P4P800 > >> - LAN (10/100/1000) on board (3COM 3C940 chipset) > >> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M > > > > + > > > >> 8x88E1111-BAB, AT89C2051-24PI) > >> > >> > > > > > > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Feb 13 17:46:38 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 13 Feb 2004 23:46:38 +0100 (CET) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: Message-ID: On Fri, 13 Feb 2004, Don Holmgren wrote: > I tried something like this a couple of years ago to investigate the > bandwidth and latency sensitivity of an application which was using > MPICH over Myrinet. ... which is pretty different from the setup of the original poster :-) But I'd like to see it discussed in general, so let's go on. > a modified version of the "mcp" for Myrinet which added ... Is this publicly available ? I'd like to give it a try. > The modifications on top of the OSU modifications to gm Well, that's a very important point: using GM, which doesn't try to make too many things like TCP does. I haven't used GM directly nor looked at its code, but I think that it doesn't introduce delays, like TCP does in some cases. Moreover, based on the description in the GM docs, GM is not needed to be optimized by the compiler as it's not in the fast path. Obviously, in such conditions, the results can be relied upon. > Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send > call is going to perturb your results because the processor won't be > doing useful work during that time. In the case of TCP, the processor doesn't appear to be doing anything useful for "long" times, as it spends time in kernel space. So, a 50 microseconds busy loop might not make a difference. And given the somehow non-deterministic behaviour of TCP in this respect, it might be that adding the delay before the PMPI_* or after PMPI_* calls might make a difference. The delays don't have to be busy-loops. Busy-loops are probably precise, but might have some side-effects; for example, reading some hardware counter (even more as it is on a PCI device, which is "far" from the CPU and might be even "farther" if it has any PCI bridge(s) in between) repeatedly will generate lots of "in*" operations during which the CPU is stalled waiting for data. Especially with today's CPU speeds, I/O operations are expensive in terms of CPU cycles... > You are likely interested in delays of 10's of microseconds. Well, it depends :-) The latencies for today's HW+SW seem to be in a range of about 2 orders of magnitude, so giving absolute figures doesn't make much sense IMHO. Apart from this I would rather suggest an exponential increase in the delay value. > - some ethernet interfaces have very sophisticated processors aboard. > IIRC there were gigE NICs (Broadcom, maybe???) which had a MIPS cpu. Well, if the company releases enough documentation about the chip, then yes ;-) 3Com has the 990 line which is still FastE but has a programmable processor, so it's not only GigE. > Obviously this has the huge disadvantage of being specific to > particular network chips. But there aren't so many programmable network chips these days. Those Ethernet chips might even be in wider use than Myrinet[1] and more people might benefit from such development. If I'd have to choose for the next cluster purchase the GigE network cards and I'd know that one offers such capabilities while not having significant flaws compared to the others, I'd certainly buy it. Another hardware approach: the modern 3Com cards driven by 3c59x, Cyclone and Tornado, have the means to delay a packet in their (hardware) Tx queue. There is however a catch: there is not guarantee that the packet will be sent at the exact time specified, it can be delayed; the only guarantee is that the packet is not sent before that time. However, I somehow think that this is true for most other approaches, so it's not so bad as it sounds :-) The operation is pretty simple, as the packet is "stamped" with the time when it should be transmitted, expressed as some internal clock ticks. Only one "in" operation to read the current clock is needed per packet, so this is certainly much less intrusive as the busy-loop. [ I'm too busy (but not busy-looping :-)) to try this at the moment. If somebody feels the urge, I can provide some guidance :-) ] However, anything that still uses TCP (as both your Broadcom approach and my 3Com one do) will likely generate unreliable results... > it would be great to simulate performance of different network > architectures on specific applications. Certainly ! Especially as this would provide means to justify spending money on fast interconnect ;-) [1] I don't want this to look like I'm saying "compared with Myrinet as it's the most widely used high-performance interconnect" and neglect Infiniband, SCI, etc; I have no idea about "market share" of the different interconnects. I compare with Myrinet because the original message talked about it and because I'm ignorant WRT programmable processors on other interconnect NICs. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Fri Feb 13 19:49:05 2004 From: djholm at fnal.gov (Don Holmgren) Date: Fri, 13 Feb 2004 18:49:05 -0600 Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: References: Message-ID: On Fri, 13 Feb 2004, Bogdan Costescu wrote: > On Fri, 13 Feb 2004, Don Holmgren wrote: > > > I tried something like this a couple of years ago to investigate the > > bandwidth and latency sensitivity of an application which was using > > MPICH over Myrinet. > > ... which is pretty different from the setup of the original poster :-) > But I'd like to see it discussed in general, so let's go on. > > > a modified version of the "mcp" for Myrinet which added ... > > Is this publicly available ? I'd like to give it a try. I'm afraid not, sorry, since the modified code base from OSU isn't publically available. IIRC it was part of a project for a masters degree; if it's OK with them, it's OK with me (we can take this offline). The modified MCP had a bug I never fixed which required me to reset the card and reload the driver when some counter overflowed, at something like a gigabyte of messages. Long enough to get very good statistics, though. > > > The modifications on top of the OSU modifications to gm > > Well, that's a very important point: using GM, which doesn't try to make > too many things like TCP does. I haven't used GM directly nor looked at > its code, but I think that it doesn't introduce delays, like TCP does in > some cases. Moreover, based on the description in the GM docs, GM is not > needed to be optimized by the compiler as it's not in the fast path. > Obviously, in such conditions, the results can be relied upon. I miswrote a bit; to be precise, this was a modification to the MCP, which is the NIC firmware, rather than to GM, which is the user space code that interacts with the NIC hardware. The modification caused the NIC itself to introduce interpacket delays of a configurable value. To the application (well, to MPICH and to GM) it simply looked like the external Myrinet network had a different bandwidth and/or latency. There were tiny code changes to MPICH and to GM to allow modification of the interpacket delay values in the MCP; otherwise I would have had to recompile or patch the firmware image and reload that image for each new value. You are absolutely correct that GM, like all good OS-bypass software, doesn't introduce the delays that you'd encounter with communications protocols like TCP that have to pass through the kernel/user space boundary. Much more deterministic. > > > Adding a 50 microsecond busy loop, say, to the beginning of an MPI_*Send > > call is going to perturb your results because the processor won't be > > doing useful work during that time. > > In the case of TCP, the processor doesn't appear to be doing anything > useful for "long" times, as it spends time in kernel space. So, a 50 > microseconds busy loop might not make a difference. And given the somehow > non-deterministic behaviour of TCP in this respect, it might be that > adding the delay before the PMPI_* or after PMPI_* calls might make a > difference. TCP processing is likely a significant component of the natural latency, and, as you point out, during that time the CPU is busy in kernel space and isn't doing useful work. But the goal here is to add additional artificial latency in a manner that mimics a slower physical network, i.e., so that during this artificial delay the application can still be crunching numbers. In user space I don't see how to accomplish this goal (adding latency, yes; adding latency during which the cpu can do calculations, no). If delay code is added correctly in kernel space, say in the TCP/IP stack (sounds like a nasty bit of careful work!), then during that 50 usec period the CPU could certainly be doing useful work in user space. Small delays, relative to the timer tick, are very difficult to do accurately in non-realtime kernels unless you have a handy source of interrupts, like the local APIC. Assuming that LAM MPI isn't multithreaded (I have no idea), then adding a delay in the user space code in the MPI call, whether it's a sleep or a busy loop, guarantees that no useful application work can done during the delay. I'm confess to be totally ignorant of the PMPI_* calls (time for homework!) and defer humbly to the MPI masters from ANL. I'm definitely curious as to how these added latencies are implemented. > > The delays don't have to be busy-loops. Busy-loops are probably precise, > but might have some side-effects; for example, reading some hardware > counter (even more as it is on a PCI device, which is "far" from the CPU > and might be even "farther" if it has any PCI bridge(s) in between) > repeatedly will generate lots of "in*" operations during which the CPU is > stalled waiting for data. Especially with today's CPU speeds, I/O > operations are expensive in terms of CPU cycles... Agreed, though I'd hope on x86 that reading the time stamp counter is very quick and with minimal impact - it's got to be more like a register-to-register move than an I/O access. Hopefully on a modern superscalar processor this doesn't interfere with the other execution units. [As I write this, I just ran a program that reads the time stamp counter back to back to different registers, multiple times. The difference in values was a consistent 84 counts or 56 nsec on this 1.5 GHz Xeon - so, definitely minimal impact.] Without busy loops, achieving accurate delays of the order of 10's to 100's of microseconds with little jitter is a real trick in user space, (and kernel space as well!). nanosleep() won't work, delivering order 10 or 20 msec (i.e., the next timer tick) instead of the 50 usec request. > > > You are likely interested in delays of 10's of microseconds. > > Well, it depends :-) The latencies for today's HW+SW seem to be in a range > of about 2 orders of magnitude, so giving absolute figures doesn't make > much sense IMHO. Apart from this I would rather suggest an exponential > increase in the delay value. True. I was really thinking of my specific problem, not his! The relevent latency range for deciding between Infiniband and switched ethernet is ~ 6 usec to ~ 100+ usec, and the bandwidth range is ~ 100 MB/sec (gigE) to ~ 700 MB/sec (I.B.). It would be really useful to be able to inject latencies in that latency range with a precision of 5 usec or so, and to dial the bandwidth with a precision of ~ 50 MB/sec. Of course, if latency really matters, one would drop TCP/IP and use an OS-bypass, like GAMMA or MVIA. > ... > > > it would be great to simulate performance of different network > > architectures on specific applications. > > Certainly ! Especially as this would provide means to justify spending > money on fast interconnect ;-) What we need is some kind corporate soul to put up a large public cluster with the lowest latency, highest bandwidth network fabric available. Then, we can add our adjustable firmware and degrade that fabric to mimic less expensive networks, and figure out what we should really buy. Works for me! Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sat Feb 14 04:47:30 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sat, 14 Feb 2004 10:47:30 +0100 (CET) Subject: [Beowulf] Math Coprocessor In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722025F5564@orsmsx402.jf.intel.com> Message-ID: On Fri, 13 Feb 2004, Lombard, David N wrote: > > Ah, another 1130 veteran! Group hug! > Talking about 'mature' computer systems, I was at the ATLAS centre at RAL yesterday, where they display the console of te IBM 360 in the front hall. Plenty of blinkenlights and switches to toggle. The notice beside it said it was a 15 MIPS machine. Seems impressive for a machineof this vintage. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Sat Feb 14 04:43:37 2004 From: john.hearns at clustervision.com (John Hearns) Date: Sat, 14 Feb 2004 10:43:37 +0100 (CET) Subject: [Beowulf] problmes with MPICH In-Reply-To: <000d01c3f23d$0e2af910$1101000a@cpn.senamhi.gob.pe> Message-ID: On Fri, 13 Feb 2004, Richard Miguel wrote: > Now i have download mpich-1.2.5.2 and i want compile it for rsh, i need help > in this point. > > I have mpich-1.2.5.2 and fortran pgi and rsh. > ./configure -rsh=RSHCOMMAND >From the configure.in: "The environment variable 'RSHCOMMAND' allows you to select an alternative remote shell command (by default, configure will use 'rsh' or 'remsh' from your 'PATH'). If your remote shell command does not support the '-l' option (some AFS versions of 'rsh' have this bug), also give the option '-rshnol'. These options are useful only when building a network version of MPICH (e.g., '--with-device=ch_p4'). The configure option '-rsh' is supported for backward compatibility." SO rsh is the defautl behaviour. You can compile with the rsh command set to the rsh under $SGE_HOME/mpi also. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Feb 14 11:31:51 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 14 Feb 2004 11:31:51 -0500 (EST) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: Message-ID: given the difficulty of accurately adding a small amount of latency to a message passing interface, how about this: hack the driver to artificially pre/append a constant number of bytes to each message. they will appear to take longer to process, giving high-resolution added delays. course, this will also saturate earlier, but that's only the upper knee of the curve: you can still learn what you want... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From konstantin_kudin at yahoo.com Sat Feb 14 14:28:22 2004 From: konstantin_kudin at yahoo.com (Konstantin Kudin) Date: Sat, 14 Feb 2004 11:28:22 -0800 (PST) Subject: [Beowulf] S.M.A.R.T usage in big clusters Message-ID: <20040214192822.35170.qmail@web21203.mail.yahoo.com> I am curious if anyone is using SMART monitoring of ide drives in a big cluster. Basically, the question is in what percentage of the situations when a drive fails SMART is able to give some kind of a reasonable warning beforehand, let's say more than 24 hours. And how often it does not predict failure at all? The reason I am asking is that recently I had a drive that started getting bunch of I/O errors on certain sectors, yet SMART seemed to indicate that things were fine. Thanks! Konstantin __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Sat Feb 14 18:12:38 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Sun, 15 Feb 2004 00:12:38 +0100 (CET) Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: Message-ID: On Sat, 14 Feb 2004, Mark Hahn wrote: > hack the driver to artificially pre/append a constant number of > bytes to each message. I thought of this as well, but I dsmissed it because: - if the higher level protocol uses fragmentation and checksums, I think that it's pretty hard for the driver to mess with the messages. - a side effect might be faster filling up of some FIFO buffers on the receiver side, which might influence in unexpected ways the latency that we want to measure. Another side effect might be on the switch (assuming a network that uses switches) where data might be kept longer in buffers or peak bandwidth might be reached for short times, but enough to make a difference... - for networks that offer a very low latency, simulating a large latency might require adding a big lot of junk data, many times larger than the original message. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Mon Feb 16 09:08:54 2004 From: timm at fnal.gov (Steven Timm) Date: Mon, 16 Feb 2004 08:08:54 -0600 (CST) Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <20040214192822.35170.qmail@web21203.mail.yahoo.com> References: <20040214192822.35170.qmail@web21203.mail.yahoo.com> Message-ID: We are using the SMART monitoring on our cluster. It depends on the drive model how much predictive power you will get. On the drives where we have had the most failures we've kept track of how well SMART predicted it pretty well.. it finds an error in advance about half the time. Steve Timm ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Core Support Services Dept. Assistant Group Leader, Scientific Computing Support Group Lead of Computing Farms Team On Sat, 14 Feb 2004, Konstantin Kudin wrote: > I am curious if anyone is using SMART monitoring of > ide drives in a big cluster. > > Basically, the question is in what percentage of the > situations when a drive fails SMART is able to give > some kind of a reasonable warning beforehand, let's > say more than 24 hours. And how often it does not > predict failure at all? > > The reason I am asking is that recently I had a drive > that started getting bunch of I/O errors on certain > sectors, yet SMART seemed to indicate that things were > fine. > > Thanks! > > Konstantin > > > > __________________________________ > Do you Yahoo!? > Yahoo! Finance: Get your refund fast by filing online. > http://taxes.yahoo.com/filing.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From camm at enhanced.com Mon Feb 16 10:47:01 2004 From: camm at enhanced.com (Camm Maguire) Date: 16 Feb 2004 10:47:01 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: Message-ID: <54brnzrpqi.fsf@intech19.enhanced.com> Greetings! The subject line says it all -- where can one get the most bang per watt among systems currently available? Take care, -- Camm Maguire camm at enhanced.com ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Mon Feb 16 11:19:53 2004 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Mon, 16 Feb 2004 11:19:53 -0500 (EST) Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <54brnzrpqi.fsf@intech19.enhanced.com> References: <54brnzrpqi.fsf@intech19.enhanced.com> Message-ID: On Mon, 16 Feb 2004 at 10:47am, Camm Maguire wrote > Greetings! The subject line says it all -- where can one get the most > bang per watt among systems currently available? I have no numbers or benchmarks, but my search for a quiet but powerful set of nodes led me to buy Dell Optiplex SX270s. They've got the Intel 865G chipset (800MHz FSB, 400MHz dual channel memory), P4 HT up to 3.2GHz, onboard e1000, laptop-style HDD, a 150W power supply, and little else. They're sweet little systems. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmpc at sanger.ac.uk Mon Feb 16 12:10:43 2004 From: gmpc at sanger.ac.uk (Guy Coates) Date: Mon, 16 Feb 2004 17:10:43 +0000 (GMT) Subject: Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <200402151704.i1FH4Vh21871@NewBlue.scyld.com> References: <200402151704.i1FH4Vh21871@NewBlue.scyld.com> Message-ID: > Message: 1 > Date: Sat, 14 Feb 2004 11:28:22 -0800 (PST) > From: Konstantin Kudin > To: beowulf at beowulf.org > Subject: [Beowulf] S.M.A.R.T usage in big clusters > > I am curious if anyone is using SMART monitoring of > ide drives in a big cluster. Yes. We use smartmon tools http://smartmontools.sourceforge.net/ Hard drive failures are by far the most common hardware failure we see on our systems. We've hooked smartmontools into the batch queueing system we use, so that if drives are flagged as failing, the host gets closed to new jobs. (You could extend this to do checkpoint/migration if your code supports it, ours doesn't.) Our cluster typically runs fairly short jobs (less than 1 hour or so) so jobs usually finish before the drive finally fails. I haven't collected any hard statistics on how many failures we catch before it impacts on a user's work, but my gut feeling is that it catches over 80% of the cases, and certainly enough for it to be worthwhile implementing. Cheers, Guy Coates -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Mon Feb 16 16:00:34 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Mon, 16 Feb 2004 13:00:34 -0800 Subject: [Beowulf] Max flops to watts hardware for a cluster References: <54brnzrpqi.fsf@intech19.enhanced.com> Message-ID: <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> This is an exceedingly sophisticated question.. Do you count: Wall plug watts to flops? or CPU watts to flops? does the interconnect count? (just the power in the line drivers and terminations is a big power consumer for spaceflight hardware... why LVDS is overtaking RS-422 ... 300mV into 100 ohms is a lot better than 12-15V into 100 ohms. Too bad LVDS parts don't have the common mode voltage tolerance. I'll bet that gigabit backplane in the switch burns a fair amount of power... does the memory count? This would drive more vs less cache decisions, which affect algorithm partitioning and data locality of reference. Is there a constraint on a "total minimum speed" or "maximum number of nodes"? The interesting tradeoff in speed of nodes vs number of nodes manifests itself in many ways: more interconnects, bigger switches, etc. More nodes means Larger physical size means longer cables means more cable capacitance to charge and discharge on each bit means more power in the line drivers. What's your message latency requirement? Can you do store and forward through the nodes (a'la iPSC/1 hypercubes) (saving you the switch, but adding some power in the CPU to shuffle messages around) Can free space optical interconnects be used? (power hungry Tx and Rx, but no cable length issues) Anyway.. this is an issue that is very near and dear to my heart (since I'm designing power constrained systems). One problem you'll find is that reliable and comparable (across processors/architectures) numbers are very hard to come by. I've spent a fair amount of time explaining why 40 MFLOPs in a 20 MHz DSP can actually be a lot more "crunch" at a lot less power than a 200 MIPS PowerPC 750 running at 133 MHz. Jim Lux Spacecraft Telecommunications Section Jet Propulsion Lab ----- Original Message ----- From: "Camm Maguire" To: Sent: Monday, February 16, 2004 7:47 AM Subject: [Beowulf] Max flops to watts hardware for a cluster > Greetings! The subject line says it all -- where can one get the most > bang per watt among systems currently available? > > Take care, > -- > Camm Maguire camm at enhanced.com > ========================================================================== > "The earth is but one country, and mankind its citizens." -- Baha'u'llah > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Mon Feb 16 18:11:50 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Mon, 16 Feb 2004 23:11:50 +0000 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> References: <54brnzrpqi.fsf@intech19.enhanced.com> <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> Message-ID: <20040216231150.GA3060@galactic.demon.co.uk> On Mon, Feb 16, 2004 at 01:00:34PM -0800, Jim Lux wrote: > This is an exceedingly sophisticated question.. > > Do you count: > Wall plug watts to flops? or CPU watts to flops? Via Eden / Nehemiah chips at 1GHz for 7W or Acorn ARM e.g. Simtec evaluation boards ? > does the interconnect count? (just the power in the line drivers and > terminations is a big power consumer for spaceflight hardware... why LVDS is > overtaking RS-422 ... 300mV into 100 ohms is a lot better than 12-15V into > 100 ohms. Too bad LVDS parts don't have the common mode voltage tolerance. > Cheap slow ASICs and serial port type speeds? Low power Bluetooth devices? > I'll bet that gigabit backplane in the switch burns a fair amount of > power... > > does the memory count? This would drive more vs less cache decisions, which > affect algorithm partitioning and data locality of reference. > The early Seymour Cray model - minimum numbers of standard parts that are ultra fast? > Is there a constraint on a "total minimum speed" or "maximum number of > nodes"? The interesting tradeoff in speed of nodes vs number of nodes > manifests itself in many ways: more interconnects, bigger switches, etc. > Buckyball of PDA's anyone ? :) > More nodes means Larger physical size means longer cables means more cable > capacitance to charge and discharge on each bit means more power in the line > drivers. > Xilinx FPGA type architecture? Inmos transputer-style? Node on chip? AVR Atmel-type chips? > What's your message latency requirement? Can you do store and forward > through the nodes (a'la iPSC/1 hypercubes) (saving you the switch, but > adding some power in the CPU to shuffle messages around) > > Can free space optical interconnects be used? (power hungry Tx and Rx, but > no cable length issues) > ThinkGeek do an _ultra cool_ looking green pumped laser pointer which will reach low cloudbases :) > > Anyway.. this is an issue that is very near and dear to my heart (since I'm > designing power constrained systems). One problem you'll find is that > reliable and comparable (across processors/architectures) numbers are very > hard to come by. I've spent a fair amount of time explaining why 40 MFLOPs > in a 20 MHz DSP can actually be a lot more "crunch" at a lot less power than > a 200 MIPS PowerPC 750 running at 133 MHz. > If 5W of power goes to/from Mars - then the JPL are the ones to beat on this [makes QRP radio hams look positively profligate] :) Andy _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Feb 16 20:45:49 2004 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 17 Feb 2004 12:45:49 +1100 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> References: <54brnzrpqi.fsf@intech19.enhanced.com> <20040216231150.GA3060@galactic.demon.co.uk> <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> Message-ID: <200402171245.51746.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 17 Feb 2004 12:22 pm, Jim Lux wrote: > For those interested, all the deep space comm stuff is documented in CCSDS > specs at http://www.ccsds.org/ Cool. http://www1.ietf.org/mail-archive/ietf-announce/Current/msg27294.html This document describes how to encapsulate Internet Protocol version 4 and version 6 packets can be encapsulated in Consultative Committee for Space Data Systems (CCSDS) Space Data Link Protocols. That's going to be one hell of a round trip time for pings.. What about distributed processing between spacecraft ? OK, maybe interplanetary would be a bit much, but what about lander(s) and orbiter(s) ? - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAMXJNO2KABBYQAh8RAkTUAKCDfbAaswt3oWYDrEzXecdrqPfIPACff5cS UUAVTMwPAR3XA3lHjjf9lYc= =+LJH -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Mon Feb 16 20:22:51 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Mon, 16 Feb 2004 17:22:51 -0800 Subject: [Beowulf] Max flops to watts hardware for a cluster References: <54brnzrpqi.fsf@intech19.enhanced.com> <001c01c3f4cf$ecd25200$36a8a8c0@LAPTOP152422> <20040216231150.GA3060@galactic.demon.co.uk> Message-ID: <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> > > > If 5W of power goes to/from Mars - then the JPL are the ones to beat on > this [makes QRP radio hams look positively profligate] :) that 15W from Mars, on the omni antenna, only gets you 7-8 bits/second, working into a 70 meter diameter dish and a cryogenically cooled receiver front end. A bit beyond the typical ham's rig or budget. Going the other way, it's hundreds of kW into the dish. Beyond QRO. More realistically, they get a hundred kbps or so on the UHF link to the orbiter from a basically omni antenna on the rover. I can't recall what the max rate on the "direct to earth" X-band high gain antenna (which is about 20 cm in diameter) is, but it's probably in the same ballpark. That's the actual signalling rate, also... there's some coding going on as well, so the "data rate" is lower, after you take out framing, error correction etc. For those interested, all the deep space comm stuff is documented in CCSDS specs at http://www.ccsds.org/ --- Actually, the low power per function (or more accurately, low energy per function) champs are probably the cellphone folks.. Battery life is a real selling point. The little GPS receivers for cellphones are actually spec'd in milliJoules/fix, for instance. That said, I don't see anyone building a big crunching cluster out of cellphones... It's all those other issues you have to deal with.. interconnects, cluster management, memory, etc. They all require energy. Jim Lux Spacecraft Telecommunications Equipment Section Jet Propulsion Laboratory _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Tue Feb 17 00:34:30 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Mon, 16 Feb 2004 21:34:30 -0800 Subject: [Beowulf] Max flops to watts hardware for a cluster References: <54brnzrpqi.fsf@intech19.enhanced.com> <20040216231150.GA3060@galactic.demon.co.uk> <000401c3f4f4$90a49a90$36a8a8c0@LAPTOP152422> <200402171245.51746.csamuel@vpac.org> Message-ID: <001a01c3f517$b7bbffb0$36a8a8c0@LAPTOP152422> > > This document describes how to encapsulate Internet Protocol > version 4 and version 6 packets can be encapsulated in > Consultative Committee for Space Data Systems (CCSDS) Space > Data Link Protocols. > > That's going to be one hell of a round trip time for pings.. > > > > What about distributed processing between spacecraft ? OK, maybe > interplanetary would be a bit much, but what about lander(s) and orbiter(s) ? > > > Such ideas are being contemplated, and not only by me. There are distributed computing/ cooperative robotics sorts of things, and also "formation flying" sorts of things, not to mention "sensor webs". Probably the biggest problem is not a technology one but a philosophical one. Spacecraft and mission design is exceedingly conservative, and you'd have to show that it would enable something that's needed, that can't be done by conventional approaches. It's sufficiently unusual that it doesn't fit well with the usual analysis models for spacecraft; which tend to push towards "one big X" supplied by power from "one big Y" using "one big Z" to talk to home, etc. The costing spreadsheets used in speculative mission planning don't have cells for "number of processors in cluster" and "power per node" You need a fairly straightforward model that says, in effect, you can process "x" amount of data with "y" mass and "z" watts/joules. That model must be backed up by credible analysis and experience ("heritage" in space speak). In general, the perception is that "more parts = more potential failure points = higher risk" so it's gotta be a "this is the ONLY way to make the measurement" or it's not going to fly. You're going to spend years and years getting ready to go, and you can't go fix it if it breaks. Spaceflight is a very, very, very different conceptual and planning model. (we won't even get into what you have to do if it's connected to human space flight in any way...). The time from "great idea" to "mission launch" is probably in the area of 5-7 years. The CPU flying on the Mars Rovers is a Rad6000, which is based on an old MIPS processor. Current missions in planning and development use things like PowerPC750's (derated) and Sparc7s and 8's (aka ERC32 and/or LEON) and ADSP21020 clones. Nobody is thinking about flying ARMs or Transmetas or even Pentiums. The popular scheme these days is various and sundry microcores (6502, 8051, PPC604s) in Xilinx megagate FPGAs. Actually, though, the fact that only these relatively low powered (computationally) processors are what are flying is what makes clusters attractive. If you need hundreds of megaflops to do your measurement, you're only going to get it with multiple processors. Jim Lux JPL _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikhailberis at free.net.ph Tue Feb 17 06:56:34 2004 From: mikhailberis at free.net.ph (Dean Michael C. Berris) Date: 17 Feb 2004 19:56:34 +0800 Subject: [Beowulf] Best Setup for Batch Systems Message-ID: <1077018992.18450.21.camel@mikhail> Good day everyone, I have just a 5 node cluster networked together with a 100 Mbps Ethernet hub (well, not the best setup). The master acts as a NAT host for the internal hosts, and only the master node has 2 nics, one facing the internet and another facing the internal net. The master node is accessible from the internet, and I login to it to run jobs in the background (using screen). I've been reading a lot about OpenPBS and the Maui scheduler, but as mentioned in the list and also evident in the website, the OpenPBS system is not readily downloadable/distributable. Are there any alternatives to OpenPBS which does most of the same thing (batch scheduling of jobs for clusters)? Interfaceability using a GUI frontend (without having to make one of my own) is definitely a plus. TIA -- Dean Michael C. Berris http://mikhailberis.blogspot.com mikhailberis at free.net.ph +63 919 8720686 GPG 08AE6EAC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 17 08:47:41 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 17 Feb 2004 14:47:41 +0100 (CET) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077018992.18450.21.camel@mikhail> Message-ID: On 17 Feb 2004, Dean Michael C. Berris wrote: > Good day everyone, > > > I've been reading a lot about OpenPBS and the Maui scheduler, but as > mentioned in the list and also evident in the website, the OpenPBS > system is not readily downloadable/distributable. Are there any > alternatives to OpenPBS which does most of the same thing (batch > scheduling of jobs for clusters)? Interfaceability using a GUI frontend > (without having to make one of my own) is definitely a plus. Gridengine is probably a good bet for you. http://gridengine.sunsource.net The GUI is called qmon (I don't use it much) There are binaries available, and clear instructions on how to install it. If you have problems, join the Gridengine list where we'll help. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 17 08:40:46 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 17 Feb 2004 14:40:46 +0100 (CET) Subject: [Beowulf] Linux-HA conference and tutorial, UK Message-ID: If anyone is interested in Linux-HA, the UKUUG are having a tutorial and conference in Bournemouth. The people leading the tutorial are Alan Robertson and Lars Markowsky-Bree, who head up the Linux-HA project. http://www.ukuug.org/events/winter2004/ (ps. I won't be there) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From camm at enhanced.com Tue Feb 17 11:41:19 2004 From: camm at enhanced.com (Camm Maguire) Date: 17 Feb 2004 11:41:19 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: References: Message-ID: <547jylk6a8.fsf@intech19.enhanced.com> Greetings, and thanks for the fascinating discussion! I'm mostly interested in dram flops, and also not the absolute maximum, mars-rover level technology, but say within 10% of the best available options on a more or less commodity basis. Take care, Mark Hahn writes: > > Greetings! The subject line says it all -- where can one get the most > > bang per watt among systems currently available? > > depends on which kind of flops: cache-friendly or dram-oriented? > > > > -- Camm Maguire camm at enhanced.com ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Tue Feb 17 14:59:52 2004 From: atp at piskorski.com (Andrew Piskorski) Date: Tue, 17 Feb 2004 14:59:52 -0500 Subject: [Beowulf] ECC RAM or not? Message-ID: <20040217195952.GA50999@piskorski.com> For a low-cost cluster, would you insist on ECC RAM or not, and why? My inclination would be to always use ECC for anything, but it looks as if there is no such thing as an inexpensive motherboard which also supports ECC RAM. Either you can have a cheap motherboard (well under $100) with no ECC, or a pricey (well over $100) motherboard with ECC. Am I mistaken about this, are are there really no exceptions to this seeming "ECC motherboads are always expensive" rule? Also at least some large production clusters out there, KASY0, for example, doesn't use ECC RAM, do not use ECC - I wonder why: http://aggregate.org/KASY0/cost.html -- Andrew Piskorski http://www.piskorski.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Feb 17 18:20:12 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 17 Feb 2004 18:20:12 -0500 (EST) Subject: [Beowulf] ECC RAM or not? In-Reply-To: <20040217195952.GA50999@piskorski.com> Message-ID: > For a low-cost cluster, would you insist on ECC RAM or not, and why? how low-cost, and what kind of code? technically, the chances of seeing dram corruption depends on how much ram you have, and how much you use it (as well as environmental factors, such as altitude, of course!) for a sufficiently low-cost cluster, you'd expect to have relatively little ram, and little CPU power to churn it, and therefore low rate of bit-flips. otoh, you can bet that the recent ECC upgrade of the VT cluster had a significant real cost (probably eaten by vendors for PR reasons...) some kinds of codes are "rad hard", in the sense that if a failure gives you a possibly-wront answer, you can just check the answer. that definition pretty much excludes traditional supercomputing, and certainly all physics-based simulations. searching/optimization stuff might work well in that mode, though rechecking only catches false positives, doesn't recover from false negatives. I suspect that doing ECC is cheaper than messing around with this kind of uncertainty, even for these specialized codes. > My inclination would be to always use ECC for anything, but it looks > as if there is no such thing as an inexpensive motherboard which also > supports ECC RAM. Either you can have a cheap motherboard (well under > $100) with no ECC, or a pricey (well over $100) motherboard with ECC. well, you're really pointing out the difference between desktop and workstation/server markets. for instance, there's not much physical difference between the i875 and i865 chipsets, but the former shows up in $200 boards that need a video card, and the latter in $100 ones that have integrated video. > Am I mistaken about this, are are there really no exceptions to this > seeming "ECC motherboads are always expensive" rule? it's a marketing/market-driven phenomenon. it's useful to work out the risks when you make this kind of decision. if you have 32 low-overhead nodes containing 20K-hour power supplies, you'll need to think about doing a replacement per month. if you have a 1M-hour disk in each of 1100 nodes, you shouldn't be shocked to get a couple failures a week. if 1100 nodes with 4G but no ECC see a two undetected corruptions a day, then 32 nodes with 1G will go a couple months between events... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dieter at engr.uky.edu Tue Feb 17 18:18:20 2004 From: dieter at engr.uky.edu (William Dieter) Date: Tue, 17 Feb 2004 18:18:20 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <200402171701.i1HH13h07766@NewBlue.scyld.com> Message-ID: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu> Try the cluster design tool at . You can enter your basic memory, memory bandwidth, etc requirements, then set the metric weighting to choose designs with the least power consumption first. For example, for the default requirements (minimal memory, disk, and network requirements, at least 50 GFLOPS, and a $10,000 budget), and weighting power consumption first then memory bandwidth, followed by GFLOPS I get the following as the best design: 23 Generic Fast Ethernet NIC $8.00 $184.00 23 Cat5 Cable for Fast Ethernet $2.00 $46.00 1 Generic 24 Port Fast Ethernet Switch $76.00 $76.00 23 Pentium 4 2.4GHz $166.00 $3818.00 23 Generic Socket 478 $56.00 $1288.00 69 Generic PC3200 256MB DDR $44.00 $3036.00 23 Generic Mid-Tower Case $50.00 $1150.00 3 Generic 2x2 Shelving Unit with Wheels $50.00 $150.00 Total $9748.00 The above design gets you 50 GFLOPS and 2.67 bytes/FLOP for about 30 Amps (you get to convert Amps to Watts.) Everything else in the design is pretty minimal, but you can adjust the requirements on the form to get what you need (or if you can't let me know why not :-) The CGI tries all designs with the parts in its database to find the ones that meet your requirements and metric weighting. The model includes current consumption for switches and compute nodes based on the power supply. The parts database is a bit out of date right now... let me know what you think. Bill Dieter. dieter at engr.uky.edu On Tuesday, February 17, 2004, at 12:01 PM, Camm Maguire wrote: > Greetings, and thanks for the fascinating discussion! > > I'm mostly interested in dram flops, and also not the absolute > maximum, mars-rover level technology, but say within 10% of the best > available options on a more or less commodity basis. > > Take care, > > Mark Hahn writes: > >>> Greetings! The subject line says it all -- where can one get the >>> most >>> bang per watt among systems currently available? >> >> depends on which kind of flops: cache-friendly or dram-oriented? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Feb 17 21:38:39 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 18 Feb 2004 13:38:39 +1100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077018992.18450.21.camel@mikhail> References: <1077018992.18450.21.camel@mikhail> Message-ID: <200402181338.50678.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 17 Feb 2004 10:56 pm, Dean Michael C. Berris wrote: > I've been reading a lot about OpenPBS and the Maui scheduler, but as > mentioned in the list and also evident in the website, the OpenPBS > system is not readily downloadable/distributable. There is a forked version of OpenPBS called 'Torque' (it was called ScalablePBS, but Altair requested it changed its name) which includes a whole host of bug fixes and enhancements (including massive scalability) and is freely downloadable under an earlier, more free, OpenPBS license. It's under active development and has an active user community, though the mailing list is moderated for some bizzare reason, which means posts can take a little while to get through. The website is at: http://www.supercluster.org/projects/torque/ Good luck! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAMtAvO2KABBYQAh8RAp8cAJsHNJuoCmIxYMNUWguwpoueopKUxACdHJiq p0nGW3X3ATurlzaV+Iw5jtg= =xwcU -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Feb 17 23:20:37 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 18 Feb 2004 12:20:37 +0800 (CST) Subject: [Beowulf] SLURM - newest (and greatest?) batch system Message-ID: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com> One of the new features of SGE 6.0 is the parallelized job container (qmaster). Another batch system called SLURM (Simple Linux Utility for Resource Management) will be releasing soon. http://www.llnl.gov/linux/slurm/slurm.html - Like SGE 6.0, it also uses threads to parallelize the job container. - licensed under the GPL!! - developed by the US gov - uses Maui - designed to be simple :) - supports lots of interconnect switches. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Tue Feb 17 23:02:54 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 18 Feb 2004 12:02:54 +0800 (CST) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077018992.18450.21.camel@mikhail> Message-ID: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> You can choose between SGE and SPBS. SGE has more features, better fault tolerance, better documentation, and better user support. http://gridengine.sunsource.net SPBS is closer to what you have now, so you and your users (BTW, are you the only one?) don't need to learn something new. http://www.supercluster.org/ Andrew. --- "Dean Michael C. Berris" ????> Good day everyone, > > I have just a 5 node cluster networked together with > a 100 Mbps Ethernet > hub (well, not the best setup). The master acts as a > NAT host for the > internal hosts, and only the master node has 2 nics, > one facing the > internet and another facing the internal net. The > master node is > accessible from the internet, and I login to it to > run jobs in the > background (using screen). > > I've been reading a lot about OpenPBS and the Maui > scheduler, but as > mentioned in the list and also evident in the > website, the OpenPBS > system is not readily downloadable/distributable. > Are there any > alternatives to OpenPBS which does most of the same > thing (batch > scheduling of jobs for clusters)? Interfaceability > using a GUI frontend > (without having to make one of my own) is definitely > a plus. > > TIA > > -- > Dean Michael C. Berris > http://mikhailberis.blogspot.com > mikhailberis at free.net.ph > +63 919 8720686 > GPG 08AE6EAC > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Tue Feb 17 23:37:00 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Tue, 17 Feb 2004 20:37:00 -0800 Subject: [Beowulf] ECC RAM or not? References: Message-ID: <003101c3f5d8$d9b03250$36a8a8c0@LAPTOP152422> > some kinds of codes are "rad hard", in the sense that if a failure gives > you a possibly-wront answer, you can just check the answer. My practical experience with DRAM designs has been that bit errors are more likely due to noise/design issues than radiation induced single event upsets. Back in the 80's I worked on a Multibus system where we used to get double bit errors in 11/8 ecc several times a week. Everyone just said "well, that's why we have ECC" until I did some quick statistics on what the ratio between single bit (corrected but counted) and double bit errors should have been. Such high rates defied belief, and it turned out to be a bus drive problem. that definition > pretty much excludes traditional supercomputing, and certainly all > physics-based simulations. searching/optimization stuff might work well > in that mode, though rechecking only catches false positives, doesn't > recover from false negatives. I suspect that doing ECC is cheaper than > messing around with this kind of uncertainty, even for these specialized codes. There are a number of algorithms which have inherent self checking built in. In the accounting business, this is why there's double entry, and/or checksums. In the signal processing world, there are checks you can do on things like FFTs, where total power in should equal total power out. > > > if you have 32 low-overhead nodes containing 20K-hour power supplies, you'll > need to think about doing a replacement per month. > > if you have a 1M-hour disk in each of 1100 nodes, you shouldn't be shocked > to get a couple failures a week. Shades of replacing tubes in Eniac or the Q-7A MIL-HDBK-217A is the "bible" on these sorts of computations. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Tue Feb 17 23:28:53 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Tue, 17 Feb 2004 20:28:53 -0800 Subject: [Beowulf] Max flops to watts hardware for a cluster References: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu> Message-ID: <002701c3f5d7$ceae2980$36a8a8c0@LAPTOP152422> This kind of thing is way cool.. Have you published the algorithm behind the page in a concise form somewhere? It would be handy to be able to point mission/system planners to it. ----- Original Message ----- From: "William Dieter" To: Sent: Tuesday, February 17, 2004 3:18 PM Subject: Re: [Beowulf] Max flops to watts hardware for a cluster > Try the cluster design tool at > . You can enter your basic > memory, memory bandwidth, etc requirements, then set the metric > weighting to choose designs with the least power consumption first. > > For example, for the default requirements (minimal memory, disk, and > network requirements, at least 50 GFLOPS, and a $10,000 budget), and > weighting power consumption first then memory bandwidth, followed by > GFLOPS I get the following as the best design: > > 23 Generic Fast Ethernet NIC $8.00 $184.00 > 23 Cat5 Cable for Fast Ethernet $2.00 $46.00 > 1 Generic 24 Port Fast Ethernet Switch $76.00 $76.00 > 23 Pentium 4 2.4GHz $166.00 $3818.00 > 23 Generic Socket 478 $56.00 $1288.00 > 69 Generic PC3200 256MB DDR $44.00 $3036.00 > 23 Generic Mid-Tower Case $50.00 $1150.00 > 3 Generic 2x2 Shelving Unit with Wheels $50.00 $150.00 > Total $9748.00 > > The above design gets you 50 GFLOPS and 2.67 bytes/FLOP for about 30 > Amps (you get to convert Amps to Watts.) Everything else in the design > is pretty minimal, but you can adjust the requirements on the form to > get what you need (or if you can't let me know why not :-) > > The CGI tries all designs with the parts in its database to find the > ones that meet your requirements and metric weighting. The model > includes current consumption for switches and compute nodes based on > the power supply. The parts database is a bit out of date right now... > > let me know what you think. > > Bill Dieter. > dieter at engr.uky.edu > > On Tuesday, February 17, 2004, at 12:01 PM, Camm Maguire wrote: > > Greetings, and thanks for the fascinating discussion! > > > > I'm mostly interested in dram flops, and also not the absolute > > maximum, mars-rover level technology, but say within 10% of the best > > available options on a more or less commodity basis. > > > > Take care, > > > > Mark Hahn writes: > > > >>> Greetings! The subject line says it all -- where can one get the > >>> most > >>> bang per watt among systems currently available? > >> > >> depends on which kind of flops: cache-friendly or dram-oriented? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Wed Feb 18 01:26:38 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Tue, 17 Feb 2004 22:26:38 -0800 Subject: [Beowulf] ECC RAM or not? References: Message-ID: <000601c3f5e8$2a8430f0$36a8a8c0@LAPTOP152422> ----- Original Message ----- From: "Mark Hahn" To: "Jim Lux" Sent: Tuesday, February 17, 2004 9:36 PM Subject: Re: [Beowulf] ECC RAM or not? > > > some kinds of codes are "rad hard", in the sense that if a failure gives > > > you a possibly-wront answer, you can just check the answer. > > > > My practical experience with DRAM designs has been that bit errors are more > > likely due to noise/design issues than radiation induced single event > > upsets. > > understood. then again, you're using deliberately selected rad-hard-ware, no? Nope... that was off the shelf DRAMs in a commercial environment (in 1980ish time frame, so they were none too dense DRAMs, either.. 256kB on a board I think, many, many, pieces.. probably 64kbit parts..) > I was mostly thinking about a talk I saw by the folks who care for ASCI-Q, > which is in Los Alamos. they say that the altitude alone is worth a 14x > increase in particle flux, and that this caused big problems for them with > a particular register on the ES40 data path that was not ecc'ed. Indeed.. ECC on memory is only part of the problem.. you really need ECC on address and data lines for full coverage (or, more properly EDAC).. The classic paper on altitude effects was done by folks at IBM, where they ran boards in NY and in Denver and, underground in Denver. Good experimental technique, etc. > > > Back in the 80's I worked on a Multibus system where we used to get > > double bit errors in 11/8 ecc several times a week. Everyone just said > > "well, that's why we have ECC" until I did some quick statistics on what the > > ratio between single bit (corrected but counted) and double bit errors > > should have been. Such high rates defied belief, and it turned out to be a > > bus drive problem. > > makes sense. to be honest, I don't see many single-bit errors even, > but today we've only < 200 GB ram online. inside a year, it'll probably > be more like 2TB, so maybe things will get more exciting ;) It's a very mixed bag, depending on what's causing the errors. If it's radiation, smaller feature sizes mean that there's a smaller target to hit, and the amount of energy transferred is less (of course, less energy is stored in the memory cell, too) > we're also pretty much at sealevel, with lots of building over us. > reactor next door, though ;) Type of particle, and it's energy, has a huge effect on the SEU effects. I would maintain, though, that run of the mill timing margin effects, particularly over temperature; and EMI/EMC effects are probably a more important source of bit hits in modern computers. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikhailberis at free.net.ph Wed Feb 18 05:25:22 2004 From: mikhailberis at free.net.ph (Dean Michael C. Berris) Date: 18 Feb 2004 18:25:22 +0800 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> References: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> Message-ID: <1077099918.4818.15.camel@mikhail> Thanks sir, and to everyone else that responded. I'm currently reading on SGE, and am going to be choosing as soon as I get the full picture. Currently my preference is still towards SPBS (Torque) mainly because it doesn't seem as complicated to set up. However, as a Debian user, I did an apt-cache search on batch system and a couple of packages were Queue and DQS (Distributed Queueing System). I went over to the DQS website, and I'm reading on it right now. What I'd like to know would be how different DQS (and/or Queue) is with regards to SPBS and SGE? It would seem like from what I've been reading, SGE and SPBS are really for clusters (and grids), and DQS is for a collection of computers that really don't work as a cluster (or as a parallel computer). How accurate is this assessment of mine? Are there any articles written by people in the group regarding comparisons between SGE and SPBS with regards to effectivity and reliability? Scalability is also a factor because the cluster may grow as more funding and problems get into the cluster project. I hope I never cease to get enlightened from posts in the group, and insights would be most appreciated. Thanks very much and have a nice day! :) On Wed, 2004-02-18 at 12:02, Andrew Wang wrote: > You can choose between SGE and SPBS. > > SGE has more features, better fault tolerance, better > documentation, and better user support. > > http://gridengine.sunsource.net > > SPBS is closer to what you have now, so you and your > users (BTW, are you the only one?) don't need to learn > something new. > > http://www.supercluster.org/ > > Andrew. > > > --- "Dean Michael C. Berris" > ????> Good day > everyone, > > > > I have just a 5 node cluster networked together with > > a 100 Mbps Ethernet > > hub (well, not the best setup). The master acts as a > > NAT host for the > > internal hosts, and only the master node has 2 nics, > > one facing the > > internet and another facing the internal net. The > > master node is > > accessible from the internet, and I login to it to > > run jobs in the > > background (using screen). > > > > I've been reading a lot about OpenPBS and the Maui > > scheduler, but as > > mentioned in the list and also evident in the > > website, the OpenPBS > > system is not readily downloadable/distributable. > > Are there any > > alternatives to OpenPBS which does most of the same > > thing (batch > > scheduling of jobs for clusters)? Interfaceability > > using a GUI frontend > > (without having to make one of my own) is definitely > > a plus. > > > > TIA > > > > -- > > Dean Michael C. Berris > > http://mikhailberis.blogspot.com > > mikhailberis at free.net.ph > > +63 919 8720686 > > GPG 08AE6EAC > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or > > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > ----------------------------------------------------------------- > ??? Yahoo!?? > ?????????????????????? > http://tw.promo.yahoo.com/mail_premium/stationery.html > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Dean Michael C. Berris http://mikhailberis.blogspot.com mikhailberis at free.net.ph +63 919 8720686 GPG 08AE6EAC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Wed Feb 18 05:47:27 2004 From: ds10025 at cam.ac.uk (ds10025 at cam.ac.uk) Date: Wed, 18 Feb 2004 10:47:27 +0000 Subject: [Beowulf] Howto setup jobs using MPI In-Reply-To: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> References: <1077018992.18450.21.camel@mikhail> Message-ID: <5.1.1.6.0.20040218104616.02a89e00@imap.hermes.cam.ac.uk> Hi How best to setup jobs using MPI? Dan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mack.joseph at epa.gov Wed Feb 18 07:22:07 2004 From: mack.joseph at epa.gov (Joseph Mack) Date: Wed, 18 Feb 2004 07:22:07 -0500 Subject: [Beowulf] S.M.A.R.T usage in big clusters References: <20040214192822.35170.qmail@web21203.mail.yahoo.com> Message-ID: <403358EF.7F0BDE75@epa.gov> Steven Timm wrote: > > On the drives where we have had the most failures we've kept track > of how well SMART predicted it pretty well.. it finds an error > in advance about half the time. How do you get your information out of smartd? I've found output in syslog - presumably I can grep for this. I can get e-mail if I want (from the docs). To look at the output of the long and short tests it appears that I have to interactively use smartctl. Is there anyway to have a flag that can be looked at periodically to say "this disk is about to fail"? Thanks Joe -- Joseph Mack PhD, High Performance Computing & Scientific Visualization SAIC, Supporting the EPA Research Triangle Park, NC 919-541-0007 Federal Contact - John B. Smith 919-541-1087 - smith.johnb at epa.gov _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Wed Feb 18 09:16:48 2004 From: timm at fnal.gov (Steven Timm) Date: Wed, 18 Feb 2004 08:16:48 -0600 (CST) Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <403358EF.7F0BDE75@epa.gov> References: <20040214192822.35170.qmail@web21203.mail.yahoo.com> <403358EF.7F0BDE75@epa.gov> Message-ID: On Wed, 18 Feb 2004, Joseph Mack wrote: > Steven Timm wrote: > > > > > On the drives where we have had the most failures we've kept track > > of how well SMART predicted it pretty well.. it finds an error > > in advance about half the time. > > How do you get your information out of smartd? > > I've found output in syslog - presumably I can grep for this. At the moment we are not using smartd. I was running an older version that didn't have it as part of the package. I wrote some cron scripts that do a short test every night and capture the output to a file. But we are going to transition and use smartd and use an agent we already have that is grepping /var/log/messages for other purposes. Steve Timm > > I can get e-mail if I want (from the docs). > > To look at the output of the long and short tests it appears that > I have to interactively use smartctl. > > Is there anyway to have a flag that can be looked at periodically to > say "this disk is about to fail"? > > Thanks Joe > -- > Joseph Mack PhD, High Performance Computing & Scientific Visualization > SAIC, Supporting the EPA Research Triangle Park, NC 919-541-0007 > Federal Contact - John B. Smith 919-541-1087 - smith.johnb at epa.gov > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dieter at engr.uky.edu Wed Feb 18 09:35:55 2004 From: dieter at engr.uky.edu (William Dieter) Date: Wed, 18 Feb 2004 09:35:55 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <002701c3f5d7$ceae2980$36a8a8c0@LAPTOP152422> Message-ID: On Tuesday, February 17, 2004, at 11:28 PM, Jim Lux wrote: > This kind of thing is way cool.. > Have you published the algorithm behind the page in a concise form > somewhere? It would be handy to be able to point mission/system > planners to > it. We just submitted the paper to IEEE Computer for review last week. If you want to look at the source code, it is available through . I haven't made an official tarball release yet, but you can get the latest code through CVS. If you want to make your own parts database on our website you can do that, too. It copies one of the existing databases into a new one, so if you just want to update a few prices, or add a few new parts, it doesn't take too much effort. Bill. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hanzl at noel.feld.cvut.cz Wed Feb 18 11:28:25 2004 From: hanzl at noel.feld.cvut.cz (hanzl at noel.feld.cvut.cz) Date: Wed, 18 Feb 2004 17:28:25 +0100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077099918.4818.15.camel@mikhail> References: <20040218040254.96478.qmail@web16808.mail.tpe.yahoo.com> <1077099918.4818.15.camel@mikhail> Message-ID: <20040218172825E.hanzl@unknown-domain> > However, as a Debian user, I did an apt-cache search on batch system and > a couple of packages were Queue and DQS (Distributed Queueing System). I > went over to the DQS website, and I'm reading on it right now. What I'd > like to know would be how different DQS (and/or Queue) is with regards > to SPBS and SGE? DQS is SGE's grandfather, the genealogy goes somehow like this: DQS(Florida State Univ.) -> CODINE(Genias) -> SGE(Sun) so you can expect DQS to be much simpler but also you can expect SGE to be much improoved. (My personal choice is SGE and I am quite happy with it.) Regards Vaclav Hanzl _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Wed Feb 18 09:35:23 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 18 Feb 2004 15:35:23 +0100 (CET) Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: <92F43F63-619F-11D8-B4A2-000393BF25C6@engr.uky.edu> Message-ID: On Tue, 17 Feb 2004, William Dieter wrote: > 23 Generic Fast Ethernet NIC $8.00 $184.00 How much in terms of power have you assigned to this item ? If you really buy a cheap low-end FE NIC, you'll most probably end up with a RTL8139 based card. This chip by design puts quite a load on the main CPU especially if you use it in a cluster context (=lots of network activity). This might increase significantly the power consumption or reduce the available flops... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dieter at engr.uky.edu Wed Feb 18 10:24:31 2004 From: dieter at engr.uky.edu (William Dieter) Date: Wed, 18 Feb 2004 10:24:31 -0500 Subject: [Beowulf] Max flops to watts hardware for a cluster In-Reply-To: Message-ID: <8CD39EDD-6226-11D8-B4A2-000393BF25C6@engr.uky.edu> On Wednesday, February 18, 2004, at 09:35 AM, Bogdan Costescu wrote: > On Tue, 17 Feb 2004, William Dieter wrote: > >> 23 Generic Fast Ethernet NIC $8.00 $184.00 > > How much in terms of power have you assigned to this item ? The tool is not perfect. We have not broken down the power to that level of detail. There is a tradeoff between how much work you have to do for each component and how much detail the model has. > If you really buy a cheap low-end FE NIC, you'll most probably end up > with a RTL8139 based card. This chip by design puts quite a load on > the main CPU especially if you use it in a cluster context (=lots of > network activity). This might increase significantly the power > consumption or reduce the available flops... To get really accurate power consumption numbers we would have to measure for many different CPU/Motherboard/NIC combinations. OTOH, there are some really cheap cards based on the Davicom 9102 chipset, (newegg.com has at least two different brands for $4.00 to $6.00). The Davicom 9102 is enough of a tulip clone that the Ethernet HOWTO recommends trying the tulip driver before the manufacturer supplied driver... Bill. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Wed Feb 18 13:27:55 2004 From: bclem at rice.edu (Brent M. Clements) Date: Wed, 18 Feb 2004 12:27:55 -0600 (CST) Subject: [Beowulf] Best or standard hpc kernel sysctl settings. Message-ID: As part of our standards documentation, I'd like to set a good starting point for tuning various kernel parameters for clusters on Rice's campus. We have a few sysctl settings that we do based on the requirements of certain codes, but I'd like to know how everyone else is tuning their linux systems in their clusters. Can I get from you guys the sysctl parameter, it's value, and the reason why you set it that way? Thanks, Brent Clements _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dag at sonsorol.org Wed Feb 18 15:54:58 2004 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed, 18 Feb 2004 15:54:58 -0500 Subject: [Beowulf] 2nd call for speakers -- Bioclusters 2004 Workshop -- March 30 Boston, MA Message-ID: <4033D122.4080008@sonsorol.org> { Apologies for the cross-posting } Enclosed is a meeting announcement for a 1 day workshop we are organizing alongside the much larger 'BioITWorld Expo' in Boston, Ma. The goals are two-fold -- recreating the vibe from the OReilly Bioinformatics Technology conference series that was recently cancelled as well as providing a forum where folks involved at the intersection of life science research and high performance IT can come together to talk shop. Feel free to pass along the enclosed announcement as appropriate. We are actively seeking technical talks and presentations focusing on how challenging problems were solved or overcome. Regards, Chris {on behalf of the organizing committee} Email: bioclusters04 at open-bio.org -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bioclusters-workshop.txt URL: From nixon at nsc.liu.se Tue Feb 17 09:16:39 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Tue, 17 Feb 2004 15:16:39 +0100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077018992.18450.21.camel@mikhail> (Dean Michael C. Berris's message of "17 Feb 2004 19:56:34 +0800") References: <1077018992.18450.21.camel@mikhail> Message-ID: "Dean Michael C. Berris" writes: > I've been reading a lot about OpenPBS and the Maui scheduler, but as > mentioned in the list and also evident in the website, the OpenPBS > system is not readily downloadable/distributable. Torque (a.k.a Storm, a.k.a. Scalable PBS) is a fork of the OpenPBS source tree, with active maintenance and a reasonable license. http://www.supercluster.org/projects/torque It plays nicely with Maui. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.giesen at kodak.com Tue Feb 17 15:58:57 2004 From: david.giesen at kodak.com (David J Giesen) Date: Tue, 17 Feb 2004 15:58:57 -0500 Subject: [Beowulf] Cluster questions for Quantum Chemistry Message-ID: <40328091.A0200730@kodak.com> Hello- (Apologies to those who have seen a similar question on the CCL mailing list) We may be in the market for a new Linux cluster these days. Unfortunately, I haven't kept up on all the latest issues, and I'd appreciate any answers you all have for any of these questions. We want to run mainly QM codes such as Gaussian 98/Gaussian 03, Jaguar and PQS on these machines with linux. We'd likely be running in parallel, typically across 3-4 dual-processor nodes. 1) Xeon vs P4: [a] At the same GHz and front-side bus speed is there a difference in performance between these chips? [b] Is there a difference in reliability? 2) AMD Opteron vs Athlon: [a] Does any QM code actually take advantage of Opteron's 64-bit technology? [b] Have people moved away from Athlon boxes because of heat problems? 3) AMD vs Intel: How to compare speeds between these two different types of processors for QM codes? Does an Athlon 2800 (2.08 GHz) run more like a 2.0 GHz P4 or a 2.8 GHz P4? 3) How important is front-side bus speed these days for quantum chemistry problems? 4) How important are 100 MHz ethernet versus 1 Gb ethernet connections between the nodes for quantum chemistry problems? Thanks in advance! Dave Any questions which highlight out my extreme stupidity are a result of exactly that (my own stupidity) rather than a reflection on the positions of the Eastman Kodak Company. -- Dr. David J. Giesen Eastman Kodak Company david.giesen at kodak.com 2/83/RL MC 02216 (ph) 1-585-58(8-0480) Rochester, NY 14650 (fax)1-585-588-1839 -- Dr. David J. Giesen Eastman Kodak Company david.giesen at kodak.com 2/83/RL MC 02216 (ph) 1-585-58(8-0480) Rochester, NY 14650 (fax)1-585-588-1839 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Feb 18 22:09:53 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Thu, 19 Feb 2004 11:09:53 +0800 (CST) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <1077099918.4818.15.camel@mikhail> Message-ID: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com> --- "Dean Michael C. Berris" > I'm currently reading > on SGE, and am going to be choosing as soon as I get > the full picture. > Currently my preference is still towards SPBS > (Torque) mainly because it > doesn't seem as complicated to set up. To install SGE, you don't even need to compile the source, just download the pre-compiled binary package, or grab the rpm. And also, SGE doesn't require root access, it can untar the package in your home directory, run the install scripts, and start playing with it. > What I'd > like to know would be how different DQS (and/or > Queue) is with regards > to SPBS and SGE? Debian is planning to replace DQS with SGE, but the maintainer of DQS was gone (he left the university). DQS and SGE are very similar. And PBS and SPBS are very similar too. > It would seem like from what I've been reading, SGE > and SPBS are really > for clusters (and grids), and DQS is for a > collection of computers that > really don't work as a cluster (or as a parallel > computer). How accurate > is this assessment of mine? Are you talking about compute farms? SGE is also used in compute farms as well, where people run EDA simulations, graphic rendering jobs, BLAST jobs, etc. SGE has quite a lot of resource management features. SPBS/PBS are used in HPC clusters, since before SGE was opensource, PBS was free/opensource, so more people used it in those environments. > Are there any articles written by people in the > group regarding > comparisons between SGE and SPBS with regards to > effectivity and > reliability? SGE vs PBS on the rocks cluster mailing list: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-September/002980.html SPBS has lots of patches integrated, but still if your SPBS master node crashes, your cluster is gone. In SGE, the admin can config 1 or more shadow masters, so in theory as long as any one machine in the cluster is running, your cluster is not dead. > Scalability is also a factor because > the cluster may grow > as more funding and problems get into the cluster > project. Both SGE and SPBS can scale to thousands of nodes, the question is, do you have the funding? :-) (SGE 6.0 will scale even further) > I hope I never cease to get enlightened from posts > in the group, and > insights would be most appreciated. I think you should try to install both, it is better to feel it than to just listen to other people. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Feb 18 22:38:18 2004 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 19 Feb 2004 14:38:18 +1100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com> References: <20040219030953.36721.qmail@web16809.mail.tpe.yahoo.com> Message-ID: <200402191438.19333.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 19 Feb 2004 02:09 pm, Andrew Wang wrote: > SPBS has lots of patches integrated, but still if your > SPBS master node crashes, your cluster is gone. Well, depends on your definition of "gone" really. People can't queue new jobs, jobs waiting to run won't be started, but as long as your filestore is elsewhere then running jobs won't be interrupted. However, if your filestore server disappears then you're stuffed. :-) Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFANC+qO2KABBYQAh8RApu3AKCET1tayR/fx4dStcQXO+AXJgThUACdE+3q jHWTp4HmlzO8CnmObbFarWA= =PrTq -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Thu Feb 19 10:41:26 2004 From: raysonlogin at yahoo.com (Rayson Ho) Date: Thu, 19 Feb 2004 07:41:26 -0800 (PST) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <200402191438.19333.csamuel@vpac.org> Message-ID: <20040219154126.56423.qmail@web11411.mail.yahoo.com> I think it is one of the biggest problems with *PBS, especially in the compute farm environment. The more advanced batch systems (SGE and LSF) have this feature for years, not sure why *PBS still don't have it. (AFAIK, PBSPro 5.4 will include it, but isn't it late??) Rayson --- Chris Samuel wrote: > Well, depends on your definition of "gone" really. > > People can't queue new jobs, jobs waiting to run won't be started, > but as long > as your filestore is elsewhere then running jobs won't be > interrupted. > > However, if your filestore server disappears then you're stuffed. :-) > > Chris > __________________________________ Do you Yahoo!? Yahoo! Mail SpamGuard - Read only the mail you want. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.brookes at quadrics.com Thu Feb 19 10:50:43 2004 From: john.brookes at quadrics.com (john.brookes at quadrics.com) Date: Thu, 19 Feb 2004 15:50:43 -0000 Subject: [Beowulf] Best Setup for Batch Systems Message-ID: <30062B7EA51A9045B9F605FAAC1B4F6234EB15@tardis0.quadrics.com> If you keep the db on a separate filestore then - if your pbs server goes down - you can just have a failover server that 'becomes' (takes over the ipaddr and hostname - the other nodes won't even notice the difference) the original server if the original gets screwed. We've got a couple of customers that do this, but YMMV as they use: a) a somewhat non-standard PBS; b) out-of-band management to ensure that the node isn't just temporarily unresponsive. Cheers, John Brookes Quadrics > -----Original Message----- > From: Chris Samuel [mailto:csamuel at vpac.org] > Sent: 19 February 2004 03:38 > To: beowulf at beowulf.org > Subject: Re: [Beowulf] Best Setup for Batch Systems > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Thu, 19 Feb 2004 02:09 pm, Andrew Wang wrote: > > > SPBS has lots of patches integrated, but still if your > > SPBS master node crashes, your cluster is gone. > > Well, depends on your definition of "gone" really. > > People can't queue new jobs, jobs waiting to run won't be > started, but as long > as your filestore is elsewhere then running jobs won't be interrupted. > > However, if your filestore server disappears then you're stuffed. :-) > > Chris > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQFANC+qO2KABBYQAh8RApu3AKCET1tayR/fx4dStcQXO+AXJgThUACdE+3q > jHWTp4HmlzO8CnmObbFarWA= > =PrTq > -----END PGP SIGNATURE----- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From radams at csail.mit.edu Thu Feb 19 14:03:38 2004 From: radams at csail.mit.edu (Ryan Adams) Date: Thu, 19 Feb 2004 14:03:38 -0500 Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC Message-ID: <1077217418.4982.35.camel@localhost> Please forgive the length of this email, as I'm going to try to be comprehensive: I have a problem that divides nicely (embarrassingly?) into parallelizable chunks. Each chunk takes approximately 2 to 5 seconds to complete and requires no communication during that time. Essentially there is a piece of data, around 500KB that must be processed and a result returned. I'd like to process as many of these pieces of data as possible. I am considering building a small heterogeneous cluster to do this (at home, basically), and am trying to decide exactly how to architect the task distribution. The network will probably be Fast Ethernet. Initially there will be four machines processing the data, but I could imagine as many as ten in the near term. My current back-of-the-envelope math puts an aggregate load (assuming 2.0s per job, 500KB transferred each, with ten nodes) of 2.5MB/s on the network, so it would seem that 100BT can get the job done without introducing much delay compared to the 2.0s execution time. Perhaps I am doing this math wrong, but I was also thinking that since the download of the data is such an I/O-intensive task that it would be reasonable to place that in a separate thread from the floating point calculations. This way, I could hope to work on data while my socket read is blocking. My question is basically this: is 2-5 seconds too small of a job to justify a batching system like *PBS or Gridengine? It would seem that the overhead for a job that requires a few hours would be very insignificant, but what about a few seconds? Certainly, one option would be to bundle sets of these chunks together for a larger effective job. Am I wasting my time thinking about this? I've been considering rolling my own scheduling system using some kind of RPC, but I've been around software development long enough to know that it is better to use something off-the-shelf if at all possible. Thanks in advance... Ryan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 19 14:20:04 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 19 Feb 2004 14:20:04 -0500 (EST) Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <1077217418.4982.35.camel@localhost> Message-ID: On Thu, 19 Feb 2004, Ryan Adams wrote: > My question is basically this: is 2-5 seconds too small of a job to > justify a batching system like *PBS or Gridengine? It would seem that > the overhead for a job that requires a few hours would be very > insignificant, but what about a few seconds? Certainly, one option > would be to bundle sets of these chunks together for a larger effective > job. Am I wasting my time thinking about this? > > I've been considering rolling my own scheduling system using some kind > of RPC, but I've been around software development long enough to know > that it is better to use something off-the-shelf if at all possible. > > Thanks in advance... I personally think that it is too small a task to use a batching system, especially since you're likely not going to architect it as a true batching system. I think you have three primary options for ways to develop your code. Well, four if you count NFS. The SIMPLEST way is to put your data blocks in files on an NFS crossmounted filesystem, and start jobs inside e.g. a simple perl script loop that grabs "the next data file" and runs on it and writes out its results back to the NFS file system for dealing with or accruing later. You're basically using NFS as your transport mechanism. Now, NFS isn't horribly efficient relative to raw peak network speed, but neither is it completely horrible -- at 100 BT (say 9-10 MB/sec peak BW) you ought to be able to get at least half of that on an NFS read of a big file. At 5 MB/sec, your 1/2 MB file should take a 0.1 seconds to be read (plus a latency hit) which is "small" (as you note) compared to a run time of 2-5 seconds so you should be able to get nice parallel speedup on four or five hosts. You can test your combined latency and bandwidth with a simple perl script or binary that opens a dozen (different!) files inside a loop. Beware caching, which will give you insane numbers if you aren't careful (as in don't run the test twice on the files without modifying them on the server). The other three ways do it "properly" and permit you both finer control (with the NFS method you'll have to work out file locking and work distribution to make sure two nodes don't try to work on the same file at the same time) and higher BW, close to the full bandwidth of the network. They'll ALSO require more programming. a) PVM b) MPI c) raw networking. PVM is a message passing library. There is a PVM program template on my personal GPL software website: http://www.phy.duke.edu/~rgb/General/general.php that might suffice to get you started -- it should just compile and run a simple master/slave program, and you should be able to modify it fairly simply to have the master distribute the next block of work to the first worker/slave to finish. If your CPUs are well balanced the I/O transactions will antibunch and communications will be very efficient. MPI is another message passing library. I don't have an MPI template, but there are example programs in the MPI distributions and on many websites, and there are books (on both PVM and MPI) from e.g. MIT press that are quite excellent. There is also a regular MPI column in Cluster World Magazine that has been working through intro level MPI applications, and old columns by Forrest Hoffman in Linux Magazine ditto. At least -- google is your friend. Both PVM and MPI are likely to be similar in ease of programming, hassle of setting up a parallel environment, and speed, and both of them should give you a very healthy fraction of wirespeed while shielding you from having to directly manipulate the network. Finally there are raw sockets (which it sounds like you are inclined towards). Now, I have nothing against raw socket programming (he says having spent the day on xmlsysd/wulflogger/libwulf, a raw socket-based monitoring program:-). However, it is NOT trivial -- you have to invent all sorts of wheels that are already invented for you and wrapped up in simple library calls with PVM or MPI. Its advantages are maximal speed -- you can't get faster than a point to point network connection -- the ability to thread the connection/I/O component and MAYBE take advantage of letting the NIC do some of the work via DMA while the CPU is doing other work, and complete control. The disadvantages are that you'll be responsible for determining e.g. message length, dealing with a dropped connection without crashing everything, debugging your server daemon and worker clients (or worker daemons and master client) in parallel when they are running on different machines, and so forth. I >>might<< be able to provide you with some applications that aren't exactly templates but that illustrate how to get started on this approach (and refer you to some key books) but if you really are a networking novice you'll need to want to do this as an excuse to stop being a novice by writing your own application or it isn't worth it. You'll need to be a much better and more skilled programmer altogether in order to debug everything and check for the myriad of error conditions that can occur and deal with them robustly. There are really a few other approaches -- perl now supports threads so you CAN use a perl script and ssh as a master/work distribution system -- but raw sockets aren't much easier to manage in perl than they are in C and using ssh as a transport layer adds overhead at least equal to or in excess to NFS, so you'd probably want to use NFS as transport and the perl script to just manage task distribution (for which it is ideally suited in this simple a context). I have a nice example threaded perl task distribution script (which I wrote for MY Cluster Magazine column some months ago:-) which I can put somewhere if this interests you. HTH, rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dcs at et.byu.edu Thu Feb 19 15:19:56 2004 From: dcs at et.byu.edu (Dave Stirling) Date: Thu, 19 Feb 2004 13:19:56 -0700 (MST) Subject: [Beowulf] comparing MPI HPC interconnects: manageability? Message-ID: Hi all, While performance (latency, bandwidth) usually comes to the fore in discussions about high performance interconnects for MPI clusters, I'm curious as to what your experiences are from the standpoint of manageability -- NIC's and spines and switches all fail at one time or another, but I'd like input as to how individual products (Myrinet, Quadrics, Infiniband, etc) handle this. In your clusters does the hardware replacement involve simple steps (swap out the NIC, rerun some config utilities) or something more complex (such as bringing down the entire high speed network to reconfigure it so all the nodes can talk to the new hardware); i.e., How painful is it to replace a single failed NIC? I'd imagine that most cluster admins are reluctant to interrupt running jobs in order to re-initialize the equipment after hardware replacement. Any information about how your clusters running high-speed interconnects handle interconnect hardware failure/replacement would be very helpful. Thanks, Dave Stirling Brigham Young University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Feb 19 17:22:38 2004 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 20 Feb 2004 09:22:38 +1100 Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <20040219154126.56423.qmail@web11411.mail.yahoo.com> References: <20040219154126.56423.qmail@web11411.mail.yahoo.com> Message-ID: <200402200922.39632.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 20 Feb 2004 02:41 am, Rayson Ho wrote: [No failover support in the pbs_server] > I think it is one of the biggest problems with *PBS, especially in the > compute farm environment. Torque (formerly SPBS) is very stable, especially since we helped the SuperCluster folks clobber the various memory leaks in the server. Our pbs_server has been running for almost a month now since I last restarted it (because I was doing a bit of system maintenance, not because of PBS problems, I think it'd been running for about 2 months before that) and it's only VSZ 3148 and RSS 2136. :-) NB: I'm still running an SPBS release from early November as that's when we fixed the last memory leak and it's worked like a dream since then. > The more advanced batch systems (SGE and LSF) have this feature for > years, not sure why *PBS still don't have it. I believe it's on the SuperCluster folks list of things to do, but they've been busy working on the stability front (as well as MAUI and Silver). CC'd to the SuperCluster folks so they can respond. > (AFAIK, PBSPro 5.4 will include it, but isn't it late??) No idea, don't use it. - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFANTcuO2KABBYQAh8RAk8AAJ0ZGx3+qLPHWMjFkG7PGD8pPzwBWwCeKnUQ u1aXnixvHrknKTqtNVDRVhM= =28y0 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.kidger at quadrics.com Thu Feb 19 18:13:20 2004 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Thu, 19 Feb 2004 23:13:20 +0000 Subject: [Beowulf] comparing MPI HPC interconnects: manageability? In-Reply-To: References: Message-ID: <200402192313.20932.daniel.kidger@quadrics.com> Dave, > While performance (latency, bandwidth) usually comes to the fore in > discussions about high performance interconnects for MPI clusters, I'm > curious as to what your experiences are from the standpoint of > manageability -- NIC's and spines and switches all fail at one time or > another, but I'd like input as to how individual products (Myrinet, > Quadrics, Infiniband, etc) handle this. In your clusters does the > hardware replacement involve simple steps (swap out the NIC, rerun some > config utilities) or something more complex (such as bringing down the > entire high speed network to reconfigure it so all the nodes can talk to > the new hardware); i.e., How painful is it to replace a single failed NIC? > > I'd imagine that most cluster admins are reluctant to interrupt running > jobs in order to re-initialize the equipment after hardware replacement. > Any information about how your clusters running high-speed interconnects > handle interconnect hardware failure/replacement would be very helpful. AFAIK all interconnects would allow the swap of a NIC without bringing down the whole network - but in all cases any parallel job running on that node would need to be aborted since in general high-speed interconect PCI cards are not hot-swappable - that node woudl need to be power-cycled. As for the cables and switches, I can't speak for other vendors - but for example a line card in a Quadrics Switch can be hot-swapped even while there are running MPI jobs that are sending data through that line card at the time - the jobs simply pause until the cables are reconnected. I would expect that other interconnects are the same in this respect? Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Thu Feb 19 18:07:43 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 20 Feb 2004 00:07:43 +0100 (CET) Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <1077217418.4982.35.camel@localhost> Message-ID: On Thu, 19 Feb 2004, Ryan Adams wrote: > Please forgive the length of this email, as I'm going to try to be > comprehensive: > There was a discussion on the Gridengine user list recently, regarding submitting lots and lots of short jobs in a bank in London. It developed into quite an interesting discussion, and I learned lots. Sorry - I tried to find the thread, but can't quite get the correct keywords. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Feb 19 20:04:32 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 20 Feb 2004 09:04:32 +0800 (CST) Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <1077217418.4982.35.camel@localhost> Message-ID: <20040220010432.88699.qmail@web16802.mail.tpe.yahoo.com> --- Ryan Adams ???? > My question is basically this: is 2-5 seconds too > small of a job to > justify a batching system like *PBS or Gridengine? Yes, 10 minutes or greater sound more reasonable. May be you can chunk 100 or more of those tasks into a job and submit it into a batch system. Also, from the "Tuning guide" HOWTO on the GridEngine website, SGE has a feature called "scheduling-on-demand" -- seems like it will help a lot since the scheduler is activated whenever a job arrives or a machine becomes available. Andrew. > Certainly, one option > would be to bundle sets of these chunks together for > a larger effective > job. Am I wasting my time thinking about this? > > I've been considering rolling my own scheduling > system using some kind > of RPC, but I've been around software development > long enough to know > that it is better to use something off-the-shelf if > at all possible. > > Thanks in advance... > > Ryan > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Feb 19 20:13:18 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 20 Feb 2004 09:13:18 +0800 (CST) Subject: [Beowulf] Best Setup for Batch Systems In-Reply-To: <200402200922.39632.csamuel@vpac.org> Message-ID: <20040220011318.63921.qmail@web16804.mail.tpe.yahoo.com> --- Chris Samuel ????> ----- > Torque (formerly SPBS) is very stable, especially > since we helped the > SuperCluster folks clobber the various memory leaks > in the server. It's not whether PBS itself is stable or not. There are human errors, machine problems, network problems, etc... And besides, the master machine also needed to be taken offline for OS upgrade. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tobeveryhonest at hotmail.com Fri Feb 20 03:55:22 2004 From: tobeveryhonest at hotmail.com (Salman Guy) Date: Fri, 20 Feb 2004 08:55:22 +0000 Subject: [Beowulf] want to implement a Beowulf cluster Message-ID: hi all, I want to learn Beowulf cluster implementation practically and for this purpose i need some help from u ppl.....I need reading material and ebooks so if anyone of u has done some practical work on Beowulf clusters then plz guide me or send me information regarding this, help will be appreciated ...thanx in advance _________________________________________________________________ MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. http://join.msn.com/?page=features/virus&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Feb 20 06:13:02 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 20 Feb 2004 12:13:02 +0100 (CET) Subject: [Beowulf] want to implement a Beowulf cluster In-Reply-To: Message-ID: On Fri, 20 Feb 2004, Salman Guy wrote: > hi all, > I want to learn Beowulf cluster implementation practically and for this > purpose i need some help from u ppl.....I need reading material and ebooks > so if anyone of u has done some practical work on Beowulf clusters then plz > guide me or send me information regarding this, > I think we need a FAQ here :-) Sorry I'm in a rush to go off on the train to FOSDEM in Brussels. SO I always say: Look at Robert Browns webpages at Duke The books 'Linux Clustering' by Charles Bookman and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Fri Feb 20 05:10:38 2004 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Fri, 20 Feb 2004 11:10:38 +0100 Subject: [Beowulf] comparing MPI HPC interconnects: manageability? In-Reply-To: <200402192313.20932.daniel.kidger@quadrics.com> References: <200402192313.20932.daniel.kidger@quadrics.com> Message-ID: <200402201110.38850.joachim@ccrl-nece.de> Dan Kidger: > AFAIK all interconnects would allow the swap of a NIC without bringing down > the whole network - but in all cases any parallel job running on that node > would need to be aborted since in general high-speed interconect PCI cards > are not hot-swappable - that node woudl need to be power-cycled. AFAIK, this is the same for SCI, but I would need to check this to be sure. Anyway, the application using the adapter to be swapped would have to be restarted anyway as its resources are gone. Avoiding this would be very hard, if at all possible. > As for the cables and switches, I can't speak for other vendors - but for > example a line card in a Quadrics Switch can be hot-swapped even while > there are running MPI jobs that are sending data through that line card at > the time - the jobs simply pause until the cables are reconnected. I would > expect that other interconnects are the same in this respect? SCI typically uses no external switches, and concerning the exchange of adapters or cables, there are two strategies: the application(s) has/have to wait until transfers are again successful, or the driver recognizes the problem and changes the routing. Of course, this can be combined into a two-phase strategy. I guess this is the way Scali is doing it. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Fri Feb 20 08:41:11 2004 From: lathama at yahoo.com (Andrew Latham) Date: Fri, 20 Feb 2004 05:41:11 -0800 (PST) Subject: [Beowulf] want to implement a Beowulf cluster In-Reply-To: Message-ID: <20040220134111.27571.qmail@web60305.mail.yahoo.com> or download the mailing list archive for the last year! thats an ebook all to its self --- John Hearns wrote: > On Fri, 20 Feb 2004, Salman Guy wrote: > > > hi all, > > I want to learn Beowulf cluster implementation practically and for this > > purpose i need some help from u ppl.....I need reading material and ebooks > > so if anyone of u has done some practical work on Beowulf clusters then plz > > > guide me or send me information regarding this, > > > I think we need a FAQ here :-) > Sorry I'm in a rush to go off on the train to FOSDEM in Brussels. > > SO I always say: > Look at Robert Browns webpages at Duke > > The books 'Linux Clustering' by Charles Bookman > and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== *----------------------------------------------------------* Andrew Latham AKA: LATHAMA (lay-th-ham-eh) - LATHAMA.COM LATHAMA at LATHAMA.COM - LATHAMA at YAHOO.COM If yahoo.com is down we have bigger problems than my email! *----------------------------------------------------------* _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmpc at sanger.ac.uk Fri Feb 20 13:39:49 2004 From: gmpc at sanger.ac.uk (Guy Coates) Date: Fri, 20 Feb 2004 18:39:49 +0000 (GMT) Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <200402201109.i1KB99h12383@NewBlue.scyld.com> References: <200402201109.i1KB99h12383@NewBlue.scyld.com> Message-ID: > > My question is basically this: is 2-5 seconds too small of a job to > justify a batching system like *PBS or Gridengine? That workload is do-able with the right queuing system. LSF (don't know about gridengine off hand) has a concept of "job chunking" for dealing with short running jobs. The queuing system batches up a number of jobs (eg 10 or 20) and then submits them all on one go to the work host where they run sequentially. This cuts down on the scheduling overhead. We've just had a user push 250,000 short running jobs though our cluster this-afternoon using this approach. Cheers, Guy Coates -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Fri Feb 20 15:12:09 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Fri, 20 Feb 2004 15:12:09 -0500 (EST) Subject: [Beowulf] want to implement a Beowulf cluster In-Reply-To: Message-ID: On Fri, 20 Feb 2004, John Hearns wrote: > On Fri, 20 Feb 2004, Salman Guy wrote: > > > hi all, > > I want to learn Beowulf cluster implementation practically and for this > > purpose i need some help from u ppl.....I need reading material and ebooks > > so if anyone of u has done some practical work on Beowulf clusters then plz > > guide me or send me information regarding this, > > > I think we need a FAQ here :-) There are the old FAQ and HOWTO's (still some relevant background information): http://www.canonical.org/~kragen/beowulf-faq.txt http://yara.ecn.purdue.edu/~pplinux/PPHOWTO/pphowto.html#toc1 http://www.tldp.org/HOWTO/Beowulf-HOWTO.html There are other links at ClusterWorld.com (on the right side, scroll down) that may be useful. Now is a good time to announce my effort to update the FAQ (and possibly the HOWTO). Starting next week, I plan on updating the FAQ by using the ClusterWorld.com site as a place to collect questions and answers. Stay tuned. Of course ClusterWorld magazine is designed to provide this type of information as well. > Sorry I'm in a rush to go off on the train to FOSDEM in Brussels. > > SO I always say: > Look at Robert Browns webpages at Duke > and book: http://www.phy.duke.edu/brahma/Resources/beowulf_book.php > The books 'Linux Clustering' by Charles Bookman IMO, this is not a good book for HPC clusters. > and 'Beowulf Clustering with Linux' by Thomas Sterling are excellent. New edition: http://www.amazon.com/exec/obidos/tg/detail/-/0262692929/102-0957058-4520116?v=glance Doug ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mg_india at sancharnet.in Sat Feb 21 19:18:55 2004 From: mg_india at sancharnet.in (Sawan Gupta) Date: Sun, 22 Feb 2004 05:48:55 +0530 Subject: [Beowulf] Movie Editing Requirements Message-ID: <000001c3f8d9$79436f00$8bd2003d@myserver> Hello, My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT system with 512 DDRAM and a 128 MB Graphic Card. But when he perform some rendering operations, it takes nearly 10-15 minutes to complete. He wishes to upgrade his system to dual XEON with more RAM to minimize this time delay. I want to know whether this will suit his requirments or a cluster is just what he needs. Please tell me which cluster can suit his requirements i.e. Windows/Linux. I mean which cluster can best suit these requirements. Also are the softwares used by him also available for Linux or not. (If the solution suggested is in Linux) Regards, Sawan Gupta || Mg_India at sancharnet.in || _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Feb 21 20:50:54 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 21 Feb 2004 20:50:54 -0500 (EST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: <000001c3f8d9$79436f00$8bd2003d@myserver> Message-ID: > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > system with 512 DDRAM and a 128 MB Graphic Card. I would guess that none of this work is done by the graphics card, so that his performance is strictly dependent on the P4 and the fairly modest amount of ram he has. I would guess that most of these applications are fairly memory-intensive, and not particularly cache-friendly. I doubt HT would matter in this case, except that IIRC all HT CPUs are the 'c' model, and thus run with 6.4 GB/s of dram bandwidth. I'm sure you already know that 512M probably too low. > But when he perform some rendering operations, it takes nearly 10-15 > minutes to complete. if this was linux, I'd advise you to use tools like oprofile, vmstat, etc to find out where it's spending its time. since it's only windows, you'll probably have to resort to watching the disk light, and running that nasty little windows accessory that tells you about cpu/memory usage. > He wishes to upgrade his system to dual XEON with more RAM to minimize > this time delay. sure. though he'd almost certainly run faster with a dual-opteron, since such systems deliver noticably more memory bandwidth and lower latency. a dual-xeon can actually be slower than a uni P4c system. it would probably make sense to talk to him about how his machine and apps are configured first. for instance, is he actually using HT, and does he notice any performance difference if he turns it off? is his ram dual-bank-PC3200? any sense of how much time is spent on disk IO? > I want to know whether this will suit his requirments or a cluster is > just what he needs. clusters are clearly more scalable, and are widely used in the render/effects industry. comparing a pair of P4c's to a single dual-opteron, though, I have no idea. I think it would depend on his applications, mainly. there's no clear answer to price/performance when it comes to clusters of duals vs unis. unis tend to be too large, and in most cases wind up replicating too many components, especially moving parts, to compete. I believe most clusters, in any industry, are not unis. > Please tell me which cluster can suit his requirements i.e. > Windows/Linux. windows is the right choice in exactly one situation: when the exact configuration you need is available off-the-shelf, and you already know how to use it. linux (unix in general) is far more robust, easy-to-manage, flexible, scalable, cheap, etc. all those TCO studies sponsored by msft consist of the following astonishing conclusion: if you have windows-only users and a supply of cheap msce's and are comfortable with the crappy level of support that the ms world provides, then indeed windows is cheaper. > Also are the softwares used by him also available for Linux or not. (If > the solution suggested is in Linux) only he can decide that. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Sat Feb 21 23:30:13 2004 From: bclem at rice.edu (Brent M. Clements) Date: Sat, 21 Feb 2004 22:30:13 -0600 (CST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: References: Message-ID: Actually a beowulf cluster can also run windows. There is a port of maya to clusters...There are also many other movie editing software distributions that work very well on clusters..It also doesn't matter what os a beowulf cluster runs. -Brent Brent Clements Linux Technology Specialist Information Technology Rice University On Sat, 21 Feb 2004, Joel Jaeggli wrote: > Given that it sounds like you're on windows, a beowulf cluster is not > appropriate from your application... > > > On Sun, 22 Feb 2004, Sawan Gupta wrote: > > > > > Hello, > > > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > > system with 512 DDRAM and a 128 MB Graphic Card. > > > > But when he perform some rendering operations, it takes nearly 10-15 > > minutes to complete. > > > > He wishes to upgrade his system to dual XEON with more RAM to minimize > > this time delay. > > > > I want to know whether this will suit his requirments or a cluster is > > just what he needs. > > Please tell me which cluster can suit his requirements i.e. > > Windows/Linux. > > I mean which cluster can best suit these requirements. > > > > Also are the softwares used by him also available for Linux or not. (If > > the solution suggested is in Linux) > > > > > > Regards, > > > > Sawan Gupta > > || Mg_India at sancharnet.in || > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > -------------------------------------------------------------------------- > Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Sat Feb 21 23:18:09 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Sat, 21 Feb 2004 20:18:09 -0800 (PST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: <000001c3f8d9$79436f00$8bd2003d@myserver> Message-ID: Given that it sounds like you're on windows, a beowulf cluster is not appropriate from your application... On Sun, 22 Feb 2004, Sawan Gupta wrote: > > Hello, > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > system with 512 DDRAM and a 128 MB Graphic Card. > > But when he perform some rendering operations, it takes nearly 10-15 > minutes to complete. > > He wishes to upgrade his system to dual XEON with more RAM to minimize > this time delay. > > I want to know whether this will suit his requirments or a cluster is > just what he needs. > Please tell me which cluster can suit his requirements i.e. > Windows/Linux. > I mean which cluster can best suit these requirements. > > Also are the softwares used by him also available for Linux or not. (If > the solution suggested is in Linux) > > > Regards, > > Sawan Gupta > || Mg_India at sancharnet.in || > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From c00jsh00 at nchc.org.tw Sun Feb 22 04:32:41 2004 From: c00jsh00 at nchc.org.tw (Jyh-Shyong Ho) Date: Sun, 22 Feb 2004 17:32:41 +0800 Subject: [Beowulf] 64-bit Gaussian 03 on Opteron/SuSE References: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com> Message-ID: <40387739.3891D96C@nchc.org.tw> Hi, We have managed to built a native 64-bit version of Gaussian 03 Rev.B05 on a dual Opteron box running SLSE8 for AMD64 with 64-bit PGI Workstation 5.1.3 compiler and 64-bit GOTO library. We ran all the test cases included in Gaussian 03 source code and compared the results against the reference results ran on SGI. All tests cases are successfully completed except test602 and test605 with error at the last stage when l9999 tries to close files. There are several files in directory bsd need some modification: machine.c (add one section to return "x86_64" as machine identification) mdutil.c (add one section for x86_64) mdutil.f (add one section for x86_64) bldg03 (modify the file so it can pick up x86_64.make as g03.make) and create a make file x86_64.make (use i386.make as a template) The compiler used is pgf90, but l906 and l609 has to be compiled with pgf77, in order to pass all the test cases. We are running more tests and comparing the performance of 64-bit version abd 32-bit version. Regards Jyh-Shyong Ho, Ph.D. Research Scientist National Center for High-Performance Computing Hsinchu, Taiwan, ROC _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mirk at vsnl.com Sat Feb 21 10:31:35 2004 From: mirk at vsnl.com (Mohd Irfan R Khan) Date: Sat, 21 Feb 2004 21:01:35 +0530 Subject: [Beowulf] comparing MPI HPC interconnects: manageability? In-Reply-To: Message-ID: hi I am one using SCI (Dolphin) cards and I think in dolphin u don't have to stop the whole cluster in case of failure. In this there is a matrix where it always has redundancy if one machine fails and the software provided by it (SCALI) will route the data to other machine and will reroute it back once it finds the line working properly. Regards. -----Original Message-----. From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf Of Dave Stirling Sent: Friday, February 20, 2004 1:50 AM To: beowulf at beowulf.org Subject: [Beowulf] comparing MPI HPC interconnects: manageability? Hi all, While performance (latency, bandwidth) usually comes to the fore in discussions about high performance interconnects for MPI clusters, I'm curious as to what your experiences are from the standpoint of manageability -- NIC's and spines and switches all fail at one time or another, but I'd like input as to how individual products (Myrinet, Quadrics, Infiniband, etc) handle this. In your clusters does the hardware replacement involve simple steps (swap out the NIC, rerun some config utilities) or something more complex (such as bringing down the entire high speed network to reconfigure it so all the nodes can talk to the new hardware); i.e., How painful is it to replace a single failed NIC? I'd imagine that most cluster admins are reluctant to interrupt running jobs in order to re-initialize the equipment after hardware replacement. Any information about how your clusters running high-speed interconnects handle interconnect hardware failure/replacement would be very helpful. Thanks, Dave Stirling Brigham Young University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Sun Feb 22 09:57:05 2004 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Sun, 22 Feb 2004 14:57:05 +0000 (UTC) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: Message-ID: On Sat, 21 Feb 2004, Brent M. Clements wrote: > It also doesn't matter what > os a beowulf cluster runs. ..as long as that OS conforms to the definition of free software, that is.. Or am I just an old fuddy-duddy, with out-of-date concepts? -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Sun Feb 22 11:04:17 2004 From: rauch at inf.ethz.ch (Felix Rauch) Date: Sun, 22 Feb 2004 17:04:17 +0100 (CET) Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <403358EF.7F0BDE75@epa.gov> Message-ID: On Wed, 18 Feb 2004, Joseph Mack wrote: > How do you get your information out of smartd? > > I've found output in syslog - presumably I can grep for this. I've done this for a while to get temperature information from a server in our small group server room (together with MRTG we have a nice history of temperature to show to the facilities people when the temperature was too high again...). The problem with greping for smartd information in the syslog file is that there is no current information after a log rotation. That's why I changed our cron jobs. Now I use a small setuid-root program which starts "smartctl -a /dev/sdX" and then greps for the temperature. - Felix --- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From agrajag at dragaera.net Sun Feb 22 10:20:20 2004 From: agrajag at dragaera.net (Jag) Date: 22 Feb 2004 10:20:20 -0500 Subject: [Beowulf] Movie Editing Requirements In-Reply-To: References: Message-ID: <1077463220.2561.4.camel@loiosh> On Sat, 2004-02-21 at 23:30, Brent M. Clements wrote: > Actually a beowulf cluster can also run windows. There is a port of maya > to clusters...There are also many other movie editing software > distributions that work very well on clusters..It also doesn't matter what > os a beowulf cluster runs. By definition, a beowulf cluster uses a free/open OS. So, a beowulf cluster can't run windows. However, an HPC (High Performance Computing) cluster doesn't have that requirement. I know its kinda nitpicking to try to distinguish between Beowulf cluster and HPC cluster, but in some ways it is important. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Sun Feb 22 11:30:03 2004 From: bclem at rice.edu (Brent M. Clements) Date: Sun, 22 Feb 2004 10:30:03 -0600 (CST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: <1077463220.2561.4.camel@loiosh> References: <1077463220.2561.4.camel@loiosh> Message-ID: Please don't start a flame war guys, I just had my terms mixed up...it was 1 am in the morning when I replied. -Brent On Sun, 22 Feb 2004, Jag wrote: > On Sat, 2004-02-21 at 23:30, Brent M. Clements wrote: > > Actually a beowulf cluster can also run windows. There is a port of maya > > to clusters...There are also many other movie editing software > > distributions that work very well on clusters..It also doesn't matter what > > os a beowulf cluster runs. > > By definition, a beowulf cluster uses a free/open OS. So, a beowulf > cluster can't run windows. However, an HPC (High Performance Computing) > cluster doesn't have that requirement. > > I know its kinda nitpicking to try to distinguish between Beowulf > cluster and HPC cluster, but in some ways it is important. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Feb 23 06:37:25 2004 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 23 Feb 2004 12:37:25 +0100 (CET) Subject: [Beowulf] Flashmobcomputing Message-ID: I hesitate a bit to send things seen on Slashdot to the list, but this is probably relevant: http://www.flashmobcomputing.org/ It might be worth a bit of a debate though. Given that this cluster will be composed of differing CPUs, and conneced together by 100Mbps links will it really have chance of getting into the Top 500? The bootable CF they are using is a Knoppix variant. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Mon Feb 23 07:27:20 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Mon, 23 Feb 2004 07:27:20 -0500 (EST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: Message-ID: On Sat, 21 Feb 2004, Brent M. Clements wrote: > Actually a beowulf cluster can also run windows. There is a port of maya > to clusters...There are also many other movie editing software > distributions that work very well on clusters..It also doesn't matter what > os a beowulf cluster runs. >From time to time, I think it is a important to recall the original definition of Beowulf. In the book "How to Build Beowulf", Sterling, Salmon, Becker, Savarese define Beowulf as: "A Beowulf is a collection of personal computers (PCs) interconnected by widely available networking technology running one of several open-source Unix like operating systems." There is often confusion as to "what is a Beowulf?" because the definition is more of a framework for building clusters and less of a recipe. I suppose, one could come up with definition of an HPC cluster which would read something like" "An HPC cluster is collection of commodity processors interconnected by widely available networking technology running a widely available OS." Rather broad. I think the keyword in all this is "commodity", which to me means choice and implies low cost. Doug > > -Brent > > Brent Clements > Linux Technology Specialist > Information Technology > Rice University > > > On Sat, 21 Feb 2004, Joel Jaeggli wrote: > > > Given that it sounds like you're on windows, a beowulf cluster is not > > appropriate from your application... > > > > > > On Sun, 22 Feb 2004, Sawan Gupta wrote: > > > > > > > > Hello, > > > > > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > > > system with 512 DDRAM and a 128 MB Graphic Card. > > > > > > But when he perform some rendering operations, it takes nearly 10-15 > > > minutes to complete. > > > > > > He wishes to upgrade his system to dual XEON with more RAM to minimize > > > this time delay. > > > > > > I want to know whether this will suit his requirments or a cluster is > > > just what he needs. > > > Please tell me which cluster can suit his requirements i.e. > > > Windows/Linux. > > > I mean which cluster can best suit these requirements. > > > > > > Also are the softwares used by him also available for Linux or not. (If > > > the solution suggested is in Linux) > > > > > > > > > Regards, > > > > > > Sawan Gupta > > > || Mg_India at sancharnet.in || > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > -- > > -------------------------------------------------------------------------- > > Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu > > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Feb 23 09:11:00 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 23 Feb 2004 09:11:00 -0500 (EST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: Message-ID: On Sun, 22 Feb 2004, Martin WHEELER wrote: > On Sat, 21 Feb 2004, Brent M. Clements wrote: > > > It also doesn't matter what > > os a beowulf cluster runs. > > ..as long as that OS conforms to the definition of free software, that > is.. > > Or am I just an old fuddy-duddy, with out-of-date concepts? No, you're absolutely right. It's right in there in the original beowulf documents and description, IIRC. There are some excellent reasons for this, BTW, as you'll discover the first time something doesn't just work for you "out of the box". rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bclem at rice.edu Mon Feb 23 08:36:05 2004 From: bclem at rice.edu (Brent M. Clements) Date: Mon, 23 Feb 2004 07:36:05 -0600 (CST) Subject: [Beowulf] Movie Editing Requirements In-Reply-To: References: Message-ID: Again, I go back to my last email concerning this..I didn't want to start people flaming me(which has now happened), I wrote my original response at 1am in the morning and was sloppy with my terms. For that I apologize. This tangent of explanations from now over 50 people can be gotten off of and people can go about there business..Nothing to see here, move along. -Brent On Mon, 23 Feb 2004, Douglas Eadline, Cluster World Magazine wrote: > On Sat, 21 Feb 2004, Brent M. Clements wrote: > > > Actually a beowulf cluster can also run windows. There is a port of maya > > to clusters...There are also many other movie editing software > > distributions that work very well on clusters..It also doesn't matter what > > os a beowulf cluster runs. > > >From time to time, I think it is a important to recall the original > definition of Beowulf. In the book "How to Build Beowulf", Sterling, > Salmon, Becker, Savarese define Beowulf as: > > "A Beowulf is a collection of personal computers (PCs) interconnected by > widely available networking technology running one of several open-source > Unix like operating systems." > > There is often confusion as to "what is a Beowulf?" because the definition > is more of a framework for building clusters and less of a recipe. > > I suppose, one could come up with definition of an HPC cluster which > would read something like" > > "An HPC cluster is collection of commodity processors interconnected by > widely available networking technology running a widely available OS." > > Rather broad. I think the keyword in all this is "commodity", which to me > means choice and implies low cost. > > Doug > > > > > -Brent > > > > Brent Clements > > Linux Technology Specialist > > Information Technology > > Rice University > > > > > > On Sat, 21 Feb 2004, Joel Jaeggli wrote: > > > > > Given that it sounds like you're on windows, a beowulf cluster is not > > > appropriate from your application... > > > > > > > > > On Sun, 22 Feb 2004, Sawan Gupta wrote: > > > > > > > > > > > Hello, > > > > > > > > My client uses Adobe After Effects, 3D Max, Maya etc on his PIV HT > > > > system with 512 DDRAM and a 128 MB Graphic Card. > > > > > > > > But when he perform some rendering operations, it takes nearly 10-15 > > > > minutes to complete. > > > > > > > > He wishes to upgrade his system to dual XEON with more RAM to minimize > > > > this time delay. > > > > > > > > I want to know whether this will suit his requirments or a cluster is > > > > just what he needs. > > > > Please tell me which cluster can suit his requirements i.e. > > > > Windows/Linux. > > > > I mean which cluster can best suit these requirements. > > > > > > > > Also are the softwares used by him also available for Linux or not. (If > > > > the solution suggested is in Linux) > > > > > > > > > > > > Regards, > > > > > > > > Sawan Gupta > > > > || Mg_India at sancharnet.in || > > > > > > > > _______________________________________________ > > > > Beowulf mailing list, Beowulf at beowulf.org > > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > -- > > > -------------------------------------------------------------------------- > > > Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu > > > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 > > > > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > ---------------------------------------------------------------- > Editor-in-chief ClusterWorld Magazine > Desk: 610.865.6061 > Cell: 610.390.7765 Redefining High Performance Computing > Fax: 610.865.6618 www.clusterworld.com > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Mon Feb 23 10:18:34 2004 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Mon, 23 Feb 2004 16:18:34 +0100 Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: References: Message-ID: <1077549514.31096.0.camel@qeldroma.cttc.org> El dom, 22-02-2004 a las 17:04, Felix Rauch escribi?: > On Wed, 18 Feb 2004, Joseph Mack wrote: > > How do you get your information out of smartd? > > > > I've found output in syslog - presumably I can grep for this. > > I've done this for a while to get temperature information from a > server in our small group server room (together with MRTG we have a > nice history of temperature to show to the facilities people when the > temperature was too high again...). > > The problem with greping for smartd information in the syslog file is > that there is no current information after a log rotation. That's why > I changed our cron jobs. Now I use a small setuid-root program which > starts "smartctl -a /dev/sdX" and then greps for the temperature. > > - Felix > > --- > Felix Rauch | Email: rauch at inf.ethz.ch > Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ > ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 > CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 > > > On Wed, 18 Feb 2004, Joseph Mack wrote: > > How do you get your information out of smartd? > > > > I've found output in syslog - presumably I can grep for this. > > I've done this for a while to get temperature information from a > server in our small group server room (together with MRTG we have a > nice history of temperature to show to the facilities people when the > temperature was too high again...). > > The problem with greping for smartd information in the syslog file is > that there is no current information after a log rotation. That's why > I changed our cron jobs. Now I use a small setuid-root program which > starts "smartctl -a /dev/sdX" and then greps for the temperature. > > - Felix > > --- > Felix Rauch | Email: rauch at inf.ethz.ch > Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ > ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 > CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 > On the other hand, is possible to deviate smartd log to a specific file and check it regularly when it's updated, adding this parameter to smartd: -l facility Of course some syslog.conf modifying will be needed to instruct syslogd to log on a specific file from the "facility" specified. facility.* /var/log/smartd.log Also, the '-M' coupled with the 'exec' directive should work, a script could be run to update some flags for example: -M exec /usr/bin/smartd_alert.sh -- Daniel Fernandez Centre tecnol?gic de transfer?ncia de calor - CTTC www.cttc.upc.edu c/ Colom n?11 UPC Campus Industrials Terrassa , Edifici TR4 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From m-valerio at onu.edu Mon Feb 23 11:24:23 2004 From: m-valerio at onu.edu (Matt Valerio) Date: Mon, 23 Feb 2004 11:24:23 -0500 Subject: [Beowulf] Anyone use MOSIX? Message-ID: <200402231626.i1NGQQBf052721@postoffice.onu.edu> Has anyone on this list used MOSIX before? I'm particularly interested in how it compares to other clustering software such as PVM and MPI. Any information regarding what you're using MOSIX for, recommendations about setting it up, comparisons to other software, etc, would be welcomed. Thanks! _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kpodesta at redbrick.dcu.ie Mon Feb 23 13:45:44 2004 From: kpodesta at redbrick.dcu.ie (Karl Podesta) Date: Mon, 23 Feb 2004 18:45:44 +0000 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: Message-ID: <20040223184544.GB30983@carbon.redbrick.dcu.ie> On Mon, Feb 23, 2004 at 12:37:25PM +0100, John Hearns wrote: > http://www.flashmobcomputing.org/ > > It might be worth a bit of a debate though. > Given that this cluster will be composed of differing CPUs, > and conneced together by 100Mbps links will it really have chance > of getting into the Top 500? > > The bootable CF they are using is a Knoppix variant. It seems a bit loose or unfair to suggest a project like this 'registers' for the top500 list? It's a once-off, temporary system, dedicated (seemingly) to nothing but qualification to the list. They say in the FAQ that if the system proves itself, it could potentially be used for bigger problems, which is a noble idea - but they obviously don't read the beowulf list often ("it all depends on the application", etc.) :-) Additionally, a flashmob system would have a limited shelf-life, before the owners want to take their computers home. Distributed projects like SETI at home and Folding at home etc. have been running for years... I'm not familiar with the entry rules to the top500, but to be fair to existing, dedicated installations - they would have a certain 'reliability' in terms of their existence. If you needed to perform a serious calculation to a scale of 36 TFLOPS etc., then you know that there is a system that can do it. They might want to be critical of how sustainable the result from the Flashmob is, if they wanted to 'call' on it's power at any particular time in the future. (whoops, it was raining today, that's 10 TFLOPS down the drain...). Pardon the pun. Kp -- Karl Podesta Dublin City University, Ireland _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Feb 23 14:11:48 2004 From: becker at scyld.com (Donald Becker) Date: Mon, 23 Feb 2004 14:11:48 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: Message-ID: On Mon, 23 Feb 2004, John Hearns wrote: > I hesitate a bit to send things seen on Slashdot to the list, > but this is probably relevant: > > http://www.flashmobcomputing.org/ >> A Flash Mob computer, unlike an ordinary cluster, is temporary and >> organized on-the-fly for the purpose of working on a single >> problem. Flash Mob I is the first of it's kind. A bit of hype here. Flash Mob is a fun demo, but not a new system architecture. All of the software is on a live CD, which Yggdrasil pioneered back in 1993, and it's far from being the first on-the-fly cluster. One of first public demo of Scyld Beowulf was temporarily converting the email-reading machines at the ALS conference into a cluster. We did that in a few minutes, taking only a few second beyond the amount of time it took to boot the machines from floppy. Today there is the opportunity to use PXE boot, which makes configuration even easier. A key was the innovative approach of making most of the systems specialized compute slaves, with only the environment needed to support the fully-cached running application. (Note that NFS root sounds like a likely alternative, but doesn't scale and has a run-time performance impact.) > It might be worth a bit of a debate though. > Given that this cluster will be composed of differing CPUs, > and conneced together by 100Mbps links will it really have chance > of getting into the Top 500? > The bootable CF they are using is a Knoppix variant. The differing CPUs and full workstation-oriented distribution will likely pose more a problem than the switched Fast Ethernet. Unless they make significant modifications, they will run into the scalability problem that every full-installation system encounters: at every timestep a few of the machines will be paging, running cron, or doing something else that slows the machine. That would be barely noticed in a workstation environment, but is a major problem with most cluster jobs. Still, it sounds like a fun, demystifying demo that introduces people to scalable computing. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster systems Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bernd-schubert at web.de Mon Feb 23 17:05:36 2004 From: bernd-schubert at web.de (Bernd Schubert) Date: Mon, 23 Feb 2004 23:05:36 +0100 Subject: [Beowulf] 64-bit Gaussian 03 on Opteron/SuSE In-Reply-To: <40387739.3891D96C@nchc.org.tw> References: <20040218042037.97418.qmail@web16812.mail.tpe.yahoo.com> <40387739.3891D96C@nchc.org.tw> Message-ID: <200402232305.36711.bernd-schubert@web.de> On Sunday 22 February 2004 10:32, Jyh-Shyong Ho wrote: > Hi, > > We have managed to built a native 64-bit version of Gaussian 03 Rev.B05 > on a dual Opteron box running SLSE8 for AMD64 with 64-bit PGI > Workstation > 5.1.3 compiler and 64-bit GOTO library. > Hello, thanks for this great information! I've forwarded it to the CCL list, since I guess on this list many people are interested in this topic. Cheers, Bernd _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Feb 23 17:47:27 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 23 Feb 2004 17:47:27 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: Message-ID: > Still, it sounds like a fun, demystifying demo that introduces people to > scalable computing. demystification is always good. IMO, the best part of this is that it'll actually demonstrate why a flash mob *CAN'T* build a supercomputer. partly the reason is hetrogeneity and other "practical" downers. but mainly, a super-computer needs a super-network. of course, in the grid nirvana, all computers would have multiple ports of infiniband, and the word would be 5 us across ;) regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Feb 23 15:52:58 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 23 Feb 2004 15:52:58 -0500 (EST) Subject: [Beowulf] Anyone use MOSIX? In-Reply-To: <200402231626.i1NGQQBf052721@postoffice.onu.edu> Message-ID: > Has anyone on this list used MOSIX before? I expect many have given it a try. > I'm particularly interested in how it compares to other clustering software > such as PVM and MPI. apples and oranges, I believe. mosix more or less tries to virtualize a cluster by making multiple machines share things like a single pid space, with forwarding of signals, etc. the idea is that the OS takes care of migrating jobs across nodes, including using proxies for resources that can't be directly moved (pages can, for instance). from the PVM/MPI perspective, the most important resource would be sockets. as far as I know, MPI-on-Mosix would use proxied sockets, and would therefore have performance problems for anything closely-coupled or high-bandwidth. in principle, Mosix could provide some sort of clusterized group-comm mechanism that wouldn't require proxies, but that would be a large effort. in a way, it's a shame that MPI is such a fat interface, since there's a lot of really good work that could be done in this direction, but is simply too large for a typical thesis project :( regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Mon Feb 23 20:53:18 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Mon, 23 Feb 2004 17:53:18 -0800 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: Message-ID: <20040224015318.GB6942@greglaptop.internal.keyresearch.com> On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > of course, in the grid nirvana, > all computers would have multiple ports of infiniband, > and the word would be 5 us across ;) In grid nirvana, the speed of light would rise with Moore's Law. 5 usec is a long time now, and much longer a year from now. That's not nirvana. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Tue Feb 24 03:22:16 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 24 Feb 2004 09:22:16 +0100 (CET) Subject: [Beowulf] Flashmobcomputing In-Reply-To: <20040224015318.GB6942@greglaptop.internal.keyresearch.com> Message-ID: On Mon, 23 Feb 2004, Greg Lindahl wrote: > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > of course, in the grid nirvana, > > all computers would have multiple ports of infiniband, > > and the word would be 5 us across ;) > > In grid nirvana, the speed of light would rise with Moore's Law. > An odd fact I always remember is that light travels at a foot per nanosecond. (Useful to know if you are plugging coax delay lines into trigger circuits) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Tue Feb 24 04:35:48 2004 From: jakob at unthought.net (Jakob Oestergaard) Date: Tue, 24 Feb 2004 10:35:48 +0100 Subject: [Beowulf] C vs C++ challenge In-Reply-To: References: <1075512676.4915.207.camel@protein.scalableinformatics.com> Message-ID: <20040224093548.GA29776@unthought.net> On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote: > > I could easily optimize it more (do the work on a larger buffer at a > > once), but I think enough waste heat has been created here. This is a > > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3. > > Enough time wasted on finding different solutions to a simple problem? Surely > not. Let me toss my hat into the ring: ... Hi guys! Guess who's back - yes, it's your friendly neighborhood language evangelist :) I said I'd be gone one week - well, I put instant coffe in the microwave, and *wooosh* went three weeks ahead in time. What a fantastic thread this turned into - awk, perl, more C, java and God knows what. I'm almost surprised I didn't see a Fortran implementation. See, I was trying to follow up on the challenge, then things got complicated (mainly by me not being able to get the performance I wanted out of my code) - so instead of flooding your inboxes, I wrote a little "article" on my findings. It's up at: http://unthought.net/c++/c_vs_c++.html Highlights: *) Benchmarks - real numbers. *) A C++'ification of the fast C implementation (that turns out to be negligibly faster than the C implementation although the same algorithm and the same system calls are used), which is generalized and generally made usable as a template library routine (for convenient re-use in other projects - yes, this requires all that boring non-sexy stuff like freeing up memory etc.) *) Two new C++ implementations - another 15 liner that's "only" twice as slow as the C code, and another longer different-algorithm C++ implementation that is significantly faster than the fastest C implementation (so far). Now, I did not include all the extra implementations presented here. I would like to update the document with those, but I will need a little feedback from various people. First; how do I compile the java implementation? GCC-3.3.2 gives me ---------------------------------------------------------------- [falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java wordcount.java: In class `wordcount': wordcount.java: In method `wordcount.main(java.lang.String[])': wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)' in type `java.util.regex.Pattern'. words = p.split(s); ^ 1 error ---------------------------------------------------------------- Second; another much faster C implementation was posted - I'd like to test against that one as well. I'm curious as to how it was done, and I'd like to use it as an example in the document if it turns out that it makes sense to write a generic C++ implementation of whatever algorithm is used there. Well, if the code is not a government secret ;) So, well, clearly my document isn't completely updated with all the great things from this thread - but at least I think it is a decent reply to the mail where the 'programming pearl' C implementation was presented. I guess this could turn into a nice little reference/FAQ/fact type of document - the oppinions stated there are biased of course, but not completely unreasonable in my own (biased) oppinion - besides, there's real-world numbers for solving a real-world problem, that's a pretty good start I would say :) I'd love to hear what people think - if you have the time to give it a look. Let me know, flame away, give me Fortran code that is faster than my 'ego-booster' implementation at the bottom of the document! ;) Cheers all :) / jakob BTW: Yes, I had a great vacation; http://unthought.net/avoriaz/p1010050.jpg ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.kidger at quadrics.com Tue Feb 24 04:35:54 2004 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Tue, 24 Feb 2004 09:35:54 +0000 Subject: [Beowulf] Math Coprocessor In-Reply-To: References: Message-ID: <200402240935.54623.daniel.kidger@quadrics.com> > On Fri, 13 Feb 2004, John Hearns wrote: > > But then again I may be the only person to own "Fortran 77: > > A Structured Approach". I don't have that but I do have on my bookshelf "A Fortran Primer" by Elliot Organick, Addison-Wiley (1963) - so go on: does anyone own any even older Fortran texts ? Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Tue Feb 24 05:56:45 2004 From: jakob at unthought.net (Jakob Oestergaard) Date: Tue, 24 Feb 2004 11:56:45 +0100 Subject: [Beowulf] Newbie Question: Batching Versus Custom RPC In-Reply-To: <1077217418.4982.35.camel@localhost> References: <1077217418.4982.35.camel@localhost> Message-ID: <20040224105645.GC29776@unthought.net> On Thu, Feb 19, 2004 at 02:03:38PM -0500, Ryan Adams wrote: ... > I have a problem that divides nicely (embarrassingly?) into > parallelizable chunks. Each chunk takes approximately 2 to 5 seconds to > complete and requires no communication during that time. Essentially > there is a piece of data, around 500KB that must be processed and a > result returned. I'd like to process as many of these pieces of data as > possible. I am considering building a small heterogeneous cluster to do > this (at home, basically), and am trying to decide exactly how to > architect the task distribution. I had the following problem; lots and lots of compile jobs. They take from a few seconds to a few minutes each. No batch scheduling system that I tried, was up to the task (simply waaay too long latency in the scheduling). ... > I've been considering rolling my own scheduling system using some kind > of RPC, but I've been around software development long enough to know > that it is better to use something off-the-shelf if at all possible. Maybe you would want to take a quick look at ANTS http://unthought.net/antsd/ ANTS was the solution I developed for the problem I had, and from the sound of it, I think your problem may be a good fit for ANTS as well. I've been updating it as of lately, but haven't put new releases on the web site. If you're interested, I can provide you with the new releases (featuring krellm2 applet! ;) - but the basic functionality is unchanged from the old release on the web site. ANTS specifically schedules jobs very quickly - but it lacks the advanced features of "real" batch systems (like accounting, gang scheduling, job restart, etc. etc.). / jakob _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Feb 24 08:34:16 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 24 Feb 2004 08:34:16 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: <20040224015318.GB6942@greglaptop.internal.keyresearch.com> Message-ID: On Mon, 23 Feb 2004, Greg Lindahl wrote: > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > of course, in the grid nirvana, > > all computers would have multiple ports of infiniband, > > and the word would be 5 us across ;) > > In grid nirvana, the speed of light would rise with Moore's Law. I'll have to think about that one. Exponential growth of the speed of light. Hmmm. Some sort of inflationary model? Space flattening towards non-relativistic classical? The physics of Nirvana would be veeeery interesting... :-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajt at rri.sari.ac.uk Tue Feb 24 08:56:31 2004 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Tue, 24 Feb 2004 13:56:31 +0000 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> References: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Message-ID: <403B580F.1020009@rri.sari.ac.uk> Matt Valerio wrote: > Hello, > > I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 > languages. > > That being said, I think it would be interesting to see what the creator of > both C and C++ has said about the two. I ran across this interview with > Bjorn Stroustrup at > http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > > Like anything on the internet, it should be taken with a grain of salt. Can > anyone vouch for its validity, or is it a hoax to get us to all hate C++ and > stick with C? Hello, Matt. I think most people know that Brian Kernighan and Denis Richie created 'C', not Bjorn Stroustrup: He created C++. The IEEE 'interview' is a hoax, of course! but Bjorn Stroustrup doesn't think it's funny: http://www.research.att.com/~bs/bs_faq.html#IEEE Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctibirna at giref.ulaval.ca Tue Feb 24 09:15:20 2004 From: ctibirna at giref.ulaval.ca (Cristian Tibirna) Date: Tue, 24 Feb 2004 09:15:20 -0500 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> References: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Message-ID: <200402240915.20284.ctibirna@giref.ulaval.ca> On Tuesday, 24 February 2004 08:13, Matt Valerio wrote: > > Like anything on the internet, it should be taken with a grain of salt. > Can anyone vouch for its validity, or is it a hoax to get us to all hate > C++ and stick with C? Of course it's a hoax ;o) http://www.research.att.com/~bs/bs_faq.html#IEEE And in fact all the FAQ deserve a reading, no matter which language one preaches as being the Holy Grail. -- Cristian Tibirna (418) 656-2131 / 4340 Laval University - Qu?bec, CAN ... http://www.giref.ulaval.ca/~ctibirna Research professional - GIREF ... ctibirna at giref.ulaval.ca Chemical Engineering PhD Student ... tibirna at gch.ulaval.ca _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From m-valerio at onu.edu Tue Feb 24 08:13:12 2004 From: m-valerio at onu.edu (Matt Valerio) Date: Tue, 24 Feb 2004 08:13:12 -0500 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <20040224093548.GA29776@unthought.net> Message-ID: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Hello, I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 languages. That being said, I think it would be interesting to see what the creator of both C and C++ has said about the two. I ran across this interview with Bjorn Stroustrup at http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. Like anything on the internet, it should be taken with a grain of salt. Can anyone vouch for its validity, or is it a hoax to get us to all hate C++ and stick with C? -Matt -----Original Message----- From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of Jakob Oestergaard Sent: Tuesday, February 24, 2004 4:36 AM To: Beowulf Subject: Re: [Beowulf] C vs C++ challenge On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote: > > I could easily optimize it more (do the work on a larger buffer at a > > once), but I think enough waste heat has been created here. This is a > > simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3. > > Enough time wasted on finding different solutions to a simple problem? Surely > not. Let me toss my hat into the ring: ... Hi guys! Guess who's back - yes, it's your friendly neighborhood language evangelist :) I said I'd be gone one week - well, I put instant coffe in the microwave, and *wooosh* went three weeks ahead in time. What a fantastic thread this turned into - awk, perl, more C, java and God knows what. I'm almost surprised I didn't see a Fortran implementation. See, I was trying to follow up on the challenge, then things got complicated (mainly by me not being able to get the performance I wanted out of my code) - so instead of flooding your inboxes, I wrote a little "article" on my findings. It's up at: http://unthought.net/c++/c_vs_c++.html Highlights: *) Benchmarks - real numbers. *) A C++'ification of the fast C implementation (that turns out to be negligibly faster than the C implementation although the same algorithm and the same system calls are used), which is generalized and generally made usable as a template library routine (for convenient re-use in other projects - yes, this requires all that boring non-sexy stuff like freeing up memory etc.) *) Two new C++ implementations - another 15 liner that's "only" twice as slow as the C code, and another longer different-algorithm C++ implementation that is significantly faster than the fastest C implementation (so far). Now, I did not include all the extra implementations presented here. I would like to update the document with those, but I will need a little feedback from various people. First; how do I compile the java implementation? GCC-3.3.2 gives me ---------------------------------------------------------------- [falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java wordcount.java: In class `wordcount': wordcount.java: In method `wordcount.main(java.lang.String[])': wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)' in type `java.util.regex.Pattern'. words = p.split(s); ^ 1 error ---------------------------------------------------------------- Second; another much faster C implementation was posted - I'd like to test against that one as well. I'm curious as to how it was done, and I'd like to use it as an example in the document if it turns out that it makes sense to write a generic C++ implementation of whatever algorithm is used there. Well, if the code is not a government secret ;) So, well, clearly my document isn't completely updated with all the great things from this thread - but at least I think it is a decent reply to the mail where the 'programming pearl' C implementation was presented. I guess this could turn into a nice little reference/FAQ/fact type of document - the oppinions stated there are biased of course, but not completely unreasonable in my own (biased) oppinion - besides, there's real-world numbers for solving a real-world problem, that's a pretty good start I would say :) I'd love to hear what people think - if you have the time to give it a look. Let me know, flame away, give me Fortran code that is faster than my 'ego-booster' implementation at the bottom of the document! ;) Cheers all :) / jakob BTW: Yes, I had a great vacation; http://unthought.net/avoriaz/p1010050.jpg ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From m-valerio at onu.edu Tue Feb 24 09:12:49 2004 From: m-valerio at onu.edu (Matt Valerio) Date: Tue, 24 Feb 2004 09:12:49 -0500 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <403B580F.1020009@rri.sari.ac.uk> Message-ID: <200402241414.i1OEErBf096719@postoffice.onu.edu> Wow, I guess I didn't do my homework! Apologizes to everyone for the misinformation! As Tony pointed out, the real interview may be found at http://www.research.att.com/~bs/ieee_interview.html. -Matt -----Original Message----- From: Tony Travis [mailto:ajt at rri.sari.ac.uk] Sent: Tuesday, February 24, 2004 8:57 AM To: Matt Valerio Cc: beowulf at beowulf.org Subject: Re: [Beowulf] C vs C++ challenge Matt Valerio wrote: > Hello, > > I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 > languages. > > That being said, I think it would be interesting to see what the creator of > both C and C++ has said about the two. I ran across this interview with > Bjorn Stroustrup at > http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > > Like anything on the internet, it should be taken with a grain of salt. Can > anyone vouch for its validity, or is it a hoax to get us to all hate C++ and > stick with C? Hello, Matt. I think most people know that Brian Kernighan and Denis Richie created 'C', not Bjorn Stroustrup: He created C++. The IEEE 'interview' is a hoax, of course! but Bjorn Stroustrup doesn't think it's funny: http://www.research.att.com/~bs/bs_faq.html#IEEE Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Feb 24 09:33:37 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 24 Feb 2004 09:33:37 -0500 (EST) Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Message-ID: On Tue, 24 Feb 2004, Matt Valerio wrote: > Hello, > > I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 > languages. > > That being said, I think it would be interesting to see what the creator of > both C and C++ has said about the two. I ran across this interview with > Bjorn Stroustrup at > http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > > Like anything on the internet, it should be taken with a grain of salt. Can > anyone vouch for its validity, or is it a hoax to get us to all hate C++ and > stick with C? I posted that up there when I found it because it is hilarious. I assume that it is a satire (not exactly the same thing as a "hoax":-). However, as is the case with much satire, it contains a lot of little nuggets that (should) make you think... about "good practice" ways of coding in C++ if nothing else. r-still-a-C-guy-at-heart-gb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Tue Feb 24 10:37:12 2004 From: jcownie at etnus.com (James Cownie) Date: Tue, 24 Feb 2004 15:37:12 +0000 Subject: [Beowulf] Adding Latency to a Cluster Environment In-Reply-To: Message from joshh@cs.earlham.edu of "Fri, 13 Feb 2004 10:25:31 EST." <3483.12.161.108.5.1076685931.squirrel@webmail.cs.earlham.edu> Message-ID: <1AvecS-6wh-00@etnus.com> > I am profiling a software package that runs over LAM-MPI on 16 node > cluster s [Details Below]. I would like to measure the effect of > increased latency on the run time of the program. > Look for "dimemas" on Google. It's a simulator from Cepba for parallel architectures which is intended to allow you to adjust exactly this kind of parameter. At one point they had it coupled up with Pallas' Vampir so that it could read Vampir trace files and then simulate the same execution with modified communication properties, or modified CPU properties. -- -- Jim -- James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From david.n.lombard at intel.com Tue Feb 24 10:40:29 2004 From: david.n.lombard at intel.com (Lombard, David N) Date: Tue, 24 Feb 2004 07:40:29 -0800 Subject: [Beowulf] C vs C++ challenge Message-ID: <187D3A7CAB42A54DB61F1D05F0125722025F55AB@orsmsx402.jf.intel.com> From: Matt Valerio; Tuesday, February 24, 2004 6:13 AM > > Wow, I guess I didn't do my homework! Apologizes to everyone for the > misinformation! > > As Tony pointed out, the real interview may be found at > http://www.research.att.com/~bs/ieee_interview.html. For a Stroustrup statement that C proponents (as am I) will also agree with, see http://www.research.att.com/~bs/bs_faq.html#really-say-that FYI, the top of the FAQ has a .wav file with the proper pronunciation of his name... -- David N. Lombard My comments represent my opinions, not those of Intel Corporation. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Tue Feb 24 12:15:01 2004 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Tue, 24 Feb 2004 18:15:01 +0100 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: Message-ID: <200402241815.01355.joachim@ccrl-nece.de> Donald Becker: > On Mon, 23 Feb 2004, John Hearns wrote: > > I hesitate a bit to send things seen on Slashdot to the list, > > but this is probably relevant: > > > > > > http://www.flashmobcomputing.org/ > > A bit of hype here. [...] Exactly. It's a nice idea (although the wrong approach, as Donald elaborated - maybe they will find out), but they shouldn't seriously clame to be the first with this "revolutionary idea" (sic!). In addition to Donald's references to earlier "on-the-fly clusters", here's another one from Germany (December 1998): http://www.heise.de/ix/artikel/E/1999/01/010/ I don't know if they actually submitted results to TOP500 - I could not find a matching entry for 1999. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Feb 24 12:15:45 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 24 Feb 2004 09:15:45 -0800 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> References: <20040224093548.GA29776@unthought.net> Message-ID: <5.2.0.9.2.20040224091430.017cb1d8@mailhost4.jpl.nasa.gov> At 08:13 AM 2/24/2004 -0500, Matt Valerio wrote: >Hello, > >I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 >languages. > >That being said, I think it would be interesting to see what the creator of >both C and C++ has said about the two. I ran across this interview with >Bjorn Stroustrup at >http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > >Like anything on the internet, it should be taken with a grain of salt. Can >anyone vouch for its validity, or is it a hoax to get us to all hate C++ and >stick with C? > >-Matt That's a classic hoax interview (and I think identified as such by RGB), and remarkably funny. Almost as good as Dijkstra's apocryphal comment that more brains have been ruined by BASIC than .... James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Tue Feb 24 12:21:18 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 24 Feb 2004 09:21:18 -0800 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: <20040224015318.GB6942@greglaptop.internal.keyresearch.com> Message-ID: <5.2.0.9.2.20040224091607.0350aa38@mailhost4.jpl.nasa.gov> At 08:34 AM 2/24/2004 -0500, Robert G. Brown wrote: >On Mon, 23 Feb 2004, Greg Lindahl wrote: > > > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > > > of course, in the grid nirvana, > > > all computers would have multiple ports of infiniband, > > > and the word would be 5 us across ;) > > > > In grid nirvana, the speed of light would rise with Moore's Law. > >I'll have to think about that one. > >Exponential growth of the speed of light. Hmmm. Some sort of >inflationary model? Space flattening towards non-relativistic >classical? The physics of Nirvana would be veeeery interesting... 5 usec gives you a "grid diameter" of a mile or so... (if you don't worry about pesky things like wires or fibers to carry the signals). You could fit a LOT of processors in a sphere a mile in diameter. Does bring up some interesting questions about optimum interconnection strategies. Even if you put nodes on the surface of that sphere (so you can use free space optical interconnects across the middle of the sphere, you'd have about 7.2 million square meters to fool with. Say you can fit a 100 nodes in a square meter. That's almost a billion nodes. If you need bigger, one could always use fancy stuff like quantum entanglement, about which I don't know much, but which might provide a solution to communicating across large distances very quickly (at least in one frame of reference) James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From orion at cora.nwra.com Tue Feb 24 18:17:31 2004 From: orion at cora.nwra.com (Orion Poplawski) Date: Tue, 24 Feb 2004 16:17:31 -0700 Subject: [Beowulf] G5 cluster for testing Message-ID: <403BDB8B.7060904@cora.nwra.com> Anyone (vendors?) out there have a G5 cluster available for some testing? I've been charged with putting together a small cluster and have been asked to look into G5 systems as well (I guess 64 bit powerPC really....) Thanks -- Orion Poplawski System Administrator 303-415-9701 x222 Colorado Research Associates/NWRA FAX: 303-415-9702 3380 Mitchell Lane, Boulder CO 80301 http://www.co-ra.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwheeler at startext.co.uk Tue Feb 24 17:57:07 2004 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Tue, 24 Feb 2004 22:57:07 +0000 (UTC) Subject: [Beowulf] Flashmobcomputing In-Reply-To: Message-ID: On Tue, 24 Feb 2004, Robert G. Brown wrote: > > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > > In grid nirvana, the speed of light would rise with Moore's Law. > > I'll have to think about that one. Then you'll have to think *very* (exponentially?) fast. Just to keep up with where you were when you started... Shades of the Red Queen. :) Maybe Lewis Carroll already described the physics? -- Martin Wheeler - StarTEXT / AVALONIX - Glastonbury - BA6 9PH - England mwheeler at startext.co.uk http://www.startext.co.uk/mwheeler/ GPG pub key : 01269BEB 6CAD BFFB DB11 653E B1B7 C62B AC93 0ED8 0126 9BEB - Share your knowledge. It's a way of achieving immortality. - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Tue Feb 24 09:10:31 2004 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Tue, 24 Feb 2004 08:10:31 -0600 Subject: [Beowulf] C vs C++ challenge In-Reply-To: <200402241315.i1ODFGBf084523@postoffice.onu.edu> References: <200402241315.i1ODFGBf084523@postoffice.onu.edu> Message-ID: <403B5B57.8080403@tamu.edu> Since he's now faculty here, I guess I'll walk down the hall and ask him. gerry Matt Valerio wrote: > Hello, > > I'm an avid C++ programmer myself, and C++ is definitely my choice of the 2 > languages. > > That being said, I think it would be interesting to see what the creator of > both C and C++ has said about the two. I ran across this interview with > Bjorn Stroustrup at > http://www.phy.duke.edu/brahma/Resources/c++_interview/c++_interview.html. > > Like anything on the internet, it should be taken with a grain of salt. Can > anyone vouch for its validity, or is it a hoax to get us to all hate C++ and > stick with C? > > -Matt > > > > > > > -----Original Message----- > From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com] On Behalf Of > Jakob Oestergaard > Sent: Tuesday, February 24, 2004 4:36 AM > To: Beowulf > Subject: Re: [Beowulf] C vs C++ challenge > > On Sun, Feb 01, 2004 at 02:57:37AM -0800, Trent Piepho wrote: > >>>I could easily optimize it more (do the work on a larger buffer at a >>>once), but I think enough waste heat has been created here. This is a >>>simple 2500+ Athlon XP box (nothing fancy) running 2.4.24-pre3. >> >>Enough time wasted on finding different solutions to a simple problem? > > Surely > >>not. Let me toss my hat into the ring: > > ... > > Hi guys! > > Guess who's back - yes, it's your friendly neighborhood language > evangelist :) > > I said I'd be gone one week - well, I put instant coffe in the > microwave, and *wooosh* went three weeks ahead in time. > > What a fantastic thread this turned into - awk, perl, more C, java and > God knows what. I'm almost surprised I didn't see a Fortran > implementation. > > See, I was trying to follow up on the challenge, then things got > complicated (mainly by me not being able to get the performance I wanted > out of my code) - so instead of flooding your inboxes, I wrote a little > "article" on my findings. > > It's up at: > http://unthought.net/c++/c_vs_c++.html > > Highlights: > *) Benchmarks - real numbers. > *) A C++'ification of the fast C implementation (that turns out to be > negligibly faster than the C implementation although the same > algorithm and the same system calls are used), which is generalized > and generally made usable as a template library routine (for > convenient re-use in other projects - yes, this requires all that > boring non-sexy stuff like freeing up memory etc.) > *) Two new C++ implementations - another 15 liner that's "only" twice > as slow as the C code, and another longer different-algorithm C++ > implementation that is significantly faster than the fastest C > implementation (so far). > > Now, I did not include all the extra implementations presented here. I > would like to update the document with those, but I will need a little > feedback from various people. > > First; how do I compile the java implementation? GCC-3.3.2 gives me > ---------------------------------------------------------------- > [falcon:joe] $ gcj -O3 -march=pentium2 -Wall -o wc-java wordcount.java > wordcount.java: In class `wordcount': > wordcount.java: In method `wordcount.main(java.lang.String[])': > wordcount.java:18: error: Can't find method `split(Ljava/lang/String;)' > in type `java.util.regex.Pattern'. > words = p.split(s); > ^ > 1 error > ---------------------------------------------------------------- > > Second; another much faster C implementation was posted - I'd like to > test against that one as well. I'm curious as to how it was done, and > I'd like to use it as an example in the document if it turns out that it > makes sense to write a generic C++ implementation of whatever algorithm > is used there. Well, if the code is not a government secret ;) > > So, well, clearly my document isn't completely updated with all the > great things from this thread - but at least I think it is a decent > reply to the mail where the 'programming pearl' C implementation was > presented. > > I guess this could turn into a nice little reference/FAQ/fact type of > document - the oppinions stated there are biased of course, but not > completely unreasonable in my own (biased) oppinion - besides, there's > real-world numbers for solving a real-world problem, that's a pretty > good start I would say :) > > I'd love to hear what people think - if you have the time to give it a > look. > > Let me know, flame away, give me Fortran code that is faster than my > 'ego-booster' implementation at the bottom of the document! ;) > > Cheers all :) > > / jakob > > BTW: Yes, I had a great vacation; > http://unthought.net/avoriaz/p1010050.jpg ;) > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ashley at quadrics.com Tue Feb 24 09:10:09 2004 From: ashley at quadrics.com (Ashley Pittman) Date: Tue, 24 Feb 2004 14:10:09 +0000 Subject: [Beowulf] Flashmobcomputing In-Reply-To: References: Message-ID: <1077631809.646.83.camel@ashley> On Mon, 2004-02-23 at 22:47, Mark Hahn wrote: > > Still, it sounds like a fun, demystifying demo that introduces people to > > scalable computing. > > demystification is always good. IMO, the best part of this is that > it'll actually demonstrate why a flash mob *CAN'T* build a supercomputer. > partly the reason is hetrogeneity and other "practical" downers. It will be interesting to see, I don't expect they are going to get much time to benchmark but it would be nice to have a plot of achieved performance against CPU count in this kind of configuration. Anybody care to predict how many CPU's you will need before wall clock performance starts dropping? > but mainly, a super-computer needs a super-network. That I won't dispute but does a single linpack run require a super-computer? Ashley, _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raiders at phreaker.net Tue Feb 24 10:39:19 2004 From: raiders at phreaker.net (raiders at phreaker.net) Date: Tue, 24 Feb 2004 23:39:19 +0800 Subject: [Beowulf] Subclusters... Message-ID: <200402242339.19310.raiders@phreaker.net> We are on a project as described below: - IA32 linux cluster for general parallel programming - five head nodes, each head node will have about 15 compute nodes and dedicated storage - groups of cluster-users will be restricted to their own clusters normally (some exclusions may apply) - SGE/PBS, GbE etc are standard choices But the people in power want one single software or admin console (cluster toolkit?) to manage the entire cluster from one adm station (which may or may not be one of the head nodes). I looked around and could not find any suitable solution (ROCKS, oscar, etc). ROCKS, oscar etc can manage only one cluster at a time and cannot handle subclusters. (I might be wrong) I believe that only custom programming can help. Appreciate any expert opinion Thanks, Shawn _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Tue Feb 24 23:38:54 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Tue, 24 Feb 2004 20:38:54 -0800 (PST) Subject: [Beowulf] G5 cluster for testing In-Reply-To: <403BDB8B.7060904@cora.nwra.com> Message-ID: I'd suggest asking your friendly IBM sales guy about ppc970 blades... joelja On Tue, 24 Feb 2004, Orion Poplawski wrote: > Anyone (vendors?) out there have a G5 cluster available for some > testing? I've been charged with putting together a small cluster and > have been asked to look into G5 systems as well (I guess 64 bit powerPC > really....) > > Thanks > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Feb 25 02:10:39 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Tue, 24 Feb 2004 23:10:39 -0800 Subject: [Beowulf] G5 cluster for testing In-Reply-To: <403BDB8B.7060904@cora.nwra.com> References: <403BDB8B.7060904@cora.nwra.com> Message-ID: <20040225071039.GA29125@cse.ucdavis.edu> On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote: > Anyone (vendors?) out there have a G5 cluster available for some For the most part I'm finding that cluster performance is mostly predictable by single node performance, and the scaling of the interconnect. At least as an approximation, I'm going to use to find a good place to start for my next couple cluster designs. I'm current benchmarking: Dual G5 Opteron duals (1.4, 1.8, and 2.2) Opteron quad 1.4 Itanium dual 1.4 GHz Dual P4-3.0 GHz+HT Single P4-3.0 GHz+HT Alas, my single node performance testing on the G5 has been foiled by my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem working. Anyone else have MPICH and shared memory working on OSX? Or maybe a dual g5 linux account for an evening of benchmarking? Normally using ch_p4 and localhost wouldn't be to big a deal, but ping localhost on OSX is something like 40 times than linux, mpich with ch_p4 on OSX is around 20 times worse than linux with shared memory. > testing? I've been charged with putting together a small cluster and > have been asked to look into G5 systems as well (I guess 64 bit powerPC > really....) Assuming all the applications and tools work under all environments your considering I'd figure out what interconnect you want to get first. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Feb 25 03:34:38 2004 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 25 Feb 2004 09:34:38 +0100 (CET) Subject: [Beowulf] S.M.A.R.T usage in big clusters In-Reply-To: <1077542843.26492.12.camel@qeldroma.cttc.org> Message-ID: On Mon, 23 Feb 2004, Daniel Fernandez wrote: [...] > On the other hand, is possible to deviate smartd log to a specific file > and check it regularly when it's updated, adding this parameter to > smartd: > > -l facility > > Of course some syslog.conf modifying will be needed to instruct syslogd > to log on a specific file from the "facility" specified. Thanks for the hints, I was not yet aware of the -l and -M flags. Still, I think directly calling "smartctl" from a cron job is the better solution. With just smartd and the flags above, you still won't get any updates if the smartd simply dies and you won't even notice, because grep simply finds the last entry in the log. Besides, you still have the problem of log rotate (except if you let grow your log file forever...). Regards, Felix --- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Feb 25 04:18:14 2004 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 25 Feb 2004 10:18:14 +0100 (CET) Subject: [Beowulf] Subclusters... In-Reply-To: <200402242339.19310.raiders@phreaker.net> Message-ID: On Tue, 24 Feb 2004 raiders at phreaker.net wrote: > - groups of cluster-users will be restricted to their own clusters normally > (some exclusions may apply) > - SGE/PBS, GbE etc are standard choices > A very quick answer from me is to think of the whole thing as one cluster, then install it. In SGE, it is possible to have groups of users defined, and to allow only certain groups/users access to each queue. So (say) you could have a Physics group, a Chemistry group etc. As for access to the public facing nodes, again quickly off the top of my head, you just need authentication which allows logins only from the appropriate group. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Feb 25 07:01:02 2004 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 25 Feb 2004 13:01:02 +0100 (CET) Subject: [Beowulf] FOSDEM talk Message-ID: There is a current thread on SMART usage. There was also a thread about six months ago on lm_sensors, about the output format of sensors, and how one has to parse it. Sorry if this message is a bit of a ramble. At FOSDEM over the weekend I went to a talk by Robert Love on his work on Linux kernel and destop integration, on HAL and DBUS. One slide made me sit up and take notice, as he had an example of a kernel message saying 'overheating'. The message format was something like an SNMP OID, as I remember org.kernel.processor.overheating (or something like that). One could then think of a process listening on the netlink socket, generating (for example) an SNMP trap on receiving events of this category. A better way of doing things than running sensors periodically then parsing the output. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Wed Feb 25 04:55:22 2004 From: john.hearns at clustervision.com (John Hearns) Date: Wed, 25 Feb 2004 10:55:22 +0100 (CET) Subject: [Beowulf] Subclusters... In-Reply-To: <200402242339.19310.raiders@phreaker.net> Message-ID: On Tue, 24 Feb 2004 raiders at phreaker.net wrote: > We are on a project as described below: > > - IA32 linux cluster for general parallel programming > - five head nodes, each head node will have about 15 compute nodes and > dedicated storage > - groups of cluster-users will be restricted to their own clusters normally > (some exclusions may apply) > - SGE/PBS, GbE etc are standard choices > > But the people in power want one single software or admin console (cluster > toolkit?) to manage the entire cluster from one adm station (which may or may > not be one of the head nodes). Thinking about this, the way I would architect things is to stop thinking of subclusters - yet of course give the users their allocation of resources. So, choose your cluster install method of choice. Have one admin/master node and install all 75 nodes. Have 5 public facing machines, and have logins go through a load-balancer or round robin. When a user logs in they get directed to the least loaded machine. Why? If one machine goes down (fault or upgrade) the users still have four machines. They don't "see" this as you have entries in the DNS for e.g. necromancy.hogwarts defence-darkarts.hogwarts potions.hogwarts spells.hogwarts magical-creatures.hogwarts all pointing the same way. It would be better to have 5 separate storage nodes, but the login machines in your scenario will have to do that job also. Just allocate storage per group. The 75 compute nodes are installed within the cluster. Now, at a first pass you want to 'saw things up' into 15 node lumps. This can be done easily - just put a queue or queues on each and allow only certain groups access. But I will contend this is a bad idea. Batch queueing systems have facilities to look after fair shares of resources between groups. Say you have the 5 separate groups scenario. Say today Professor Snape isn't doing any potions work. The 15 potions machines will lie idel, while there are plenty of jobs in necromancy just dying to run. Use the fairshare in SGE or LSF. Each group will get their allocated share of CPU. You'll also have redundancy - so that you can take machines out for maintenance/repairs without impacting any one group, ie. the load is shared across 75 machines not 5. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patricka at its.uct.ac.za Wed Feb 25 09:01:49 2004 From: patricka at its.uct.ac.za (Patrick) Date: Wed, 25 Feb 2004 16:01:49 +0200 Subject: [Beowulf] G5 cluster for testing References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: <026001c3fba7$e9af5d00$a61b9e89@nawty> Has anyone here actually tried out Xgrid ? Apples grid stuff. It seems to be not so fussy in regards to the type of macs you attach and suchlike ? as well as them being configurable via Zeroconf. P ----- Original Message ----- From: "Bill Broadley" To: "Orion Poplawski" Cc: Sent: Wednesday, February 25, 2004 9:10 AM Subject: Re: [Beowulf] G5 cluster for testing > On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote: > > Anyone (vendors?) out there have a G5 cluster available for some > > For the most part I'm finding that cluster performance is mostly > predictable by single node performance, and the scaling of the > interconnect. At least as an approximation, I'm going to use to find > a good place to start for my next couple cluster designs. > > I'm current benchmarking: > Dual G5 > Opteron duals (1.4, 1.8, and 2.2) > Opteron quad 1.4 > Itanium dual 1.4 GHz > Dual P4-3.0 GHz+HT > Single P4-3.0 GHz+HT > > Alas, my single node performance testing on the G5 has been foiled by > my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem > working. > > Anyone else have MPICH and shared memory working on OSX? Or maybe a dual > g5 linux account for an evening of benchmarking? > > Normally using ch_p4 and localhost wouldn't be to big a deal, but > ping localhost on OSX is something like 40 times than linux, mpich with > ch_p4 on OSX is around 20 times worse than linux with shared memory. > > > testing? I've been charged with putting together a small cluster and > > have been asked to look into G5 systems as well (I guess 64 bit powerPC > > really....) > > Assuming all the applications and tools work under all environments your > considering I'd figure out what interconnect you want to get first. > > -- > Bill Broadley > Computational Science and Engineering > UC Davis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Feb 25 08:22:15 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 25 Feb 2004 21:22:15 +0800 (CST) Subject: [Beowulf] G5 cluster for testing In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: <20040225132215.21497.qmail@web16813.mail.tpe.yahoo.com> I've heard that LAM works better with OSX. Andrew. --- Bill Broadley ????> On > Alas, my single node performance testing on the G5 > has been foiled by > my inability to get MPICH, OSX, and ./configure > --with-device=ch_shmem > working. > > Anyone else have MPICH and shared memory working on > OSX? Or maybe a dual > g5 linux account for an evening of benchmarking? > > Normally using ch_p4 and localhost wouldn't be to > big a deal, but > ping localhost on OSX is something like 40 times > than linux, mpich with > ch_p4 on OSX is around 20 times worse than linux > with shared memory. > > > testing? I've been charged with putting together > a small cluster and > > have been asked to look into G5 systems as well (I > guess 64 bit powerPC > > really....) > > Assuming all the applications and tools work under > all environments your > considering I'd figure out what interconnect you > want to get first. > > -- > Bill Broadley > Computational Science and Engineering > UC Davis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Feb 25 08:27:54 2004 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Wed, 25 Feb 2004 21:27:54 +0800 (CST) Subject: [Beowulf] Subclusters... In-Reply-To: <200402242339.19310.raiders@phreaker.net> Message-ID: <20040225132754.14108.qmail@web16809.mail.tpe.yahoo.com> GridEngine has the concept of a CELL. It is not well documented, but it works like pointing to a different cell gives you a different configuration, ie. different subcluster. When you setup SGE, it will ask you for the name of the cell, so on the same head node, each time you run the sge install script, use a different cell name. This way you will get 5 different SGE clusters controlled by the same headnode. Better ask on the SGE mailing list since I've never played around with this too much. http://gridengine.sunsource.net/project/gridengine/maillist.html Andrew. --- raiders at phreaker.net ????> We are on a project as described below: > > - IA32 linux cluster for general parallel > programming > - five head nodes, each head node will have about 15 > compute nodes and > dedicated storage > - groups of cluster-users will be restricted to > their own clusters normally > (some exclusions may apply) > - SGE/PBS, GbE etc are standard choices > > But the people in power want one single software or > admin console (cluster > toolkit?) to manage the entire cluster from one adm > station (which may or may > not be one of the head nodes). > > I looked around and could not find any suitable > solution (ROCKS, oscar, etc). > ROCKS, oscar etc can manage only one cluster at a > time and cannot handle > subclusters. (I might be wrong) > > I believe that only custom programming can help. > Appreciate any expert > opinion > > Thanks, > Shawn > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ashley at quadrics.com Wed Feb 25 08:03:02 2004 From: ashley at quadrics.com (Ashley Pittman) Date: Wed, 25 Feb 2004 13:03:02 +0000 Subject: [Beowulf] G5 cluster for testing In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu> References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: <1077714182.656.235.camel@ashley> On Wed, 2004-02-25 at 07:10, Bill Broadley wrote: > On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote: > > Anyone (vendors?) out there have a G5 cluster available for some > > For the most part I'm finding that cluster performance is mostly > predictable by single node performance, and the scaling of the > interconnect. There is a third issue here which you've missed which is that interconnect performance can depends on the PCI bridge that it's plugged into. It would be more correct to say that performance is predictable by dual-node performance and scaling of the interconnect. Of course this may not make a difference for Ethernet or even gig-e but it does matter at the high end. Ashley, _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pathscale.com Wed Feb 25 14:32:12 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed, 25 Feb 2004 11:32:12 -0800 Subject: [Beowulf] G5 cluster for testing In-Reply-To: <1077714182.656.235.camel@ashley> References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <1077714182.656.235.camel@ashley> Message-ID: <20040225193212.GA14558@greglaptop.internal.keyresearch.com> On Wed, Feb 25, 2004 at 01:03:02PM +0000, Ashley Pittman wrote: > There is a third issue here which you've missed which is that > interconnect performance can depends on the PCI bridge that it's plugged > into. Doesn't the G5 have exactly one chipset implementation available? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Thu Feb 26 05:38:09 2004 From: pesch at attglobal.net (pesch at attglobal.net) Date: Thu, 26 Feb 2004 02:38:09 -0800 Subject: [Beowulf] Flashmobcomputing References: Message-ID: <403DCC91.5A10A77B@attglobal.net> Nothing moves faster than the speed of light - with the exception of bad news (according to the late Douglas Adams); therefore, at the grid nirvana, bad news must get increasingly more bad. Which leads me to the hypothesis that nirvana is that locus at the irs which stores the access codes for the pentium microcode backdoors... "Robert G. Brown" wrote: > On Mon, 23 Feb 2004, Greg Lindahl wrote: > > > On Mon, Feb 23, 2004 at 05:47:27PM -0500, Mark Hahn wrote: > > > > > of course, in the grid nirvana, > > > all computers would have multiple ports of infiniband, > > > and the word would be 5 us across ;) > > > > In grid nirvana, the speed of light would rise with Moore's Law. > > I'll have to think about that one. > > Exponential growth of the speed of light. Hmmm. Some sort of > inflationary model? Space flattening towards non-relativistic > classical? The physics of Nirvana would be veeeery interesting... > > :-) > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Feb 25 22:02:03 2004 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 26 Feb 2004 14:02:03 +1100 Subject: [Beowulf] Flashmobcomputing In-Reply-To: <403DCC91.5A10A77B@attglobal.net> References: <403DCC91.5A10A77B@attglobal.net> Message-ID: <200402261402.05015.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 26 Feb 2004 09:38 pm, pesch at attglobal.net wrote: > Nothing moves faster than the speed of light - with the exception of bad > news (according to the late Douglas Adams); The only things known to go faster than ordinary light is monarchy, according to the philosopher Ly Tin Weedle. He reasoned like this: you can't have more than one king, and tradition demands that there is no gap between kings, so when a king dies the succession must therefore pass to the heir instantaneously. Presumably, he said, there must be some elementary particles - -- kingons, or possibly queons -- that do this job, but of course succession sometimes fails if, in mid-flight, they strike an anti-particle, or republicon. His ambitious plans to use his discovery to send messages, involving the careful torturing of a small king in order to modulate the signal, were never fully expanded because, at that point, the bar closed. - -- (Terry Pratchett, Mort) courtesy of: http://www.co.uk.lspace.org/books/pqf/mort.html - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAPWGrO2KABBYQAh8RAsm3AJ4zV3fEk8q/8Jm/zqY4xiBzGvKj4ACfeT+N 3NhDhvgiJyhukmnzBFHUaMQ= =NgG+ -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Feb 25 21:32:59 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed, 25 Feb 2004 18:32:59 -0800 Subject: [Beowulf] Cray buys Octigabay Message-ID: <20040226023258.GA9211@cse.ucdavis.edu> An interesting development: http://www.octigabay.com/ http://www.octigabay.com/newsEvents/cray_release.htm http://www.cray.com/ http://www.cray.com/media/2004/february/octigabay.html -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hpcatcnc at yahoo.com Thu Feb 26 01:56:09 2004 From: hpcatcnc at yahoo.com (prakash borade) Date: Wed, 25 Feb 2004 22:56:09 -0800 (PST) Subject: [Beowulf] predifined nodes for a job Message-ID: <20040226065609.19075.qmail@web21507.mail.yahoo.com> can any body tell how can i allot some fix predefined machines from my cluster to the job i have tried using option -machinefile mcfile where mcfile is the fiel in a local dir contaning required machinnames also i don't want to use the machine from which i will issue mpirun command and the mpich is installed opn thish machine do i have any solution fro this __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hpcatcnc at yahoo.com Thu Feb 26 02:04:08 2004 From: hpcatcnc at yahoo.com (prakash borade) Date: Wed, 25 Feb 2004 23:04:08 -0800 (PST) Subject: [Beowulf] predifined nodes for a job Message-ID: <20040226070408.73987.qmail@web21509.mail.yahoo.com> can any body tell how can i allot some fix predefined machines from my cluster to the job i have tried using option -machinefile mcfile where mcfile is the fiel in a local dir contaning required machinnames also i don't want to use the machine from which i will issue mpirun command and the mpich is installed opn thish machine do i have any solution fro this __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sdutta at deas.harvard.edu Thu Feb 26 07:43:51 2004 From: sdutta at deas.harvard.edu (Suvendra Nath Dutta) Date: Thu, 26 Feb 2004 07:43:51 -0500 (EST) Subject: [Beowulf] G5 cluster for testing In-Reply-To: <20040225071039.GA29125@cse.ucdavis.edu> References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: I fought for a while to get a OSX cluster up, precisely to test the G5 performance. I had lots of problems with setting up NFS and setting up MPICH to use shared memory on the dual processors. I was able to take advantage of the firewire networking built into OS X. We were taking the harder route of staying away from all non-open source tools to do NFS (NFSManager) or MPI (Pooch). As was pointed out in another message, we are mostly keen on just testing performance of three applications that we will run on our cluster rather than HPL numbers. Finally we gave up the struggle. We are now working with Apple to benchmark on an existing setup instead of us trying to set everything up ourselves. Unfortunately there isn't a howto on doing this yet. I'll post numbers when we get it. Suvendra. On Tue, 24 Feb 2004, Bill Broadley wrote: > On Tue, Feb 24, 2004 at 04:17:31PM -0700, Orion Poplawski wrote: > > Anyone (vendors?) out there have a G5 cluster available for some > > For the most part I'm finding that cluster performance is mostly > predictable by single node performance, and the scaling of the > interconnect. At least as an approximation, I'm going to use to find > a good place to start for my next couple cluster designs. > > I'm current benchmarking: > Dual G5 > Opteron duals (1.4, 1.8, and 2.2) > Opteron quad 1.4 > Itanium dual 1.4 GHz > Dual P4-3.0 GHz+HT > Single P4-3.0 GHz+HT > > Alas, my single node performance testing on the G5 has been foiled by > my inability to get MPICH, OSX, and ./configure --with-device=ch_shmem > working. > > Anyone else have MPICH and shared memory working on OSX? Or maybe a dual > g5 linux account for an evening of benchmarking? > > Normally using ch_p4 and localhost wouldn't be to big a deal, but > ping localhost on OSX is something like 40 times than linux, mpich with > ch_p4 on OSX is around 20 times worse than linux with shared memory. > > > testing? I've been charged with putting together a small cluster and > > have been asked to look into G5 systems as well (I guess 64 bit powerPC > > really....) > > Assuming all the applications and tools work under all environments your > considering I'd figure out what interconnect you want to get first. > > -- > Bill Broadley > Computational Science and Engineering > UC Davis > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Thu Feb 26 06:55:22 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu, 26 Feb 2004 03:55:22 -0800 Subject: [Beowulf] G5 cluster for testing In-Reply-To: <1077714182.656.235.camel@ashley> References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> <1077714182.656.235.camel@ashley> Message-ID: <20040226115522.GA12286@cse.ucdavis.edu> On Wed, Feb 25, 2004 at 01:03:02PM +0000, Ashley Pittman wrote: > There is a third issue here which you've missed which is that > interconnect performance can depends on the PCI bridge that it's plugged > into. It would be more correct to say that performance is predictable > by dual-node performance and scaling of the interconnect. Of course > this may not make a difference for Ethernet or even gig-e but it does > matter at the high end. Take this chart for instance: http://www.myri.com/myrinet/PCIX/bus_performance.html On any decent size cluster the node performance or interconnect performance is likely to be significantly larger effects on cluster performance then any of the differences on that chart. Or maybe your talking about sticking $1200 Myrinet cards in a 133 MB/sec PCI slot? Don't forget peak bandwidth measurements assume huge (10000-64000 byte packets), latency tolerance, and zero computation. Not exactly the use I'd expect in a typical production cluster. So my suggestion is: #1 Pick your application(s), this is why your buying a cluster right? #2 For compatible nodes pick the node with the best perf or price/perf. #3 For compatible interconnects pick the one with the best scaling or price/scaling for the number of nodes you can afford/fit. #3 If you get a choice of PCI-X bridges, sure consult the URL above and pick the fastest one. -- Bill Broadley Computational Science and Engineering UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Feb 26 11:51:42 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 26 Feb 2004 11:51:42 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: <403DCC91.5A10A77B@attglobal.net> Message-ID: On Thu, 26 Feb 2004 pesch at attglobal.net wrote: > Nothing moves faster than the speed of light - with the exception of bad > news (according to the late Douglas Adams); therefore, at the grid > nirvana, bad news must get increasingly more bad. Which leads me to the > hypothesis that nirvana is that locus at the irs which stores the access > codes for the pentium microcode backdoors... This is not exactly correct. Or rather, it might well be true (something mandala-like in the image of that locus:-) but isn't strictly logical or on topic for the list. The correct LIST conclusion is that for us to build transluminal clusters, we need to insure that all the messages (news) carried are bad. Now, who is going to develop BMPI (Bad Message Passing Interface)? Any volunteers? ;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From choyhauyan at yahoo.com Thu Feb 26 00:51:24 2004 From: choyhauyan at yahoo.com (choy hau yan) Date: Wed, 25 Feb 2004 21:51:24 -0800 (PST) Subject: [Beowulf] shared distributed memory ? Message-ID: <20040226055124.33033.qmail@web41313.mail.yahoo.com> I am a user for scyld beowulf cluster. I use mpi for parallel computing.I have some question: > I got 2 processors that in shared memory and then > connect with TCP/IP to another 2 processors in shared > memory. > > I use mpisend/recv for communication, but why can't I > call this shared distributed memory? > The speedup with this architecture is very low.why? > > speedup: > 2 processor: 1.61 > 3 processor: 2.31 > 4 processor: 2.30 > actually with shared memory, the speedup is more high > that distributed becasue almost no cos communication > in shared memory.right? Hope that some one can answer my question. thanks.. > __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From moor007 at bellsouth.net Thu Feb 26 16:51:03 2004 From: moor007 at bellsouth.net (moor007 at bellsouth.net) Date: Thu, 26 Feb 2004 15:51:03 -0600 Subject: [Beowulf] Cluster HW Message-ID: <200402261551.03785.moor007@bellsouth.net> I apologize for having to ask in this forum...but I really do not know where to begin. I just upgraded my interconnects from the Dolphinics (SCI) and want to sell them (rarely used) because only one of the four applications I use would utilize them. Is there a forum/market, besides Ebay, for this type of specialty HW? Tim _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Feb 26 17:34:08 2004 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 27 Feb 2004 09:34:08 +1100 Subject: [Beowulf] G5 cluster for testing In-Reply-To: References: <403BDB8B.7060904@cora.nwra.com> <20040225071039.GA29125@cse.ucdavis.edu> Message-ID: <200402270934.10022.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 26 Feb 2004 11:43 pm, Suvendra Nath Dutta wrote: > We were taking the harder route of staying away from all non-open source > tools to do NFS (NFSManager) or MPI (Pooch). There is also Black Lab Linux from Terrasoft which build clusters on YDL with BProc, MPICH, etc for Macs. No idea whether it supports G5's or how FOSS it is though.. cheers! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD4DBQFAPnRgO2KABBYQAh8RAq6sAJMEJwyT1vn3MV9RM/Fwpy6gs4CZAJ9QAGf2 oyEbIVcHgTfcs+Jk2xb7dg== =92C8 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Feb 26 19:01:51 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 26 Feb 2004 19:01:51 -0500 (EST) Subject: [Beowulf] shared distributed memory ? In-Reply-To: <20040226055124.33033.qmail@web41313.mail.yahoo.com> Message-ID: A few questions: What are your processor speeds? What is your interconnect? What is your application? Having to communicate with another node vs the same node is not the same thing. (ping local host and ping the other node) Obviously your application is sensitive to the interconnect (either bandwidth of latency) Really fast processors and a slow interconnect usually means poor scalability for some applications. Doug On Wed, 25 Feb 2004, choy hau yan wrote: > I am a user for scyld beowulf cluster. I use mpi for > parallel computing.I have some question: > > > I got 2 processors that in shared memory and then > > connect with TCP/IP to another 2 processors in > shared > > memory. > > > > I use mpisend/recv for communication, but why can't > I > > call this shared distributed memory? > > The speedup with this architecture is very low.why? > > > > speedup: > > 2 processor: 1.61 > > 3 processor: 2.31 > > 4 processor: 2.30 > > actually with shared memory, the speedup is more > high > > that distributed becasue almost no cos communication > > in shared memory.right? Hope that some one can > answer my question. thanks.. > > > > > __________________________________ > Do you Yahoo!? > Get better spam protection with Yahoo! Mail. > http://antispam.yahoo.com/tools > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From graham.mullier at syngenta.com Fri Feb 27 05:12:18 2004 From: graham.mullier at syngenta.com (graham.mullier at syngenta.com) Date: Fri, 27 Feb 2004 10:12:18 -0000 Subject: [Beowulf] Flashmobcomputing Message-ID: <0B27450D68F1D511993E0001FA7ED2B30437D3BD@ukjhmbx12.ukjh.zeneca.com> Hmm, presumably a 'bad' message will need to have the Evil Bit set (http://www.ietf.org/rfc/rfc3514.txt)? Graham -----Original Message----- From: Robert G. Brown [mailto:rgb at phy.duke.edu] [...] The correct LIST conclusion is that for us to build transluminal clusters, we need to insure that all the messages (news) carried are bad. Now, who is going to develop BMPI (Bad Message Passing Interface)? Any volunteers? ;-) [...] _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From anantanagb at yahoo.com Fri Feb 27 04:49:38 2004 From: anantanagb at yahoo.com (anantanag bhat) Date: Fri, 27 Feb 2004 01:49:38 -0800 (PST) Subject: [Beowulf] P4_error: net_recv read : probable EOF on socket:1 Message-ID: <20040227094938.32769.qmail@web21322.mail.yahoo.com> Sir, I have installed MPICH on my 8 processor Cluster. Every thing was running fine for first few days. Now if I starts the run in the node4, it is getting stuck. after 2hour. the error in the .out file is as below "P4_error: net_recv read : probable EOF on socket:1" But it is not the same in first 3 nodes. In these runs are going fine. Can anybody please help me to solve this. Thanks in advance __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Feb 27 08:11:43 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 27 Feb 2004 08:11:43 -0500 (EST) Subject: [Beowulf] Flashmobcomputing In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B30437D3BD@ukjhmbx12.ukjh.zeneca.com> Message-ID: On Fri, 27 Feb 2004 graham.mullier at syngenta.com wrote: > Hmm, presumably a 'bad' message will need to have the Evil Bit set > (http://www.ietf.org/rfc/rfc3514.txt)? You know, I just joined the ietf.org list a week or two ago to see if there was any possibility of leveraging their influence on e.g. AV vendors to get them to stop mailing bounce messages back to the "From" address on viruses, given that there hasn't been a virus that hasn't forged its From header to an innocent third party for several years now. Finding myself sucked into an endless discussion with people who want the ietf to issue an RFC to call for digitally signing all mail and using said signatures to drive all spam white/blacklisting (imagine the keyservice THAT would require and the gazillion dollar profits it would generate) I have gradually started to wonder if the ietf has degenerated into a kind of a cruel joke. This RFC, however, lifts my spirits and renews my confidence that the original luminaries that designed in the Internet have not fully stopped glowing in the chaotic darkness that surrounds them. Armed with the complete confidence that my design is based on both sound protocol and Dr. D. Adams' valuable empirical observation about bad news, I will start work on a PVM version that sets the Evil Bit right away. I fully expect to win a Nobel Prize from the proof that communications are transluminal in the resulting cluster. It must be that the Evil Bit is somehow a time-reversal bit or a tachyonic bit -- Bad News must somehow propagate backwards in time from the event. I most certainly will acknowledge all of the contributions of all you "little people" when I receive my invitation to Stockholm. I'm so happy. Sniff. rgb > > Graham > > -----Original Message----- > From: Robert G. Brown [mailto:rgb at phy.duke.edu] > [...] The correct LIST conclusion is that > for us to build transluminal clusters, we need to insure that all the > messages (news) carried are bad. > > Now, who is going to develop BMPI (Bad Message Passing Interface)? Any > volunteers? > > ;-) > [...] > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Feb 27 12:38:09 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 27 Feb 2004 18:38:09 +0100 (CET) Subject: [Beowulf] Flashmobcomputing In-Reply-To: Message-ID: On Fri, 27 Feb 2004, Robert G. Brown wrote: > > Armed with the complete confidence that my design is based on both sound > protocol and Dr. D. Adams' valuable empirical observation about bad > news, I will start work on a PVM version that sets the Evil Bit right > away. I fully expect to win a Nobel Prize from the proof that > communications are transluminal in the resulting cluster. It must be > that the Evil Bit is somehow a time-reversal bit or a tachyonic bit -- > Bad News must somehow propagate backwards in time from the event. > Once this phase of the research has been completed, can we make an application to the NSF for an extension into using SEP fields for systems management? http://www.fact-index.com/s/so/somebody_else_s_problem_field.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at comcast.net Fri Feb 27 16:23:23 2004 From: laytonjb at comcast.net (Jeffrey B. Layton) Date: Fri, 27 Feb 2004 16:23:23 -0500 Subject: [Beowulf] Single Processor vs SMP In-Reply-To: <20040227092124.GA8410@blackTiger> References: <20040227092124.GA8410@blackTiger> Message-ID: <403FB54B.5030401@comcast.net> Paulo, I hoping someone will jump and say that the answer depends upon the code(s) you're running. If possible test your codes on a dual CPU box with one copy running and then two copies running (make sure one copy is on one CPU). Test this on the architectures you are interested in. If you can also test on multiple nodes with some kind of interconnect to judge how the code(s) scale with number of nodes and with interconnect. For example, at work I use a code that we tested on single and dual CPU machines. It was an older PIII/500 box that used the old Intel 440BX chipset (if I remember correctly). We found that running two copies only resulted in a 30% penalty for running duals. We also tested on a cluster with Myrinet and GigE. Myrinet only gave this code about a 2% decrease in wall clock time (we measure speed in wall clock time since that is what is important to us). Then we got quotes for machines and did the price/performance calculation and determined which cluster was the best. I highly recommend doing the same thing for your code(s). Be sure to check out Opterons since they have an interesting memory subsytem that should allow your codes to have little penalty in running on dual machines ("should" is the operative word. You should test your codes to determine if this is true). Good Luck! Jeff >Hello, > >I'm currently working in a physics department that is in the process of >building a high performance Beowulf cluster and I have some doubts in >terms of what type of hardware to acquire. > >The programming systems that will be used are MPI and HPF. Does anyone >knows any study comparing the performance of single cpu machines vs smp >machines or even between the several cpu's available (intel p4, amd athlon, >powerpc g5, ...)? > >Thanks for any advice > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From paulojjs at bragatel.pt Fri Feb 27 04:21:24 2004 From: paulojjs at bragatel.pt (Paulo Silva) Date: Fri, 27 Feb 2004 09:21:24 +0000 Subject: [Beowulf] Single Processor vs SMP Message-ID: <20040227092124.GA8410@blackTiger> Hello, I'm currently working in a physics department that is in the process of building a high performance Beowulf cluster and I have some doubts in terms of what type of hardware to acquire. The programming systems that will be used are MPI and HPF. Does anyone knows any study comparing the performance of single cpu machines vs smp machines or even between the several cpu's available (intel p4, amd athlon, powerpc g5, ...)? Thanks for any advice -- Paulo Jorge Jesus Silva perl -we 'print "paulojjs".reverse "\ntp.letagarb@"' The best you get is an even break. -- Franklin Adams -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From ddw at dreamscape.com Sat Feb 28 01:19:37 2004 From: ddw at dreamscape.com (Daniel Williams) Date: Sat, 28 Feb 2004 01:19:37 -0500 Subject: [Beowulf] Flashmobcomputing - the evil bit References: <200402271705.i1RH5vh16216@NewBlue.scyld.com> Message-ID: <404032F8.F5DDDA59@dreamscape.com> The problem with this idea is that Linux is too good at dealing with flawed or malicious data, so even if the evil bit is set, it still would not qualify as "bad news", and thus would not travel superluminally. Consequently, I would speculate that the only system that could communicate superluminally is one running some form of Winblows, since *any* data, of *any* kind, with or without the evil bit set is bad news for MS operating systems, and likely to cause a crash. The problem with superluminal cluster computing then becomes obvious - you can't get any actual useful calculation done faster than lightspeed, because the only operating systems that work at that speed can't do any useful work. DDW > Hmm, presumably a 'bad' message will need to have the Evil Bit set > (http://www.ietf.org/rfc/rfc3514.txt)? > > Graham > > -----Original Message----- > From: Robert G. Brown [mailto:rgb at phy.duke.edu] > [...] The correct LIST conclusion is that > for us to build transluminal clusters, we need to insure that all the > messages (news) carried are bad. > > Now, who is going to develop BMPI (Bad Message Passing Interface)? Any > volunteers? > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf