From hraa at lncc.br Fri Aug 1 13:58:59 2003 From: hraa at lncc.br (Ricardo) Date: Fri, 1 Aug 2003 14:58:59 -0300 (BRT) Subject: Filesystem Message-ID: Hi all Which one is better to use, ext3 or raiserfs? Someone have performance results comparing Ext3 with raiserfs? Thanks ------------------------------- Ricardo .-. /v\ // \\ > L I N U X < /( )\ ^^-^^ ------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Aug 1 19:05:47 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 1 Aug 2003 16:05:47 -0700 Subject: New technology for trunking gigE switches In-Reply-To: References: <20030730190605.GA2640@sphere.math.ucdavis.edu> Message-ID: <20030801230547.GA2324@greglaptop.internal.keyresearch.com> What good timing: Broadcom just released public info about their new generation of gigE switch chips, which are capable of using inexpensive 4-wire copper 10gig uplinks between boxes. The neat thing about this is that instead of having to buy a bunch of 10gig optics, which are very expensive, it uses a 4-wire 3.125 gbit copper interconnect, same as InfiniBand. You should expect to see this showing up in stackable 24 to 48 port switches, allowing up to 384 gigE ports in a single blob, at around $100/gigE port. The center is an 8-port 10gigE switch, so as you can see, you have the same issue of the ratio of uplink bandwidth to local bandwidth that you had in the fast ethernet stackables with 1gig uplinks. You will note, however, that the Broadcom blurb says you can get much better total fabric bandwidth than just one of those chips. They don't explain how, and so I can't mention it -- but if anyone finds a public explanation, please let me know. I believe that it should be able to hit the quoted 640 gbits of total traffic, i.e. at 384 ports, you can build a switch which almost has perfect bisection. The total switch latency also shouldn't be so bad: say ~35 usec for first bit in to last bit out, which is just over double of what you'll see with a standalone gigE switch. (The total latency seen by an application using TCP/IP will be higher, of course.) The HP Procurve guys had a quote in one of the press releases, but I'm sure that other vendors will ship products based on this too; Broadcom is already a high volume producer of chips used in ethernet switches. -- greg http://www.broadcom.com/docs/promostrataxgs.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Fri Aug 1 19:26:38 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Fri, 1 Aug 2003 19:26:38 -0400 (EDT) Subject: Filesystem In-Reply-To: Message-ID: > Which one is better to use, ext3 or raiserfs? there is no clearcut winner. > Someone have performance results comparing Ext3 with raiserfs? yes, there's plenty available. reiserfs people always focus on situations where directories have billions of small files. that's not surprising, since that's their design target: efficient storage of very small files, and efficient handling of ridiculously overfull directories. I question the value of worrying about very small files (because disk is so cheap, and clusters mostly have big files); big directories seem like someone's design mistake to me. ext3 is designed as an ultra-stable journaling version of ext2, and succeeds. it's difficult to compare reliability, but ext3 does generally have a better reputation than reiserfs. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Fri Aug 1 22:33:26 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 02 Aug 2003 06:33:26 +0400 Subject: nfs problem In-Reply-To: <200308011902.h71J2Vw28853@NewBlue.Scyld.com> Message-ID: Hello Thanks everbody it's working. I will need to install the MPICH now. Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Fri Aug 1 22:33:26 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 02 Aug 2003 06:33:26 +0400 Subject: nfs problem In-Reply-To: <200308011902.h71J2Vw28853@NewBlue.Scyld.com> Message-ID: Hello Thanks everbody it's working. I will need to install the MPICH now. Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Fri Aug 1 22:33:50 2003 From: timm at fnal.gov (Steven Timm) Date: Fri, 01 Aug 2003 21:33:50 -0500 Subject: Filesystem In-Reply-To: Message-ID: I have some anecdotal evidence that ext3 starts taking performance hit in cases where there is a lot of files getting written and then quickly erased. Also there's a performance penalty on burst I/O--e.g. if you have a system doing near-continuous disk writes and reads it will bump the load factor up. But I don't have any information to suggest that Reiser does it better. Steve ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Core Support Services Dept. Assistant Group Leader, Scientific Computing Support Group Lead of Computing Farms Team On Fri, 1 Aug 2003, Ricardo wrote: > > Hi all > > Which one is better to use, ext3 or raiserfs? > Someone have performance results comparing Ext3 with raiserfs? > > Thanks > > ------------------------------- > Ricardo > > .-. > /v\ > // \\ > L I N U X < > /( )\ > ^^-^^ > ------------------------------- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Aug 1 22:45:01 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 1 Aug 2003 19:45:01 -0700 (PDT) Subject: Filesystem In-Reply-To: Message-ID: hi ricardo filesystem comparasons http://www.linux-sec.net/FileSystem/#FS http://aurora.zemris.fer.hr/filesystems/ i think ext3 is better than reiserfs i think ext3 is not any better than ext2 in terms of somebody hitting pwer/reset w/o proper shutdown - i always allow it to run e2fsck when it does an unclean shutdown ... - yes ext3 will timeout and continue and restore from backups but ... am paranoid about the underlying ext2 getting corrupted by random power off and resets c ya alvin On Fri, 1 Aug 2003, Ricardo wrote: > > Hi all > > Which one is better to use, ext3 or raiserfs? > Someone have performance results comparing Ext3 with raiserfs? > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Sat Aug 2 08:48:09 2003 From: angel at wolf.com (Angel Rivera) Date: Sat, 02 Aug 2003 12:48:09 GMT Subject: Filesystem In-Reply-To: References: Message-ID: <20030802124809.1437.qmail@houston.wolf.com> Steven Timm writes: > I have some anecdotal evidence that ext3 starts taking performance hit > in cases where there is a lot of files getting written and then > quickly erased. Also there's a performance penalty on burst I/O--e.g. > if you have a system doing near-continuous disk writes and reads it > will bump the load factor up. But I don't have any information to > suggest that Reiser does it better. > It depends what you are going to use the nodes for. For normal compute nodes, I don't think there is enough of a payback to change ext3. For our disk nodes, we use ext3 for system filesystems and XFS for the exported disk space (with NFS patches and tuning of couse) to get some serious performance. We are currently testing different filesystems on one of the disk nodes we just purchased and have seen a dramatic rise in performance with the above. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mukshere at rediffmail.com Sat Aug 2 10:16:43 2003 From: mukshere at rediffmail.com (mukund govind umalkar) Date: 2 Aug 2003 14:16:43 -0000 Subject: Beowulf Research Message-ID: <20030802141643.9462.qmail@webmail7.rediffmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sanjoy at chem.iitkgp.ernet.in Sat Aug 2 13:33:21 2003 From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay) Date: Sat, 2 Aug 2003 23:03:21 +0530 (IST) Subject: NIS In-Reply-To: <20030802124809.1437.qmail@houston.wolf.com> Message-ID: Hi, I have a cluster running Rh 7.3 with NIS server. The cluster was running fine. But suddenly after rebooting now the clients are having problems in recognizing the NIS domain server name. while booting the clients it says: Binding to the NIS domain: [OK] Listening for an NIS domail server............[FAILED] ypwhich on clients says 'Can't communicate with ypbind' ypbind, ypserv are running fine on the server. I will appreciate if anyone can help.. Thanks. Sanjoy -------------------------------------------------------------------- Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in Assistant Professor Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) Department of Chemistry 91-3222-283345 (Home) Indian Institute of Technology 91-3222-279938 (Home) Kharagpur 721 302 Fax : 91-3222-255303 West Bengal, India. 91-3222-282252 http://www.chem.iitkgp.ernet.in/faculty/SB/ -------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bari at onelabs.com Sat Aug 2 14:02:45 2003 From: bari at onelabs.com (Bari Ari) Date: Sat, 02 Aug 2003 13:02:45 -0500 Subject: SH4 & SH5 Clustering Message-ID: <3F2BFCC5.6060803@onelabs.com> It's been a few years since anyone has posted anything here on clusters using the SH-4. http://www.beowulf.org/pipermail/beowulf/1999-November/007339.html Does anyone have results or experiences of building systems using the SH-4? http://www.superh.com/products/sh4.htm http://www.superh.com/products/sh5.htm The SH-5 is finally showing up in silicon at 2.8GFLOPS, 400MHz, under 1W/cpu. The caches are small at 32KB yet have a 3.2GB/s peak internal bus, the SOC's have DDR memory and 32bit/66MHz PCI. They look attractive for low power dense clusters/blade applications that won't be hurt much by their small cache size and the 264MB/s peak PCI interface. A 1-U could contain 24 - 32 of these and require only convection cooling for the cpu's. The DDR memory would be the "hot spots" and require some forced air cooling. Bari _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sat Aug 2 14:42:14 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 2 Aug 2003 14:42:14 -0400 (EDT) Subject: NIS In-Reply-To: Message-ID: On Sat, 2 Aug 2003, Sanjoy Bandyopadhyay wrote: > Hi, > I have a cluster running Rh 7.3 with NIS server. The cluster was running > fine. But suddenly after rebooting now the clients are having problems in > recognizing the NIS domain server name. while booting the clients it says: > > Binding to the NIS domain: [OK] > Listening for an NIS domail server............[FAILED] > > ypwhich on clients says 'Can't communicate with ypbind' > > ypbind, ypserv are running fine on the server. Hmmm, so many possible causes. If you say "suddenly after rebooting" and if it applies to all the clients, I'd check the following: a) The network connection of the server. All things being equal, I'd have to say this is a prime candidate. Don't forget to check the wire(s) itself -- many is the perplexing networking or service problem that turned out to be caused by somebody kicking a wire so that the plug was no longer properly seated. Check network connectivity in other ways to -- is the switch port suddenly bad, do I need to power cycle the switch (switches sometimes "wedge" and need a cycle to rebuild their tables), and so forth. On some switches it is possible to block broadcasts -- NIS requires them, so be sure that this didn't get done by mistake. b) When you've eliminated hardware as a possible cause (and have validated perfect network connectivity) then you can look for software problems. A "sudden" problem like this is odd -- perhaps you accidentally updated with a broken RPM? Perhaps somebody trashed a table? Did somebody update iptables or ipchains or change their rules so port access is blocked that way? See if checking out these systems solves it. If not, in your next post include more detail on your network and so forth. Usually this kind of thing is solved by doggedly testing one system at a time until the culprit emerges, starting with the most likely. Don't forget, you have tools like tcpdump that will let you snoop the network packets one at a time if necessary to be sure that they are indeed arriving at the server from the clients. I recall that you can turn on ypserv with -d for debug to get a much more verbose operational mode to help debug as well. HTH, rgb > > I will appreciate if anyone can help.. > Thanks. > Sanjoy > > > -------------------------------------------------------------------- > Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in > Assistant Professor > Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) > Department of Chemistry 91-3222-283345 (Home) > Indian Institute of Technology 91-3222-279938 (Home) > Kharagpur 721 302 Fax : 91-3222-255303 > West Bengal, India. 91-3222-282252 > http://www.chem.iitkgp.ernet.in/faculty/SB/ > -------------------------------------------------------------------- > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sat Aug 2 14:50:03 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 2 Aug 2003 14:50:03 -0400 (EDT) Subject: Beowulf Research In-Reply-To: <20030802141643.9462.qmail@webmail7.rediffmail.com> Message-ID: On 2 Aug 2003, mukund govind umalkar wrote: > hello sir, > i am a graduate student, and i am intrested in doing research on > Beowulf clusters, so plzz send me some material and let me know > about the various papers that have presented on Beowulf. > > If possible please some useful URLs for the same There are lots of starting points, and the better sites form for all practical purposes a webring with mutual links interconnecting them so sites you don't find on one you're likely to find on another linked to it. One such starting point is: http://www.phy.duke.edu/brahma (look under e.g. resources and links and papers). Brahma will lead you do the beowulf underground, to the original/main beowulf site, and to many other well-known clustering sites and resources. To find "real" papers on clustering, check out e.g. ;login and various other computer geek journals and magazines. Linux Magazine has an excellent clustering column by Forrest Hoffman. There are some online webzines devoted to clustering (some linked to brahma). Google is your friend here -- with google you can find out pretty much anything that is online. rgb > > thanx > Mukund > > > > ___________________________________________________ > Download the hottest & happening ringtones here! > OR SMS: Top tone to 7333 > Click here now: > http://sms.rediff.com/cgi-bin/ringtone/ringhome.pl > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Sat Aug 2 18:11:17 2003 From: angel at wolf.com (Angel Rivera) Date: Sat, 02 Aug 2003 22:11:17 GMT Subject: NIS In-Reply-To: References: Message-ID: <20030802221117.3967.qmail@houston.wolf.com> Sanjoy Bandyopadhyay writes: > Hi, > I have a cluster running Rh 7.3 with NIS server. The cluster was running > fine. But suddenly after rebooting now the clients are having problems in > recognizing the NIS domain server name. while booting the clients it says: > > Binding to the NIS domain: [OK] > Listening for an NIS domail server............[FAILED] > > ypwhich on clients says 'Can't communicate with ypbind' > > ypbind, ypserv are running fine on the server. > > I will appreciate if anyone can help.. Check to make sure your NIS server is running and talking (TCPDUMP). If your node is running fine? Trying running "/etc/rc.d/init.d/ypbind restart" and see what error crops up. Also, try nisdomainname and see what crops up there. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sanjoy at chem.iitkgp.ernet.in Sun Aug 3 01:36:00 2003 From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay) Date: Sun, 3 Aug 2003 11:06:00 +0530 (IST) Subject: NIS In-Reply-To: <20030802221117.3967.qmail@houston.wolf.com> Message-ID: On Sat, 2 Aug 2003, Angel Rivera wrote: > Check to make sure your NIS server is running and talking (TCPDUMP). > If your node is running fine? Trying running "/etc/rc.d/init.d/ypbind > restart" and see what error crops up. yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this: Shutting down NIS services: [FAILED] Binding to the NIS domain: [OK] Listening for an NIS domain server................... [FAILED] > Also, try nisdomainname and see what crops up there. nisdomainname gives correct domain name. We have the Sever filesystems NFS mounted on the clients. I can see now that this NFS mounting is not working for the clients. While the clients tries to mount the NFS filesystem, it gives this error: Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable to receive Thanks.. -Sanjoy -------------------------------------------------------------------- Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in Assistant Professor Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) Department of Chemistry 91-3222-283345 (Home) Indian Institute of Technology 91-3222-279938 (Home) Kharagpur 721 302 Fax : 91-3222-255303 West Bengal, India. 91-3222-282252 http://www.chem.iitkgp.ernet.in/faculty/SB/ -------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leventeh at hotmail.com Sun Aug 3 02:14:23 2003 From: leventeh at hotmail.com (Levente Horvath) Date: Sun, 03 Aug 2003 06:14:23 +0000 Subject: MPI & linux compilers Message-ID: To whom it may concern, We have 12 PCs set up for parallel computation. All are running linux (Redhat 7.3) and MPI. We would like to compute eigenvalues and eigenvectors for large matrices. We have managed to do up to 10000x10000 matrix no problem. Our program uses Scalapack and Blacs routines. These routines require two matrix to be declared. On single precision two 10000x10000 matrix occupies 800Mb of memory which is already exceeds the 512Mb local memory of each computer in our cluster. This memory were equally distributed over the 12 computers upon computation. So, we think that in theory we shouldn't have any problem going to large matrices; as our distributed memory is quite large 12*512Mb. Now, if we try to run a larger size then the compiler mpif77 returns a "large matrix" error. We have traced the compiler and found that mpif77 is a script that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we found that there is no problem with the compilation up to a size of 15000x15000, then the compiler crashes. After tracing the compilation procedure, we found that the linker "as" cannot link some of the .o and .s files in our /tmp directory. So, we used C rather than fortran. Statically, we cannot declare more than a 1500x1500 matrix (that put in to a hello world program for MPI). We thought it might be the problem with the static allocation of memory. So, we tried to allocate this space dynamically without any success.... Our questions are: Are we doing something wrong here. Or are the compilers gcc and g77-3 responsible for such an array limit. Or are we missing the ways to allocate memory for large matrices.... This is not the end of our story. We tried "ifc" IBM fortran 90 compiler. Unfortunately, we cannot link mpi libraries against this "ifc" compiler. It just doesn't see them. We have tried to compile ifc with the full path names of libraries using either static and dynamics libraries. In either case we had no success... We would appreciate all of your comments and suggestions. Thank you in advance.... _________________________________________________________________ ninemsn Extra Storage comes with McAfee Virus Scanning - to keep your Hotmail account and PC safe. Click here http://join.msn.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Sun Aug 3 12:17:19 2003 From: angel at wolf.com (Angel Rivera) Date: Sun, 03 Aug 2003 16:17:19 GMT Subject: NIS In-Reply-To: References: Message-ID: <20030803161719.30576.qmail@houston.wolf.com> Sanjoy Bandyopadhyay writes: > > On Sat, 2 Aug 2003, Angel Rivera wrote: > >> Check to make sure your NIS server is running and talking (TCPDUMP). >> If your node is running fine? Trying running "/etc/rc.d/init.d/ypbind >> restart" and see what error crops up. > > yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this: > Shutting down NIS services: [FAILED] > Binding to the NIS domain: [OK] > Listening for an NIS domain server................... [FAILED] > >> Also, try nisdomainname and see what crops up there. > > nisdomainname gives correct domain name. > > > We have the Sever filesystems NFS mounted on the clients. I can see > now that this NFS mounting is not working for the clients. While the > clients tries to mount the NFS filesystem, it gives this error: > > Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable > to receive It is not seeing the ypserver. have you tried rpcinfo -p _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bropers at lsu.edu Sat Aug 2 17:28:18 2003 From: bropers at lsu.edu (Brian D. Ropers-Huilman) Date: Sat, 2 Aug 2003 16:28:18 -0500 (CDT) Subject: NIS In-Reply-To: References: Message-ID: On Sat, 2 Aug 2003, Sanjoy Bandyopadhyay wrote: > Hi, > I have a cluster running Rh 7.3 with NIS server. The cluster was running > fine. But suddenly after rebooting now the clients are having problems in > recognizing the NIS domain server name. while booting the clients it says: > > Binding to the NIS domain: [OK] > Listening for an NIS domail server............[FAILED] > > ypwhich on clients says 'Can't communicate with ypbind' > > ypbind, ypserv are running fine on the server. > > I will appreciate if anyone can help.. > Thanks. > Sanjoy Sanjoy, You say that ypbind is running fine /on the SERVER/, what about ypbind running on the /CLIENT/? ypbind should not run on the server, it runs on the clients. -- Brian D. Ropers-Huilman (225) 578-0461 (V) Systems Administrator AIX (225) 578-6400 (F) Office of Computing Services GNU Linux brian at ropers-huilman.net High Performance Computing .^. http://www.ropers-huilman.net/ Fred Frey Building, Rm. 201, E-1Q /V\ \o/ Louisiana State University (/ \) -- __o / | Baton Rouge, LA 70803-1900 ( ) --- `\<, / `\\, ^^-^^ O/ O / O/ O _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Aug 3 13:05:32 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 3 Aug 2003 13:05:32 -0400 (EDT) Subject: NIS In-Reply-To: Message-ID: On Sun, 3 Aug 2003, Sanjoy Bandyopadhyay wrote: > > On Sat, 2 Aug 2003, Angel Rivera wrote: > > > Check to make sure your NIS server is running and talking (TCPDUMP). > > If your node is running fine? Trying running "/etc/rc.d/init.d/ypbind > > restart" and see what error crops up. > > yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this: > Shutting down NIS services: [FAILED] > Binding to the NIS domain: [OK] > Listening for an NIS domain server................... [FAILED] > > > Also, try nisdomainname and see what crops up there. > > nisdomainname gives correct domain name. > > > We have the Sever filesystems NFS mounted on the clients. I can see > now that this NFS mounting is not working for the clients. While the > clients tries to mount the NFS filesystem, it gives this error: > > Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable > to receive Yah. How about ping? Can you ping the server? Seriously, this looks like your problem is just a bad network connection, or conceivably a downed portmapper. If you can't ping, obviously your network is down and you need to fix it. If you can ping and ssh back and forth and the like, then make sure that portmap is running on your clients and server (an rpm that updated but installed the new one off?). In fact, do chkconfig --list and look at ALL of your network services to make sure they still make sense. Be careful here -- trojanned portmappers and other broken rpc services are a favorite way for crackers to enter your system. What you are seeing COULD be symptoms of being cracked, as trojanned portmappers not infrequently are broken (for a variety of reasons). You might prefer to back up your data and do a full reinstall of the server and a client, to check the rpm MD5 checksums, and to presume that you may have been cracked (monitoring your net traffic with TCPDUMP looking for bad guys) while you proceed. At least stay aware of the possibility. It's happened to me; it could have happened to you. rgb > > Thanks.. > -Sanjoy > > -------------------------------------------------------------------- > Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in > Assistant Professor > Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) > Department of Chemistry 91-3222-283345 (Home) > Indian Institute of Technology 91-3222-279938 (Home) > Kharagpur 721 302 Fax : 91-3222-255303 > West Bengal, India. 91-3222-282252 > http://www.chem.iitkgp.ernet.in/faculty/SB/ > -------------------------------------------------------------------- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Aug 3 13:16:36 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 3 Aug 2003 13:16:36 -0400 (EDT) Subject: NIS In-Reply-To: Message-ID: On Sat, 2 Aug 2003, Brian D. Ropers-Huilman wrote: > Sanjoy, > > You say that ypbind is running fine /on the SERVER/, what about ypbind running > on the /CLIENT/? ypbind should not run on the server, it runs on the clients. Right, but if NFS is also not running with an RPC error, it really suggests either raw networking problems or problems with the RPC subsystem, e.g. portmap. He also originally said that he had it working and then it stopped. If that is true it doubly points to networking or RPC. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Sun Aug 3 14:18:17 2003 From: gropp at mcs.anl.gov (William Gropp) Date: Sun, 03 Aug 2003 13:18:17 -0500 Subject: MPI & linux compilers In-Reply-To: Message-ID: <5.1.1.6.2.20030803131144.02f00e50@localhost> At 06:14 AM 8/3/2003 +0000, Levente Horvath wrote: >To whom it may concern, > >We have 12 PCs set up for parallel computation. All are running linux >(Redhat 7.3) and MPI. >We would like to compute eigenvalues and eigenvectors for large matrices. > >We have managed to do up to 10000x10000 matrix no problem. Our program >uses Scalapack and Blacs >routines. These routines require two matrix to be declared. On single >precision two 10000x10000 >matrix occupies 800Mb of memory which is already exceeds the 512Mb local >memory of >each computer in our cluster. This memory were equally distributed over >the 12 computers >upon computation. So, we think that in theory we shouldn't have any >problem going >to large matrices; as our distributed memory is quite large 12*512Mb. You need to declare only the local part of the matrix that is distributed across the processes, not the entire matrix. MPI doesn't provide any support for automatically distributing the data, though libraries written using MPI can do this if the data is allocated dynamically by the library. Languages such as HPF can do this for you, but have their own limitations. Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xavier at zeth.ciencias.uchile.cl Sun Aug 3 21:59:56 2003 From: xavier at zeth.ciencias.uchile.cl (Xavier Andrade) Date: Sun, 3 Aug 2003 21:59:56 -0400 (CLT) Subject: MPI & linux compilers In-Reply-To: Message-ID: On Sun, 3 Aug 2003, Levente Horvath wrote: > To whom it may concern, > > We have 12 PCs set up for parallel computation. All are running linux > (Redhat 7.3) and MPI. > We would like to compute eigenvalues and eigenvectors for large matrices. > > We have managed to do up to 10000x10000 matrix no problem. Our program uses > Scalapack and Blacs > routines. These routines require two matrix to be declared. On single > precision two 10000x10000 > matrix occupies 800Mb of memory which is already exceeds the 512Mb local > memory of > each computer in our cluster. This memory were equally distributed over the > 12 computers > upon computation. So, we think that in theory we shouldn't have any problem > going > to large matrices; as our distributed memory is quite large 12*512Mb. > > Now, if we try to run a larger size then the compiler mpif77 returns > a "large matrix" error. We have traced the compiler and found that mpif77 is > a script > that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we > found that > there is no problem with the compilation up to a size of 15000x15000, then > the > compiler crashes. After tracing the compilation procedure, we found that > the linker "as" cannot link some of the .o and .s files in our /tmp > directory. > > So, we used C rather than fortran. Statically, we cannot declare more than > a 1500x1500 matrix (that put in to a hello world program for MPI). We > thought > it might be the problem with the static allocation of memory. So, we tried > to allocate this space dynamically without any success.... > > Our questions are: Are we doing something wrong here. Or are the compilers > gcc and g77-3 > responsible for such an array limit. Or are we missing the ways to allocate > memory for large matrices.... > > This is not the end of our story. We tried "ifc" IBM fortran 90 compiler. > Unfortunately, we > cannot link mpi libraries against this "ifc" compiler. It just doesn't see > them. We have > tried to compile ifc with the full path names of libraries using either > static and dynamics libraries. > In either case we had no success... > Running "mpif77 -showme" will show you the line that mpif77 actually calls for compiling, if you want to change the compiler that mpif77 calls set the enviroment variable LAMHF77 (i.e. with `export LAMHF77=ifc` mpif77 will compile using ifc instead of f77). Xavier _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sanjoy at chem.iitkgp.ernet.in Mon Aug 4 01:09:13 2003 From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay) Date: Mon, 4 Aug 2003 10:39:13 +0530 (IST) Subject: NIS In-Reply-To: Message-ID: Hi, I figured out what was wrong.. the nsswitch.conf file was somehow corrupted. nis was not mentioned for passwd,group,shadow files. Now everything is under control. Thanks very much to all of you who helped with their valuable suggestions. -Sanjoy -------------------------------------------------------------------- Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in Assistant Professor Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) Department of Chemistry 91-3222-283345 (Home) Indian Institute of Technology 91-3222-279938 (Home) Kharagpur 721 302 Fax : 91-3222-255303 West Bengal, India. 91-3222-282252 http://www.chem.iitkgp.ernet.in/faculty/SB/ -------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From javier.crespo at itp.es Mon Aug 4 02:53:05 2003 From: javier.crespo at itp.es (Javier Crespo) Date: Mon, 04 Aug 2003 08:53:05 +0200 Subject: MPI & linux compilers References: Message-ID: <3F2E02D1.E834011B@itp.es> Levente Horvath wrote: > To whom it may concern, > > We have 12 PCs set up for parallel computation. All are running linux > (Redhat 7.3) and MPI. > We would like to compute eigenvalues and eigenvectors for large matrices. > > We have managed to do up to 10000x10000 matrix no problem. Our program uses > Scalapack and Blacs > routines. These routines require two matrix to be declared. On single > precision two 10000x10000 > matrix occupies 800Mb of memory which is already exceeds the 512Mb local > memory of > each computer in our cluster. This memory were equally distributed over the > 12 computers > upon computation. So, we think that in theory we shouldn't have any problem > going > to large matrices; as our distributed memory is quite large 12*512Mb. > > Now, if we try to run a larger size then the compiler mpif77 returns > a "large matrix" error. We have traced the compiler and found that mpif77 is > a script > that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we > found that > there is no problem with the compilation up to a size of 15000x15000, then > the > compiler crashes. After tracing the compilation procedure, we found that > the linker "as" cannot link some of the .o and .s files in our /tmp > directory. > > So, we used C rather than fortran. Statically, we cannot declare more than > a 1500x1500 matrix (that put in to a hello world program for MPI). We > thought > it might be the problem with the static allocation of memory. So, we tried > to allocate this space dynamically without any success.... > > Our questions are: Are we doing something wrong here. Or are the compilers > gcc and g77-3 > responsible for such an array limit. Or are we missing the ways to allocate > memory for large matrices.... > > This is not the end of our story. We tried "ifc" IBM fortran 90 compiler. > Unfortunately, we > cannot link mpi libraries against this "ifc" compiler. It just doesn't see > them. We have > tried to compile ifc with the full path names of libraries using either > static and dynamics libraries. > In either case we had no success... > > We would appreciate all of your comments and suggestions. > Thank you in advance.... If you want to link to mpi but compiling with "ifc" (is it really IBM? - I think it comes from intel), you first at all should have to compile that libraries with the same compiler that you are going to use for the main program, typically using the options "-fc=ifc","--f90=ifc" and "-f90linker=ifc" when configuring MPI and then installing it in you path (in a different place than the MPI libraries compiled with f77). Javier _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Mon Aug 4 08:02:50 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Mon, 4 Aug 2003 14:02:50 +0200 (CEST) Subject: Cisco switches for lam mpi In-Reply-To: Message-ID: On Tue, 29 Jul 2003, Jack Douglas wrote: > We have just installed a 32 Node Dual Xeon Cluster, with a Cisco Cataslyst > 4003 Chassis with 48 1000Base-t ports. > > We are running LAM MPI over gigabit, but we seem to be experiencing > bottlenecks within the switch > > Typically, using the cisco, we only see CPU utilisation of around 30-40% [...] I'm not a Cisco expert, but... We once got a Cisco switch from our networking people that we had to return immediately because it delivered such a bad performance. It was a Catalyst 2900XL with 24 Fast Ethernet ports, but it could only handle 12 ports at full speed. Above that, the performance brake down completely. For some benchmark results see, e.g.: http://www.cs.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf As a comparison, the quite nice results of a CentreCom 742i: http://www.cs.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf Disclaimer: Maybe the Cisco you mentioned is better, or Ciscos improved anyway since spring 2001 when I did the above tests. Besides, the situation for Gigabit Ethernet could be different. As we described on our workshop paper at CAC03 you can not trust the data sheets of switches anyway: http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 Conclusion: If you need a very high performing switch, you have to evaluate/benchmark it yourself. - Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Mon Aug 4 15:31:22 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 04 Aug 2003 15:31:22 -0400 Subject: large filesystem & fileserver architecture issues. Message-ID: <1060025481.28642.81.camel@roughneck> Hey all -- here is our situation. We currently have several clusters that are configured with either IBM x342 or Dell 2650 serves with their respective vendors SCSI RAID arrays hanging off of them. Each server + array is good for around 600GB after RAID 5 and formatting. The IBM's have the added ability to do a RAID 50 of multiple arrays ( which seems to work & perform quite nicely ). Each of the servers then exports the filesystem via NFS, and is mounted on the nodes. The clusters range from 24 to 128 nodes. For backups we maintain an offline server + array that we use to rsync the data nightly, then use our amanda server and tape robot to backup. We use an offline sync, as we need a level 0 dump every 2 weeks, and doing a level 0 dump of 600GB just trashes the performance on a live server. As we are a .edu and all of the clusters were purchased by the individual groups, the options we can explore have to be very cost efficient for hardware, and free for software. Now for the problem... A couple of our clusters are using the available filespace quite rapidly, and we are looking to add space. The most cost efficient approach we have found is to buy a IDE RAID box, like those available from RaidZone or PogoLinux. This allows us to use the cheap IDE systems as the offline sync, and use the scsi systems as online servers. And the questions: 1) Is there a better way to backup the systems without the need for an offline sync? 2) Does anyone have experience doing RAID 50 with Dell hardware? How bad does it bite ? 3) Are there any recommended IDE RAID systems? We are not looking for super stellar performance, just a solid system that does it's job as an offline sync for backups. -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Aug 4 22:49:35 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: 04 Aug 2003 22:49:35 -0400 Subject: updated run_mpiblast code Message-ID: <1060051774.25281.22.camel@protein.scalableinformatics.com> Hi folks: Updated and documented the run_mpiblast code. Better data from --debug switch. To see the man page, either perldoc run_mpiblast or run run_mpiblast --help Will be working on an RPM and a tarball installer in short order. It can be pulled from http://scalableinformatics.com/sge_mpiblast.html. The documentation (pod generated) can be viewed at http://scalableinformatics.com/run_mpiblast.html . -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Tue Aug 5 08:54:57 2003 From: rouds at servihoo.com (RoUdY) Date: Tue, 05 Aug 2003 16:54:57 +0400 Subject: mpich2-0.93 In-Reply-To: <200308041901.h74J1Tw27276@NewBlue.Scyld.com> Message-ID: hello everybody I have download MPICH2-0.93 and I have some difficulty in implementing it. That is, according to some research done I need to amend the file "machines.LINUX" so that the parallel computing can start and to choose which node to form part of the cluster. But the problem is that there is no file which name "machine.LINUX" and the file is suppose to be found in the directory .../mpich2-0.93/util/machines. Well, I use redhat9.0 - hope to hear from you very soon If there is a web site to get the necessary information please let me know. Cheers Roudy. -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Tue Aug 5 11:47:07 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 05 Aug 2003 11:47:07 -0400 Subject: large filesystem & fileserver architecture issues. In-Reply-To: References: Message-ID: <1060098427.30922.6.camel@roughneck> On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > On 4 Aug 2003, Nicholas Henke wrote: > > We have a lot of experience with IDE RAID arrays at client sites. The DOE > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) > and the price is hard to beat. The raid array that serves home > directories to their clusters and workstations is backed up nightly to a > second raid server, similarly to your system. To speed things along we > installed an extra gigabit card in the primary and backup servers and > connected the two directly. The nightly backup (cp -auf via NFS) of 410 > GBs take just over an hour using the dedicated gbit link. Rsync would > probably be faster. Without the shortcircuit gigabit link, it used to run > four or five times longer and seriously impact NFS performance for the > rest of the systems on the LAN. > > Hope this helps. > > Regards, > > Mike Prinkey > Aeolus Research, Inc. Definately does -- can you recommend hardware for the IDE RAID, or list what you guys have used ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mitchel at navships.com Tue Aug 5 15:11:58 2003 From: mitchel at navships.com (Mitchel Kagawa) Date: Tue, 5 Aug 2003 09:11:58 -1000 Subject: large filesystem & fileserver architecture issues. References: <1060098427.30922.6.camel@roughneck> Message-ID: <009701c35b85$714e7110$7101a8c0@Navatek.local> I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable. Although they do offer a NFS box that will turn one of these arrays into a standalone. We have had great success with these units (http://neptune.navships.com/images/harddrivearrays.jpg) . We first acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's. We have set it up in a RAID-5 configuration and have not yet had to replace even one of the drives (Knockin on wood). After a year we picked up the 14slot chassis and filled it with 160 maxtor drives and it has performed flawless... I think we paig about $4000 for the 14 slot chassis. you can add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver for $1500 and you got about 2TB of storage for around $7000 Mitchel Kagawa ----- Original Message ----- From: "Nicholas Henke" To: "Michael T. Prinkey" Cc: Sent: Tuesday, August 05, 2003 5:47 AM Subject: Re: large filesystem & fileserver architecture issues. > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > > On 4 Aug 2003, Nicholas Henke wrote: > > > > We have a lot of experience with IDE RAID arrays at client sites. The DOE > > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. > > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) > > and the price is hard to beat. The raid array that serves home > > directories to their clusters and workstations is backed up nightly to a > > second raid server, similarly to your system. To speed things along we > > installed an extra gigabit card in the primary and backup servers and > > connected the two directly. The nightly backup (cp -auf via NFS) of 410 > > GBs take just over an hour using the dedicated gbit link. Rsync would > > probably be faster. Without the shortcircuit gigabit link, it used to run > > four or five times longer and seriously impact NFS performance for the > > rest of the systems on the LAN. > > > > Hope this helps. > > > > Regards, > > > > Mike Prinkey > > Aeolus Research, Inc. > > Definately does -- can you recommend hardware for the IDE RAID, or list > what you guys have used ? > > Nic > -- > Nicholas Henke > Penguin Herder & Linux Cluster System Programmer > Liniac Project - Univ. of Pennsylvania > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From egan at sense.net Tue Aug 5 18:12:21 2003 From: egan at sense.net (Egan Ford) Date: Tue, 5 Aug 2003 16:12:21 -0600 Subject: Power monitoring Message-ID: <095d01c35b9e$a4ae90d0$0664a8c0@titan> I know this was discussed recently with "kill-a-watt" as a popular choice, however I am looking for the next step up, something more on the circuit level that I can hardwire between my lab and breakers. Support for multiple circuits would be nice too as well as 110/220 support. Add a serial port for remote monitoring and I'm set. However I am looking for a cheap solution, a web cam pointing to a meter is an option. I'll even settle for analogue, I just need kwh. Thanks. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Tue Aug 5 18:35:09 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Tue, 5 Aug 2003 15:35:09 -0700 (PDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: <009701c35b85$714e7110$7101a8c0@Navatek.local> Message-ID: hi ya On Tue, 5 Aug 2003, Mitchel Kagawa wrote: > I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true > NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable. thought acnc.com has good stuff . :-) > Although they do offer a NFS box that will turn one of these arrays into a > standalone. We have had great success with these units > (http://neptune.navships.com/images/harddrivearrays.jpg) . We first > acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's. > We have set it up in a RAID-5 configuration and have not yet had to replace > even one of the drives (Knockin on wood). After a year we picked up the > 14slot chassis and filled it with 160 maxtor drives and it has performed > flawless... I think we paig about $4000 for the 14 slot chassis. you can > add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver > for $1500 and you got about 2TB of storage for around $7000 8 drives at 250GB each is 2TB in one 1U chassis ... 250GB disks is about $250 now days.... maybe less on the online webstores backup of 2TB should be done on another 2TB systems .. 3rd 2TB machine if the data cannot be recreated save only the raw data/apps needed to regenerate the output data c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Tue Aug 5 18:40:27 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Tue, 5 Aug 2003 15:40:27 -0700 (PDT) Subject: large filesystem & fileserver architecture issues. -hw In-Reply-To: <1060098427.30922.6.camel@roughneck> Message-ID: hi ya On 5 Aug 2003, Nicholas Henke wrote: > > Definately does -- can you recommend hardware for the IDE RAID, or list > what you guys have used ? you have basically 2 choices ... - leave the ide as an ide disks ... ( software raid ) - get a $50 ide controller ( 4 drives on it ) and 4 drives on the mb - convert the ide to look like a scsi drives ( tho not really ) - 3ware 7500-8 series for 8 "scsi" disks on it - or get a real hardware raid card for lots of $$$ - mylex, adaptec - for a list of hardware raid card that is supported by linux http://www.linux-ide.org/chipsets.html http://www.1u-raid5.net sw/hw raid5 howto's c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From m0ukb at unb.ca Wed Aug 6 08:50:13 2003 From: m0ukb at unb.ca (White, Adam Murray) Date: Wed, 6 Aug 2003 09:50:13 -0300 Subject: Performance monitoring tool Message-ID: <1060174213.3f30f9859afaf@webmail.unb.ca> Hello, I am interested in acquiring a good real time cluster performance monitoring tool, which at least displays (dynamically while the program is running) each thread's cpu utilization and memory usage (graphically). Not a postmortem display. Free as well. Any help would be much appreciated. Regards, A. M. White ###################################################### Adam M. White University of New Brunswick Saint John http://www.unbsj.ca/sase/csas m0ukb at unb.ca ###################################################### _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Aug 6 13:21:02 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 6 Aug 2003 13:21:02 -0400 (EDT) Subject: Performance monitoring tool In-Reply-To: <1060174213.3f30f9859afaf@webmail.unb.ca> Message-ID: On Wed, 6 Aug 2003, White, Adam Murray wrote: > Hello, > > I am interested in acquiring a good real time cluster performance monitoring tool, which at > least displays (dynamically while the program is running) each thread's cpu utilization and > memory usage (graphically). Not a postmortem display. Free as well. > > Any help would be much appreciated. At this time it won't QUITE do what you like, but it is within spitting distance of it. Check out: xmlsysd and wulfstat on brahma (http://www.phy.duke.edu/brahma). xmlsysd is a daemon that runs on a cluster and obtains by a variety of means statistics of interest on the system. Some of these it parses from proc, others by the use of systems calls. It is not promiscuous (it doesn't provide e.g. a complete copy of /proc to clients that connect to it) but rather offers a digested view that can be throttled so that one or more "sets" of interesting statistics can be monitored. This is to keep it lightweight, both on the system it is monitoring and on the network and client -- it is (literally) a parallel application in its own right and it isn't a good idea for a monitor application to significantly compete for any of the resources that might bottleneck a "production" parallel application. Its "prepackaged" return sets include load avg (5,10,15 min), memory (basically the data underlying the "free" command), ethernet network usage for one or more devices, date/time/cpu information, basically the kind of data one finds digested at the top of the "top" command or made available by e.g xosview in kin in graphical windows. It also has a "pid" mode where it can monitor running processes. Here throttling and filtering is a bit trickier, as one generally does NOT want to monitor every process running on a system with a supposedly lightweight tool. I thus implemented pid selection by means of matching task name or user name, a mode that returns all "userspace" tasks that have accumulated more than some cutoff in total time (5 seconds? I can't remember), as well as a to-be-rarely-used promiscuous mode that returns everything it can find including root tasks. xmlsysd's returns are in xml, and hence are easy to parse out with any xml parser for application in anything you like. That's the good news. The other good news is that wulfstat, the provided client, lets you use most of these features in a tty/ncurses window. The bad news it that there is no GUI display with little graphs and the like. This is mixed news, really, not necessarily bad. A tty display lets you use the pgup and pgdn keys and scroll arrows to page quickly through a lot of hosts, seeing instantly the full detail (actual numbers) for each field being monitored -- you might find wulfstat to be adequate. If it isn't adequate, though, you'll likely need to write some sort of client application that polls the daemon at some interval (I tend to use 5 seconds as the default, but it can be set up or down as low as 1 second, depending on how many hosts one wishes to monitor, again remembering that it is supposed to be lightweight and that it is a bad idea to run it so fast that the return latency causes the loop to pile up). This should be pretty easy -- you can actually talk to the daemon with telnet, so watching it work and testing the api is not a problem. You've got wulfstat sources to play with (both tools fully GPL). The daemon returns XML, which is easy to parse out. Finally, there are a fair number of tools or libraries that you can pipe this output into to generate graphs, either on the web or some other console. One day I'll actually write such a tool myself, but wulfstat proved so adequate for most of what we use it for that I haven't been able to justify advancing the project to the top of the triage-heap of bloody and neglected projects that fill my life:-). If you do write one, feel free to do so collaboratively and donate it back to the project so we can all share, although of course the GPL wouldn't require this as far as I can see for clients not derived from wulfstat code or that you write for yourself. xmlsysd and wulfstat have been in "production" use locally for some time, but they are still probably beta level code because most people use ganglia with its web-based displays. Personally I think xmlsysd/wulfstat provide a pretty rich set of monitor options (and actually is derived from code I originally wrote and was using somewhat before the ganglia project was begun, so I can't be accused of foolishly duplicating an existing project:-). If you have any problems with them I will cheerfully fix them, and if you have any ideas for additions or improvements that wouldn't drive me mad timewise to implement, I was cheerfully add them. rgb > > Regards, > A. M. White > > ###################################################### > Adam M. White > University of New Brunswick Saint John > http://www.unbsj.ca/sase/csas > m0ukb at unb.ca > ###################################################### > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Tue Aug 5 11:45:20 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Tue, 5 Aug 2003 11:45:20 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: <1060025481.28642.81.camel@roughneck> Message-ID: On 4 Aug 2003, Nicholas Henke wrote: We have a lot of experience with IDE RAID arrays at client sites. The DOE lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) and the price is hard to beat. The raid array that serves home directories to their clusters and workstations is backed up nightly to a second raid server, similarly to your system. To speed things along we installed an extra gigabit card in the primary and backup servers and connected the two directly. The nightly backup (cp -auf via NFS) of 410 GBs take just over an hour using the dedicated gbit link. Rsync would probably be faster. Without the shortcircuit gigabit link, it used to run four or five times longer and seriously impact NFS performance for the rest of the systems on the LAN. Hope this helps. Regards, Mike Prinkey Aeolus Research, Inc. > Hey all -- here is our situation. > > We currently have several clusters that are configured with either IBM > x342 or Dell 2650 serves with their respective vendors SCSI RAID arrays > hanging off of them. Each server + array is good for around 600GB after > RAID 5 and formatting. The IBM's have the added ability to do a RAID 50 > of multiple arrays ( which seems to work & perform quite nicely ). Each > of the servers then exports the filesystem via NFS, and is mounted on > the nodes. The clusters range from 24 to 128 nodes. For backups we > maintain an offline server + array that we use to rsync the data > nightly, then use our amanda server and tape robot to backup. We use an > offline sync, as we need a level 0 dump every 2 weeks, and doing a level > 0 dump of 600GB just trashes the performance on a live server. As we are > a .edu and all of the clusters were purchased by the individual groups, > the options we can explore have to be very cost efficient for hardware, > and free for software. > > Now for the problem... > A couple of our clusters are using the available filespace quite > rapidly, and we are looking to add space. The most cost efficient > approach we have found is to buy a IDE RAID box, like those available > from RaidZone or PogoLinux. This allows us to use the cheap IDE systems > as the offline sync, and use the scsi systems as online servers. > > And the questions: > > 1) Is there a better way to backup the systems without the need for an > offline sync? > > 2) Does anyone have experience doing RAID 50 with Dell hardware? How bad > does it bite ? > > 3) Are there any recommended IDE RAID systems? We are not looking for > super stellar performance, just a solid system that does it's job as an > offline sync for backups. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Tue Aug 5 12:34:03 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Tue, 5 Aug 2003 12:34:03 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: <1060098427.30922.6.camel@roughneck> Message-ID: On 5 Aug 2003, Nicholas Henke wrote: > Definately does -- can you recommend hardware for the IDE RAID, or list > what you guys have used ? > > Nic > I started building these arrays when 20 GBs was a big drive and hardware ide raid controllers were very expensive. So old habits die hard. Most of my experience has been with Software RAID in Linux. We use Promise Ultra66/100/133 controller cards, Maxtor 80 - 200 GB 5400-rpm drives, and Intel-chipset motherboards. I use the Promise cards, again because they were what was available and supported in Linux in the late 90s. They are limited to two IDE channels per cards, but I have used 3 cards in addition to the on-board IDE in large arrays before. Some people buy the IDE Raid cards that have 4 or 8 IDE channels and then use Software RAID instead. The conventional wisdom is that you should only put one drive on each IDE channel to maximize performance. I have built arrays with single drive per channel and two drives per channel and find that is not really true for ATA100 and faster controllers. Two of these drives cannot saturate a 100 or 133 MB/s channel. Typically, we put eight drives in an array. I have been using a 4U rack enclosure that has 8 exposed 5.25 bays. This works well because mounting the drives in a 5.25 bay gives a nice air gap for cooling. Stacking 3 or more drives tightly together heats the middle ones up quite a bit. I also usually use 5400-RPM drives to keep the heat production down. I only use Intel chipset motherboards. Normally just single CPU P4. One of the boards with 1 or 2 onboard gigabit controllers would be a nice choice. 1 GB of RAM is more than enough, but do use ECC. Also, if you use the newest kernels, the onboard IDE controllers are fast enough to be used in the array. For an 8-drive array, I will normally use 1 promise addin card and the two on-board channels. Important Miscellany: - Power Supply. Don't skimp. 400W+ from a good vendor - IDE cables <=24" long. I tried to use the 36" IDE cables once and it nearly drove me nuts with drive corruption and random errors. The 24" ones work very well and usually give you enough length to route to 8 drives in an enclosure. Once Serial ATA gets cheaper, this will no longer be an issue. - UPS. In general, you can NEVER allow a power failure to take down the raid server. There is at least a 50% chance of low-level drive corruption on an 8-drive array if it loses power. (Don't ask about the time the cleaning crew unplugged the array from the USP!) We use a smart UPS and UPS monitoring software (upsmon) to unmount the array and raidstop it if the power goes out for more than 30 secs. I am also tempted to not even connect the power switch on the front panel. Reseting a crashed system is OK, but powering it off doesn't give the hard drives a chance to flush their buffers to disk. With 8+ spinning drives, there is a good chance at least one of them will be corrupted. - Bonnie and burn-in. There are many problems that can crop up when you build the array. IRQ issues, etc. It is paramount that you throughly abuse the array with something like bonnie to make sure that everything is working. I typically mkraid which starts the array synching, mke2fs on the raid device, and then mount the filesystem and run bonnie on it all while it is still synching. This is pretty hard on the whole system and if there is a problem, you will notice quickly. Once it is done resyncing, I usually run bonnie overnight to burn it in and verify that performance is reasonable. - Fixing things. If you do have a power failure and the raid doesn't come back up, it is usually do to a hard drive problem. The only way to fix it is to run a low-level utility (Maxtor Powermax) on the drive. Maybe someone know how to do something similary within Linux. If so, I would love to hear about it. Again, our approach is not necessarily exhaustively researched. This is just "what we do." So, take it for what it's worth. Best, Mike Prinkey Aeolus Research, Inc. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Mon Aug 4 09:50:09 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Mon, 04 Aug 2003 08:50:09 -0500 Subject: Cisco switches for lam mpi In-Reply-To: References: Message-ID: <3F2E6491.1020802@tamu.edu> I should have commented earlier, but I didn't think I had time... My experience with the Cisco 4006 was that as an aggregation switch it was OK for 10/100 or GBE. It did fine for normal "enterprise switching. The 4006's I've used had only older Supervisor Modules and ran CAT-OS, rather than IOS like the 4506 I'm testing now. For higher performance, while CPU utilization stays low, the switch falls off at higher loads. Caveat: I did not test these devices in a cluster environment; the thought never crossed my mind. I'd be using a 6509 if I had to use a Cisco, but I'd probably be shopping for HP ProCurves, Foundry's, Riverstones, or NEC Bluefires, based on what I've seen and done lately. I tested the 4006 in normal enterprise mode, and loaded it for high-perf network modes. If you ever need QoS do NOT use a 4006. Or a 4506. They can't handle it too well. But I digress. I'm gonna try to get a couple of ProCurves in and test 'em against a LAN tester made by Anritsu (MD1230/1231) for small packet capability (RFC-2544). That's been a killer for a lot of switches I've looked at. gerry Felix Rauch wrote: > On Tue, 29 Jul 2003, Jack Douglas wrote: > >>We have just installed a 32 Node Dual Xeon Cluster, with a Cisco Cataslyst >>4003 Chassis with 48 1000Base-t ports. >> >>We are running LAM MPI over gigabit, but we seem to be experiencing >>bottlenecks within the switch >> >>Typically, using the cisco, we only see CPU utilisation of around 30-40% > > [...] > > I'm not a Cisco expert, but... > > We once got a Cisco switch from our networking people that we had to > return immediately because it delivered such a bad performance. It was > a Catalyst 2900XL with 24 Fast Ethernet ports, but it could only > handle 12 ports at full speed. Above that, the performance brake down > completely. > > For some benchmark results see, e.g.: > http://www.cs.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf > > As a comparison, the quite nice results of a CentreCom 742i: > http://www.cs.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf > > Disclaimer: Maybe the Cisco you mentioned is better, or Ciscos improved > anyway since spring 2001 when I did the above tests. Besides, the > situation for Gigabit Ethernet could be different. > > As we described on our workshop paper at CAC03 you can not trust the > data sheets of switches anyway: > http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 > > Conclusion: If you need a very high performing switch, you have to > evaluate/benchmark it yourself. > > - Felix > -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Wed Aug 6 08:07:45 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Wed, 06 Aug 2003 07:07:45 -0500 Subject: large filesystem & fileserver architecture issues. In-Reply-To: <1060098427.30922.6.camel@roughneck> References: <1060098427.30922.6.camel@roughneck> Message-ID: <3F30EF91.6080606@tamu.edu> We just implemented an IDE RAID system for some meteorology data/work. We're pretty happy with the results so far. Our hardwre complement is: SuperMicro X5DAE Motherboard dual Xeon 2.8GHz processors 2 GB Kingston Registered ECC RAM 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters 10 Maxtor 250 GB 7200 RPM disks 1 Maxtor 60 GB drive for system work 1 long multi-drop disk power cable... SuperMicro case (nomenclature escapes me, however, it has 1 disk bays and fits the X5DAE MoBo Cheapest PCI video card I could find (no integrated video on MoBo) Add-on Intel GBE SC fiber adapter Drawbacks: 1. I should have checked for integrated video for simplicity 2. Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with ALL the patches 3. Make sure you order the rack mount parts when you order the case; it only appeared they were included... 4. Questions have been raised about the E-1000 integrated GBE copper NIC on the Mobo; Doesn't matter: it's gonna be connected to a 100M switch and GBE will be on fiber like God intended data to be passed (No, I don't trust most terminations for GBE on copper!) It's up and working. Burning in for the last 2 weeks with no problems, it's going to the Texas GigaPoP today where it'll be live on Internet2. HTH, Gerry Nicholas Henke wrote: > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > >>On 4 Aug 2003, Nicholas Henke wrote: >> >>We have a lot of experience with IDE RAID arrays at client sites. The DOE >>lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. >>The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) >>and the price is hard to beat. The raid array that serves home >>directories to their clusters and workstations is backed up nightly to a >>second raid server, similarly to your system. To speed things along we >>installed an extra gigabit card in the primary and backup servers and >>connected the two directly. The nightly backup (cp -auf via NFS) of 410 >>GBs take just over an hour using the dedicated gbit link. Rsync would >>probably be faster. Without the shortcircuit gigabit link, it used to run >>four or five times longer and seriously impact NFS performance for the >>rest of the systems on the LAN. >> >>Hope this helps. >> >>Regards, >> >>Mike Prinkey >>Aeolus Research, Inc. > > > Definately does -- can you recommend hardware for the IDE RAID, or list > what you guys have used ? > > Nic -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Douglas.L.Farley at nasa.gov Wed Aug 6 08:35:10 2003 From: Douglas.L.Farley at nasa.gov (Doug Farley) Date: Wed, 06 Aug 2003 08:35:10 -0400 Subject: large filesystem & fileserver architecture issues. In-Reply-To: <009701c35b85$714e7110$7101a8c0@Navatek.local> References: <1060098427.30922.6.camel@roughneck> Message-ID: <5.0.2.1.2.20030806081148.00a94be8@pop.larc.nasa.gov> I noticed with acnc's 14 unit raid they used an IDE-SCSI U3 something or another, anyone know what type of hardware they used to convert the drives for this array? Just direct IDE-SCSI adaptors (which I've not seen cheaper than $80) on each drive and then connecting to something like an adaptec Raid card? Does anyone have any experience with doing this (with off the shelf parts) to create a semi-cheep raid (maybe 10 x $250 for 250G disk, + 10 x $80 IDE-SCSI converter + $800 expensive adaptec 2200 esq card )? Those costs are higher (~$420/disk ) than doing 10 disks on a 3ware 7500-12 (~$320/disk) (costs excluding host system), so is whatever gained really worth it? Doug At 09:11 AM 8/5/2003 -1000, you wrote: >I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true >NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable. >Although they do offer a NFS box that will turn one of these arrays into a >standalone. We have had great success with these units >(http://neptune.navships.com/images/harddrivearrays.jpg) . We first >acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's. >We have set it up in a RAID-5 configuration and have not yet had to replace >even one of the drives (Knockin on wood). After a year we picked up the >14slot chassis and filled it with 160 maxtor drives and it has performed >flawless... I think we paig about $4000 for the 14 slot chassis. you can >add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver >for $1500 and you got about 2TB of storage for around $7000 > >Mitchel Kagawa > >----- Original Message ----- >From: "Nicholas Henke" >To: "Michael T. Prinkey" >Cc: >Sent: Tuesday, August 05, 2003 5:47 AM >Subject: Re: large filesystem & fileserver architecture issues. > > > > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > > > On 4 Aug 2003, Nicholas Henke wrote: > > > > > > We have a lot of experience with IDE RAID arrays at client sites. The >DOE > > > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for >them. > > > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec >write) > > > and the price is hard to beat. The raid array that serves home > > > directories to their clusters and workstations is backed up nightly to a > > > second raid server, similarly to your system. To speed things along we > > > installed an extra gigabit card in the primary and backup servers and > > > connected the two directly. The nightly backup (cp -auf via NFS) of 410 > > > GBs take just over an hour using the dedicated gbit link. Rsync would > > > probably be faster. Without the shortcircuit gigabit link, it used to >run > > > four or five times longer and seriously impact NFS performance for the > > > rest of the systems on the LAN. > > > > > > Hope this helps. > > > > > > Regards, > > > > > > Mike Prinkey > > > Aeolus Research, Inc. > > > > Definately does -- can you recommend hardware for the IDE RAID, or list > > what you guys have used ? > > > > Nic > > -- > > Nicholas Henke > > Penguin Herder & Linux Cluster System Programmer > > Liniac Project - Univ. of Pennsylvania > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf ============================== Doug Farley Data Analysis and Imaging Branch Systems Engineering Competency NASA Langley Research Center < D.L.FARLEY at LaRC.NASA.GOV > < Phone +1 757 864-8141 > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Wed Aug 6 15:09:59 2003 From: ctierney at hpti.com (Craig Tierney) Date: 06 Aug 2003 13:09:59 -0600 Subject: large filesystem & fileserver architecture issues. In-Reply-To: <3F30EF91.6080606@tamu.edu> References: <1060098427.30922.6.camel@roughneck> <3F30EF91.6080606@tamu.edu> Message-ID: <1060196998.8961.17.camel@woody> On Wed, 2003-08-06 at 06:07, Gerry Creager N5JXS wrote: > We just implemented an IDE RAID system for some meteorology data/work. > We're pretty happy with the results so far. Our hardwre complement is: > > SuperMicro X5DAE Motherboard > dual Xeon 2.8GHz processors > 2 GB Kingston Registered ECC RAM > 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters > 10 Maxtor 250 GB 7200 RPM disks > 1 Maxtor 60 GB drive for system work > 1 long multi-drop disk power cable... > SuperMicro case (nomenclature escapes me, however, it has 1 disk bays > and fits the X5DAE MoBo > Cheapest PCI video card I could find (no integrated video on MoBo) > Add-on Intel GBE SC fiber adapter > Hardware choices look good. How did you configure it? Are there 1 or 2 filesystems? Raid 0, 1, 5? Do you have any performance numbers on the setup (perferably large file, dd type tests)? Thanks, Craig > Drawbacks: > 1. I should have checked for integrated video for simplicity > 2. Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with > ALL the patches > 3. Make sure you order the rack mount parts when you order the case; it > only appeared they were included... > 4. Questions have been raised about the E-1000 integrated GBE copper > NIC on the Mobo; Doesn't matter: it's gonna be connected to a 100M > switch and GBE will be on fiber like God intended data to be passed (No, > I don't trust most terminations for GBE on copper!) > > It's up and working. Burning in for the last 2 weeks with no problems, > it's going to the Texas GigaPoP today where it'll be live on Internet2. > > HTH, Gerry > > Nicholas Henke wrote: > > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > > > >>On 4 Aug 2003, Nicholas Henke wrote: > >> > >>We have a lot of experience with IDE RAID arrays at client sites. The DOE > >>lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. > >>The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) > >>and the price is hard to beat. The raid array that serves home > >>directories to their clusters and workstations is backed up nightly to a > >>second raid server, similarly to your system. To speed things along we > >>installed an extra gigabit card in the primary and backup servers and > >>connected the two directly. The nightly backup (cp -auf via NFS) of 410 > >>GBs take just over an hour using the dedicated gbit link. Rsync would > >>probably be faster. Without the shortcircuit gigabit link, it used to run > >>four or five times longer and seriously impact NFS performance for the > >>rest of the systems on the LAN. > >> > >>Hope this helps. > >> > >>Regards, > >> > >>Mike Prinkey > >>Aeolus Research, Inc. > > > > > > Definately does -- can you recommend hardware for the IDE RAID, or list > > what you guys have used ? > > > > Nic -- Craig Tierney _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Wed Aug 6 16:55:09 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed, 6 Aug 2003 16:55:09 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: <3F30EF91.6080606@tamu.edu> Message-ID: > SuperMicro X5DAE Motherboard > dual Xeon 2.8GHz processors > 2 GB Kingston Registered ECC RAM > 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters > 10 Maxtor 250 GB 7200 RPM disks > 1 Maxtor 60 GB drive for system work > 1 long multi-drop disk power cable... > SuperMicro case (nomenclature escapes me, however, it has 1 disk bays > and fits the X5DAE MoBo > Cheapest PCI video card I could find (no integrated video on MoBo) > Add-on Intel GBE SC fiber adapter > > Drawbacks: > 1. I should have checked for integrated video for simplicity I did something similar a little while back: a tyan thunder e7500 board, just one Xeon, just 1G ram, integrated video/gigabit (copper), 3ware 8500-8 in jbod mode, 8x200G WD JB disks and a ~500W PS. I don't see any reason for adding extra ram or putting in multiple, higher-powered CPUs for a fileserver. > 2. Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with > ALL the patches I'll be doing more boxes, probably with something like 8x250 SATA disks, with a pair of promise tx4 cards. open-source drivers for these cards recently became available, btw. there was a very interesting talk at OLS about doing raid intelligently over a network... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From luis.licon at yakko.cimav.edu.mx Thu Aug 7 12:05:45 2003 From: luis.licon at yakko.cimav.edu.mx (Luis Fernando Licon Padilla) Date: Thu, 07 Aug 2003 10:05:45 -0600 Subject: test Message-ID: <3F3278D9.5000709@yakko.cimav.edu.mx> _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From John.Hearns at micromuse.com Thu Aug 7 09:12:55 2003 From: John.Hearns at micromuse.com (John Hearns) Date: Thu, 07 Aug 2003 14:12:55 +0100 Subject: AMD core maths library Message-ID: <3F325057.4080801@micromuse.com> Sorry if this is old news to everyone. I saw a snippet in Linux Magazine (UK/German type) on the AMD Core Math Library for Opterons. https://wwwsecure.amd.com/gb-uk/Processors/DevelopWithAMD/0,,30_2252_2282,00.html Says it is initially released in FORTAN, with BLAS, LAPACK and FFTs. g77 under Linux and Windows. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Thu Aug 7 09:54:29 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Thu, 07 Aug 2003 08:54:29 -0500 Subject: large filesystem & fileserver architecture issues. In-Reply-To: References: Message-ID: <3F325A15.80901@tamu.edu> Mark Hahn wrote: >>SuperMicro X5DAE Motherboard >>dual Xeon 2.8GHz processors >>2 GB Kingston Registered ECC RAM >>2 HighPoint RocketRAID 404 4-channel IDE RAID adapters >>10 Maxtor 250 GB 7200 RPM disks >>1 Maxtor 60 GB drive for system work >>1 long multi-drop disk power cable... >>SuperMicro case (nomenclature escapes me, however, it has 1 disk bays >>and fits the X5DAE MoBo >>Cheapest PCI video card I could find (no integrated video on MoBo) >>Add-on Intel GBE SC fiber adapter >> >>Drawbacks: >>1. I should have checked for integrated video for simplicity > > > I did something similar a little while back: a tyan thunder e7500 board, > just one Xeon, just 1G ram, integrated video/gigabit (copper), 3ware 8500-8 > in jbod mode, 8x200G WD JB disks and a ~500W PS. > > I don't see any reason for adding extra ram or putting in multiple, > higher-powered CPUs for a fileserver. This one will A) be on the Unidata weather distribution network for general weather data AND the newer real-time radar feeds; B) be extracting some of that data for graphics; C) be doing NNTP for Unidata (one, exactly, newsgroup) for a research project; D) reside on the I2 Logistical Backbone... It's a busy box. >>2. Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with >>ALL the patches > > I'll be doing more boxes, probably with something like 8x250 SATA disks, > with a pair of promise tx4 cards. open-source drivers for these cards > recently became available, btw. > > there was a very interesting talk at OLS about doing raid intelligently > over a network... Check out loki.cs.utk.edu (I think: It's certainly a project called 'loki' and run by Micah Beck at utk.edu) about the logistical backbone. I didn't go with Promise cards because of one of my grad students, who's obviously better funded than me... He's looked at Promise, HighPoint and at least one other card, and had comparisons, and strongly recommended HighPoint as a Price/Performance leader. The HighPoints were less expensive and currently boast the same performance as the tx4's. Everyone's getting into the SATA game; I didn't go that way because I wanted to get to the 2 TB point and couldn't reasonably do it today with SATA; maybe later. I didn't want to take the time to hack the drivers HighPoint had available, since i'm overloaded these days. -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Wed Aug 6 19:55:07 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Wed, 6 Aug 2003 19:55:07 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: Message-ID: On Wed, 6 Aug 2003, Mark Hahn wrote: > > there was a very interesting talk at OLS about doing raid intelligently > over a network... > I have considered trying this using network block devices, but I haven't had the opportunity to try it. Is this what you are talking about or something different? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Aug 7 14:15:39 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 7 Aug 2003 14:15:39 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: Message-ID: On Wed, 6 Aug 2003, Michael T. Prinkey wrote: > On Wed, 6 Aug 2003, Mark Hahn wrote: > > > > there was a very interesting talk at OLS about doing raid intelligently > > over a network... > > > > I have considered trying this using network block devices, but I haven't > had the opportunity to try it. Is this what you are talking about or > something different? thre are similarities: http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf but it's really a development beyond NBD or DRDB. hmm, I'm not sure that brief pdf is either complete or does the idea justice. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Aug 7 15:15:01 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 7 Aug 2003 15:15:01 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: Message-ID: > I read the abstract last evening and got a taste for it. That is really a > remarkable idea to use the ethernet checksum for data integrity of stored > data. Thanks for the heads-up. for me, the crux of the idea is: - if you want big storage, $/GB drives you to IDE. - IDE is not amazingly fast, reliable or scalable. - building storage bricks out of IDE makes a lot of sense, since they can now be quite dense, low-overhead, etc. - ethernet is a wonderfully hot-pluggable interconnect for this kind of thing. - doing raid over a multicast-capable network is pretty cool. - using eth's checksumming is pretty cool. - doing it this way (all open-source, including software raid) means the system is much more transparent - you are not dependent on some closed-source vendor tools to control/monitor/upgrade your storage. Ben's approach (along with Lustre, for instance) seems very sweet for HPC type storage needs. one thing I do ponder, though, is whether it really makes sense to hide raid so firmly under the block layer. it's conceptually tidy, to be sure, and works well in practice. but suppose: - to create a filesystem, you hand some arbitrary collection of block-device extents to the mkfs tool. you also let it know which extents happen to reside on the same disk, bus, host, UPS, geographic location, etc. - you can tell the FS that your default policy should be for reliability - that raid5 across separate disks is OK, for instance. or maybe you can tell it that a particular file should be raid10 instead. or that a file should be raid1 across each geographic site. or that updates to a file should be logged. or that it should transparently compress older files. - the FS might do other HSM-like things, such as incorporating knowlege of what's on your tape/DVD/cdrom's. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Thu Aug 7 14:33:23 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Thu, 7 Aug 2003 14:33:23 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: Message-ID: > > > > I have considered trying this using network block devices, but I haven't > > had the opportunity to try it. Is this what you are talking about or > > something different? > > thre are similarities: > > http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf > > but it's really a development beyond NBD or DRDB. hmm, I'm not sure > that brief pdf is either complete or does the idea justice. > I read the abstract last evening and got a taste for it. That is really a remarkable idea to use the ethernet checksum for data integrity of stored data. Thanks for the heads-up. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From twhitcomb at apl.washington.edu Thu Aug 7 15:55:15 2003 From: twhitcomb at apl.washington.edu (Timothy R. Whitcomb) Date: Thu, 7 Aug 2003 12:55:15 -0700 (PDT) Subject: (Scyld) Nodes going down unexpectedly Message-ID: We have a 10-processor cluster and are currently running a weather model on 4 of the processors. When I try to up the number, it works for a while, then the "beostatus" window will show one node's information not changing for a little while before it shows the node status as "down". Each node is dual-processor and I have noticed (but not verified) that this becomes an issue when both processors on a node are in use. After the node status changes to "down", I cannot restart it through the console tools on the root node. However, I know that the node is still alive and on the network because I can ping it successfully. This problem requires me to actually restart the node by hand, which is a bit of an issue since we're on opposite sides of the building. What's going on here and what can I do to mitigate/fix this? Tim Whitcomb twhitcomb at apl.washington.edu Applied Physics Lab University of Washington _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Aug 7 17:51:10 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 7 Aug 2003 14:51:10 -0700 Subject: large filesystem & fileserver architecture issues. In-Reply-To: References: Message-ID: <20030807215110.GA2780@greglaptop.internal.keyresearch.com> On Thu, Aug 07, 2003 at 03:15:01PM -0400, Mark Hahn wrote: > - IDE is not amazingly fast, reliable or scalable. That's about like saying "commondity servers are not fast, reliable, or scalable, so I'm going to buy an SGI Altix instead of a Beowulf." More facts, less religion. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Thu Aug 7 18:04:25 2003 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Thu, 7 Aug 2003 17:04:25 -0500 Subject: large filesystem & fileserver architecture issues. In-Reply-To: <20030807215110.GA2780@greglaptop.internal.keyresearch.com> References: <20030807215110.GA2780@greglaptop.internal.keyresearch.com> Message-ID: On Thu, 7 Aug 2003, Greg Lindahl wrote: > On Thu, Aug 07, 2003 at 03:15:01PM -0400, Mark Hahn wrote: > > > - IDE is not amazingly fast, reliable or scalable. > > That's about like saying "commondity servers are not fast, reliable, > or scalable, so I'm going to buy an SGI Altix instead of a Beowulf." > > More facts, less religion. Since when has the value of facts outweighed religion on *THIS* list?! _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sun Aug 10 04:16:42 2003 From: rouds at servihoo.com (RoUdY) Date: Sun, 10 Aug 2003 12:16:42 +0400 Subject: Implementing MPICH2-0.93 In-Reply-To: <200308081902.h78J20w27961@NewBlue.Scyld.com> Message-ID: Hello Can someone tell me if they ever use this MPI version. Because I have some difficulty in implementing it. I was unable to implement the slave nodes. thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sun Aug 10 04:16:42 2003 From: rouds at servihoo.com (RoUdY) Date: Sun, 10 Aug 2003 12:16:42 +0400 Subject: Implementing MPICH2-0.93 In-Reply-To: <200308081902.h78J20w27961@NewBlue.Scyld.com> Message-ID: Hello Can someone tell me if they ever use this MPI version. Because I have some difficulty in implementing it. I was unable to implement the slave nodes. thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Sun Aug 10 14:51:40 2003 From: gropp at mcs.anl.gov (William Gropp) Date: Sun, 10 Aug 2003 13:51:40 -0500 Subject: Implementing MPICH2-0.93 In-Reply-To: References: <200308081902.h78J20w27961@NewBlue.Scyld.com> Message-ID: <5.1.1.6.2.20030810135037.016afce8@localhost> At 12:16 PM 8/10/2003 +0400, RoUdY wrote: >Hello >Can someone tell me if they ever use this MPI version. Because I have some >difficulty in implementing it. I was unable to implement the slave nodes. Questions and bug reports on MPICH2 should be sent to mpich2-maint at mcs.anl.gov . Thanks! Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Sun Aug 10 14:51:40 2003 From: gropp at mcs.anl.gov (William Gropp) Date: Sun, 10 Aug 2003 13:51:40 -0500 Subject: Implementing MPICH2-0.93 In-Reply-To: References: <200308081902.h78J20w27961@NewBlue.Scyld.com> Message-ID: <5.1.1.6.2.20030810135037.016afce8@localhost> At 12:16 PM 8/10/2003 +0400, RoUdY wrote: >Hello >Can someone tell me if they ever use this MPI version. Because I have some >difficulty in implementing it. I was unable to implement the slave nodes. Questions and bug reports on MPICH2 should be sent to mpich2-maint at mcs.anl.gov . Thanks! Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Mon Aug 11 22:44:45 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Tue, 12 Aug 2003 10:44:45 +0800 (CST) Subject: PBSPro with 1024 nodes :-O (oh!) Message-ID: <20030812024445.80371.qmail@web16812.mail.tpe.yahoo.com> Looks like the problems with OpenPBS in large clusters were all fixed in PBSPro, ASU has a 1024 node cluster (http://www.pbspro.com/press_030811.html). Also, heard from PBS developers that the next release of PBSPro (5.4) will add fault tolerance in the master node, very similar to the shadow master concept in Gridengine. Sounds to me PBSPro is very much better than OpenPBS. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Aug 12 20:59:17 2003 From: becker at scyld.com (Donald Becker) Date: Tue, 12 Aug 2003 20:59:17 -0400 (EDT) Subject: $900,000 RFP for climate simulation machine at UC Irvine (fwd) Message-ID: -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 ---------- Forwarded message ---------- Date: Tue, 12 Aug 2003 16:54:44 -0700 From: Charlie Zender To: Donald Becker Subject: $900,000 RFP for climate simulation machine at UC Irvine Dear Donald, Ooops. Forgot the announcement itself. Here it is. Please disseminate! Thanks, Charlie Cut here ====================================================================== Dear High Performance Computing Vendor, The University of California at Irvine is pleased to announce the immediate availability of US$900,000 towards the purchase of an Earth System Modeling Facility (ESMF). Following a competitive bid process open to all interested vendors, the ESMF contract will be awarded to the proposal with the most competitive response to our Request for Proposals (RFP). All necessary details about the ESMF and the RFP process are available from the ESMF homepage: http://www.ess.uci.edu/esmf Bids are due August 22, 2003. Please visit the ESMF homepage for more details and contact Mr. Ralph Kupcha with any questions. All further contact contact with potential vendors will take place on the ESMF Potential Vendor Mail List. You may subscribe to this list by visiting https://maillists.uci.edu/mailman/listinfo/esmfvnd Please pass this Announcement of Opportunity on to any interested colleagues. Sincerely, Ralph Kupcha Senior Buyer, Procurement Services, UCI _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jhearns at micromuse.com Thu Aug 14 05:04:57 2003 From: jhearns at micromuse.com (John Hearns) Date: Thu, 14 Aug 2003 10:04:57 +0100 Subject: Slashdot thread on supercomputers Message-ID: <3F3B50B9.4090405@micromuse.com> Everyone has probably seen the thread on Slashdot. Here are links to the two relevant stories. http://www.eetimes.com/story/OEG20030811S0018 http://www.eetimes.com/story/OEG20030812S0011 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From farschad at myrealbox.com Thu Aug 14 15:21:47 2003 From: farschad at myrealbox.com (Farschad Torabi) Date: Fri, 15 Aug 2003 00:11:47 +0450 Subject: MPICH Message-ID: <1060890107.c2a01a60farschad@myrealbox.com> Hi, I am a new user to this mailing list. And also I am very new to Beowulf clusters. So I will have to many questions, please be patient :^) At the moment, I want to run a sample program using MPI. The program is in F90 and I use PGF90 to compile it. I installed the MPICH and pgf90 and compiled the program successfully. Now, my question is that how can I run the output executable file on a cluster?? Should I use lamboot to make the systems ready to work together?? It seems that lamboot is not for MPICH. It is for LAM and now I want to know, what is the alternative command for lamboot!! Thank you in advance Farschad Torabi _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jconnor at atmos.colostate.edu Thu Aug 14 18:55:20 2003 From: jconnor at atmos.colostate.edu (Jason Connor) Date: 14 Aug 2003 16:55:20 -0600 Subject: MPICH In-Reply-To: <1060890107.c2a01a60farschad@myrealbox.com> References: <1060890107.c2a01a60farschad@myrealbox.com> Message-ID: <1060901719.6160.11.camel@gentoo.atmos.colostate.edu> Hi Farschad, Here are only some possible answers to your questions. Like all things, there is more than one way to do these things. On Thu, 2003-08-14 at 13:21, Farschad Torabi wrote: > Hi, > I am a new user to this mailing list. > And also I am very new to Beowulf clusters. > So I will have to many questions, please be patient :^) > > At the moment, I want to run a sample program using > MPI. The program is in F90 and I use PGF90 to compile it. > > I installed the MPICH and pgf90 and compiled the program successfully. Now, my question is that how can I run the output executable file on a cluster?? using mpich: /bin/mpirun -np <# of nodes to run on> \ -machinefile /util/machines/machines.LINUX \ the -machinefile doesn't need need to be explicit, as long as you have the file mentioned above filled with the names of your cluster nodes. mpirun --help is always a good reference =) > > Should I use lamboot to make the systems ready to work together?? It seems that lamboot is not for MPICH. It is for LAM and now I want to know, what is > the alternative command for lamboot!! There isn't one. Just have whatever shell your using with mpich (rsh or ssh) setup so that you don't need a password to login to the nodes. > > Thank you in advance > Farschad Torabi > I hope this helps. In case you care, I like lam better. =) Jason Connor Colorado State University Prof. Scott Denning's BioCycle Research Group jconnor at atmos.colostate.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Aug 14 21:52:06 2003 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 15 Aug 2003 11:52:06 +1000 Subject: Scalable PBS Message-ID: <200308151152.09499.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, Just joined the list, so apologies if this is already well known. I noticed a recent message in the archive about OpenPBS and problems with scalability, and I think it's worth noting that there is an alternative (and actively developed) fork of OpenPBS called "Scalable PBS" available from: http://www.supercluster.org/projects/pbs/ Amongst other features it has (quoting the website): Better Scalability - Significantly improved server to MOM communication model, the ability to handle larger clusters, larger jobs, larger messages, etc. - Scales up to 2K nodes vs ~300 nodes for standard OpenPBS. Improved Usability by incorporating more extensive logging, as well as, more human readable logging(ie no more 'error 15038 on command 42'). We're using SPBS here at VPAC on our IBM cluster and it's a lot better than the last OpenPBS release (2.3.16, from 2001). They forked off from 2.3.12 rather than the last OpenPBS because it had a more open license. The folks behind the project have worked very quickly with us to fix bugs we've been finding in it, typically when I found a bug they had fixed it within a day or so, usually overnight from my perspective in Oz. :-) If you are considering using it I'd suggest using the current snapshot release from: http://www.supercluster.org/downloads/spbs/temp/ as that irons out a couple of bugs that might bite. For the less adventurous there is a new release SOpenPBS-2.3.12p4 due out in the near future that will include the fixes from the current snapshot. cheers, Chris - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/PDzGO2KABBYQAh8RAnKwAJ9OeSE508v7elkeDHL2qDehjH9LvwCfUrmu J4wal1ph00ExP8w/5HgVCek= =Nyjb -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Thu Aug 14 13:28:21 2003 From: josip at lanl.gov (Josip Loncaric) Date: Thu, 14 Aug 2003 11:28:21 -0600 Subject: Two AMD Opteron clusters for LANL Message-ID: <3F3BC6B5.5040706@lanl.gov> This October, LANL will be getting large AMD Opteron model 244 clusters ("Lightning" consisting of 1408 dual-CPU machines and "Orange" consisting of 256 dual-CPU machines, both built by Linux Networx): http://www.itworld.com/Comp/1437/030814supercomp/ http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~73744,00.html Sincerely, Josip _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kapurs at seas.upenn.edu Fri Aug 15 11:41:39 2003 From: kapurs at seas.upenn.edu (kapurs at seas.upenn.edu) Date: Fri, 15 Aug 2003 11:41:39 -0400 Subject: Hard Drive Upgrade(Internal or External) Message-ID: <1060962099.3f3cff33d8b6c@webmail.seas.upenn.edu> Hi- Does any one know if we can add an external or internal hard drive (EIDE, 200GB) to the Dell Precision 530 Workstation. It's running on red hat linux 7.1, has a USB 1.1 port, two 36-GB SCSI hard drives. The primary EIDE controler on system board is empty. thanks- -sumeet- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Matthew_Wygant at dell.com Fri Aug 15 17:54:05 2003 From: Matthew_Wygant at dell.com (Matthew_Wygant at dell.com) Date: Fri, 15 Aug 2003 16:54:05 -0500 Subject: Hard Drive Upgrade(Internal or External) Message-ID: <6CB36426C6B9D541A8B1D2022FEA7FC10273F510@ausx2kmpc108.aus.amer.dell.com> The 530 appears to include both a SCSI U160 and 2 ATA100 IDE channels. The ATA100 defaults to 'auto' in the BIOS, so I would imagine the node should pick it up. -matt -----Original Message----- From: kapurs at seas.upenn.edu [mailto:kapurs at seas.upenn.edu] Sent: Friday, August 15, 2003 10:42 AM To: beowulf at beowulf.org Subject: Hard Drive Upgrade(Internal or External) Hi- Does any one know if we can add an external or internal hard drive (EIDE, 200GB) to the Dell Precision 530 Workstation. It's running on red hat linux 7.1, has a USB 1.1 port, two 36-GB SCSI hard drives. The primary EIDE controler on system board is empty. thanks- -sumeet- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From zender at uci.edu Fri Aug 15 18:14:58 2003 From: zender at uci.edu (Charlie Zender) Date: Fri, 15 Aug 2003 15:14:58 -0700 Subject: Bid deadline extended for UC Irvine climate computer Message-ID: Hi Donald, Response from members on the beowulf list has been so positive that we are extending our bid deadline in order to give your list members who want to bid a fair chance to prepare competitive bids. Would you please allow posting of this notice of extension so that those vendors who thought they may not have enough time to submit bids become aware of the two week extension? I promise not to bother you again :) One thought: We are not the only Institution buying medium size "super-computers" that Beowulf vendors might like to know about. It might be a good idea for the whole Beowulf community to create a separate list for RFPs. Such a list would help buyers and Beowulf vendors find eachother. Thanks! Charlie -- Charlie Zender, zender at uci dot edu, (949) 824-2987, Department of Earth System Science, University of California, Irvine CA 92697-3100 -------------------------------------------------------------------- Dear HPC Vendors, We are extending by two weeks the deadline for submission of bids in response to the $900,000 Earth System Modeling Facility RFP: http://www.ess.uci.edu/esmf The new bid deadline is Friday, September 5. All other deadlines and the expected timeline are also shifted by two weeks, and these changes are reflected on the recently updated web page and conference summary. Consequently, the deadline to send bid-related questions to Ralph Kupcha is Friday, August 29. We hope that this extension provides some additional breathing room to improve any parts of your bid that you might have rushed to finish. At the same time, we are now ready to accept any completed proposals and look forward to reading your ideas on how best to meet our coupled climate modeling needs. Sincerely, Ralph Kupcha _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sat Aug 16 00:03:57 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sat, 16 Aug 2003 12:03:57 +0800 (CST) Subject: Scalable PBS In-Reply-To: <200308151152.09499.csamuel@vpac.org> Message-ID: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> How big is your cluster? Did you use Gridengine before -- how does SPBS compare to SGE? Andrew. --- Chris Samuel ????> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > Just joined the list, so apologies if this is > already well known. > > I noticed a recent message in the archive about > OpenPBS and problems with > scalability, and I think it's worth noting that > there is an alternative (and > actively developed) fork of OpenPBS called "Scalable > PBS" available from: > > http://www.supercluster.org/projects/pbs/ > > Amongst other features it has (quoting the website): > > Better Scalability > - Significantly improved server to MOM > communication model, the ability to > handle larger clusters, larger jobs, larger > messages, etc. > - Scales up to 2K nodes vs ~300 nodes for > standard OpenPBS. > > Improved Usability by incorporating more extensive > logging, as well as, more > human readable logging(ie no more 'error 15038 on > command 42'). > > We're using SPBS here at VPAC on our IBM cluster and > it's a lot better than > the last OpenPBS release (2.3.16, from 2001). They > forked off from 2.3.12 > rather than the last OpenPBS because it had a more > open license. > > The folks behind the project have worked very > quickly with us to fix bugs > we've been finding in it, typically when I found a > bug they had fixed it > within a day or so, usually overnight from my > perspective in Oz. :-) > > If you are considering using it I'd suggest using > the current snapshot release > from: > > http://www.supercluster.org/downloads/spbs/temp/ > > as that irons out a couple of bugs that might bite. > > For the less adventurous there is a new release > SOpenPBS-2.3.12p4 due out in > the near future that will include the fixes from the > current snapshot. > > cheers, > Chris > - -- > Chris Samuel -- VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing > Bldg 91, 110 Victoria Street, Carlton South, > VIC 3053, Australia - http://www.vpac.org/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/PDzGO2KABBYQAh8RAnKwAJ9OeSE508v7elkeDHL2qDehjH9LvwCfUrmu > J4wal1ph00ExP8w/5HgVCek= > =Nyjb > -----END PGP SIGNATURE----- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sat Aug 16 00:51:40 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 16 Aug 2003 08:51:40 +0400 Subject: Beowulf digest, Vol 1 #1412 - 5 msgs In-Reply-To: <200308151905.h7FJ5uw13967@NewBlue.Scyld.com> Message-ID: Hello Jason Connor, It look as if you know something about mpich, well I am using MPICH2-0.93 and in this one their no directory for 'machines.linux' instead we have mpd.hosts. But my problem is that I do not know now to configure this file despite of reading the online help. Please help me Thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sat Aug 16 00:51:40 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 16 Aug 2003 08:51:40 +0400 Subject: Beowulf digest, Vol 1 #1412 - 5 msgs In-Reply-To: <200308151905.h7FJ5uw13967@NewBlue.Scyld.com> Message-ID: Hello Jason Connor, It look as if you know something about mpich, well I am using MPICH2-0.93 and in this one their no directory for 'machines.linux' instead we have mpd.hosts. But my problem is that I do not know now to configure this file despite of reading the online help. Please help me Thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From farschad at myrealbox.com Sat Aug 16 08:47:55 2003 From: farschad at myrealbox.com (Farschad Torabi) Date: Sat, 16 Aug 2003 17:37:55 +0450 Subject: Beowulf digest, Vol 1 #1412 - 5 msgs Message-ID: <1061039275.c61518e0farschad@myrealbox.com> Dear Jason Connor and Roudy, I think that my question covers Roudy's questions too ;^) First of all Roudy, the new version of MPICH is available on the net i.e. mpich-1.2.5; you can dl it. As Jason Connor advised me, I ran the following command: /bin/mpirun -np 1 -machinefile machs -arch machines.arc a.out the contents of machs is like this node1 node1 and the contents of machines.arc (architecture file): node1.parallel.net node1.parallel.net node1.parallel.net (Roudy I think that you have to use your file like this! the name of the machines are written in this file; in your case let say -arch mpd.hosts) the program runs well on -np 1 machine but, when I wanted to define two processes on a single machine (i.e -np 2)it messages me: "Could not find enough architecture for machines LINUX" the question is, can we define more that ONE processes on a SINGLE machine?? Thanks -----Original Message----- From: "RoUdY" To: beowulf at scyld.com, beowulf at beowulf.org Date: Sat, 16 Aug 2003 08:51:40 +0400 Subject: Re: Beowulf digest, Vol 1 #1412 - 5 msgs Hello Jason Connor, It look as if you know something about mpich, well I am using MPICH2-0.93 and in this one their no directory for 'machines.linux' instead we have mpd.hosts. But my problem is that I do not know now to configure this file despite of reading the online help. Please help me Thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Sat Aug 16 10:29:17 2003 From: rodmur at maybe.org (Dale Harris) Date: Sat, 16 Aug 2003 07:29:17 -0700 Subject: Scalable PBS In-Reply-To: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> References: <200308151152.09499.csamuel@vpac.org> <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> Message-ID: <20030816142917.GA24928@maybe.org> On Sat, Aug 16, 2003 at 12:03:57PM +0800, Andrew Wang elucidated: > How big is your cluster? > > Did you use Gridengine before -- how does SPBS compare > to SGE? > > Andrew. > In a quick glance, it already wins points with me because it uses GNU autoconf instead of aimk to build. -- Dale Harris rodmur at maybe.org /.-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Sat Aug 16 12:07:13 2003 From: rodmur at maybe.org (Dale Harris) Date: Sat, 16 Aug 2003 09:07:13 -0700 Subject: Scalable PBS In-Reply-To: <20030816142917.GA24928@maybe.org> References: <200308151152.09499.csamuel@vpac.org> <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> <20030816142917.GA24928@maybe.org> Message-ID: <20030816160713.GB24928@maybe.org> On Sat, Aug 16, 2003 at 07:29:17AM -0700, Dale Harris elucidated: > > > In a quick glance, it already wins points with me because it uses GNU > autoconf instead of aimk to build. > However, the fact that it requires tcl/tk does not. Whatever happen to the concept of making a simple tool that just does it's job well. I don't see why I need a GUI for a job scheduler. Let the emacs people make some frontend for it. Dale _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sat Aug 16 23:12:21 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sun, 17 Aug 2003 11:12:21 +0800 (CST) Subject: Scalable PBS In-Reply-To: <20030816160713.GB24928@maybe.org> Message-ID: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com> For SGE, I simply download the binary package, and then do the full install. I don't have to build the source so it doesn't matter if it use aimk or autoconf. I looked at SPBS a while ago, I think if you don't need to build the GUI, then you don't need tcl/tk, and you just need to use the command line for managing the cluster. Andrew. --- Dale Harris ???? > On Sat, Aug 16, 2003 at 07:29:17AM -0700, Dale > However, the fact that it requires tcl/tk does not. > Whatever happen to > the concept of making a simple tool that just does > it's job well. I > don't see why I need a GUI for a job scheduler. Let > the emacs people > make some frontend for it. > > Dale > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dmcollins79 at hotmail.com Sun Aug 17 07:14:24 2003 From: dmcollins79 at hotmail.com (Timothy M Collins) Date: Sun, 17 Aug 2003 12:14:24 +0100 Subject: Request for parallel applications to test on beowulf cluster. Message-ID: Hi, I have built a beowulf (Redhat8 with PVM&LAM) Looking for parallel applications for different size and complexity to test fault tolerance. If anybody has one or knows where I can find one/some, please let me know. Kind regards Collins _________________________________________________________________ Stay in touch with absent friends - get MSN Messenger http://www.msn.co.uk/messenger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Aug 17 21:52:56 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 18 Aug 2003 11:52:56 +1000 Subject: Scalable PBS In-Reply-To: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> References: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> Message-ID: <200308181152.57812.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 16 Aug 2003 02:03 pm, Andrew Wang wrote: > How big is your cluster? http://www.vpac.org/content/services_and_support/facility/linux_cluster.php (If it looks a little sparse, that's because someone's in the process of updating it) > Did you use Gridengine before -- how does SPBS compare > to SGE? Nope, it's always been running OpenPBS prior to migrating to SPBS. - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QDF4O2KABBYQAh8RAncrAJoDWbSivr52PpPy/jyNkqdVFqLLCwCfVK8S 604i8kwR1wNA+7J5oWMPxBg= =Znzi -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Aug 17 21:55:24 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 18 Aug 2003 11:55:24 +1000 Subject: Scalable PBS In-Reply-To: <20030816160713.GB24928@maybe.org> References: <200308151152.09499.csamuel@vpac.org> <20030816142917.GA24928@maybe.org> <20030816160713.GB24928@maybe.org> Message-ID: <200308181155.25278.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 17 Aug 2003 02:07 am, Dale Harris wrote: > However, the fact that it requires tcl/tk does not. Whatever happen to > the concept of making a simple tool that just does it's job well. I > don't see why I need a GUI for a job scheduler. Let the emacs people > make some frontend for it. 1) I don't believe it requires tk/tcl 2) The tk/tcl isn't for a GUI, it's for one of the example schedulers. 3) That was inherited from OpenPBS 4) There is a GUI (plain old X) for monitoring PBS, xpbsmon, but I'd ignore it if I were you.. cheers, Chris - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QDIMO2KABBYQAh8RAnYiAJ9TBbBiGNRSJTP122dhqr8fXtQF9ACfatF7 XL5HFH/3hMPqm1K0FuCJlc8= =+U9N -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Aug 17 21:57:25 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 18 Aug 2003 11:57:25 +1000 Subject: Scalable PBS In-Reply-To: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com> References: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com> Message-ID: <200308181157.26287.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 17 Aug 2003 01:12 pm, Andrew Wang wrote: > I looked at SPBS a while ago, I think if you don't > need to build the GUI, then you don't need tcl/tk, and > you just need to use the command line for managing the > cluster. The tk/tcl is for one of the example schedulers (there are 3, one written in C, one in tk/tcl and one in BASL). Viz: --set-sched=TYPE sets the scheduler type. If TYPE is "c" the scheduler will be written in C "tcl" the server will use a Tcl based scheduler "basl" will use the rule based scheduler "no" then their will be no scheduling done (the "c" scheduler is the default) - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QDKFO2KABBYQAh8RAtIZAJwN0D0dts5DyU3tSN4eLsucYn6DsQCgiB7q wVSIraBXrPWoODE2LbglW14= =4Etb -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rene.storm at emplics.com Mon Aug 18 05:26:17 2003 From: rene.storm at emplics.com (Rene Storm) Date: Mon, 18 Aug 2003 11:26:17 +0200 Subject: mulitcast copy or snowball copy Message-ID: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Hi Beowulfers, Problem: I want to distribute large files over a cluster. To raise performance I decided to copy the file to the local HD of any node in the cluster. Did someone find a multicast solution for that or maybe something with snowball principle? Till now I've take a look at msync (multicast rsync). Does someone have experiences with JETfs ? My idea was to write some scripts which copy files via rsync with snowball, but there are some heavy problems. e.g. What happens if one node (in the middle) is down. How does the next snowball generation know when to start copying (the last ones have finished copying)? Any ideas ? Thanks in advance Ren? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Mon Aug 18 09:12:52 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Mon, 18 Aug 2003 15:12:52 +0200 (CEST) Subject: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Message-ID: On Mon, 18 Aug 2003, Rene Storm wrote: > Problem: > I want to distribute large files over a cluster. > To raise performance I decided to copy the file to the local HD of > any node in the cluster. Quick solution: Dolly [1] ;-) Longer description: I once wrote a tool called "Dolly" to clone whole hard-disk drives, partitions, or large files to many nodes in a cluster. It does so by sending the files concurrently around the cluster in a "TCP chain". In a switched network, this solution is often faster then IP multicast becauce Dolly can use the proven TCP congestion control and error correction, whereas high-speed reliable multicast is something difficult. > Till now I've take a look at msync (multicast rsync). Another tool is "udpcast". > What happens if one node (in the middle) is down. Dolly, can't handle that (it's a working prototype), but Atsushi Manabe extended Dolly into Dolly++, which supposedly can handle node failures (see link in [1]). We use Dolly regularly to clone our small 16-node cluster and the local support group uses Dolly to clone the larger 128-node cluster. Because that cluster has two Fast Ethernet networks, we can clone whole disks with about 20 MByte/s to all nodes in the cluster. If you want to clone files instead of partitions, just specify your file name in the config file instead of the device file. - Felix [1] http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mike at etek.chalmers.se Mon Aug 18 08:56:33 2003 From: mike at etek.chalmers.se (Mikael Fredriksson) Date: Mon, 18 Aug 2003 14:56:33 +0200 Subject: mulitcast copy or snowball copy References: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Message-ID: <3F40CD01.89E6AB62@etek.chalmers.se> Rene Storm wrote: > > Hi Beowulfers, > > Problem: > I want to distribute large files over a cluster. > Any ideas ? Jepp, there is a distribution system for large files mainly for the Internet, but it can probbably be of use for you. It's a fast way to distribute a large file from one host to several others, at the same time. Check out: http://bitconjurer.org/BitTorrent/index.html MF _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 10:51:27 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 10:51:27 -0400 (EDT) Subject: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Message-ID: On Mon, 18 Aug 2003, Rene Storm wrote: > I want to distribute large files over a cluster. How large? Some people think that 1MB is large, while others consider large files to be 2GB+ (e.g. "Large File Summit"). This will have a significant impact on how you copy the file. > To raise performance I decided to copy the file to the local HD of any > node in the cluster. > > Did someone find a multicast solution for that or maybe something with > snowball principle? There are several multicast file distribution protocols, but they all share the same potential flaw: they use multicast. That means that they will work great in a few specific installations, generally small clusters on a single Ethernet switch. But as you grow, multicast becomes more of a problem. Here is a strong indicator for using multicast A shared media or repeater-based network (e.g. traditional Ethernet) Here are a few of the contra-indications for using multicast Larger clusters Non-Ethernet networks "Smart" Ethernet switches which try to filter packets Random communication traffic while copying Heavy non-multicast traffic while copying Multiple multicast streams NICs with mediocre, broken or slow to configure multicast filters Drivers not tuned for rapid multicast filter changes Or, in summary, "using the cluster for something besides a multicast demo. Here is an example: The Intel EEPro100 design configures the multicast filter with a special command appended to the transmit command queue. The command is followed by a list of the multicast addresses to accept. While the command is usually queued to avoid delaying the OS, the chip makes an effort to keep the Rx side synchronous by turning off the receiver while it's computing the new multicast filter. So the longer the multicast filter list and the more frequently it is changed, the more packets dropped. And what's the biggest performance killer with multicast? Dropped packets.. > My idea was to write some scripts which copy files via rsync with snowball, If you are doing this for yourself, the solution is easy. Try the different approaches and stop when you find one that works for you. If you are building a system for use by others (as we do), then the problem becomes more challenging. > but there are some heavy problems. > e.g. > What happens if one node (in the middle) is down. Good: first consider the semantics of failure. That means both recovery and reporting the failure. My first suggesting is that *not* implement a program that copies a file to every available node. Instead use a system where you first get a list of available ("up") nodes, and then copy the files to that node list. When the copy completes continue to use that node list rather then letting jobs use newly-generated "up" lists. A geometrically cascading copy can work very well. It very effectively uses current networks (switched Ethernet, Myrinet, SCI, Quadrics, Infiniband), and can make use of the sendfile() system call. For a system such as Scyld, use a zero-base geometric cascade: move the work off of the master as the first step. The master generates the work list and immediately shifts the process creation work off to the first compute node. The master then only monitors for completion. You can implement low-overhead fault checking by counting down job issues and job completion. As the first machine falls idle, check that the final machine to assign work is still running. As the next-to-last job completes, check that the one machine still working is up. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erik at aarg.net Mon Aug 18 10:14:25 2003 From: erik at aarg.net (Erik Arneson) Date: Mon, 18 Aug 2003 07:14:25 -0700 Subject: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> References: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Message-ID: <20030818141424.GA16386@aarg.net> On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote: > Hi Beowulfers, > > Problem: > I want to distribute large files over a cluster. > To raise performance I decided to copy the file to the local HD of any node in the cluster. > > Did someone find a multicast solution for that or maybe something with snowball principle? I am really new to the Beowulf thing, so I am not sure if this solution is a good one or not. But have you taken a look at the various network filesystems? OpenAFS has a configurable client-side cache, and if the files are needed only for reading this ends up being a very quick and easy way to distribute changes throughout a number of nodes. (However, I have noticed that network filesystems are not often mentioned in conjunction with Beowulf clusters, and I would really love to learn why. Performance? Latency? Complexity?) -- ;; Erik Arneson SD, Ashland Lodge No. 23 ;; ;; GPG Key ID: 2048R/8B4CBC9C CoTH, Siskiyou Chapter No. 21 ;; ;; ;; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 481 bytes Desc: not available URL: From farschad at myrealbox.com Mon Aug 18 12:17:14 2003 From: farschad at myrealbox.com (Farschad Torabi) Date: Mon, 18 Aug 2003 21:07:14 +0450 Subject: MPICH again Message-ID: <1061224634.8d1769c0farschad@myrealbox.com> Hi All, I still have some problems running MPICH on my machine :^( I've installed MPICH and PGF90 on my PC and I am able to compile parallel codes using MPI with mpif90 command. But the problem arise when I want to run the executable file on a Bowulf cluster. As Jason Connor told me, I use the following command /bin/mpirun -machinefile machs -np 2 a.out But it prompts me that there are not enough architecture on LINUX. In this case it is like when I run the executable file (i.e. a.out) manually without using mpirun. what do you think about this?? Thank you in advance Farschad Torabi _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rene.storm at emplics.com Mon Aug 18 11:34:16 2003 From: rene.storm at emplics.com (Rene Storm) Date: Mon, 18 Aug 2003 17:34:16 +0200 Subject: AW: mulitcast copy or snowball copy Message-ID: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com> Hi Donald, > I want to distribute large files over a cluster. How large? Some people think that 1MB is large, while others consider large files to be 2GB+ (e.g. "Large File Summit"). This will have a significant impact on how you copy the file. Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB each. (Overall 30 GB) And the cluster is 128++ nodes. Here is an example: The Intel EEPro100 design configures the multicast filter with a special command appended to the transmit command queue. The command is followed by a list of the multicast addresses to accept. While the command is usually queued to avoid delaying the OS, the chip makes an effort to keep the Rx side synchronous by turning off the receiver while it's computing the new multicast filter. So the longer the multicast filter list and the more frequently it is changed, the more packets dropped. And what's the biggest performance killer with multicast? Dropped packets.. Rene: Thats right, but what if I ignore dropped packets and accept the corrupt files ? I would be able to rsync them later on. First Multicast to create files, Second step is to compare with rsync. I've tried this and it isn't really slow, if you're doing the rsync via snowball. If you are doing this for yourself, the solution is easy. Try the different approaches and stop when you find one that works for you. If you are building a system for use by others (as we do), then the problem becomes more challenging. Rene: That's the problem with all the things you do, first they are for your own and then everybody wants them ;o) > but there are some heavy problems. > e.g. > What happens if one node (in the middle) is down. Good: first consider the semantics of failure. That means both recovery and reporting the failure. My first suggesting is that *not* implement a program that copies a file to every available node. Instead use a system where you first get a list of available ("up") nodes, and then copy the files to that node list. When the copy completes continue to use that node list rather then letting jobs use newly-generated "up" lists. Rene: Good idea You can implement low-overhead fault checking by counting down job issues and job completion. As the first machine falls idle, check that the final machine to assign work is still running. As the next-to-last job completes, check that the one machine still working is up. Rene: But how do I get this status back to my "master", e.g command from master: node16 copy to node17? I don't want do de-centralize my job, like fire and forget. Cya, Rene _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 12:50:57 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 12:50:57 -0400 (EDT) Subject: AW: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com> Message-ID: On Mon, 18 Aug 2003, Rene Storm wrote: > Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB > each. (Overall 30 GB) > And the cluster is 128++ nodes. Those are important parameters. What network type are you using? If Ethernet, what switches and topology? (My guess is that you are using "smart" switches, likely connected with a chassis backplane.) > > Here is an example: ...the longer > the multicast filter list ... the more packets dropped. > Rene: Thats right, but what if I ignore dropped packets and accept the > corrupt files ? I would be able to rsync them later on. This is costly. "Open loop" multicast protocols work by having the receiver track the missing blocks, and requesting (or interpolating) them later. Here you are discarding that information and doing much extra work on both the sending and receiving side by later locating the missing blocks. An alternative is closed-loop multicast, with positive acknowledgment before proceeding more than one window. > First Multicast to create files, Second step is to compare with rsync. > I've tried this and it isn't really slow, if you're doing the > rsync via snowball. This is verifying/filling with a neighbor instead of the original sender. Except here you don't know when you are both missing the same blocks. > If you are doing this for yourself, the solution is easy. ... > Rene: That's the problem with all the things you do, first they are for > your own and then everybody wants them ;o) If your end goal is to publish papers, do the hack. If your end goal is make works useful for other, you have to start with a wider view. >> [Do] *not* implement a program that copies a file >> to every available node. Instead use a system where you first get a >> list of available ("up") nodes, and then copy the files to that node >> list. When the copy completes continue to use that node list rather >> then letting jobs use newly-generated "up" lists. > > Rene: Good idea This approach applies to a wide range of cluster tasks. A similar idea is that you don't care as much about which nodes are currently up as you care about which nodes have remained up since you last checked. [[ Ideally you could ask "which nodes will be up when this program completes", but there are all sorts of temporal and halting issues there. ]] >> You can implement low-overhead fault checking by counting down job >> issues and job completion. As the first machine falls idle, check that >> the final machine to assign work is still running. As the next-to-last >> job completes, check that the one machine still working is up. > > Rene: But how do I get this status back to my "master", e.g command from > master: node16 copy to node17? We have a positive completion indication as part of the Job/Process Management subsystem. If you consider the problem, the final acknowledgment must flow from the last worker to the process that is checking for job completion. You might as well put that process on the cluster master. The natural Unix-style implementation is having the controlling machine hold the parent of the process tree implementing the work, even if the work is divided elsewhere. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 13:31:17 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 13:31:17 -0400 (EDT) Subject: mulitcast copy or snowball copy In-Reply-To: <20030818141424.GA16386@aarg.net> Message-ID: On Mon, 18 Aug 2003, Erik Arneson wrote: > On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote: > > Hi Beowulfers, > > > > Problem: > > I want to distribute large files over a cluster. > > To raise performance I decided to copy the file to the local HD of any node in the cluster. > > > > Did someone find a multicast solution for that or maybe something with snowball principle? > > I am really new to the Beowulf thing, so I am not sure if this solution is a > good one or not. But have you taken a look at the various network > filesystems? OpenAFS has a configurable client-side cache, and if the files > are needed only for reading this ends up being a very quick and easy way to > distribute changes throughout a number of nodes. This is a good example of why Grid/wide-area tools should not be confused with local cluster approaches. The time scale, performance and complexity issues are much different. AFS uses TCP/IP to transfer whole files from a server. With multiple servers the configuration is static or slow changing. > (However, I have noticed that network filesystems are not often mentioned in > conjunction with Beowulf clusters, and I would really love to learn why. > Performance? Latency? Complexity?) It's because file systems are critically important to many applications. There is no universal cluster file system, and thus no single solution. The best approach is not tie the cluster management, membership, or process control to the file system in any way. Instead the file system should be selection based on the application's need for consistency, performance and reliability. For instance, NFS is great for small, read-only input files. But using NFS for large files, or when any files will be written or updated, results in both performance and consistency problems. When working from a large read-only database, explicitly pre-staging (copying) the database to the compute nodes is usually better than relying on an underlying FS. It's easier, more predictable and more explicit than per-directory tuning FS cache parameters. As as example of why predictability is very important, imagine what happens to an adaptive algorithm when a cached parameter file expires, or a daemon does a bunch of work. That machine suddenly is slower, and that part of the problem now looks "harder". So the work is reshuffled, only to be shuffled back during the next time step. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lange at informatik.Uni-Koeln.DE Mon Aug 18 14:21:59 2003 From: lange at informatik.Uni-Koeln.DE (Thomas Lange) Date: Mon, 18 Aug 2003 20:21:59 +0200 Subject: AW: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com> References: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com> Message-ID: <16193.6471.238244.224191@informatik.uni-koeln.de> Hi, I would try rgang, a nice tools which uses a tree structure for copying files or executing commands on a large list of nodes. It's written in python but there's also a compiled binary. It's very flexible and fast. Search for rgang in google to find the download page. To allow scaling to kiloclusters, the new rgang can utilize a tree-structure, via an "nway" switch. When so invoked, rgang uses rsh/ssh to spawn copies of itself on multiple nodes. These copies in turn spawn additional copies. Product Name: rgang Product Version: 2.5 ("rgang" cvs rev. 1.103) Date (mm/dd/yyyy): 06/23/2003 ORIGIN ====== Author Ron Rechenmacher Fermi National Accelerator Laboratory - Mail Station 234 P.O Box 500 Batavia, IL 60510 Internet: rgang-support at fnal.gov -- regards Thomas _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mitchskin at comcast.net Mon Aug 18 13:40:59 2003 From: mitchskin at comcast.net (Mitchell Skinner) Date: 18 Aug 2003 10:40:59 -0700 Subject: AW: mulitcast copy or snowball copy In-Reply-To: References: Message-ID: <1061228458.5291.32.camel@zeitgeist> On Mon, 2003-08-18 at 09:50, Donald Becker wrote: > This is costly. "Open loop" multicast protocols work by having the > receiver track the missing blocks, and requesting (or interpolating) > them later. Here you are discarding that information and doing much > extra work on both the sending and receiving side by later locating the > missing blocks. Some possible google terms include: reliable multicast, forward error correction There's an ietf working group on reliable multicast that wasn't making a whole lot of progress the last time I checked. At that time, I recall there being some acknowledgment-based implementations as well as one forward error correction-based implementation using reed-solomon codes, from an academic in Italy whose name I forgot. It's been a little while, but when I looked at the code for that FEC-based reliable multicast program (rmdp?) I think it could only handle pretty small files. My understanding is that FEC-based approaches should scale better in terms of the number of receiving nodes, but the algorithms can be very time/space intensive. There's a patented algorithm from Digital Fountain that's supposed to be pretty efficient (google tornado codes, michael luby, digital fountain) but I'm not aware that they have a cluster-oriented product. My impression of them was that they were pretty WAN-oriented. If I was less lazy I'd give some links instead of google terms, but hopefully that's some food for thought. Mitch _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 16:00:04 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 16:00:04 -0400 (EDT) Subject: AW: mulitcast copy or snowball copy In-Reply-To: <1061228458.5291.32.camel@zeitgeist> Message-ID: On 18 Aug 2003, Mitchell Skinner wrote: > On Mon, 2003-08-18 at 09:50, Donald Becker wrote: > > This is costly. "Open loop" multicast protocols work by having the > > receiver track the missing blocks, and requesting (or interpolating) > > them later. Here you are discarding that information and doing much > > extra work on both the sending and receiving side by later locating the > > missing blocks. .. > There's an ietf working group on reliable multicast that wasn't making a > whole lot of progress the last time I checked. It's a hard problem, and when they agree on a protocol it likely won't apply to clusters. The packet loss characteristic and cost trade-off is much different on a WAN than with a local Ethernet switch on a cluster. On a WAN every packet is costly to transport, so it's worth having both end stations doing extensive computations. On a cluster we might talk about doing more computation to avoid communication, but that's only for a few applications. In reality we prefer to do minimal work. Thus we prefer OS-bypass for application communication, and kernel-only for file system I/O. Notice the attention given to zero copy, TCP offload, TOE/TSO and sendfile(). Multicast and packet FEC add exactly what people are trying to avoid, extra copying, complexity and work. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rene.storm at emplics.com Mon Aug 18 17:27:56 2003 From: rene.storm at emplics.com (Rene Storm) Date: Mon, 18 Aug 2003 23:27:56 +0200 Subject: AW: AW: mulitcast copy or snowball copy Message-ID: <29B376A04977B944A3D87D22C495FB2301278E@vertrieb.emplics.com> Ok, A geometrically cascading structure gives me some more disadvantages. If you are using an additional high performance network, eg myrinet or infiniband you won't have problems with the switch bandwidth. If you are using low cost Ethernet/Gigabit network topology with 2 or more hups between the nodes (like FFN), the last "generation" of the snowball could be a heavy bottleneck. It seems, there are too many variables for too many kinds of clusters. A big cluster farm often got a "idle" network, but only one, while a MPI cluster could have a network for the message passing and one for commands and copying. You could use this service-network to copy our files without using full bandwidth of this network. But this would cost something cluster users don't have: time. Rene _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Mon Aug 18 17:37:27 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Mon, 18 Aug 2003 14:37:27 -0700 Subject: big memory opteron In-Reply-To: <1061228458.5291.32.camel@zeitgeist> References: <1061228458.5291.32.camel@zeitgeist> Message-ID: <20030818213727.GB2131@greglaptop.internal.keyresearch.com> I'm attempting to put together a big memory 2-cpu Opteron box, without success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of memory. Now that's a pretty strange number, since if I was out of chip selects, it should see exactly 4 GBytes. Any clues? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From farooqkamal_76 at yahoo.com Mon Aug 18 18:38:21 2003 From: farooqkamal_76 at yahoo.com (Farooq Kamal) Date: Mon, 18 Aug 2003 15:38:21 -0700 (PDT) Subject: Newbie Message-ID: <20030818223821.11770.qmail@web21209.mail.yahoo.com> Hi Everyone, Its my first email to this group. What I was looking for is that "is beowulf transparent to applications running". What I mean by that is suppose I run a apache server on the master node; will the cluster manage the load balancing and process migration itself? or every application that is intented to run on beowulf must be written from scracth to do so. And at last if beowulf can't do, is there anyother implematation of clusters that has these above said qualities Regards Farooq Kamal SZABIST - Karachi Pakistan __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Aug 18 22:41:49 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 19 Aug 2003 12:41:49 +1000 Subject: Newbie In-Reply-To: <20030818223821.11770.qmail@web21209.mail.yahoo.com> References: <20030818223821.11770.qmail@web21209.mail.yahoo.com> Message-ID: <200308191241.50419.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 19 Aug 2003 08:38 am, Farooq Kamal wrote: > And at last if beowulf can't do, is there anyother > implematation of clusters that has these above said > qualities I think what you're looking for is OpenMOSIX. http://www.openmosix.org/ There's an introduction to it at the Intel website at: http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic+Editorial%3a%3axeon_openmosix&cntType=IDS_EDITORIAL&catCode=BMB Excuse the large URL! good luck! Chris - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QY5tO2KABBYQAh8RAqKrAJ9SY5wfCvvL35hLPubrEa8/xFuYsgCdFHYi 4wDadQBbfYpz06hX3YRkwRI= =QIb3 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 22:43:57 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 22:43:57 -0400 (EDT) Subject: AW: AW: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278E@vertrieb.emplics.com> Message-ID: On Mon, 18 Aug 2003, Rene Storm wrote: > A geometrically cascading structure gives me some more disadvantages. > If you are using an additional high performance network, eg myrinet or > infiniband you won't have problems with the switch bandwidth. > > If you are using low cost Ethernet/Gigabit network topology with 2 or > more hups between the nodes (like FFN), the last "generation" of the > snowball could be a heavy bottleneck. No one uses Ethernet repeaters on a cluster. 32 port Fast Ethernet switches are under $5/port. Even for Gigabit Ethernet, 8 port switches can be found for $20/port. An unusual topology might be better utilized by mapping the copy topology to the physical, but that's not the usual case. The typical case is an essentially flat topology, or one close enough that treating it as flat avoids complexity. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Mon Aug 18 23:21:16 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Tue, 19 Aug 2003 11:21:16 +0800 (CST) Subject: Scalable PBS In-Reply-To: <200308181152.57812.csamuel@vpac.org> Message-ID: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com> --- Chris Samuel ???? >http://www.vpac.org/content/services_and_support/facility/linux_cluster.php Interesting ;-> May be you can take a look at the PBS addons like mpiexec, maui scheduler. > Nope, it's always been running OpenPBS prior to > migrating to SPBS. SGE is sponsored by Sun, and is opensource, I am currently using it. http://gridengine.sunsource.net/ Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Aug 18 23:47:40 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 19 Aug 2003 13:47:40 +1000 Subject: Scalable PBS In-Reply-To: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com> References: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com> Message-ID: <200308191347.42057.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 19 Aug 2003 01:21 pm, Andrew Wang wrote: > May be you can take a look at the PBS addons like > mpiexec, maui scheduler. Already there. :-) We've got some users using mpiexec (though it does mean that you can nolonger restart a mom and have an MPI job keep going like you could with MPICH's mpirun) and we swapped to the MAUI scheduler yesterday (not without problems). > > Nope, it's always been running OpenPBS prior to > > migrating to SPBS. > > SGE is sponsored by Sun, and is opensource, I am > currently using it. > > http://gridengine.sunsource.net/ What's your impression of it ? Does it integrate with commercial molecular modelling packages like MSI ? cheers, Chris - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QZ3cO2KABBYQAh8RAnBbAJ9MbVoDWNp0pjp6CHANpDZe9K2i0QCfSbE9 jlJDiWkEkM2a1uY+qCETprU= =9w8a -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bradshaw at mcs.anl.gov Tue Aug 19 00:42:52 2003 From: bradshaw at mcs.anl.gov (Rick Bradshaw) Date: Mon, 18 Aug 2003 23:42:52 -0500 Subject: big memory opteron In-Reply-To: <20030818213727.GB2131@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 18 Aug 2003 14:37:27 -0700") References: <1061228458.5291.32.camel@zeitgeist> <20030818213727.GB2131@greglaptop.internal.keyresearch.com> Message-ID: <87k79agsvn.fsf@skywalker-lin.mcs.anl.gov> Greg, This seems to be a huge bug that has been in the Bios for over a year now. I have only seen this on the AGP motherboards though. Unfortunetly they still perform much better than the none AGP boards that do recognise all the memory. Rick Greg Lindahl writes: > I'm attempting to put together a big memory 2-cpu Opteron box, without > success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of > memory. Now that's a pretty strange number, since if I was out of chip > selects, it should see exactly 4 GBytes. > > Any clues? > > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Tue Aug 19 08:42:21 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Tue, 19 Aug 2003 14:42:21 +0200 (CEST) Subject: mulitcast copy or snowball copy In-Reply-To: Message-ID: On Mon, 18 Aug 2003, Donald Becker wrote: > On Mon, 18 Aug 2003, Rene Storm wrote: [...] > > Rene: That's the problem with all the things you do, first they are for > > your own and then everybody wants them ;o) > > If your end goal is to publish papers, do the hack. If you want to write a paper you might also want to consider reading the following papers as related work: @article{ CCPE2002, author = "Felix Rauch and Christian Kurmann and Thomas M. Stricker", title = "{Optimizing the Distribution of Large Data Sets in Theory and Practice}", journal = "Concurrency and Computation: Practice and Experience", year = 2002, volume = 14, number = 3, pages = "165--181", month = apr } % frisbee-usenix03.pdf % Cloning tool, with multicast data distribution, compression techniques etc. @inproceedings{ Frisbee-Usenix2003, author = "Mike Hibler and Leigh Stoller and Jay Lepreau and Robert Ricci and Chad Barb", title = "{Fast, Scalable Disk Imaging with Frisbee}", booktitle = "Proceedings of the USENIX Annual Technical Conference 2003", year = 2003, month = jun, organization = "The USENIX Association" } Regards, Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lehi.gracia at amd.com Tue Aug 19 10:15:33 2003 From: lehi.gracia at amd.com (lehi.gracia at amd.com) Date: Tue, 19 Aug 2003 09:15:33 -0500 Subject: big memory opteron Message-ID: <99F2150714F93F448942F9A9F112634C07BE62CD@txexmtae.amd.com> Greg, You might want to try upgrading to the lates BIOS, what type of board do you have? -Lehi -----Original Message----- From: Greg Lindahl [mailto:lindahl at keyresearch.com] Sent: Monday, August 18, 2003 4:37 PM To: beowulf at beowulf.org Subject: big memory opteron I'm attempting to put together a big memory 2-cpu Opteron box, without success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of memory. Now that's a pretty strange number, since if I was out of chip selects, it should see exactly 4 GBytes. Any clues? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmpc at sanger.ac.uk Tue Aug 19 12:53:59 2003 From: gmpc at sanger.ac.uk (Guy Coates) Date: Tue, 19 Aug 2003 17:53:59 +0100 (BST) Subject: AW: mulitcast copy or snowball copy In-Reply-To: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com> References: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com> Message-ID: We've tried both multicast and snowball for data distribution on our cluster. We have a 60Gig dataset which we have to distribute to 1000 nodes. We started off using snowball copies. They work, but care is needed in your choice of tools for the file-transfers. rsync works, but can have problems with large (> 2Gig) files if you use rsh as the transport mechanism. (this is an rsh bug on some redhat versions rather than an rsync bug). rsync over ssh gets around that problem, but of course has the added encryption overhead. You should also avoid the incremental update mode of rsync (which is the default). We've found that it will silently corrupt your files if you rsync across different architectures (eg alpha-->ia32). It also has problems with large files. The only usable multicast code we've found that actually works is udpcast. http://udpcast.linux.lu/ There are plenty of other multicast codes to choose from out on the web, and most of them fall over horribly as soon as you cross more than one switch or have more than 10-20 hosts. We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used it to sucessfully distribute our 60gig dataset over large numbers of nodes simultaneously. In practice, on gigabit, we find that disk write speed is the limiting factor rather than the network. Lawrence Livermore use udpcast to install OS images on the MCR cluster, and I believe they side-step the disk performance issue by writing data to a ramdisk as an intermediate step. Obviously this only makes sense if your dataset < size of memory. Our current file distribution strategy is to use a combination of rsync and updcast. We do a dummy rsync to find out what files need updating, tar them up, pipe the tarball through udpcast and then untar the files and the client. The main performance killer we've found for udpcast is cheap switches. Cheers, Guy Coates -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csmith at lnxi.com Tue Aug 19 13:48:24 2003 From: csmith at lnxi.com (Curtis Smith) Date: Tue, 19 Aug 2003 11:48:24 -0600 Subject: AW: mulitcast copy or snowball copy References: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com> Message-ID: <072a01c3667a$16624c90$a423a8c0@blueberry> You might want to look into the Clusterworx product from Linux Networx. It has been used to boot and image clusters over 1100 nodes in size using multicast, and supports image sizes over 4GB. Multiple images can be served by a single server using ethernet. Each channel can use 100% of the network bandwidth (12.5MB per second on Fast Ethernet) or can be throttled to a specific rate. We typically use a transmission rate of 10MB per second on Fast Ethernet (30 seconds for a 300MB image), allowing DHCP traffic to get through. The multicast server can also be throttled to ensure that its doesn't overdrive the switch or hub (if you are using cheap ones) which in many cases can account for up to 95% of packet loss. If your switch is fast and is IGMP enabled, you will generally experience little to no packet loss. The technology is based on UDP and multicast and works with LinuxBios and Etherboot, and was used to image the MCR cluster many times prior to its deployment at LLNL. MCR could go from powered-off bare metal to running in about 7 minutes (most of which was disk formatting). Curtis Smith Principal Software Engineer Linux Networx (www.lnxi.com) ----- Original Message ----- From: "Guy Coates" To: Sent: Tuesday, August 19, 2003 10:53 AM Subject: Re:AW: mulitcast copy or snowball copy > > We've tried both multicast and snowball for data distribution on our > cluster. We have a 60Gig dataset which we have to distribute to 1000 > nodes. > > We started off using snowball copies. They work, but care is needed in > your choice of tools for the file-transfers. rsync works, but can have > problems with large (> 2Gig) files if you use rsh as the transport > mechanism. (this is an rsh bug on some redhat versions rather than an > rsync bug). > > rsync over ssh gets around that problem, but of course has the added > encryption overhead. > > You should also avoid the incremental update mode of rsync (which is the > default). We've found that it will silently corrupt your files if you > rsync across different architectures (eg alpha-->ia32). It also has > problems with large files. > > > The only usable multicast code we've found that actually works is udpcast. > > http://udpcast.linux.lu/ > > There are plenty of other multicast codes to choose from out on the web, > and most of them fall over horribly as soon as you cross more than one > switch or have more than 10-20 hosts. > > We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used > it to sucessfully distribute our 60gig dataset over large numbers of nodes > simultaneously. > > In practice, on gigabit, we find that disk write speed is the limiting > factor rather than the network. Lawrence Livermore use udpcast to install > OS images on the MCR cluster, and I believe they side-step the disk > performance issue by writing data to a ramdisk as an intermediate step. > Obviously this only makes sense if your dataset < size of memory. > > Our current file distribution strategy is to use a combination of rsync > and updcast. We do a dummy rsync to find out what files need updating, tar > them up, pipe the tarball through udpcast and then untar the files and the > client. > > The main performance killer we've found for udpcast is cheap switches. > > Cheers, > > Guy Coates > > -- > Guy Coates, Informatics System Group > The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK > Tel: +44 (0)1223 834244 ex 7199 > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john152 at libero.it Wed Aug 20 07:26:43 2003 From: john152 at libero.it (john152 at libero.it) Date: Wed, 20 Aug 2003 13:26:43 +0200 Subject: Detection performance? Message-ID: Hi all, does anyone know about the performance of Mii-diag using ioctl calls? Using Mii-diag, what could be the average delay between the link-status change ( phisically ) and the detection of this event. I'm using a 3Com 905 Tornado PC card; is there a different delay for each PC card in changing the status register? How long could this delay be in your experience? I'd like to have a delay minor than 1 ms between the time in which i phisically disconnect the cable and the time in which I have the detection (in example with a printf on video, ...) In your experience, is it reasonable? Normally do I have to wait for a greater delay? Thanks in advance for your kind answer and observations. Giovanni di Giacomo _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Aug 20 08:30:58 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 20 Aug 2003 14:30:58 +0200 (CEST) Subject: mulitcast copy or snowball copy In-Reply-To: Message-ID: On Tue, 19 Aug 2003, Guy Coates wrote: > The only usable multicast code we've found that actually works is udpcast. > > http://udpcast.linux.lu/ > > There are plenty of other multicast codes to choose from out on the web, > and most of them fall over horribly as soon as you cross more than one > switch or have more than 10-20 hosts. > > We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used > it to sucessfully distribute our 60gig dataset over large numbers of nodes > simultaneously. That's interesting, since I tried udpcast once (just a few tests) on our Cabletron SmartSwitchRouter with Gigabit Ethernet without disk accesses and I got about 350 Mbps, while Dolly ran with approx. 500 Mbps on Machines with 1 GHz processors. I even used Dolly once (already many years ago, with 400 MHz machines) to clone two 24-node clusters at the same time, they were connected to two different switches and had a router in between. The throughput for the nodes was about 6.9 MByte/s over Fast Ethernet for every of the nodes. > The main performance killer we've found for udpcast is cheap switches. True. I tried it once with a cheap and simple ATI 24-port Fast Ethernet switch. Udpcast run with only about 1 MByte/s since the switch decided to multicast everything with only 10 Mbps (one machine that wasn't a member of the multicast group was connected with only 10 Mbps). Dolly on the other hand worked perfect with full wire speed on that switch. - Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Wed Aug 20 09:39:16 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 20 Aug 2003 15:39:16 +0200 (CEST) Subject: Detection performance? In-Reply-To: Message-ID: On Wed, 20 Aug 2003, wrote: > Using Mii-diag, what could be the average delay > between the link-status change ( phisically ) > and the detection of this event. Depends on the card capabilities and the driver. Most drivers poll for change, some use an interrupt. > I'm using a 3Com 905 Tornado PC card; is there > a different delay for each PC card in changing the > status register? I don't understand the question... > I'd like to have a delay minor than 1 ms > between the time in which i phisically > disconnect the cable and the time in which > I have the detection (in example with a printf on video, ...) The 3c59x driver polls every 60 seconds for media status when using autonegotiation (default). People from the HA and bonding projects have modified this to allow very fine polling of the media registers, however this has a big disadvantage: the CPU spends a lot of time waiting for completion of in/out operations - the finer the poll, the more CPU lost. The time taken to talk to th MII does not depend on the CPU speed, but on th PCI speed, so the faster the CPU, the more instruction cycles are lost to I/O. > In your experience, is it reasonable? No, because the network card should transfer data, not be a watchdog. There is one other solution, but there is no code for it yet. At least the Tornado cards allow generating an interrupt whenever the media changes. This would alleviate the need to continually poll the media registers and would give an indication very soon after the event happened. This was on my to-do list for a long time, but it was never done and probably won't be done soon. > Normally do I have to wait for a greater delay? If by "normally" you mean "the 359x driver distributed with the kernel" or "the 3c59x driver from Scyld", then yes. > Thanks in advance for your kind answer and observations. This isn't really beowulf related. Please use vortex at scyld.com for discussing the 3c59x driver. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jhearns at micromuse.com Wed Aug 20 10:50:30 2003 From: jhearns at micromuse.com (John Hearns) Date: Wed, 20 Aug 2003 15:50:30 +0100 Subject: Detection performance? In-Reply-To: References: Message-ID: <3F438AB6.6090507@micromuse.com> That's an interesting question. Can you tell us what your application is, and why it needs fast response? First thought I had would be to SNMP trap the port status on the switch, rather than the card. But I must admit I have no idea of the latency there, but would I would expect it to be much more than 1ms. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Wed Aug 20 12:33:49 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 20 Aug 2003 12:33:49 -0400 Subject: clubmask 0.5 released Message-ID: <1061397229.16487.45.camel@roughneck> Name : Clubmask Version : 0.5 Release : 1 Group : Cluster Resource Management and Scheduling Vendor : Liniac Project, University of Pennsylvania License : GPL-2 URL : http://clubmask.sourceforge.net What is Clubmask ---------------- Clubmask is a resource manager designed to allow Bproc based clusters enjoy the full scheduling power and configuration of the Maui HPC Scheduler. Clubmask uses a modified version of the Supermon resource monitoring software to gather resource information from the cluster nodes. This information is combined with job submission data and delivered to the Maui scheduler. Maui issues job control commands back to Clubmask, which then starts or stops the job scripts using the Bproc environment. Clubmask also provides builtin support for a supermon2ganglia translator that allows a standard Ganlgia web backend to contact supermon and get XML data that will disply through the Ganglia web interface. Clubmask is currently running on around 10 clusters, varying in size from 8 to 128 nodes, and has been tested up to 5000 jobs. Links ------------- Bproc: http://bproc.sourceforge.net Ganglia: http://ganglia.sourceforge.net Maui Scheduler: http://www.supercluster.org/maui Supermon: http://supermon.sourceforge.net Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Fri Aug 22 00:39:04 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 22 Aug 2003 12:39:04 +0800 (CST) Subject: SGE on AMD Opteron ? In-Reply-To: <200308201609.UAA08558@nocserv.free.net> Message-ID: <20030822043904.18171.qmail@web16811.mail.tpe.yahoo.com> Using the 32-bit x86 glinux binary package, it works on my machine. SGE gets the load information and the system/hardware information correctly: > qhost HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - opteron1 glinux 2 0.00 997.0M 47.8M 1.0G 4.3M Andrew. --- Mikhail Kuzminsky ???? > Sorry, is here somebody who > works w/Sun GrideEngine on AMD Opteron platform ? > I'm interesting in any information - > about binary SGE distribution in 32-bit mode, > or about compilation from the source for x86-64 > mode, > under SuSE or RedHat distribution etc. > > Yours > Mikhail Kuzminsky > Zelinsky Institute of Organic Chemistry > Moscow > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jhearns at micromuse.com Sat Aug 23 03:48:58 2003 From: jhearns at micromuse.com (John Hearns) Date: 23 Aug 2003 08:48:58 +0100 Subject: SGE on AMD Opteron ? In-Reply-To: <200308201609.UAA08558@nocserv.free.net> References: <200308201609.UAA08558@nocserv.free.net> Message-ID: <1061624938.1182.57.camel@harwood> On Wed, 2003-08-20 at 17:09, Mikhail Kuzminsky wrote: > Sorry, is here somebody who > works w/Sun GrideEngine on AMD Opteron platform ? > I'm interesting in any information >From - Return-Path: <> Received: from localhost by clarice with LMTP for ; Sat, 23 Aug 2003 08:51:33 +0100 Received: from mta.micromuse.com (mta.micromuse.com [194.131.185.92]) by mailstore.micromuse.co.uk (Switch-2.2.6/Switch-2.2.4) with ESMTP id h7N7pXZ27346 for ; Sat, 23 Aug 2003 08:51:33 +0100 Received: from marstons.services.quay.plus.net (marstons.services.quay.plus.net [212.159.14.223]) by mta.micromuse.com (Switch-2.2.6/Switch-2.2.6) with SMTP id h7N7pWY27479 for ; Sat, 23 Aug 2003 08:51:32 +0100 Message-Id: <200308230751.h7N7pWY27479 at mta.micromuse.com> Received: (qmail 19110 invoked for bounce); 23 Aug 2003 07:51:26 -0000 Date: 23 Aug 2003 07:51:26 -0000 From: MAILER-DAEMON at marstons.services.quay.plus.net To: jhearns at micromuse.com Subject: failure notice X-Perlmx-Spam: Gauge=XXXIIIIIIIII, Probability=39%, Report="FAILURE_NOTICE_1, MAILER_DAEMON, NO_MX_FOR_FROM, NO_REAL_NAME, SPAM_PHRASE_00_01" X-Evolution-Source: imap://jhearns at mta.micromuse.com/ Mime-Version: 1.0 Hi. This is the qmail-send program at marstons.services.quay.plus.net. I'm afraid I wasn't able to deliver your message to the following addresses. This is a permanent error; I've given up. Sorry it didn't work out. : Sorry, I couldn't find any host named bewoulf.org. (#5.1.2) --- Below this line is a copy of the message. Return-Path: Received: (qmail 19106 invoked by uid 10001); 23 Aug 2003 07:51:26 -0000 Received: from dockyard.plus.com (HELO .) (212.159.87.168) by marstons.services.quay.plus.net with SMTP; 23 Aug 2003 07:51:26 -0000 Subject: Re: SGE on AMD Opteron ? From: John Hearns To: bewoulf at bewoulf.org In-Reply-To: <200308201609.UAA08558 at nocserv.free.net> References: <200308201609.UAA08558 at nocserv.free.net> Content-Type: text/plain Organization: Micromuse Message-Id: <1061624843.1183.52.camel at harwood> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 23 Aug 2003 08:47:23 +0100 Content-Transfer-Encoding: 7bit On Wed, 2003-08-20 at 17:09, Mikhail Kuzminsky wrote: > Sorry, is here somebody who > works w/Sun GrideEngine on AMD Opteron platform ? > I'm interesting in any information - I'm working with this. More news when I get it. Also, and I know that all I have to do is Google and do some reading, but does andone on the list have experience with lm_sensors on Opteron? Specifically HDAMA motherboards. A quick Google just turned up a post by Mikhail in June on this very subject... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From saville at comcast.net Sat Aug 23 17:38:49 2003 From: saville at comcast.net (Gregg Germain) Date: Sat, 23 Aug 2003 17:38:49 -0400 Subject: Help! Endless RARP requests Message-ID: <3F47DEE9.F00C5213@comcast.net> Hi, I have installed the Scyle basic edition I got from Linux Central (RH 6.2). I've done the installation and I selected the range of IP addresses they suggest for the slave nodes. ifconfig shows that eth1 is operating. I connect a slave node to the Master node by connecting the Slave's eth0 card to the Master's eth1 card. I created a slave boot floppy, and boot the slave. It boots ok but starts sending RARP requests that never get satisfied. It sits there forever making more requests (well eventually it reboots itself and tries again but then there's endless RARP requests). Can any one give me a hint? Do I have to go through a hub to connect that first slave to the master? I know I'll have to have a hub for the second slave, but I thought I could make a direct connection for the first one. Any help would be greatly appreciated. thanks Gregg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From adm35 at georgetown.edu Sun Aug 24 14:36:02 2003 From: adm35 at georgetown.edu (adm35 at georgetown.edu) Date: Sun, 24 Aug 2003 14:36:02 -0400 Subject: Help! Endless RARP requests Message-ID: <1967012801.1280119670@georgetown.edu> You'll either need a hub, switch or crossover cable. Arnie Miles Systems Administrator: Advanced Research Computing Adjunct Faculty: Computer Science 202.687.9379 168 Reiss Science Building http://www.georgetown.edu/users/adm35 http://www.guppi.arc.georgetown.edu ----- Original Message ----- From: Gregg Germain Date: Saturday, August 23, 2003 5:38 pm Subject: Help! Endless RARP requests > Hi, > > I have installed the Scyle basic edition I got from Linux Central (RH > 6.2). > > I've done the installation and I selected the range of IP addresses > they suggest for the slave nodes. ifconfig shows that eth1 is > operating. > I connect a slave node to the Master node by connecting the Slave's > eth0 card to the Master's eth1 card. > > I created a slave boot floppy, and boot the slave. It boots ok but > starts sending RARP requests that never get satisfied. It sits there > forever making more requests (well eventually it reboots itself and > tries again but then there's endless RARP requests). > > Can any one give me a hint? > > Do I have to go through a hub to connect that first slave to the > master? I know I'll have to have a hub for the second slave, but I > thought I could make a direct connection for the first one. > > Any help would be greatly appreciated. > > thanks > > Gregg > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Mon Aug 25 03:29:29 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Mon, 25 Aug 2003 09:29:29 +0200 Subject: PCI-X/133 NICs on PCI-X/100 In-Reply-To: <200308221815.WAA27091@nocserv.free.net> References: <200308221815.WAA27091@nocserv.free.net> Message-ID: <200308250929.29082.joachim@ccrl-nece.de> Mikhail Kuzminsky: > Really I need to estimate: will Mellanox MTPB23108 IB PCI-X/133 cards > work w/PCI-X/100 slots on Opteron-based mobos (most of > them have PCI-X/100, exclusions that I know are Tyan S2885 and Apppro > mobos) - i.e. how high is the probability that they are > incompatible ? Very low. But why don't you ask the vendor directly? Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wade.hampton at nsc1.net Mon Aug 25 08:31:34 2003 From: wade.hampton at nsc1.net (Wade Hampton) Date: Mon, 25 Aug 2003 08:31:34 -0400 Subject: help with e1000 upgrade Message-ID: <3F4A01A6.4090608@nsc1.net> G'day, I am upgrading the larger of my clusters to 1G ethernet. All nodes are TYAN motherboards (including the head node), and have on-board 1G. I've been using the default e1000 driver on my head node for the past year. It's version 4.1.7. However, when I try to boot my slave nodes, they appear to "hang" after initializing the NIC. I tried upgrading to the newer 5.1.3 driver. The head node is up and working. I made a boot floppy and tried booting, but once again, it hung right after the line displaying the e1000 and its IRQ. In the slave node BIOS, I have turned off the eepro100 and turned on the e1000. Any help would be appreciated. Cheers, -- Wade Hampton _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wade.hampton at nsc1.net Mon Aug 25 11:33:26 2003 From: wade.hampton at nsc1.net (Wade Hampton) Date: Mon, 25 Aug 2003 11:33:26 -0400 Subject: help with e1000 upgrade In-Reply-To: <3F4A163E.B3301A38@accessgate.net> References: <3F4A01A6.4090608@nsc1.net> <3F4A163E.B3301A38@accessgate.net> Message-ID: <3F4A2C46.6040007@nsc1.net> Doug Shubert wrote: >Hello Wade, > >Wade Hampton wrote: > > > >>G'day, >> >>I am upgrading the larger of my clusters to 1G ethernet. All nodes are >> >> > >Are the on-board NIC's Intel ? > Intel >>I tried upgrading to the newer 5.1.3 driver. The head node >>is up and working. I made a boot floppy and tried booting, >>but once again, it hung right after the line displaying the >>e1000 and its IRQ. >> >> >> > >Are you using Cat5e or Cat6 cabling? > >We have found that Cat6 works more reliably >on auto sense 10/100/1000 NIC's and switches. > So far, CAT5E (3-6 foot cables). >>In the slave node BIOS, I have turned off the eepro100 >>and turned on the e1000. >> >> >We are using the E1000 driver in kernel 2.4.21 and it works flawlessly. > I've been using it in the Scyld 2.4.17 kernel from my head node for nearly a year without any issues. The master has the same motherboard and chips, only more disks, etc. The issue seems to be with booting my slave nodes from the Scyld boot disc. Thanks, -- Wade Hampton _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Tue Aug 26 01:33:33 2003 From: rouds at servihoo.com (RoUdY) Date: Tue, 26 Aug 2003 09:33:33 +0400 Subject: linpack In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com> Message-ID: Hello, Can someone please tell me a bit more about linpack and how to implement it so that i can measure its performance . And also some recommended sites Thnx roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Tue Aug 26 01:33:33 2003 From: rouds at servihoo.com (RoUdY) Date: Tue, 26 Aug 2003 09:33:33 +0400 Subject: linpack In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com> Message-ID: Hello, Can someone please tell me a bit more about linpack and how to implement it so that i can measure its performance . And also some recommended sites Thnx roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Tue Aug 26 07:16:42 2003 From: rouds at servihoo.com (RoUdY) Date: Tue, 26 Aug 2003 15:16:42 +0400 Subject: mpich2-0.93 In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com> Message-ID: hello Please help me, i really need help Because i can run mpd on the localhost but not in a ring of PC's When I type "mpiexec -np 3 ~/mpich2-0.93/examples/cpi " I get the answer Permission to node1 denied Permission to node 2 denied.................. Hope to hear from u very soon -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Tue Aug 26 10:33:05 2003 From: angel at wolf.com (Angel Rivera) Date: Tue, 26 Aug 2003 14:33:05 GMT Subject: Change Management Control Message-ID: <20030826143305.24318.qmail@houston.wolf.com> I am looking for information/sites and a formal best practice change control for clusters. Can someone point me in the right direction? thx -ar _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nordquist at geosci.uchicago.edu Tue Aug 26 16:50:34 2003 From: nordquist at geosci.uchicago.edu (Russell Nordquist) Date: Tue, 26 Aug 2003 15:50:34 -0500 (CDT) Subject: mpich2-0.93 In-Reply-To: Message-ID: It sounds like you haven't setup password-less communication between your nodes. Can you "rsh node1" or "ssh node1" and not be prompted? If not you need to setup password-less rsh (usaully .rhosts) or ssh. You can tell which one mpirun (or change it) is using by the value of RSHCOMMAND (at least in the 1.2 version) in mpirun. russell On Tue, 26 Aug 2003 at 15:16, RoUdY wrote: > hello > Please help me, i really need help > Because i can run mpd on the localhost but not in a ring > of PC's > When I type "mpiexec -np 3 ~/mpich2-0.93/examples/cpi " > I get the answer > Permission to node1 denied > Permission to node 2 denied.................. > > Hope to hear from u very soon > -------------------------------------------------- > Get your free email address from Servihoo.com! > http://www.servihoo.com > The Portal of Mauritius > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > - - - - - - - - - - - - Russell Nordquist UNIX Systems Administrator Geophysical Sciences Computing http://geosci.uchicago.edu/computing NSIT, University of Chicago - - - - - - - - - - - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Aug 26 20:34:38 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 27 Aug 2003 10:34:38 +1000 Subject: mpich2-0.93 In-Reply-To: References: Message-ID: <200308271034.39916.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 27 Aug 2003 06:50 am, Russell Nordquist wrote: > Can you "rsh node1" or "ssh node1" and not be prompted? If not you need to > setup password-less rsh (usaully .rhosts) or ssh. You can tell which one > mpirun (or change it) is using by the value of RSHCOMMAND (at least in > the 1.2 version) in mpirun. Standard security blah - rsh is evil, ssh is your friend. :-) It is possible to not install rsh, rlogin and rcp and replace them with symbolic links to ssh, slogin and scp. This should work for most cases, but of course, test, test and test again. We were fortunate, although we had installed the r-series clients on our cluster the daemons weren't enabled in inetd, so we knew we couldn't break anything by removing them (as they'd never have worked in the first place). So far not found anything that has a problem because of this - although don't nuke users .rhosts files as some other programs, like PBS, call ruserok() to validate connections! cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/S/yeO2KABBYQAh8RAr/QAKCNHOz5hxIejvGOW34KZsRW74u0NwCeOONj C49BRL6ceXRIHHNhl1mqHss= =BM9q -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajiang at mail.eecis.udel.edu Wed Aug 27 17:32:38 2003 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Wed, 27 Aug 2003 17:32:38 -0400 (EDT) Subject: A question about Bewoulf software: Message-ID: Hi, These days, our lab are planning to built up a Beowulf cluster, which uses Intel Xeon Processors or Pentium 4, and Intel Pro Gigabit (10/100/100) ethernet card. We wonder if we choose commerical software, such as scyld, which version will support Xeon Processor or Pentium 4 respectively? And which version will support Intel Pro Gigabit Ethernet card? If we try buliding by ourself, which version of software we should choose? Thanks a lot for your kind suggestion. I am looking forward to hearing from you. Thanks again. Tom _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Aug 27 19:41:17 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 27 Aug 2003 19:41:17 -0400 (EDT) Subject: A question about Bewoulf software: In-Reply-To: Message-ID: On Wed, 27 Aug 2003, Ao Jiang wrote: > These days, our lab are planning to built up > a Beowulf cluster, which uses Intel Xeon Processors > or Pentium 4, and Intel Pro Gigabit (10/100/100) > ethernet card. > We wonder if we choose commerical software, such as > scyld, which version will support Xeon Processor or > Pentium 4 respectively? Most Linux distributions will "support" the Pentium 4 and Xeon. The question is if the kernel is compiled to take advantage of the newer processor features. The Scyld distribution now has about a dozen different kernels to match the processor types and UP/SMP on the master and compute nodes. Typically only two to four of the kernels are installed, based which checkboxes are slected during installation. We always install a safe, featureless i386 uniprocessor BTW, you might think that the processor family is the most important optimization, but there is an even bigger difference between uniprocessor and SMP kernels. > If we try buliding by ourself, which version of software > we should choose? You pretty much have two choices: be library version compatible with a consumer/workstation distribution (Red Hat, SuSE, Debian), or use a meta-distribution such as GenToo or Debian and compile everything yourself. > And which version will support Intel Pro Gigabit Ethernet card? Every few weeks Intel comes out with a new card version with a new PCI ID. The e1000 driver is one of the five or so drivers that we are constantly updating to support just-introduced chips. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From exa at kablonet.com.tr Wed Aug 27 20:39:21 2003 From: exa at kablonet.com.tr (Eray Ozkural) Date: Thu, 28 Aug 2003 03:39:21 +0300 Subject: gigabit switches for 32-64 nodes Message-ID: <200308280339.21458.exa@kablonet.com.tr> hi there, are there any high performance gigabit ethernet switches for a beowulf cluster consisting of 32 to 64 nodes? what do you recommend for the interconnect of such a system? regards, -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From widyono at cis.upenn.edu Tue Aug 26 17:33:06 2003 From: widyono at cis.upenn.edu (Daniel Widyono) Date: Tue, 26 Aug 2003 17:33:06 -0400 Subject: perl-bproc bindings Message-ID: <20030826213306.GA2497@central.cis.upenn.edu> Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc, something more recent than spring of 2001? Failing that, anyone have a chance to flesh out the missing information in Dan's work (e.g. C constant() function which doesn't seem to exist, error handling, etc.)? I have it "just working" for "just users", and barely at that. Error handling consists of returning -128 minus negated error code if there's an error. I've already Googled and checked these archives (perl bproc binding). Everything points back to Dan's work. Thanks, Dan W. -- -- Daniel Widyono http://www.cis.upenn.edu/~widyono -- Liniac Project, CIS Dept., SEAS, University of Pennsylvania -- Mail: CIS Dept, 302 Levine 3330 Walnut St Philadelphia, PA 19104 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Aug 28 01:49:42 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 28 Aug 2003 01:49:42 -0400 (EDT) Subject: perl-bproc bindings In-Reply-To: <20030826213306.GA2497@central.cis.upenn.edu> Message-ID: On Tue, 26 Aug 2003, Daniel Widyono wrote: > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc, > something more recent than spring of 2001? There are updated bindings, and a small example, at ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mitchel at navships.com Thu Aug 28 05:08:13 2003 From: mitchel at navships.com (Mitchel Kagawa) Date: Wed, 27 Aug 2003 23:08:13 -1000 Subject: gigabit switches for 32-64 nodes References: <200308280339.21458.exa@kablonet.com.tr> Message-ID: <000601c36d43$ea127360$0a02a8c0@kitsu2> We use a Foundry Fastiron II Plus with 64 non-blocking copper gigabit ports. A little on the pricy side but it works very well. ~Mitchel ----- Original Message ----- From: "Eray Ozkural" To: Sent: Wednesday, August 27, 2003 2:39 PM Subject: gigabit switches for 32-64 nodes > hi there, > > are there any high performance gigabit ethernet switches for a beowulf cluster > consisting of 32 to 64 nodes? what do you recommend for the interconnect of > such a system? > > regards, > > -- > Eray Ozkural (exa) > Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org > www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza > GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Thu Aug 28 08:57:15 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu, 28 Aug 2003 14:57:15 +0200 (CEST) Subject: 32bit slots and riser cards Message-ID: Dear beowulfers, In planning for some new cluster nodes, I hit a small problem. I want: - a modern mainboard for dual-Xeon (preferred) or dual-Athlon - 1U or 2U rackmounted case - to use an old Myrinet M2M-PCI32 (Lanai4) which requires a riser card for installing in the case The problem is that all mainboards that I looked at position the 32bit PCI slot(s) near the edge of the mainboard and I cannot see how the riser card can be installed into them so that the card still fits in the case; the Myrinet card does not fit (keyed differently) into the 64bit PCI slots or 64bit risers. Is there some solution to this problem or do I have to go back to midi-tower cases ? Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Thu Aug 28 10:24:19 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 28 Aug 2003 10:24:19 -0400 Subject: perl-bproc bindings In-Reply-To: References: Message-ID: <1062080659.7565.0.camel@roughneck> On Thu, 2003-08-28 at 01:49, Donald Becker wrote: > On Tue, 26 Aug 2003, Daniel Widyono wrote: > > > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc, > > something more recent than spring of 2001? > > There are updated bindings, and a small example, at > ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz Any chance you guys have updated python bindings as well? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at bellsouth.net Thu Aug 28 11:04:52 2003 From: laytonjb at bellsouth.net (Jeffrey B. Layton) Date: Thu, 28 Aug 2003 11:04:52 -0400 Subject: Intel acquiring Pallas Message-ID: <3F4E1A14.8030900@bellsouth.net> Good morning! I though I would post this for those who haven't seen it yet: http://www.theregister.co.uk/content/4/32522.html "Intel has signed on to acquire German software maker Pallas, hoping the company's performance tools can give it an edge in the compute cluster arena." Enjoy! Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 11:30:30 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 11:30:30 -0400 (EDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E1A14.8030900@bellsouth.net> Message-ID: On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > Good morning! > > I though I would post this for those who haven't seen it yet: > > http://www.theregister.co.uk/content/4/32522.html > > "Intel has signed on to acquire German software maker Pallas, > hoping the company's performance tools can give it an edge in > the compute cluster arena." Interesting. I'm trying to understand where and how this will help them -- more often than not it is a Bad Thing when hardware mfrs start dabbling in something higher than firmware or compilers -- Apple (and Next in its day) stands at one end of that path. It's especially curious given that Intel is already overwhelmingly dominant in the compute cluster arena (with only AMD a meaningful cluster competitor, and with apple and the PPC perhas a distant third). Not to mention the fact that if they REALLY wanted to get an edge in the compute cluster arena, they'd acquire somebody like Dolphin or Myricom. Monitoring is lovely and even important for application tuning, but it is an application layer on TOP of both systems software and the network. Or perhaps they are buying them so they can instrument their compilers? rgb > > Enjoy! > > Jeff > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at bellsouth.net Thu Aug 28 11:50:41 2003 From: laytonjb at bellsouth.net (Jeffrey B. Layton) Date: Thu, 28 Aug 2003 11:50:41 -0400 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <3F4E24D1.9010301@bellsouth.net> Robert G. Brown wrote: >On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > > >>Good morning! >> >> I though I would post this for those who haven't seen it yet: >> >>http://www.theregister.co.uk/content/4/32522.html >> >>"Intel has signed on to acquire German software maker Pallas, >>hoping the company's performance tools can give it an edge in >>the compute cluster arena." >> >> > >Interesting. I'm trying to understand where and how this will help them >-- more often than not it is a Bad Thing when hardware mfrs start >dabbling in something higher than firmware or compilers -- Apple (and >Next in its day) stands at one end of that path. > >It's especially curious given that Intel is already overwhelmingly >dominant in the compute cluster arena (with only AMD a meaningful >cluster competitor, and with apple and the PPC perhas a distant third). >Not to mention the fact that if they REALLY wanted to get an edge in the >compute cluster arena, they'd acquire somebody like Dolphin or Myricom. > >Monitoring is lovely and even important for application tuning, but it >is an application layer on TOP of both systems software and the network. >Or perhaps they are buying them so they can instrument their compilers? > > rgb > Bob, Very interesting observation. I wonder if Intel doesn't have something else up their sleeve? Could they be trying to get back into Supercomputer game (not likely, but didn't they get some DoD money recently?). Could they be helping with networking stuff (Intel has been discussing the next generation networking stuff lately). Maybe some sort of TCP Offload Engine? Maybe something with their new bus ( PCI Express?) They have also created CSA (Communication Streaming Architecture) in their new chipset to bypass the PCI bottleneck. Of course they could also be after the Pallas parallel debuggers to integrate into their compilers (like you mentioned) or perhaps to help with debugging threaded code in the hyperthreaded chips. Not that you mention it, this is a somewhat interesting development. I wonder what they're up to? >Robert G. Brown http://www.phy.duke.edu/~rgb/ >Duke University Dept. of Physics, Box 90305 >Durham, N.C. 27708-0305 >Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 12:05:52 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 12:05:52 -0400 (EDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E24D1.9010301@bellsouth.net> Message-ID: On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > to bypass the PCI bottleneck. Of course they could also be after the Pallas > parallel debuggers to integrate into their compilers (like you mentioned) > or perhaps to help with debugging threaded code in the hyperthreaded chips. > Not that you mention it, this is a somewhat interesting development. > I wonder what they're up to? My guess is something like this, given what pallas does, but if this is the case, they may be preparing to attempt a task that has brought strong programmers to their knees repeatedly in the past -- create a true parallel compiler. A compiler where the thread library transparently hides a network-based cluster, complete with migration and load balancing. So the same code, written on top of a threading library, could compile and run transparently on a single processor or a multiprocessor or a distributed cluster. Or something. Hell, they're one of the few entities that can afford to tackle such a blue-sky project, and just perhaps it is time for the project to be tackled. At least they can attack it from both ends at once -- writing the compiler at the same time they hack the hardware around. But they're going to have create a hardware-level virtual interface for a variety of IPC mechanism's for this to work, I think, in order to instrument it locally and globally with no particular penalty either way. Or, of course, buy SCI and start putting the chipset on their motherboards as a standard feature on a custom bus. Myricom wouldn't like that (or Dolphin if they went the other way), but it would make a hell of a clustering motherboard. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Thu Aug 28 12:22:01 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Thu, 28 Aug 2003 11:22:01 -0500 (CDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E24D1.9010301@bellsouth.net> Message-ID: On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > Bob, > > Very interesting observation. I wonder if Intel doesn't have something > else up their sleeve? Could they be trying to get back into Supercomputer > game (not likely, but didn't they get some DoD money recently?). Could > they be helping with networking stuff (Intel has been discussing the next > generation networking stuff lately). Maybe some sort of TCP Offload > Engine? Maybe something with their new bus ( PCI Express?) They have also > created CSA (Communication Streaming Architecture) in their new chipset > to bypass the PCI bottleneck. Of course they could also be after the Pallas > parallel debuggers to integrate into their compilers (like you mentioned) > or perhaps to help with debugging threaded code in the hyperthreaded chips. > Not that you mention it, this is a somewhat interesting development. > I wonder what they're up to? > Intel's already dropped Infiniband, and they have also recently gotten very quiet about using PCI Express as a node interconnect. In fact, this use of PCI Express has recently been switched to one of their "non-Goals" for the technology. I'd guess that Intel does not care about this market. This is fine by me. I'd rather have the Myricom's and Dolphin's that live or die by their products to ensure the products are getting the attention they deserve. -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From walkev at presearch.com Thu Aug 28 12:49:25 2003 From: walkev at presearch.com (Vann H. Walke) Date: Thu, 28 Aug 2003 12:49:25 -0400 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <1062089290.4363.22.camel@localhost.localdomain> Hmmm... Not to throw water on hopes for parallelizing compilers and Intel supported parallel debuggers, but my guess is that Intel's move is much less revolutionary (but perhaps still important). Pallas's main HPC product is Vampir/Vampirtrace. These are performance analysis tools. As such they would only be peripherally useful for compiler design (perhaps to measure the effects of certain changes). Even for this purpose, Vampir/Vampirtrace doesn't provide the amount of detail that Intel's own V-Tune product does. For debugging, Pallas resells Etnus Totalview. For compiler options Pallas has the Intel compilers as well as PGI. As far as I can tell, Pallas doesn't do any significant independent development for these systems. So, what does the Pallas performance analysis product do that is important? Vampir/Vampirtrace allows the collection and display of data from a large number of programs running in parallel. Doing this well is not trivial. Time differences between machines must be taken into account. The tools must be able to handle a potentially huge amount of trace data (running a profiler on a 1000 process system is a much different animal from instrumenting a single process job). And, finally once all this data is collected it has to be presented in some way which can actually be of use. VA/VT is among the best available tools for this purpose. So, why would Intel want to acquire Pallas? First, they have a good product which can be sold at a high price. Combined with some Intel marketing they "should" be able to make money on the product. Second, Vampirtrace has the capability of using processor performance counters. By pushing the capabilities of VA/VT to work on Intel processors it promotes "lock-in" to Intel processors. In this way a developer using the Intel compilers, V-Tune for single process analysis, and Vampir for parallel profiling, wouldn't be likely to move to an AMD or Power platform. Is this a good thing? For the most part probably so. Intel should be able to help improve the Vampir software. Making it work even better on Intel processors doesn't really hurt things if you're using another system and might make things really nice for those of us on Intel hardware. Hopefully it's development on other systems won't languish. But, on the basis of this acquisition, I wouldn't hold my breath for parallel compilers or a full fledged Intel return to the HPC market. Vann Presearch, Inc. On Thu, 2003-08-28 at 12:05, Robert G. Brown wrote: > On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > > > to bypass the PCI bottleneck. Of course they could also be after the Pallas > > parallel debuggers to integrate into their compilers (like you mentioned) > > or perhaps to help with debugging threaded code in the hyperthreaded chips. > > Not that you mention it, this is a somewhat interesting development. > > I wonder what they're up to? > > My guess is something like this, given what pallas does, but if this is > the case, they may be preparing to attempt a task that has brought > strong programmers to their knees repeatedly in the past -- create a > true parallel compiler. A compiler where the thread library > transparently hides a network-based cluster, complete with migration and > load balancing. So the same code, written on top of a threading > library, could compile and run transparently on a single processor or a > multiprocessor or a distributed cluster. Or something. > > Hell, they're one of the few entities that can afford to tackle such a > blue-sky project, and just perhaps it is time for the project to be > tackled. At least they can attack it from both ends at once -- writing > the compiler at the same time they hack the hardware around. But > they're going to have create a hardware-level virtual interface for a > variety of IPC mechanism's for this to work, I think, in order to > instrument it locally and globally with no particular penalty either > way. Or, of course, buy SCI and start putting the chipset on their > motherboards as a standard feature on a custom bus. Myricom wouldn't > like that (or Dolphin if they went the other way), but it would make a > hell of a clustering motherboard. > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Thu Aug 28 13:53:32 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu, 28 Aug 2003 10:53:32 -0700 Subject: Intel acquiring Pallas In-Reply-To: <3F4E24D1.9010301@bellsouth.net> References: Message-ID: <5.2.0.9.2.20030828104843.03073620@mailhost4.jpl.nasa.gov> At 11:50 AM 8/28/2003 -0400, Jeffrey B. Layton wrote: >Robert G. Brown wrote: > >> >>Interesting. I'm trying to understand where and how this will help them >>-- more often than not it is a Bad Thing when hardware mfrs start >>dabbling in something higher than firmware or compilers -- Apple (and >>Next in its day) stands at one end of that path. >> >>It's especially curious given that Intel is already overwhelmingly >>dominant in the compute cluster arena (with only AMD a meaningful >>cluster competitor, and with apple and the PPC perhas a distant third). >>Not to mention the fact that if they REALLY wanted to get an edge in the >>compute cluster arena, they'd acquire somebody like Dolphin or Myricom. >> >> rgb > >Bob, > > Very interesting observation. I wonder if Intel doesn't have something >else up their sleeve? Could they be trying to get back into Supercomputer >game (not likely, but didn't they get some DoD money recently?). Could >they be helping with networking stuff (Intel has been discussing the next >generation networking stuff lately). Maybe some sort of TCP Offload >Engine? Maybe something with their new bus ( PCI Express?) They have also >created CSA (Communication Streaming Architecture) in their new chipset >to bypass the PCI bottleneck. Of course they could also be after the Pallas >parallel debuggers to integrate into their compilers (like you mentioned) >or perhaps to help with debugging threaded code in the hyperthreaded chips. > Not that you mention it, this is a somewhat interesting development. >I wonder what they're up to? > >Jeff Intel is making a big push into wireless and RF technology. A recent article ( I don't recall where exactly,but one of the trade rags..) mentioned that the mass market (consumer) don't seem to need much more processor crunch (at least until Windows XXXP comes out, then you'll need all that power just to apply the patches), but that they saw a big market opportunity in integrated wireless networking. Simultaneously, the generalized tanking of the telecom industry has meant that they can hire very skilled RF engineers for reasonable wages without having to compete against speculative piles of options, etc. (I suspect that there are some skilled RF engineers who are now older and wiser and less speculative, too!) We're talking about RF chip designers, as well as PWB layout, circuit designers, and antenna folks. It wouldn't surprise me that Intel is looking at other areas than traditional CPU and processor support kinds of roles. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at cert.ucr.edu Thu Aug 28 14:19:28 2003 From: glen at cert.ucr.edu (Glen Kaukola) Date: Thu, 28 Aug 2003 11:19:28 -0700 Subject: thrashing Message-ID: <3F4E47B0.3000805@cert.ucr.edu> Hi there, So for our newest simulations, we're working with a different domain, where each of our grid cells are much smaller, and so we're expecting the runs to take about 4 times longer. But actually they're taking around 40 times longer. I'm thinking this may have something to do with not having enough memory. The problem with this theory is that I'm not really sure how to tell if my machines are thrashing. On a desktop machine I can tell no problem, as the disk starts going crazy and the system pretty much grinds to a halt. But on a machine up in my server room on which I don't have any gui and where it's too loud to hear any disk activity, I'm really not sure how to tell whether it's thrashing or not. I mean, I can look at top, and free, and sar and everything doesn't look much different than when the other simulations were running, except for maybe 'sar -W', which is a little bit higher. Anyway, if someone could help me out with a way to determine without a doubt if my machines are thrashing or not, then I'd greatly appriciate it. Thanks for your time, Glen Kaukola _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Aug 28 14:35:49 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 28 Aug 2003 14:35:49 -0400 (EDT) Subject: perl-bproc bindings In-Reply-To: <1062080659.7565.0.camel@roughneck> Message-ID: On 28 Aug 2003, Nicholas Henke wrote: > On Thu, 2003-08-28 at 01:49, Donald Becker wrote: > > On Tue, 26 Aug 2003, Daniel Widyono wrote: > > > > > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc, > > > something more recent than spring of 2001? > > > > There are updated bindings, and a small example, at > > ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz > > Any chance you guys have updated python bindings as well? 0.9-8 is the current version -- which are you using? The last bugfix was logged in October of 2003. The next planned refresh has added bindings for the Beostat statistics library, Beomap job mapping and BBQ job scheduling systems. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at bellsouth.net Thu Aug 28 14:39:02 2003 From: laytonjb at bellsouth.net (Jeffrey B. Layton) Date: Thu, 28 Aug 2003 14:39:02 -0400 Subject: thrashing In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu> References: <3F4E47B0.3000805@cert.ucr.edu> Message-ID: <3F4E4C46.1010300@bellsouth.net> Use vmstat. Try something like, vmstat 1 10 (1 second delay, 10 repeats). Look in the columns labeled, swap si so that will give you the information you want. Good Luck! Jeff > Hi there, > > So for our newest simulations, we're working with a different domain, > where each of our grid cells are much smaller, and so we're expecting > the runs to take about 4 times longer. But actually they're taking > around 40 times longer. I'm thinking this may have something to do > with not having enough memory. The problem with this theory is that > I'm not really sure how to tell if my machines are thrashing. On a > desktop machine I can tell no problem, as the disk starts going crazy > and the system pretty much grinds to a halt. But on a machine up in > my server room on which I don't have any gui and where it's too loud > to hear any disk activity, I'm really not sure how to tell whether > it's thrashing or not. I mean, I can look at top, and free, and sar > and everything doesn't look much different than when the other > simulations were running, except for maybe 'sar -W', which is a little > bit higher. Anyway, if someone could help me out with a way to > determine without a doubt if my machines are thrashing or not, then > I'd greatly appriciate it. > > Thanks for your time, > Glen Kaukola > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Thu Aug 28 15:08:53 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: 28 Aug 2003 15:08:53 -0400 Subject: thrashing In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu> References: <3F4E47B0.3000805@cert.ucr.edu> Message-ID: <1062097733.8882.120.camel@protein.scalableinformatics.com> Hi Glen: Several methods. 1) vmstat vmstat 1 and look at the so/si columns, not to mention the r/b/w. 2) swapon -s to see the swap usage 3) top has an ok summary of the vm info 4) cat /proc/meminfo can give a crude picture of the memory system. On Thu, 2003-08-28 at 14:19, Glen Kaukola wrote: > Hi there, > > So for our newest simulations, we're working with a different domain, > where each of our grid cells are much smaller, and so we're expecting > the runs to take about 4 times longer. But actually they're taking > around 40 times longer. I'm thinking this may have something to do with > not having enough memory. The problem with this theory is that I'm not > really sure how to tell if my machines are thrashing. On a desktop > machine I can tell no problem, as the disk starts going crazy and the > system pretty much grinds to a halt. But on a machine up in my server > room on which I don't have any gui and where it's too loud to hear any > disk activity, I'm really not sure how to tell whether it's thrashing or > not. I mean, I can look at top, and free, and sar and everything > doesn't look much different than when the other simulations were > running, except for maybe 'sar -W', which is a little bit higher. > Anyway, if someone could help me out with a way to determine without a > doubt if my machines are thrashing or not, then I'd greatly appriciate it. > > Thanks for your time, > Glen Kaukola > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Aug 28 15:43:08 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 28 Aug 2003 12:43:08 -0700 Subject: 32bit slots and riser cards In-Reply-To: References: Message-ID: <20030828194308.GA1778@greglaptop.internal.keyresearch.com> On Thu, Aug 28, 2003 at 02:57:15PM +0200, Bogdan Costescu wrote: > the Myrinet card does not fit (keyed differently) into the 64bit PCI slots > or 64bit risers. Is there some solution to this problem or do I have to go > back to midi-tower cases ? Doesn't that mean that the Myrinet card is 5 volts, and you only have 3.3 volt PCI slots? It's such an old Myrinet card that I don't remember the details of when PCI got a 3.3 volt option. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 16:24:47 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 16:24:47 -0400 (EDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E44D8.7090200@wildopensource.com> Message-ID: On Thu, 28 Aug 2003, Stephen Gaudet wrote: > With Itanium2 this is not the case. Both Red Hat and SuSe have a fixed > cost of about $400.00, plus or minus a few dollars per system. > Therefore, due to this fixed cost, MOST people looking at a cluster > won't touch Itanium2. Steve, Are you suggesting RH has put together a package that is NOT GPL in any way that would significantly affect the 64 bit market? The kernel, the compiler, and damn near every package is GPL, much of it from Gnu itself. Am I crazy here? So I'm having a hard time seeing why one would HAVE to pay them $400/system for anything except perhaps proprietary non-GPL "advanced server" packages that almost certainly wouldn't be important to HPC cluster builders (and which they would have had to damn near develop in a sealed room to avoid incorporating GPL stuff in it anywhere). > Some white box resellers are looking at taking RH Advanced Server and > stripping it down and offering on their ia64 clusters. However, if > their not working with code lawyers, and paying very close attention to > copy right laws, they could end up with law suits down the road. If Red Hat isn't careful and not working very carefully with code lawyers, I think the reverse is a lot more likely, as Richard Stallman is known to take the Gnu Public License (free as in air at the source level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't "own" a hell of a lot of code in what they sell; the bulk of what they HAVE written is GPL derived and hence GPL by inheritance alone. The Open Source community would stomp anything out of line with hobnailed boots and club it until it stopped twitching... So although many a business may cheerfully pay $400/seat for advanced server because it is a cost and support model they are comfortable with, I don't see what there is to stop anyone from taking an advanced server copy (which necessarily either comes with src rpm's or makes them publically available somewhere), doing an rpmbuild on all the src rpm's (as if anyone would care that you went through an independent rebuild vs just used the distribution rpm's) and putting it on 1000 systems, or giving the sources to a friend, or even reselling a repackaging of the whole thing (as long as they don't call them Red Hat and as long as they omit any really proprietary non-GPL work). I even thought there were some people on the list who were using at least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm wrong...:-( rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Aug 28 16:23:37 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 28 Aug 2003 13:23:37 -0700 Subject: thrashing In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu> References: <3F4E47B0.3000805@cert.ucr.edu> Message-ID: <20030828202337.GA1964@greglaptop.internal.keyresearch.com> On Thu, Aug 28, 2003 at 11:19:28AM -0700, Glen Kaukola wrote: > I mean, I can look at top, and free, and sar and everything > doesn't look much different than when the other simulations were > running, A clear sign of thrashing is that the program should be getting a lot less than 100% of the cpu, because it's waiting for blocks from the disk. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Aug 28 16:31:20 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 28 Aug 2003 13:31:20 -0700 Subject: Change Management Control In-Reply-To: <20030826143305.24318.qmail@houston.wolf.com> References: <20030826143305.24318.qmail@houston.wolf.com> Message-ID: <20030828203120.GB1964@greglaptop.internal.keyresearch.com> > I am looking for information/sites and a formal best practice change > control for clusters. Can someone point me in the right direction? thx -ar Most clusters are a lot more informal, and don't have any kind of change control. I suspect your best bet would be to look at people involved in LISA: Large Installation Systems Administration. These guys are mostly commercial, and we (the HPC cluster community) don't talk to them much, even though we should. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 16:42:19 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 16:42:19 -0400 (EDT) Subject: thrashing In-Reply-To: <1062097733.8882.120.camel@protein.scalableinformatics.com> Message-ID: On 28 Aug 2003, Joseph Landman wrote: > Hi Glen: > > Several methods. > > 1) vmstat > > vmstat 1 > > and look at the so/si columns, not to mention the r/b/w. > > 2) swapon -s > > to see the swap usage > > 3) top > > has an ok summary of the vm info > > 4) cat /proc/meminfo > > can give a crude picture of the memory system. and if you want to watch pretty much all of this information in parallel (on all the systems at once) xmlsysd provides output fields with the information available in both vmstat and free (cat /proc/meminfo), so you can actually watch for swapping or paging or leaks on lots of systems at once in wulfstat. It easily handles updates with a 5 second granularity and can often manage 1 second (depending on your network and number of nodes and so forth). It's on the brahma website or linked under my own. I don't really provide a direct monitor of disk activity (partly out of irritation at custom-parsing the multidelimited "disk_io" field in /proc/stat), but if you were really interested in it I could probably bite the bullet and add a "disk" display that would work for up to four disks in a few hours of work. I'd guess that ganglia could also manage this sort of monitoring as well, but I don't use it (as I wrote my package before they started theirs by a year or three) so I don't know for sure. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 16:49:06 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 16:49:06 -0400 (EDT) Subject: 32bit slots and riser cards In-Reply-To: <20030828194308.GA1778@greglaptop.internal.keyresearch.com> Message-ID: On Thu, 28 Aug 2003, Greg Lindahl wrote: > On Thu, Aug 28, 2003 at 02:57:15PM +0200, Bogdan Costescu wrote: > > > the Myrinet card does not fit (keyed differently) into the 64bit PCI slots > > or 64bit risers. Is there some solution to this problem or do I have to go > > back to midi-tower cases ? > > Doesn't that mean that the Myrinet card is 5 volts, and you only have > 3.3 volt PCI slots? It's such an old Myrinet card that I don't > remember the details of when PCI got a 3.3 volt option. I think that this is right, Greg -- the keying is related to voltage. If your actual PCI slots are keyed correctly, they should be able to manage either voltage (IIRC), but you may have to replace the risers. We've had trouble getting risers that didn't key correctly or work correctly for one kind of card or the other (or one motherboard or another) in the past. It sounds like this might be your problem if you're referring to replacing the cases and not the motherboard itself. Look around and see if you find better/different risers -- there are a fair number of different kinds of risers out there, at least for 2U. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Aug 28 17:06:40 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 28 Aug 2003 14:06:40 -0700 (PDT) Subject: Intel acquiring Pallas In-Reply-To: Message-ID: The cost of the os, either of a blessed one, or a roll your own one hasn't been a significant factor in our reluctance to use Itanium II. The lack of commodity mainboards. The steep price of the cpu's. and lack of a clear view into intels product lifecycle for itaniumII. have been issues. Itanium II 1.3ghz 3mb cpu's have only recently arrived at ~$1400ea. opteron 244s are less than half that and that's before we put the rest of the system around it. we have some off-the-shelf compaq itanium boxes to evaluate but at around $8000 ea that sort of a non-starter. joelja On Thu, 28 Aug 2003, Robert G. Brown wrote: > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > > With Itanium2 this is not the case. Both Red Hat and SuSe have a fixed > > cost of about $400.00, plus or minus a few dollars per system. > > Therefore, due to this fixed cost, MOST people looking at a cluster > > won't touch Itanium2. > > Steve, > > Are you suggesting RH has put together a package that is NOT GPL in any > way that would significantly affect the 64 bit market? The kernel, the > compiler, and damn near every package is GPL, much of it from Gnu > itself. Am I crazy here? > > So I'm having a hard time seeing why one would HAVE to pay them > $400/system for anything except perhaps proprietary non-GPL "advanced > server" packages that almost certainly wouldn't be important to HPC > cluster builders (and which they would have had to damn near develop in > a sealed room to avoid incorporating GPL stuff in it anywhere). > > > Some white box resellers are looking at taking RH Advanced Server and > > stripping it down and offering on their ia64 clusters. However, if > > their not working with code lawyers, and paying very close attention to > > copy right laws, they could end up with law suits down the road. > > If Red Hat isn't careful and not working very carefully with code > lawyers, I think the reverse is a lot more likely, as Richard Stallman > is known to take the Gnu Public License (free as in air at the source > level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't > "own" a hell of a lot of code in what they sell; the bulk of what they > HAVE written is GPL derived and hence GPL by inheritance alone. The > Open Source community would stomp anything out of line with hobnailed > boots and club it until it stopped twitching... > > So although many a business may cheerfully pay $400/seat for advanced > server because it is a cost and support model they are comfortable with, > I don't see what there is to stop anyone from taking an advanced server > copy (which necessarily either comes with src rpm's or makes them > publically available somewhere), doing an rpmbuild on all the src rpm's > (as if anyone would care that you went through an independent rebuild vs > just used the distribution rpm's) and putting it on 1000 systems, or > giving the sources to a friend, or even reselling a repackaging of the > whole thing (as long as they don't call them Red Hat and as long as they > omit any really proprietary non-GPL work). > > I even thought there were some people on the list who were using at > least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm > wrong...:-( > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Aug 28 17:16:38 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 28 Aug 2003 14:16:38 -0700 (PDT) Subject: 32bit slots and riser cards In-Reply-To: Message-ID: On Thu, 28 Aug 2003, Bogdan Costescu wrote: > In planning for some new cluster nodes, I hit a small problem. I want: > - a modern mainboard for dual-Xeon (preferred) or dual-Athlon > - 1U or 2U rackmounted case > - to use an old Myrinet M2M-PCI32 (Lanai4) which requires a riser card for > installing in the case the 2U chassis should be trivial to solve for either 32bit or 64bit pci slots for 1U chassis... you need to pick "right motherboard" that works with the chassis ... and pci cards ( you cannot do a mix and match with any motherboard ) if you want performance out of your pci card, you will have to use 64bit pci slots or 32bit pci slot - but the riser card should be one piece instead of the whacky non-conforming wires between the "2 sections of the pci riser" 32 and 64 bit pci riser cards (not cheap but lot better than most others) http://www.adexelec.com c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From johnb at quadrics.com Thu Aug 28 17:22:13 2003 From: johnb at quadrics.com (John Brookes) Date: Thu, 28 Aug 2003 22:22:13 +0100 Subject: thrashing Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E2C1@stegosaurus.bristol.quadrics.com> The test I would probably suggest to someone whose machine I had no access to is 'swapoff -a'. It's not big and it's not clever, but largely removes the need for value judgements: if it bombs in an OOM style, you were most probably thrashing. Just a thought. Cheers, John Brookes Quadrics > -----Original Message----- > From: Glen Kaukola [mailto:glen at mail.cert.ucr.edu] > Sent: 28 August 2003 19:19 > To: beowulf at beowulf.org > Subject: thrashing > > > Hi there, > > So for our newest simulations, we're working with a different domain, > where each of our grid cells are much smaller, and so we're expecting > the runs to take about 4 times longer. But actually they're taking > around 40 times longer. I'm thinking this may have something > to do with > not having enough memory. The problem with this theory is > that I'm not > really sure how to tell if my machines are thrashing. On a desktop > machine I can tell no problem, as the disk starts going crazy and the > system pretty much grinds to a halt. But on a machine up in > my server > room on which I don't have any gui and where it's too loud to > hear any > disk activity, I'm really not sure how to tell whether it's > thrashing or > not. I mean, I can look at top, and free, and sar and everything > doesn't look much different than when the other simulations were > running, except for maybe 'sar -W', which is a little bit higher. > Anyway, if someone could help me out with a way to determine > without a > doubt if my machines are thrashing or not, then I'd greatly > appriciate it. > > Thanks for your time, > Glen Kaukola > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From exa at kablonet.com.tr Thu Aug 28 13:56:53 2003 From: exa at kablonet.com.tr (Eray Ozkural) Date: Thu, 28 Aug 2003 20:56:53 +0300 Subject: Filesystem In-Reply-To: References: Message-ID: <200308282056.54106.exa@kablonet.com.tr> On Saturday 02 August 2003 05:45, Alvin Oga wrote: > i think ext3 is better than reiserfs > > i think ext3 is not any better than ext2 in terms > of somebody hitting pwer/reset w/o proper shutdown > - i always allow it to run e2fsck when it does > an unclean shutdown ... > > - yes ext3 will timeout and continue and restore from > backups but ... am paranoid about the underlying ext2 > getting corrupted by random power off and resets > I basically think ext3 and ext2 are a joke and we use XFS on the nodes with no performance problem. Excellent reliability! Regards, -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sp at scali.com Thu Aug 28 16:55:51 2003 From: sp at scali.com (Steffen Persvold) Date: Thu, 28 Aug 2003 22:55:51 +0200 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <3F4E6C57.9030406@scali.com> Robert G. Brown wrote: > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > >>With Itanium2 this is not the case. Both Red Hat and SuSe have a fixed >>cost of about $400.00, plus or minus a few dollars per system. >>Therefore, due to this fixed cost, MOST people looking at a cluster >>won't touch Itanium2. > > > Steve, > > Are you suggesting RH has put together a package that is NOT GPL in any > way that would significantly affect the 64 bit market? The kernel, the > compiler, and damn near every package is GPL, much of it from Gnu > itself. Am I crazy here? > > So I'm having a hard time seeing why one would HAVE to pay them > $400/system for anything except perhaps proprietary non-GPL "advanced > server" packages that almost certainly wouldn't be important to HPC > cluster builders (and which they would have had to damn near develop in > a sealed room to avoid incorporating GPL stuff in it anywhere). > > >>Some white box resellers are looking at taking RH Advanced Server and >>stripping it down and offering on their ia64 clusters. However, if >>their not working with code lawyers, and paying very close attention to >>copy right laws, they could end up with law suits down the road. > > > If Red Hat isn't careful and not working very carefully with code > lawyers, I think the reverse is a lot more likely, as Richard Stallman > is known to take the Gnu Public License (free as in air at the source > level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't > "own" a hell of a lot of code in what they sell; the bulk of what they > HAVE written is GPL derived and hence GPL by inheritance alone. The > Open Source community would stomp anything out of line with hobnailed > boots and club it until it stopped twitching... > > So although many a business may cheerfully pay $400/seat for advanced > server because it is a cost and support model they are comfortable with, > I don't see what there is to stop anyone from taking an advanced server > copy (which necessarily either comes with src rpm's or makes them > publically available somewhere), doing an rpmbuild on all the src rpm's > (as if anyone would care that you went through an independent rebuild vs > just used the distribution rpm's) and putting it on 1000 systems, or > giving the sources to a friend, or even reselling a repackaging of the > whole thing (as long as they don't call them Red Hat and as long as they > omit any really proprietary non-GPL work). > > I even thought there were some people on the list who were using at > least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm > wrong...:-( > Robert, AFAIK, there is no "proprietary non-GPL" work in RedHat's Enterprise Linux line. I think the price is so high because of the support level you're buing. All the source for RHEL, either 32bit or 64bit is available on their ftp sites for download. And as long as they do that I don't think they're violating GPL, but I might be wrong (as I'm not a lawyers, but I'm sure RH has plenty of them). And actually, according to their web site, the cheapest (most suitable cluster) release for ITP2; RHEL WS (workstation) is $792, AS (advanced server) is $1992 for standard edition and $2988 for premium edition. Regards, -- Steffen Persvold ,,, mailto: sp at scali.com Senior Software Engineer (o-o) http://www.scali.com -----------------------------oOO-(_)-OOo----------------------------- Scali AS, PObox 150, Oppsal, N-0619 Oslo, Norway, Tel: +4792484511 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Aug 28 18:11:19 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 28 Aug 2003 15:11:19 -0700 (PDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E6C57.9030406@scali.com> Message-ID: Stephen... anyone who wants can grab the entire srpms dir for AS and build it. The only way they'll end up with a lawsuit is if they represent the result as official suppoprt redhat linux AS... If you like you can pick it up from the RH mirrors including mine. > > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > > >>Some white box resellers are looking at taking RH Advanced Server and > >>stripping it down and offering on their ia64 clusters. However, if > >>their not working with code lawyers, and paying very close attention to > >>copy right laws, they could end up with law suits down the road. > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Thu Aug 28 18:25:42 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 29 Aug 2003 00:25:42 +0200 (CEST) Subject: 32bit slots and riser cards In-Reply-To: Message-ID: On Thu, 28 Aug 2003, Alvin Oga wrote: > the 2U chassis should be trivial to solve for either 32bit or 64bit pci > slots Well, maybe trivial for you who do this for a living :-) > for 1U chassis... you need to pick "right motherboard" that works with the > chassis ... and pci cards > ( you cannot do a mix and match with any motherboard ) Sure, but I was looking for example at the Intel offerings which pair dual Xeon mainboards with 1U/2U cases that are certified to work together. > if you want performance out of your pci card, I know that this 32bit/33MHz card looks slow by today's standards, but I think that it can still provide lower latency than e1000 or tg3 -driven cards, so Id' like to continue to use them. > 32 and 64 bit pci riser cards (not cheap but lot better than most others) > http://www.adexelec.com Many thanks for this address. I did try to use google before writting to the list, but I came with all sorts of shops, but nothing with good descriptions, which is what I needed most. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Thu Aug 28 18:44:24 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 29 Aug 2003 00:44:24 +0200 (CEST) Subject: 32bit slots and riser cards In-Reply-To: <20030828194308.GA1778@greglaptop.internal.keyresearch.com> Message-ID: On Thu, 28 Aug 2003, Greg Lindahl wrote: > Doesn't that mean that the Myrinet card is 5 volts, and you only have > 3.3 volt PCI slots ? Bingo. This is exactly what most people that wrote to me off-list probably missed, although I did mention that it doesn't fit because of a different keying - I should have probably mentioned this explicitly. All the 32bit slots that I've seen on these mainboards allow inserting such cards, which makes me believe that they support both 5V and 3.3V cards; but the 64bit slots are 3.3V only. I don't have much experience with rackmounted systems, which it's probably evident, so I didn't know what to expect from a riser. Thanks to Alvin Oga's mention of Adexelec site, I was able to find out that the risers exist in many different variations. For example, I was wondering if such a riser exist that would allow mounting of the card from the edge toward the middle of the mainboard, while most common way is the other way around - I still need to find out if the case allows fixing of the card the other way around, but this is an easier problem to solve. One other thing that turned me off was that in a system composed of only Intel components (SE7501WV2 mainboard and SR2300 case) the 64bit PCI slots on the mainboard allow inserting of the Myrinet card (but didn't try to see if it works), while the riser cards that came with the case do not, allowing only 3.3V ones - so the riser imposes additional limitations... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gotero at linuxprophet.com Thu Aug 28 19:46:29 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Thu, 28 Aug 2003 16:46:29 -0700 Subject: Intel acquiring Pallas In-Reply-To: Message-ID: Joel- Have you actually built RH AS from scratch using their SRPMS? Or do you know anyone that has? I'm very interested in doing this but I heard there were some pretty significant obstacles along the lines of package dependencies. Glen On Thursday, August 28, 2003, at 03:11 PM, Joel Jaeggli wrote: > Stephen... anyone who wants can grab the entire srpms dir for AS and > build > it. The only way they'll end up with a lawsuit is if they represent > the > result as official suppoprt redhat linux AS... > > If you like you can pick it up from the RH mirrors including mine. > >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote: >>> >>>> Some white box resellers are looking at taking RH Advanced Server >>>> and >>>> stripping it down and offering on their ia64 clusters. However, if >>>> their not working with code lawyers, and paying very close >>>> attention to >>>> copy right laws, they could end up with law suits down the road. >>> > > -- > ----------------------------------------------------------------------- > --- > Joel Jaeggli Unix Consulting > joelja at darkwing.uoregon.edu > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F > 56B2 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From walkev at presearch.com Thu Aug 28 19:57:53 2003 From: walkev at presearch.com (Vann H. Walke) Date: Thu, 28 Aug 2003 19:57:53 -0400 Subject: Intel acquiring Pallas (Redhat AS Rebuild) In-Reply-To: References: Message-ID: <1062115073.7007.2.camel@localhost.localdomain> Haven't tried it but... http://www2.uibk.ac.at/zid/software/unix/linux/rhel-rebuild.htm http://www.uibk.ac.at/zid/software/unix/linux/rhel-rebuild-l.html Vann On Thu, 2003-08-28 at 19:46, Glen Otero wrote: > Joel- > > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? I'm very interested in doing this but I heard > there were some pretty significant obstacles along the lines of package > dependencies. > > Glen > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Aug 28 20:54:43 2003 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 29 Aug 2003 10:54:43 +1000 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <200308291054.45059.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 29 Aug 2003 09:46 am, Glen Otero wrote: > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? - From the Rocks Cluster Distribution website: http://www.rocksclusters.org/Rocks/ [...] Rocks 2.3.2 IA64 is based on Red Hat Advanced Workstation 2.1 recompiled from Red Hat's publicly available source RPMs. [...] - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/TqRTO2KABBYQAh8RAnd4AJkBCFmq3tyb97EgHvg5x9mrsqkGGQCghGqG 9cF9eAKLTHD6lQS4kZGtg0A= =WVIz -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Thu Aug 28 20:59:57 2003 From: timm at fnal.gov (Steven Timm) Date: Thu, 28 Aug 2003 19:59:57 -0500 Subject: Intel acquiring Pallas In-Reply-To: Message-ID: The ROCKS distribution at www.rocksclusters.org claims to have done so for the IA64 architecture.. I have not tested it myself. Your mileage may vary. Steve ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Core Support Services Dept. Assistant Group Leader, Scientific Computing Support Group Lead of Computing Farms Team On Thu, 28 Aug 2003, Glen Otero wrote: > Joel- > > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? I'm very interested in doing this but I heard > there were some pretty significant obstacles along the lines of package > dependencies. > > Glen > > On Thursday, August 28, 2003, at 03:11 PM, Joel Jaeggli wrote: > > > Stephen... anyone who wants can grab the entire srpms dir for AS and > > build > > it. The only way they'll end up with a lawsuit is if they represent > > the > > result as official suppoprt redhat linux AS... > > > > If you like you can pick it up from the RH mirrors including mine. > > > >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote: > >>> > >>>> Some white box resellers are looking at taking RH Advanced Server > >>>> and > >>>> stripping it down and offering on their ia64 clusters. However, if > >>>> their not working with code lawyers, and paying very close > >>>> attention to > >>>> copy right laws, they could end up with law suits down the road. > >>> > > > > -- > > ----------------------------------------------------------------------- > > --- > > Joel Jaeggli Unix Consulting > > joelja at darkwing.uoregon.edu > > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F > > 56B2 > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nfalano at hotmail.com Thu Aug 28 21:29:34 2003 From: nfalano at hotmail.com (Norman Alano) Date: Fri, 29 Aug 2003 09:29:34 +0800 Subject: mpich Message-ID: greetings ! i already installed mpich... but the problem is whenever i run an application for instant the examples in the mpich the graphics wont show.... how can i configure so that i can run the application with graphic? cheers norman _________________________________________________________________ The new MSN 8: advanced junk mail protection and 2 months FREE* http://join.msn.com/?page=features/junkmail _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Aug 29 00:04:27 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 28 Aug 2003 21:04:27 -0700 (PDT) Subject: Intel acquiring Pallas In-Reply-To: Message-ID: I've built almost all of it with the exception of gtk and kde related stuff which was outside the scope of my interest, on a redhat 7.2 box... I wouldn't try it on a 9 host. joelja On Thu, 28 Aug 2003, Glen Otero wrote: > Joel- > > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? I'm very interested in doing this but I heard > there were some pretty significant obstacles along the lines of package > dependencies. > > Glen > > On Thursday, August 28, 2003, at 03:11 PM, Joel Jaeggli wrote: > > > Stephen... anyone who wants can grab the entire srpms dir for AS and > > build > > it. The only way they'll end up with a lawsuit is if they represent > > the > > result as official suppoprt redhat linux AS... > > > > If you like you can pick it up from the RH mirrors including mine. > > > >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote: > >>> > >>>> Some white box resellers are looking at taking RH Advanced Server > >>>> and > >>>> stripping it down and offering on their ia64 clusters. However, if > >>>> their not working with code lawyers, and paying very close > >>>> attention to > >>>> copy right laws, they could end up with law suits down the road. > >>> > > > > -- > > ----------------------------------------------------------------------- > > --- > > Joel Jaeggli Unix Consulting > > joelja at darkwing.uoregon.edu > > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F > > 56B2 > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sgaudet at wildopensource.com Fri Aug 29 09:52:31 2003 From: sgaudet at wildopensource.com (Stephen Gaudet) Date: Fri, 29 Aug 2003 09:52:31 -0400 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <3F4F5A9F.5000809@wildopensource.com> Robert, and everyone else, To be clear on this without breaking NDA's see below; > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > >>With Itanium2 this is not the case. Both Red Hat and SuSe have a fixed >>cost of about $400.00, plus or minus a few dollars per system. >>Therefore, due to this fixed cost, MOST people looking at a cluster >>won't touch Itanium2. > > > Steve, > > Are you suggesting RH has put together a package that is NOT GPL in any > way that would significantly affect the 64 bit market? The kernel, the > compiler, and damn near every package is GPL, much of it from Gnu > itself. Am I crazy here? > > So I'm having a hard time seeing why one would HAVE to pay them > $400/system for anything except perhaps proprietary non-GPL "advanced > server" packages that almost certainly wouldn't be important to HPC > cluster builders (and which they would have had to damn near develop in > a sealed room to avoid incorporating GPL stuff in it anywhere). > > >>Some white box resellers are looking at taking RH Advanced Server and >>stripping it down and offering on their ia64 clusters. However, if >>their not working with code lawyers, and paying very close attention to >>copy right laws, they could end up with law suits down the road. I can't really comment here on what I hear resellers looking to do. > If Red Hat isn't careful and not working very carefully with code > lawyers, I think the reverse is a lot more likely, as Richard Stallman > is known to take the Gnu Public License (free as in air at the source > level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't > "own" a hell of a lot of code in what they sell; the bulk of what they > HAVE written is GPL derived and hence GPL by inheritance alone. The > Open Source community would stomp anything out of line with hobnailed > boots and club it until it stopped twitching... > > So although many a business may cheerfully pay $400/seat for advanced > server because it is a cost and support model they are comfortable with, > I don't see what there is to stop anyone from taking an advanced server > copy (which necessarily either comes with src rpm's or makes them > publically available somewhere), doing an rpmbuild on all the src rpm's > (as if anyone would care that you went through an independent rebuild vs > just used the distribution rpm's) and putting it on 1000 systems, or > giving the sources to a friend, or even reselling a repackaging of the > whole thing (as long as they don't call them Red Hat and as long as they > omit any really proprietary non-GPL work). > > I even thought there were some people on the list who were using at > least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm > wrong...:-( > In regards to the high-performance/technical computing space. People buy Red Hat Advanced Server and SuSE Linux Enterprise Server because that's what the ISVs support (Oracle, DB2, Sybase, WebLogic etc.). RHAS and SLES are primarity targeted at the commercial computing space. In the HPC space, there is a void in the sense that Red Hat doesn't have a "community" distribution for IA-64 anymore (7.2 was the last). Don't know whether SuSE make their bits readily available. There are, however, several free alternatives: - Debian, for instance, is available for all HP hardware (as it is the internal software development vehicle at HP). - MSC Linux is also available for download (www.msclinux.com). - Rocks (www.rocksclusters.org) is a stripped and shipped Red Hat Advanced Server 2.1 for IA-64. So it's perfectly reasonable to use any of the above - as long as you don't require technical support (something WOS could provide, though). The strip and ship game works for now. However, given the increasing customization and branding done by Red Hat in later releases (8 and 9, also in RHAS 3) it is probably not going to be feasible to keep doing this going forward. Red Hat's brand is very strong and consequently it's all over the place in their products now. So I guesstimate that debranding is going to be at least an order of magnitude harder for RHAS 3. And just to clear up confusion. Here's the scoop with RHAS, availabity, support agreements, etc.: 1. Red Hat has decided *not* to make binaries/ISO images of RHAS available for download. Given that the distribution is covered by the GPL, *nothing* prevents somebody else from making it available. It is out there on the net if you look hard enough. 2. Again, being covered by the GPL, nothing prevents you from distributing it in unmodified form. It's perfectly legal to burn CDs and give them to customers. 3. If you modify the product in any way you invalidate the branding on RHAS as a whole, and you can no longer call the result RHAS without infringing Red Hat's trademarks. 4. If you buy RHAS from Red Hat you have to sign a service level agreement. This agreement is not restricting distribution of the RHAS binaries or source. It is a service level agreement between you and Red Hat (which you unfortunately have to sign to get access to the product in the first place). 5. One of the clauses in the SLA states that you agree to pay a support fee for each system you use RHAS on (and you grant RH the right to audit your network). If you choose not to comply with this clause, Red Hat will declare the service agreement null and void and you will no longer have access to patches and security fixes. 6. Given that the update packages are covered by the GPL, *nothing* prevents a receiver of said packages to make them available for download on the Internet. Red Hat can do *nothing* to prevent further distribution. IOW, nothing prevents you from buying one license and make the updates available to the rest of the world. Red Hat can, however, potentially decide not to provide you with future updates if you do this. This is a bit unclear in the SLA. Ok. So, executive summary: Red Hat are using a service customer level agreement to limit spreading of binary versions of RHAS. Given that RHAS is covered by the GPL, they cannot prevent distribution. Their only rebuttal will be refusal of further updates as per the SLA. But in the case of technical computing it isn't really that important whether the product is called RHAS, Rocks or HP Linux for HPC. They are all functionally identical. mkp, Resident Paralegal -- Martin K. Petersen Wild Open Source, Inc. mkp at wildopensource.com http://www.wildopensource.com/ BTW: http://www.msclinux.com/ has been shut down. -- Steve Gaudet Wild Open Source (home office) ---------------------- Bedford, NH 03110 pH:603-488-1599 cell:603-498-1600 http://www.wildopensource.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From asabigue at fing.edu.uy Fri Aug 29 04:06:38 2003 From: asabigue at fing.edu.uy (Ariel Sabiguero) Date: Fri, 29 Aug 2003 11:06:38 +0300 Subject: European Commission Patentability rules Message-ID: <3F4F098E.7080202@fing.edu.uy> Dear all: I have not seen comments on the list regarding to this subject. I know that this might be considered political and off-topic but I believe that most of our (beowulf) software technology is Open/Free and that the results of further regulations might affect our work. Sorry for the noise for those of you who already knew this. Regards Ariel On September 1st the European Commission is going to vote a revised version of the European Patentability rules. The proposed revision contains a set of serious challenges to Open Source development since regulation regarding software patents will be broadly extended and might forbid independent development of innovative (Open Source and not) software-based solutions. The European Open Source community is very concerned about the upcoming new regulation and has organized a demo protest for August 27, asking Open Source supporting sites to change their home pages to let everyone know what is going on at the European Parliament. For further information please see http://swpat.ffii.org and http://petition.eurolinux.org. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Aug 29 10:17:09 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 29 Aug 2003 10:17:09 -0400 (EDT) Subject: 32bit slots and riser cards In-Reply-To: Message-ID: On Fri, 29 Aug 2003, Bogdan Costescu wrote: > One other thing that turned me off was that in a system composed of only > Intel components (SE7501WV2 mainboard and SR2300 case) the 64bit PCI slots > on the mainboard allow inserting of the Myrinet card (but didn't try to > see if it works), while the riser cards that came with the case do not, > allowing only 3.3V ones - so the riser imposes additional limitations... One last possible solution you can consider if you're using 2U cases and don't mind ugly is that MANY of the cards you might want to add nowadays are half-height cards on full height backplates. Usually the backplate is held on by two little screws. The half height cards will snap into a regular PCI slot normally (vertically) and still permit the case to close with no riser at all. The two negatives are there there are no "half height riser backplates" that I know of, so the back of each chassis will be open to the air, which may or may not screw around with cooling airflow in negative ways, and the fact that you can't "screw the cards down". Both of these can be solved (or ignored) with a teeny bit of effort, although you'll probably prefer to just get a riser that meets your needs -- there are risers with a key that fits in the AGP slot, risers with 32 bit keys, risers with 64 bit keys.. shop around. Be aware that some of the risers you can buy don't work properly (why I can't say, given that they appear to be little more than bus extenders with keys to grab power and timing/address lines). At a guess this won't help you with an old myrinet card as it is probably full height, but if you get desperate and it's not, you could likely make this work. rgb > > -- > Bogdan Costescu > > IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen > Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY > Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 > E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Aug 29 11:10:43 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 29 Aug 2003 17:10:43 +0200 (CEST) Subject: 32bit slots and riser cards In-Reply-To: Message-ID: On Fri, 29 Aug 2003, Robert G. Brown wrote: > is that MANY of the cards you might want to add nowadays are half-height > cards on full height backplates. Nice try :-) It's a full-height card. And buying a taller case for each node with these Myrinet cards to allow vertical mounting would make me start looking for an 100U rack :-) -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lehi.gracia at amd.com Fri Aug 29 10:48:55 2003 From: lehi.gracia at amd.com (lehi.gracia at amd.com) Date: Fri, 29 Aug 2003 09:48:55 -0500 Subject: Intel acquiring Pallas Message-ID: <99F2150714F93F448942F9A9F112634C07BE62F3@txexmtae.amd.com> >6. Given that the update packages are covered by the GPL, *nothing* > prevents a receiver of said packages to make them available for > download on the Internet. Red Hat can do *nothing* to prevent > further distribution. IOW, nothing prevents you from buying one > license and make the updates available to the rest of the world. > > Red Hat can, however, potentially decide not to provide you with > future updates if you do this. This is a bit unclear in the SLA. Correct me if I'm wrong, I though part of the GPL was that you have to give the source code to anyone that asks for it, is it not? Per section 2b. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. http://www.gnu.org/copyleft/gpl.html?cid=6 They still keep patches on their web site do they not? Lehi Gracia -----Original Message----- From: Stephen Gaudet [mailto:sgaudet at wildopensource.com] Sent: Friday, August 29, 2003 8:53 AM To: Robert G. Brown Cc: Rocky McGaugh; Jeffrey B. Layton; beowulf at beowulf.org Subject: Re: Intel acquiring Pallas Robert, and everyone else, To be clear on this without breaking NDA's see below; > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > >>With Itanium2 this is not the case. Both Red Hat and SuSe have a >>fixed >>cost of about $400.00, plus or minus a few dollars per system. >>Therefore, due to this fixed cost, MOST people looking at a cluster >>won't touch Itanium2. > > > Steve, > > Are you suggesting RH has put together a package that is NOT GPL in > any way that would significantly affect the 64 bit market? The > kernel, the compiler, and damn near every package is GPL, much of it > from Gnu itself. Am I crazy here? > > So I'm having a hard time seeing why one would HAVE to pay them > $400/system for anything except perhaps proprietary non-GPL "advanced > server" packages that almost certainly wouldn't be important to HPC > cluster builders (and which they would have had to damn near develop > in a sealed room to avoid incorporating GPL stuff in it anywhere). > > >>Some white box resellers are looking at taking RH Advanced Server and >>stripping it down and offering on their ia64 clusters. However, if >>their not working with code lawyers, and paying very close attention to >>copy right laws, they could end up with law suits down the road. I can't really comment here on what I hear resellers looking to do. > If Red Hat isn't careful and not working very carefully with code > lawyers, I think the reverse is a lot more likely, as Richard Stallman > is known to take the Gnu Public License (free as in air at the source > level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't > "own" a hell of a lot of code in what they sell; the bulk of what they > HAVE written is GPL derived and hence GPL by inheritance alone. The > Open Source community would stomp anything out of line with hobnailed > boots and club it until it stopped twitching... > > So although many a business may cheerfully pay $400/seat for advanced > server because it is a cost and support model they are comfortable > with, I don't see what there is to stop anyone from taking an advanced > server copy (which necessarily either comes with src rpm's or makes > them publically available somewhere), doing an rpmbuild on all the src > rpm's (as if anyone would care that you went through an independent > rebuild vs just used the distribution rpm's) and putting it on 1000 > systems, or giving the sources to a friend, or even reselling a > repackaging of the whole thing (as long as they don't call them Red > Hat and as long as they omit any really proprietary non-GPL work). > > I even thought there were some people on the list who were using at > least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm > wrong...:-( > In regards to the high-performance/technical computing space. People buy Red Hat Advanced Server and SuSE Linux Enterprise Server because that's what the ISVs support (Oracle, DB2, Sybase, WebLogic etc.). RHAS and SLES are primarity targeted at the commercial computing space. In the HPC space, there is a void in the sense that Red Hat doesn't have a "community" distribution for IA-64 anymore (7.2 was the last). Don't know whether SuSE make their bits readily available. There are, however, several free alternatives: - Debian, for instance, is available for all HP hardware (as it is the internal software development vehicle at HP). - MSC Linux is also available for download (www.msclinux.com). - Rocks (www.rocksclusters.org) is a stripped and shipped Red Hat Advanced Server 2.1 for IA-64. So it's perfectly reasonable to use any of the above - as long as you don't require technical support (something WOS could provide, though). The strip and ship game works for now. However, given the increasing customization and branding done by Red Hat in later releases (8 and 9, also in RHAS 3) it is probably not going to be feasible to keep doing this going forward. Red Hat's brand is very strong and consequently it's all over the place in their products now. So I guesstimate that debranding is going to be at least an order of magnitude harder for RHAS 3. And just to clear up confusion. Here's the scoop with RHAS, availabity, support agreements, etc.: 1. Red Hat has decided *not* to make binaries/ISO images of RHAS available for download. Given that the distribution is covered by the GPL, *nothing* prevents somebody else from making it available. It is out there on the net if you look hard enough. 2. Again, being covered by the GPL, nothing prevents you from distributing it in unmodified form. It's perfectly legal to burn CDs and give them to customers. 3. If you modify the product in any way you invalidate the branding on RHAS as a whole, and you can no longer call the result RHAS without infringing Red Hat's trademarks. 4. If you buy RHAS from Red Hat you have to sign a service level agreement. This agreement is not restricting distribution of the RHAS binaries or source. It is a service level agreement between you and Red Hat (which you unfortunately have to sign to get access to the product in the first place). 5. One of the clauses in the SLA states that you agree to pay a support fee for each system you use RHAS on (and you grant RH the right to audit your network). If you choose not to comply with this clause, Red Hat will declare the service agreement null and void and you will no longer have access to patches and security fixes. 6. Given that the update packages are covered by the GPL, *nothing* prevents a receiver of said packages to make them available for download on the Internet. Red Hat can do *nothing* to prevent further distribution. IOW, nothing prevents you from buying one license and make the updates available to the rest of the world. Red Hat can, however, potentially decide not to provide you with future updates if you do this. This is a bit unclear in the SLA. Ok. So, executive summary: Red Hat are using a service customer level agreement to limit spreading of binary versions of RHAS. Given that RHAS is covered by the GPL, they cannot prevent distribution. Their only rebuttal will be refusal of further updates as per the SLA. But in the case of technical computing it isn't really that important whether the product is called RHAS, Rocks or HP Linux for HPC. They are all functionally identical. mkp, Resident Paralegal -- Martin K. Petersen Wild Open Source, Inc. mkp at wildopensource.com http://www.wildopensource.com/ BTW: http://www.msclinux.com/ has been shut down. -- Steve Gaudet Wild Open Source (home office) ---------------------- Bedford, NH 03110 pH:603-488-1599 cell:603-498-1600 http://www.wildopensource.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Aug 29 11:17:04 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 29 Aug 2003 11:17:04 -0400 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <1062170224.9421.4.camel@roughneck> On Thu, 2003-08-28 at 19:46, Glen Otero wrote: > Joel- > > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? I'm very interested in doing this but I heard > there were some pretty significant obstacles along the lines of package > dependencies. > The links to the rhel-rebuild howto and mailing list are enought to get this done -- I just did 2.1 ES ( why bother with spending more for AS ? ). We purchased one copy of ES, and I used that to do the rebuild. Of course, it is not completely automatic, but there are only a handfull of packages that do not build without a bit of tweaking. As far as pkg dependencies go, it is _much_ easier to build on a similar system. Now for the $10K question -- are there any reasons that I ( or someone else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It of course still has the RH branding all over it, but it could be distributed being called 'Nics Fun RH clone', or something similar. Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From robert at yu.org Fri Aug 29 12:09:46 2003 From: robert at yu.org (Robert K. Yu) Date: Fri, 29 Aug 2003 09:09:46 -0700 (PDT) Subject: beowulf to a good home Message-ID: <20030829160946.6897.qmail@web40904.mail.yahoo.com> Hi, I have the following: 16 machines 450 MHz dual Celeron each (i.e. 32 CPU) 128M memory each 100BaseT switch 6G drive each I would like to donate these machines and see them put to good use. Pick up from the San Francisco south bay area, or you pay for shipment. Thanks. -Robert ===== Robert K. Yu mailto:robert at yu.org _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Aug 29 12:19:31 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 29 Aug 2003 12:19:31 -0400 (EDT) Subject: Intel acquiring Pallas In-Reply-To: <99F2150714F93F448942F9A9F112634C07BE62F3@txexmtae.amd.com> Message-ID: On Fri, 29 Aug 2003 lehi.gracia at amd.com wrote: Red Hat can do *nothing* to prevent > > further distribution. IOW, nothing prevents you from buying one > > license and make the updates available to the rest of the world. > > > > Red Hat can, however, potentially decide not to provide you with > > future updates if you do this. This is a bit unclear in the SLA. > > Correct me if I'm wrong, I though part of the GPL was that you have to > give the source code to anyone that asks for it, is it not? Per section > 2b. No, section 2b states that you must propage the license, not make the source code available to any third party. Section 3 covers distribution and redistribution. You don't have to make the source code available to an arbitrary third party, just those with the offers in 3b or 3c. For distributions Red Hat ship with the source code, they have no further obligations. > >6. Given that the update packages are covered by the GPL, *nothing* > > prevents a receiver of said packages to make them available for > > download on the Internet. For most individual packages, correct. And the following discussion covers individual packages, not the distribution as a whole. If the package contains a trademarked logo embedded with GPL code they Should grant the right to use a package unmodified, including the logo (The GPL doesn't explicitly cover the case of logos, but a reasonable reading is that if Red Hat itself packages up the logo you have the right of unmodified distribution.) May require you to remove the logo with any modificatation The entire distribution is another issue. It may be protected by copyright on the collection. The may restrict distribution of packages consisting of Red Hat branding and logos, which means some level of content reassembly is necessary to distribute. Red Hat may also insist that you not misrepresent a copy as a Red Hat product. This is an area where it's difficult to generalize. They may require removing packages/elements consisting of just logos or Red Hat documentation. And third parties can use the trademark name where it's descriptive, but not misleading. Consider the difference between "Chevrolet Service Station" and "Service Station for Chevrolets" [[ Native English speakers immediately understand the difference, and think of this rule as just part of the language. But you will not find this legally-inculcated distinction as a part of the grammer. ]] -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Fri Aug 29 13:16:06 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Fri, 29 Aug 2003 10:16:06 -0700 (PDT) Subject: IBM releases C/C++/F90 compilers - optimized for G5 Message-ID: <20030829171606.62071.qmail@web11408.mail.yahoo.com> Free download: http://www-3.ibm.com/software/awdtools/ccompilers/ http://www-3.ibm.com/software/awdtools/fortran/ Rayson __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Aug 29 14:52:19 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Fri, 29 Aug 2003 11:52:19 -0700 (PDT) Subject: Intel acquiring Pallas In-Reply-To: <5E9EB8D6-DA4C-11D7-A174-000393911A90@linuxprophet.com> Message-ID: On Fri, 29 Aug 2003, Glen Otero wrote: > > You can redistribute it as long as it doesn't have RH all over it and > you don't use the RH name while endorsing/promoting it. I suppose you > could say it's RH compliant and built from RH srpms. The loop hole that > RH is taking advantage of is the fact that they are compliant with the > GPL as long as they release the sources. They comply with the GPL by > releasing the sources in srpm format, and so technically do not have to > make the isos freely available. By making it slightly difficult to > build your own distro, and not offering support to those who do, RH is > coaxing people to take the path of least resistance (wrt effort) and > buy licenses. I wouldn't really consider it a loophole, it's compatible with the spirit of the gpl. it's not as convenient as some people might like... but the sources are all there and they build and work. > Glen > > > > Nic > > -- > > Nicholas Henke > > Penguin Herder & Linux Cluster System Programmer > > Liniac Project - Univ. of Pennsylvania > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gotero at linuxprophet.com Fri Aug 29 14:12:37 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Fri, 29 Aug 2003 11:12:37 -0700 Subject: Intel acquiring Pallas In-Reply-To: <1062170224.9421.4.camel@roughneck> Message-ID: <5E9EB8D6-DA4C-11D7-A174-000393911A90@linuxprophet.com> On Friday, August 29, 2003, at 08:17 AM, Nicholas Henke wrote: > On Thu, 2003-08-28 at 19:46, Glen Otero wrote: >> Joel- >> >> Have you actually built RH AS from scratch using their SRPMS? Or do >> you know anyone that has? I'm very interested in doing this but I >> heard >> there were some pretty significant obstacles along the lines of >> package >> dependencies. >> > > The links to the rhel-rebuild howto and mailing list are enought to get > this done -- I just did 2.1 ES ( why bother with spending more for AS ? > ). We purchased one copy of ES, and I used that to do the rebuild. Of > course, it is not completely automatic, but there are only a handfull > of > packages that do not build without a bit of tweaking. > > As far as pkg dependencies go, it is _much_ easier to build on a > similar > system. > > Now for the $10K question -- are there any reasons that I ( or someone > else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It > of course still has the RH branding all over it, but it could be > distributed being called 'Nics Fun RH clone', or something similar. You can redistribute it as long as it doesn't have RH all over it and you don't use the RH name while endorsing/promoting it. I suppose you could say it's RH compliant and built from RH srpms. The loop hole that RH is taking advantage of is the fact that they are compliant with the GPL as long as they release the sources. They comply with the GPL by releasing the sources in srpm format, and so technically do not have to make the isos freely available. By making it slightly difficult to build your own distro, and not offering support to those who do, RH is coaxing people to take the path of least resistance (wrt effort) and buy licenses. Glen > > Nic > -- > Nicholas Henke > Penguin Herder & Linux Cluster System Programmer > Liniac Project - Univ. of Pennsylvania > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Fri Aug 29 19:01:03 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Fri, 29 Aug 2003 16:01:03 -0700 (PDT) Subject: IBM releases C/C++/F90 compilers - optimized for Apple G5 In-Reply-To: <99F2150714F93F448942F9A9F112634C07BE62F5@txexmtae.amd.com> Message-ID: <20030829230103.91270.qmail@web11407.mail.yahoo.com> (Sorry, didn't made it clear in my last email...) The compilers are for MacOSX. Rayson > Which one do we use for Linux, will the AIX one work? > > > Free download: > > > > http://www-3.ibm.com/software/awdtools/ccompilers/ > > http://www-3.ibm.com/software/awdtools/fortran/ > > __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sat Aug 30 12:48:36 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 30 Aug 2003 20:48:36 +0400 Subject: .rhosts or /etc/hosts.equiv In-Reply-To: <200308291902.h7TJ2Ow14727@NewBlue.Scyld.com> Message-ID: hi If i don't find these to file should i create it? i know that .rhosts is hidden but when I do ls -a i cannot find it even if i use the command locate therefore if i create it what permission should i give them thanks roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Sat Aug 30 13:53:56 2003 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Sat, 30 Aug 2003 12:53:56 -0500 Subject: .rhosts or /etc/hosts.equiv In-Reply-To: ; from rouds@servihoo.com on Sat, Aug 30, 2003 at 08:48:36PM +0400 References: <200308291902.h7TJ2Ow14727@NewBlue.Scyld.com> Message-ID: <20030830125356.C3206@mikee.ath.cx> On Sat, 30 Aug 2003, RoUdY wrote: > hi > If i don't find these to file should i create it? > i know that .rhosts is hidden but when I do ls -a > i cannot find it even if i use the command locate > therefore if i create it what permission should i give > them > thanks > roudy the file ~/.rhosts should have permissions of 600 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Aug 31 19:45:52 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 1 Sep 2003 09:45:52 +1000 Subject: Trademark caveats about building RHAS from SRPMS (was Re: Intel acquiring Pallas) In-Reply-To: <1062170224.9421.4.camel@roughneck> References: <1062170224.9421.4.camel@roughneck> Message-ID: <200309010945.53871.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 30 Aug 2003 01:17 am, Nicholas Henke wrote: > Now for the $10K question -- are there any reasons that I ( or someone > else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It > of course still has the RH branding all over it, but it could be > distributed being called 'Nics Fun RH clone', or something similar. Redhat have a set of rules of what you can and cannot do. Basically whilst they comply with the GPL they do restrict what you can do with their trademarks (i.e. things like Redhat and the ShadowMan logo). Two of the major things are: http://www.redhat.com/about/corporate/trademark/guidelines/page6.html C. You may not state that your product "contains Red Hat Linux X.X." This would amount to impermissible use of Red Hat's trademarks. [...] D. You must modify the files identified as REDHAT-LOGOS and ANACONDA-IMAGES so as to remove all use of images containing the "Red Hat" trademark or Red Hat's Shadow Man logo. Note that mere deletion of these files may corrupt the software. So if you want to build and redistribute from their SRPMS you will need to do extra work to make them happy. Note that RMS thinks that this use of trademark in relation to the GPL is legitimate, in an interview quoted on the "Open For Business" website he says (in regards to Mandrake): http://www.ofb.biz/modules.php?name=News&file=article&sid=260 [quote] TRB: Another interesting current issue is the concept of what might be seen as "hybrid licensing." For example, MandrakeSoft's Multi-Network Firewall is based on entirely Free Software, however the Mandrake branding itself is placed under a more restrictive license (you can't redistribute it for a fee). This give the user or consultant two choices -- use the software under the more restrictive licensing or remove the Mandrake artwork. What are your thoughts on this type or approach? RMS: I think it is legitimate. Freedom to redistribute and change software is a human right that must be protected, but the commercial use of a logo is a very different matter. Provided that removing the logo from the software is easy to do in practice, the requirement to pay for use of the logo does not stain the free status of the software itself. [/quote] - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/UoiwO2KABBYQAh8RAji6AJ4smNhqZ/my4k8i787Uaqs+n4rfsACcC4yS BLtsLZDIzG8Hm0KEACBOZyo= =A0dE -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hraa at lncc.br Fri Aug 1 13:58:59 2003 From: hraa at lncc.br (Ricardo) Date: Fri, 1 Aug 2003 14:58:59 -0300 (BRT) Subject: Filesystem Message-ID: Hi all Which one is better to use, ext3 or raiserfs? Someone have performance results comparing Ext3 with raiserfs? Thanks ------------------------------- Ricardo .-. /v\ // \\ > L I N U X < /( )\ ^^-^^ ------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Aug 1 19:05:47 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 1 Aug 2003 16:05:47 -0700 Subject: New technology for trunking gigE switches In-Reply-To: References: <20030730190605.GA2640@sphere.math.ucdavis.edu> Message-ID: <20030801230547.GA2324@greglaptop.internal.keyresearch.com> What good timing: Broadcom just released public info about their new generation of gigE switch chips, which are capable of using inexpensive 4-wire copper 10gig uplinks between boxes. The neat thing about this is that instead of having to buy a bunch of 10gig optics, which are very expensive, it uses a 4-wire 3.125 gbit copper interconnect, same as InfiniBand. You should expect to see this showing up in stackable 24 to 48 port switches, allowing up to 384 gigE ports in a single blob, at around $100/gigE port. The center is an 8-port 10gigE switch, so as you can see, you have the same issue of the ratio of uplink bandwidth to local bandwidth that you had in the fast ethernet stackables with 1gig uplinks. You will note, however, that the Broadcom blurb says you can get much better total fabric bandwidth than just one of those chips. They don't explain how, and so I can't mention it -- but if anyone finds a public explanation, please let me know. I believe that it should be able to hit the quoted 640 gbits of total traffic, i.e. at 384 ports, you can build a switch which almost has perfect bisection. The total switch latency also shouldn't be so bad: say ~35 usec for first bit in to last bit out, which is just over double of what you'll see with a standalone gigE switch. (The total latency seen by an application using TCP/IP will be higher, of course.) The HP Procurve guys had a quote in one of the press releases, but I'm sure that other vendors will ship products based on this too; Broadcom is already a high volume producer of chips used in ethernet switches. -- greg http://www.broadcom.com/docs/promostrataxgs.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Fri Aug 1 19:26:38 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Fri, 1 Aug 2003 19:26:38 -0400 (EDT) Subject: Filesystem In-Reply-To: Message-ID: > Which one is better to use, ext3 or raiserfs? there is no clearcut winner. > Someone have performance results comparing Ext3 with raiserfs? yes, there's plenty available. reiserfs people always focus on situations where directories have billions of small files. that's not surprising, since that's their design target: efficient storage of very small files, and efficient handling of ridiculously overfull directories. I question the value of worrying about very small files (because disk is so cheap, and clusters mostly have big files); big directories seem like someone's design mistake to me. ext3 is designed as an ultra-stable journaling version of ext2, and succeeds. it's difficult to compare reliability, but ext3 does generally have a better reputation than reiserfs. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Fri Aug 1 22:33:26 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 02 Aug 2003 06:33:26 +0400 Subject: nfs problem In-Reply-To: <200308011902.h71J2Vw28853@NewBlue.Scyld.com> Message-ID: Hello Thanks everbody it's working. I will need to install the MPICH now. Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Fri Aug 1 22:33:26 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 02 Aug 2003 06:33:26 +0400 Subject: nfs problem In-Reply-To: <200308011902.h71J2Vw28853@NewBlue.Scyld.com> Message-ID: Hello Thanks everbody it's working. I will need to install the MPICH now. Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Fri Aug 1 22:33:50 2003 From: timm at fnal.gov (Steven Timm) Date: Fri, 01 Aug 2003 21:33:50 -0500 Subject: Filesystem In-Reply-To: Message-ID: I have some anecdotal evidence that ext3 starts taking performance hit in cases where there is a lot of files getting written and then quickly erased. Also there's a performance penalty on burst I/O--e.g. if you have a system doing near-continuous disk writes and reads it will bump the load factor up. But I don't have any information to suggest that Reiser does it better. Steve ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Core Support Services Dept. Assistant Group Leader, Scientific Computing Support Group Lead of Computing Farms Team On Fri, 1 Aug 2003, Ricardo wrote: > > Hi all > > Which one is better to use, ext3 or raiserfs? > Someone have performance results comparing Ext3 with raiserfs? > > Thanks > > ------------------------------- > Ricardo > > .-. > /v\ > // \\ > L I N U X < > /( )\ > ^^-^^ > ------------------------------- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Aug 1 22:45:01 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 1 Aug 2003 19:45:01 -0700 (PDT) Subject: Filesystem In-Reply-To: Message-ID: hi ricardo filesystem comparasons http://www.linux-sec.net/FileSystem/#FS http://aurora.zemris.fer.hr/filesystems/ i think ext3 is better than reiserfs i think ext3 is not any better than ext2 in terms of somebody hitting pwer/reset w/o proper shutdown - i always allow it to run e2fsck when it does an unclean shutdown ... - yes ext3 will timeout and continue and restore from backups but ... am paranoid about the underlying ext2 getting corrupted by random power off and resets c ya alvin On Fri, 1 Aug 2003, Ricardo wrote: > > Hi all > > Which one is better to use, ext3 or raiserfs? > Someone have performance results comparing Ext3 with raiserfs? > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Sat Aug 2 08:48:09 2003 From: angel at wolf.com (Angel Rivera) Date: Sat, 02 Aug 2003 12:48:09 GMT Subject: Filesystem In-Reply-To: References: Message-ID: <20030802124809.1437.qmail@houston.wolf.com> Steven Timm writes: > I have some anecdotal evidence that ext3 starts taking performance hit > in cases where there is a lot of files getting written and then > quickly erased. Also there's a performance penalty on burst I/O--e.g. > if you have a system doing near-continuous disk writes and reads it > will bump the load factor up. But I don't have any information to > suggest that Reiser does it better. > It depends what you are going to use the nodes for. For normal compute nodes, I don't think there is enough of a payback to change ext3. For our disk nodes, we use ext3 for system filesystems and XFS for the exported disk space (with NFS patches and tuning of couse) to get some serious performance. We are currently testing different filesystems on one of the disk nodes we just purchased and have seen a dramatic rise in performance with the above. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mukshere at rediffmail.com Sat Aug 2 10:16:43 2003 From: mukshere at rediffmail.com (mukund govind umalkar) Date: 2 Aug 2003 14:16:43 -0000 Subject: Beowulf Research Message-ID: <20030802141643.9462.qmail@webmail7.rediffmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sanjoy at chem.iitkgp.ernet.in Sat Aug 2 13:33:21 2003 From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay) Date: Sat, 2 Aug 2003 23:03:21 +0530 (IST) Subject: NIS In-Reply-To: <20030802124809.1437.qmail@houston.wolf.com> Message-ID: Hi, I have a cluster running Rh 7.3 with NIS server. The cluster was running fine. But suddenly after rebooting now the clients are having problems in recognizing the NIS domain server name. while booting the clients it says: Binding to the NIS domain: [OK] Listening for an NIS domail server............[FAILED] ypwhich on clients says 'Can't communicate with ypbind' ypbind, ypserv are running fine on the server. I will appreciate if anyone can help.. Thanks. Sanjoy -------------------------------------------------------------------- Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in Assistant Professor Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) Department of Chemistry 91-3222-283345 (Home) Indian Institute of Technology 91-3222-279938 (Home) Kharagpur 721 302 Fax : 91-3222-255303 West Bengal, India. 91-3222-282252 http://www.chem.iitkgp.ernet.in/faculty/SB/ -------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bari at onelabs.com Sat Aug 2 14:02:45 2003 From: bari at onelabs.com (Bari Ari) Date: Sat, 02 Aug 2003 13:02:45 -0500 Subject: SH4 & SH5 Clustering Message-ID: <3F2BFCC5.6060803@onelabs.com> It's been a few years since anyone has posted anything here on clusters using the SH-4. http://www.beowulf.org/pipermail/beowulf/1999-November/007339.html Does anyone have results or experiences of building systems using the SH-4? http://www.superh.com/products/sh4.htm http://www.superh.com/products/sh5.htm The SH-5 is finally showing up in silicon at 2.8GFLOPS, 400MHz, under 1W/cpu. The caches are small at 32KB yet have a 3.2GB/s peak internal bus, the SOC's have DDR memory and 32bit/66MHz PCI. They look attractive for low power dense clusters/blade applications that won't be hurt much by their small cache size and the 264MB/s peak PCI interface. A 1-U could contain 24 - 32 of these and require only convection cooling for the cpu's. The DDR memory would be the "hot spots" and require some forced air cooling. Bari _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sat Aug 2 14:42:14 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 2 Aug 2003 14:42:14 -0400 (EDT) Subject: NIS In-Reply-To: Message-ID: On Sat, 2 Aug 2003, Sanjoy Bandyopadhyay wrote: > Hi, > I have a cluster running Rh 7.3 with NIS server. The cluster was running > fine. But suddenly after rebooting now the clients are having problems in > recognizing the NIS domain server name. while booting the clients it says: > > Binding to the NIS domain: [OK] > Listening for an NIS domail server............[FAILED] > > ypwhich on clients says 'Can't communicate with ypbind' > > ypbind, ypserv are running fine on the server. Hmmm, so many possible causes. If you say "suddenly after rebooting" and if it applies to all the clients, I'd check the following: a) The network connection of the server. All things being equal, I'd have to say this is a prime candidate. Don't forget to check the wire(s) itself -- many is the perplexing networking or service problem that turned out to be caused by somebody kicking a wire so that the plug was no longer properly seated. Check network connectivity in other ways to -- is the switch port suddenly bad, do I need to power cycle the switch (switches sometimes "wedge" and need a cycle to rebuild their tables), and so forth. On some switches it is possible to block broadcasts -- NIS requires them, so be sure that this didn't get done by mistake. b) When you've eliminated hardware as a possible cause (and have validated perfect network connectivity) then you can look for software problems. A "sudden" problem like this is odd -- perhaps you accidentally updated with a broken RPM? Perhaps somebody trashed a table? Did somebody update iptables or ipchains or change their rules so port access is blocked that way? See if checking out these systems solves it. If not, in your next post include more detail on your network and so forth. Usually this kind of thing is solved by doggedly testing one system at a time until the culprit emerges, starting with the most likely. Don't forget, you have tools like tcpdump that will let you snoop the network packets one at a time if necessary to be sure that they are indeed arriving at the server from the clients. I recall that you can turn on ypserv with -d for debug to get a much more verbose operational mode to help debug as well. HTH, rgb > > I will appreciate if anyone can help.. > Thanks. > Sanjoy > > > -------------------------------------------------------------------- > Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in > Assistant Professor > Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) > Department of Chemistry 91-3222-283345 (Home) > Indian Institute of Technology 91-3222-279938 (Home) > Kharagpur 721 302 Fax : 91-3222-255303 > West Bengal, India. 91-3222-282252 > http://www.chem.iitkgp.ernet.in/faculty/SB/ > -------------------------------------------------------------------- > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sat Aug 2 14:50:03 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat, 2 Aug 2003 14:50:03 -0400 (EDT) Subject: Beowulf Research In-Reply-To: <20030802141643.9462.qmail@webmail7.rediffmail.com> Message-ID: On 2 Aug 2003, mukund govind umalkar wrote: > hello sir, > i am a graduate student, and i am intrested in doing research on > Beowulf clusters, so plzz send me some material and let me know > about the various papers that have presented on Beowulf. > > If possible please some useful URLs for the same There are lots of starting points, and the better sites form for all practical purposes a webring with mutual links interconnecting them so sites you don't find on one you're likely to find on another linked to it. One such starting point is: http://www.phy.duke.edu/brahma (look under e.g. resources and links and papers). Brahma will lead you do the beowulf underground, to the original/main beowulf site, and to many other well-known clustering sites and resources. To find "real" papers on clustering, check out e.g. ;login and various other computer geek journals and magazines. Linux Magazine has an excellent clustering column by Forrest Hoffman. There are some online webzines devoted to clustering (some linked to brahma). Google is your friend here -- with google you can find out pretty much anything that is online. rgb > > thanx > Mukund > > > > ___________________________________________________ > Download the hottest & happening ringtones here! > OR SMS: Top tone to 7333 > Click here now: > http://sms.rediff.com/cgi-bin/ringtone/ringhome.pl > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Sat Aug 2 18:11:17 2003 From: angel at wolf.com (Angel Rivera) Date: Sat, 02 Aug 2003 22:11:17 GMT Subject: NIS In-Reply-To: References: Message-ID: <20030802221117.3967.qmail@houston.wolf.com> Sanjoy Bandyopadhyay writes: > Hi, > I have a cluster running Rh 7.3 with NIS server. The cluster was running > fine. But suddenly after rebooting now the clients are having problems in > recognizing the NIS domain server name. while booting the clients it says: > > Binding to the NIS domain: [OK] > Listening for an NIS domail server............[FAILED] > > ypwhich on clients says 'Can't communicate with ypbind' > > ypbind, ypserv are running fine on the server. > > I will appreciate if anyone can help.. Check to make sure your NIS server is running and talking (TCPDUMP). If your node is running fine? Trying running "/etc/rc.d/init.d/ypbind restart" and see what error crops up. Also, try nisdomainname and see what crops up there. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sanjoy at chem.iitkgp.ernet.in Sun Aug 3 01:36:00 2003 From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay) Date: Sun, 3 Aug 2003 11:06:00 +0530 (IST) Subject: NIS In-Reply-To: <20030802221117.3967.qmail@houston.wolf.com> Message-ID: On Sat, 2 Aug 2003, Angel Rivera wrote: > Check to make sure your NIS server is running and talking (TCPDUMP). > If your node is running fine? Trying running "/etc/rc.d/init.d/ypbind > restart" and see what error crops up. yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this: Shutting down NIS services: [FAILED] Binding to the NIS domain: [OK] Listening for an NIS domain server................... [FAILED] > Also, try nisdomainname and see what crops up there. nisdomainname gives correct domain name. We have the Sever filesystems NFS mounted on the clients. I can see now that this NFS mounting is not working for the clients. While the clients tries to mount the NFS filesystem, it gives this error: Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable to receive Thanks.. -Sanjoy -------------------------------------------------------------------- Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in Assistant Professor Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) Department of Chemistry 91-3222-283345 (Home) Indian Institute of Technology 91-3222-279938 (Home) Kharagpur 721 302 Fax : 91-3222-255303 West Bengal, India. 91-3222-282252 http://www.chem.iitkgp.ernet.in/faculty/SB/ -------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leventeh at hotmail.com Sun Aug 3 02:14:23 2003 From: leventeh at hotmail.com (Levente Horvath) Date: Sun, 03 Aug 2003 06:14:23 +0000 Subject: MPI & linux compilers Message-ID: To whom it may concern, We have 12 PCs set up for parallel computation. All are running linux (Redhat 7.3) and MPI. We would like to compute eigenvalues and eigenvectors for large matrices. We have managed to do up to 10000x10000 matrix no problem. Our program uses Scalapack and Blacs routines. These routines require two matrix to be declared. On single precision two 10000x10000 matrix occupies 800Mb of memory which is already exceeds the 512Mb local memory of each computer in our cluster. This memory were equally distributed over the 12 computers upon computation. So, we think that in theory we shouldn't have any problem going to large matrices; as our distributed memory is quite large 12*512Mb. Now, if we try to run a larger size then the compiler mpif77 returns a "large matrix" error. We have traced the compiler and found that mpif77 is a script that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we found that there is no problem with the compilation up to a size of 15000x15000, then the compiler crashes. After tracing the compilation procedure, we found that the linker "as" cannot link some of the .o and .s files in our /tmp directory. So, we used C rather than fortran. Statically, we cannot declare more than a 1500x1500 matrix (that put in to a hello world program for MPI). We thought it might be the problem with the static allocation of memory. So, we tried to allocate this space dynamically without any success.... Our questions are: Are we doing something wrong here. Or are the compilers gcc and g77-3 responsible for such an array limit. Or are we missing the ways to allocate memory for large matrices.... This is not the end of our story. We tried "ifc" IBM fortran 90 compiler. Unfortunately, we cannot link mpi libraries against this "ifc" compiler. It just doesn't see them. We have tried to compile ifc with the full path names of libraries using either static and dynamics libraries. In either case we had no success... We would appreciate all of your comments and suggestions. Thank you in advance.... _________________________________________________________________ ninemsn Extra Storage comes with McAfee Virus Scanning - to keep your Hotmail account and PC safe. Click here http://join.msn.com/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Sun Aug 3 12:17:19 2003 From: angel at wolf.com (Angel Rivera) Date: Sun, 03 Aug 2003 16:17:19 GMT Subject: NIS In-Reply-To: References: Message-ID: <20030803161719.30576.qmail@houston.wolf.com> Sanjoy Bandyopadhyay writes: > > On Sat, 2 Aug 2003, Angel Rivera wrote: > >> Check to make sure your NIS server is running and talking (TCPDUMP). >> If your node is running fine? Trying running "/etc/rc.d/init.d/ypbind >> restart" and see what error crops up. > > yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this: > Shutting down NIS services: [FAILED] > Binding to the NIS domain: [OK] > Listening for an NIS domain server................... [FAILED] > >> Also, try nisdomainname and see what crops up there. > > nisdomainname gives correct domain name. > > > We have the Sever filesystems NFS mounted on the clients. I can see > now that this NFS mounting is not working for the clients. While the > clients tries to mount the NFS filesystem, it gives this error: > > Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable > to receive It is not seeing the ypserver. have you tried rpcinfo -p _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bropers at lsu.edu Sat Aug 2 17:28:18 2003 From: bropers at lsu.edu (Brian D. Ropers-Huilman) Date: Sat, 2 Aug 2003 16:28:18 -0500 (CDT) Subject: NIS In-Reply-To: References: Message-ID: On Sat, 2 Aug 2003, Sanjoy Bandyopadhyay wrote: > Hi, > I have a cluster running Rh 7.3 with NIS server. The cluster was running > fine. But suddenly after rebooting now the clients are having problems in > recognizing the NIS domain server name. while booting the clients it says: > > Binding to the NIS domain: [OK] > Listening for an NIS domail server............[FAILED] > > ypwhich on clients says 'Can't communicate with ypbind' > > ypbind, ypserv are running fine on the server. > > I will appreciate if anyone can help.. > Thanks. > Sanjoy Sanjoy, You say that ypbind is running fine /on the SERVER/, what about ypbind running on the /CLIENT/? ypbind should not run on the server, it runs on the clients. -- Brian D. Ropers-Huilman (225) 578-0461 (V) Systems Administrator AIX (225) 578-6400 (F) Office of Computing Services GNU Linux brian at ropers-huilman.net High Performance Computing .^. http://www.ropers-huilman.net/ Fred Frey Building, Rm. 201, E-1Q /V\ \o/ Louisiana State University (/ \) -- __o / | Baton Rouge, LA 70803-1900 ( ) --- `\<, / `\\, ^^-^^ O/ O / O/ O _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Aug 3 13:05:32 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 3 Aug 2003 13:05:32 -0400 (EDT) Subject: NIS In-Reply-To: Message-ID: On Sun, 3 Aug 2003, Sanjoy Bandyopadhyay wrote: > > On Sat, 2 Aug 2003, Angel Rivera wrote: > > > Check to make sure your NIS server is running and talking (TCPDUMP). > > If your node is running fine? Trying running "/etc/rc.d/init.d/ypbind > > restart" and see what error crops up. > > yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this: > Shutting down NIS services: [FAILED] > Binding to the NIS domain: [OK] > Listening for an NIS domain server................... [FAILED] > > > Also, try nisdomainname and see what crops up there. > > nisdomainname gives correct domain name. > > > We have the Sever filesystems NFS mounted on the clients. I can see > now that this NFS mounting is not working for the clients. While the > clients tries to mount the NFS filesystem, it gives this error: > > Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable > to receive Yah. How about ping? Can you ping the server? Seriously, this looks like your problem is just a bad network connection, or conceivably a downed portmapper. If you can't ping, obviously your network is down and you need to fix it. If you can ping and ssh back and forth and the like, then make sure that portmap is running on your clients and server (an rpm that updated but installed the new one off?). In fact, do chkconfig --list and look at ALL of your network services to make sure they still make sense. Be careful here -- trojanned portmappers and other broken rpc services are a favorite way for crackers to enter your system. What you are seeing COULD be symptoms of being cracked, as trojanned portmappers not infrequently are broken (for a variety of reasons). You might prefer to back up your data and do a full reinstall of the server and a client, to check the rpm MD5 checksums, and to presume that you may have been cracked (monitoring your net traffic with TCPDUMP looking for bad guys) while you proceed. At least stay aware of the possibility. It's happened to me; it could have happened to you. rgb > > Thanks.. > -Sanjoy > > -------------------------------------------------------------------- > Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in > Assistant Professor > Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) > Department of Chemistry 91-3222-283345 (Home) > Indian Institute of Technology 91-3222-279938 (Home) > Kharagpur 721 302 Fax : 91-3222-255303 > West Bengal, India. 91-3222-282252 > http://www.chem.iitkgp.ernet.in/faculty/SB/ > -------------------------------------------------------------------- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Sun Aug 3 13:16:36 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 3 Aug 2003 13:16:36 -0400 (EDT) Subject: NIS In-Reply-To: Message-ID: On Sat, 2 Aug 2003, Brian D. Ropers-Huilman wrote: > Sanjoy, > > You say that ypbind is running fine /on the SERVER/, what about ypbind running > on the /CLIENT/? ypbind should not run on the server, it runs on the clients. Right, but if NFS is also not running with an RPC error, it really suggests either raw networking problems or problems with the RPC subsystem, e.g. portmap. He also originally said that he had it working and then it stopped. If that is true it doubly points to networking or RPC. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Sun Aug 3 14:18:17 2003 From: gropp at mcs.anl.gov (William Gropp) Date: Sun, 03 Aug 2003 13:18:17 -0500 Subject: MPI & linux compilers In-Reply-To: Message-ID: <5.1.1.6.2.20030803131144.02f00e50@localhost> At 06:14 AM 8/3/2003 +0000, Levente Horvath wrote: >To whom it may concern, > >We have 12 PCs set up for parallel computation. All are running linux >(Redhat 7.3) and MPI. >We would like to compute eigenvalues and eigenvectors for large matrices. > >We have managed to do up to 10000x10000 matrix no problem. Our program >uses Scalapack and Blacs >routines. These routines require two matrix to be declared. On single >precision two 10000x10000 >matrix occupies 800Mb of memory which is already exceeds the 512Mb local >memory of >each computer in our cluster. This memory were equally distributed over >the 12 computers >upon computation. So, we think that in theory we shouldn't have any >problem going >to large matrices; as our distributed memory is quite large 12*512Mb. You need to declare only the local part of the matrix that is distributed across the processes, not the entire matrix. MPI doesn't provide any support for automatically distributing the data, though libraries written using MPI can do this if the data is allocated dynamically by the library. Languages such as HPF can do this for you, but have their own limitations. Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xavier at zeth.ciencias.uchile.cl Sun Aug 3 21:59:56 2003 From: xavier at zeth.ciencias.uchile.cl (Xavier Andrade) Date: Sun, 3 Aug 2003 21:59:56 -0400 (CLT) Subject: MPI & linux compilers In-Reply-To: Message-ID: On Sun, 3 Aug 2003, Levente Horvath wrote: > To whom it may concern, > > We have 12 PCs set up for parallel computation. All are running linux > (Redhat 7.3) and MPI. > We would like to compute eigenvalues and eigenvectors for large matrices. > > We have managed to do up to 10000x10000 matrix no problem. Our program uses > Scalapack and Blacs > routines. These routines require two matrix to be declared. On single > precision two 10000x10000 > matrix occupies 800Mb of memory which is already exceeds the 512Mb local > memory of > each computer in our cluster. This memory were equally distributed over the > 12 computers > upon computation. So, we think that in theory we shouldn't have any problem > going > to large matrices; as our distributed memory is quite large 12*512Mb. > > Now, if we try to run a larger size then the compiler mpif77 returns > a "large matrix" error. We have traced the compiler and found that mpif77 is > a script > that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we > found that > there is no problem with the compilation up to a size of 15000x15000, then > the > compiler crashes. After tracing the compilation procedure, we found that > the linker "as" cannot link some of the .o and .s files in our /tmp > directory. > > So, we used C rather than fortran. Statically, we cannot declare more than > a 1500x1500 matrix (that put in to a hello world program for MPI). We > thought > it might be the problem with the static allocation of memory. So, we tried > to allocate this space dynamically without any success.... > > Our questions are: Are we doing something wrong here. Or are the compilers > gcc and g77-3 > responsible for such an array limit. Or are we missing the ways to allocate > memory for large matrices.... > > This is not the end of our story. We tried "ifc" IBM fortran 90 compiler. > Unfortunately, we > cannot link mpi libraries against this "ifc" compiler. It just doesn't see > them. We have > tried to compile ifc with the full path names of libraries using either > static and dynamics libraries. > In either case we had no success... > Running "mpif77 -showme" will show you the line that mpif77 actually calls for compiling, if you want to change the compiler that mpif77 calls set the enviroment variable LAMHF77 (i.e. with `export LAMHF77=ifc` mpif77 will compile using ifc instead of f77). Xavier _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sanjoy at chem.iitkgp.ernet.in Mon Aug 4 01:09:13 2003 From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay) Date: Mon, 4 Aug 2003 10:39:13 +0530 (IST) Subject: NIS In-Reply-To: Message-ID: Hi, I figured out what was wrong.. the nsswitch.conf file was somehow corrupted. nis was not mentioned for passwd,group,shadow files. Now everything is under control. Thanks very much to all of you who helped with their valuable suggestions. -Sanjoy -------------------------------------------------------------------- Dr. Sanjoy Bandyopadhyay E-mail: sanjoy at chem.iitkgp.ernet.in Assistant Professor Molecular Modeling Laboratory Phone: 91-3222-283344 (Office) Department of Chemistry 91-3222-283345 (Home) Indian Institute of Technology 91-3222-279938 (Home) Kharagpur 721 302 Fax : 91-3222-255303 West Bengal, India. 91-3222-282252 http://www.chem.iitkgp.ernet.in/faculty/SB/ -------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From javier.crespo at itp.es Mon Aug 4 02:53:05 2003 From: javier.crespo at itp.es (Javier Crespo) Date: Mon, 04 Aug 2003 08:53:05 +0200 Subject: MPI & linux compilers References: Message-ID: <3F2E02D1.E834011B@itp.es> Levente Horvath wrote: > To whom it may concern, > > We have 12 PCs set up for parallel computation. All are running linux > (Redhat 7.3) and MPI. > We would like to compute eigenvalues and eigenvectors for large matrices. > > We have managed to do up to 10000x10000 matrix no problem. Our program uses > Scalapack and Blacs > routines. These routines require two matrix to be declared. On single > precision two 10000x10000 > matrix occupies 800Mb of memory which is already exceeds the 512Mb local > memory of > each computer in our cluster. This memory were equally distributed over the > 12 computers > upon computation. So, we think that in theory we shouldn't have any problem > going > to large matrices; as our distributed memory is quite large 12*512Mb. > > Now, if we try to run a larger size then the compiler mpif77 returns > a "large matrix" error. We have traced the compiler and found that mpif77 is > a script > that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we > found that > there is no problem with the compilation up to a size of 15000x15000, then > the > compiler crashes. After tracing the compilation procedure, we found that > the linker "as" cannot link some of the .o and .s files in our /tmp > directory. > > So, we used C rather than fortran. Statically, we cannot declare more than > a 1500x1500 matrix (that put in to a hello world program for MPI). We > thought > it might be the problem with the static allocation of memory. So, we tried > to allocate this space dynamically without any success.... > > Our questions are: Are we doing something wrong here. Or are the compilers > gcc and g77-3 > responsible for such an array limit. Or are we missing the ways to allocate > memory for large matrices.... > > This is not the end of our story. We tried "ifc" IBM fortran 90 compiler. > Unfortunately, we > cannot link mpi libraries against this "ifc" compiler. It just doesn't see > them. We have > tried to compile ifc with the full path names of libraries using either > static and dynamics libraries. > In either case we had no success... > > We would appreciate all of your comments and suggestions. > Thank you in advance.... If you want to link to mpi but compiling with "ifc" (is it really IBM? - I think it comes from intel), you first at all should have to compile that libraries with the same compiler that you are going to use for the main program, typically using the options "-fc=ifc","--f90=ifc" and "-f90linker=ifc" when configuring MPI and then installing it in you path (in a different place than the MPI libraries compiled with f77). Javier _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Mon Aug 4 08:02:50 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Mon, 4 Aug 2003 14:02:50 +0200 (CEST) Subject: Cisco switches for lam mpi In-Reply-To: Message-ID: On Tue, 29 Jul 2003, Jack Douglas wrote: > We have just installed a 32 Node Dual Xeon Cluster, with a Cisco Cataslyst > 4003 Chassis with 48 1000Base-t ports. > > We are running LAM MPI over gigabit, but we seem to be experiencing > bottlenecks within the switch > > Typically, using the cisco, we only see CPU utilisation of around 30-40% [...] I'm not a Cisco expert, but... We once got a Cisco switch from our networking people that we had to return immediately because it delivered such a bad performance. It was a Catalyst 2900XL with 24 Fast Ethernet ports, but it could only handle 12 ports at full speed. Above that, the performance brake down completely. For some benchmark results see, e.g.: http://www.cs.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf As a comparison, the quite nice results of a CentreCom 742i: http://www.cs.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf Disclaimer: Maybe the Cisco you mentioned is better, or Ciscos improved anyway since spring 2001 when I did the above tests. Besides, the situation for Gigabit Ethernet could be different. As we described on our workshop paper at CAC03 you can not trust the data sheets of switches anyway: http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 Conclusion: If you need a very high performing switch, you have to evaluate/benchmark it yourself. - Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Mon Aug 4 15:31:22 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 04 Aug 2003 15:31:22 -0400 Subject: large filesystem & fileserver architecture issues. Message-ID: <1060025481.28642.81.camel@roughneck> Hey all -- here is our situation. We currently have several clusters that are configured with either IBM x342 or Dell 2650 serves with their respective vendors SCSI RAID arrays hanging off of them. Each server + array is good for around 600GB after RAID 5 and formatting. The IBM's have the added ability to do a RAID 50 of multiple arrays ( which seems to work & perform quite nicely ). Each of the servers then exports the filesystem via NFS, and is mounted on the nodes. The clusters range from 24 to 128 nodes. For backups we maintain an offline server + array that we use to rsync the data nightly, then use our amanda server and tape robot to backup. We use an offline sync, as we need a level 0 dump every 2 weeks, and doing a level 0 dump of 600GB just trashes the performance on a live server. As we are a .edu and all of the clusters were purchased by the individual groups, the options we can explore have to be very cost efficient for hardware, and free for software. Now for the problem... A couple of our clusters are using the available filespace quite rapidly, and we are looking to add space. The most cost efficient approach we have found is to buy a IDE RAID box, like those available from RaidZone or PogoLinux. This allows us to use the cheap IDE systems as the offline sync, and use the scsi systems as online servers. And the questions: 1) Is there a better way to backup the systems without the need for an offline sync? 2) Does anyone have experience doing RAID 50 with Dell hardware? How bad does it bite ? 3) Are there any recommended IDE RAID systems? We are not looking for super stellar performance, just a solid system that does it's job as an offline sync for backups. -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Aug 4 22:49:35 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: 04 Aug 2003 22:49:35 -0400 Subject: updated run_mpiblast code Message-ID: <1060051774.25281.22.camel@protein.scalableinformatics.com> Hi folks: Updated and documented the run_mpiblast code. Better data from --debug switch. To see the man page, either perldoc run_mpiblast or run run_mpiblast --help Will be working on an RPM and a tarball installer in short order. It can be pulled from http://scalableinformatics.com/sge_mpiblast.html. The documentation (pod generated) can be viewed at http://scalableinformatics.com/run_mpiblast.html . -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Tue Aug 5 08:54:57 2003 From: rouds at servihoo.com (RoUdY) Date: Tue, 05 Aug 2003 16:54:57 +0400 Subject: mpich2-0.93 In-Reply-To: <200308041901.h74J1Tw27276@NewBlue.Scyld.com> Message-ID: hello everybody I have download MPICH2-0.93 and I have some difficulty in implementing it. That is, according to some research done I need to amend the file "machines.LINUX" so that the parallel computing can start and to choose which node to form part of the cluster. But the problem is that there is no file which name "machine.LINUX" and the file is suppose to be found in the directory .../mpich2-0.93/util/machines. Well, I use redhat9.0 - hope to hear from you very soon If there is a web site to get the necessary information please let me know. Cheers Roudy. -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Tue Aug 5 11:47:07 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 05 Aug 2003 11:47:07 -0400 Subject: large filesystem & fileserver architecture issues. In-Reply-To: References: Message-ID: <1060098427.30922.6.camel@roughneck> On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > On 4 Aug 2003, Nicholas Henke wrote: > > We have a lot of experience with IDE RAID arrays at client sites. The DOE > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) > and the price is hard to beat. The raid array that serves home > directories to their clusters and workstations is backed up nightly to a > second raid server, similarly to your system. To speed things along we > installed an extra gigabit card in the primary and backup servers and > connected the two directly. The nightly backup (cp -auf via NFS) of 410 > GBs take just over an hour using the dedicated gbit link. Rsync would > probably be faster. Without the shortcircuit gigabit link, it used to run > four or five times longer and seriously impact NFS performance for the > rest of the systems on the LAN. > > Hope this helps. > > Regards, > > Mike Prinkey > Aeolus Research, Inc. Definately does -- can you recommend hardware for the IDE RAID, or list what you guys have used ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mitchel at navships.com Tue Aug 5 15:11:58 2003 From: mitchel at navships.com (Mitchel Kagawa) Date: Tue, 5 Aug 2003 09:11:58 -1000 Subject: large filesystem & fileserver architecture issues. References: <1060098427.30922.6.camel@roughneck> Message-ID: <009701c35b85$714e7110$7101a8c0@Navatek.local> I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable. Although they do offer a NFS box that will turn one of these arrays into a standalone. We have had great success with these units (http://neptune.navships.com/images/harddrivearrays.jpg) . We first acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's. We have set it up in a RAID-5 configuration and have not yet had to replace even one of the drives (Knockin on wood). After a year we picked up the 14slot chassis and filled it with 160 maxtor drives and it has performed flawless... I think we paig about $4000 for the 14 slot chassis. you can add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver for $1500 and you got about 2TB of storage for around $7000 Mitchel Kagawa ----- Original Message ----- From: "Nicholas Henke" To: "Michael T. Prinkey" Cc: Sent: Tuesday, August 05, 2003 5:47 AM Subject: Re: large filesystem & fileserver architecture issues. > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > > On 4 Aug 2003, Nicholas Henke wrote: > > > > We have a lot of experience with IDE RAID arrays at client sites. The DOE > > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. > > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) > > and the price is hard to beat. The raid array that serves home > > directories to their clusters and workstations is backed up nightly to a > > second raid server, similarly to your system. To speed things along we > > installed an extra gigabit card in the primary and backup servers and > > connected the two directly. The nightly backup (cp -auf via NFS) of 410 > > GBs take just over an hour using the dedicated gbit link. Rsync would > > probably be faster. Without the shortcircuit gigabit link, it used to run > > four or five times longer and seriously impact NFS performance for the > > rest of the systems on the LAN. > > > > Hope this helps. > > > > Regards, > > > > Mike Prinkey > > Aeolus Research, Inc. > > Definately does -- can you recommend hardware for the IDE RAID, or list > what you guys have used ? > > Nic > -- > Nicholas Henke > Penguin Herder & Linux Cluster System Programmer > Liniac Project - Univ. of Pennsylvania > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From egan at sense.net Tue Aug 5 18:12:21 2003 From: egan at sense.net (Egan Ford) Date: Tue, 5 Aug 2003 16:12:21 -0600 Subject: Power monitoring Message-ID: <095d01c35b9e$a4ae90d0$0664a8c0@titan> I know this was discussed recently with "kill-a-watt" as a popular choice, however I am looking for the next step up, something more on the circuit level that I can hardwire between my lab and breakers. Support for multiple circuits would be nice too as well as 110/220 support. Add a serial port for remote monitoring and I'm set. However I am looking for a cheap solution, a web cam pointing to a meter is an option. I'll even settle for analogue, I just need kwh. Thanks. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Tue Aug 5 18:35:09 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Tue, 5 Aug 2003 15:35:09 -0700 (PDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: <009701c35b85$714e7110$7101a8c0@Navatek.local> Message-ID: hi ya On Tue, 5 Aug 2003, Mitchel Kagawa wrote: > I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true > NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable. thought acnc.com has good stuff . :-) > Although they do offer a NFS box that will turn one of these arrays into a > standalone. We have had great success with these units > (http://neptune.navships.com/images/harddrivearrays.jpg) . We first > acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's. > We have set it up in a RAID-5 configuration and have not yet had to replace > even one of the drives (Knockin on wood). After a year we picked up the > 14slot chassis and filled it with 160 maxtor drives and it has performed > flawless... I think we paig about $4000 for the 14 slot chassis. you can > add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver > for $1500 and you got about 2TB of storage for around $7000 8 drives at 250GB each is 2TB in one 1U chassis ... 250GB disks is about $250 now days.... maybe less on the online webstores backup of 2TB should be done on another 2TB systems .. 3rd 2TB machine if the data cannot be recreated save only the raw data/apps needed to regenerate the output data c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Tue Aug 5 18:40:27 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Tue, 5 Aug 2003 15:40:27 -0700 (PDT) Subject: large filesystem & fileserver architecture issues. -hw In-Reply-To: <1060098427.30922.6.camel@roughneck> Message-ID: hi ya On 5 Aug 2003, Nicholas Henke wrote: > > Definately does -- can you recommend hardware for the IDE RAID, or list > what you guys have used ? you have basically 2 choices ... - leave the ide as an ide disks ... ( software raid ) - get a $50 ide controller ( 4 drives on it ) and 4 drives on the mb - convert the ide to look like a scsi drives ( tho not really ) - 3ware 7500-8 series for 8 "scsi" disks on it - or get a real hardware raid card for lots of $$$ - mylex, adaptec - for a list of hardware raid card that is supported by linux http://www.linux-ide.org/chipsets.html http://www.1u-raid5.net sw/hw raid5 howto's c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From m0ukb at unb.ca Wed Aug 6 08:50:13 2003 From: m0ukb at unb.ca (White, Adam Murray) Date: Wed, 6 Aug 2003 09:50:13 -0300 Subject: Performance monitoring tool Message-ID: <1060174213.3f30f9859afaf@webmail.unb.ca> Hello, I am interested in acquiring a good real time cluster performance monitoring tool, which at least displays (dynamically while the program is running) each thread's cpu utilization and memory usage (graphically). Not a postmortem display. Free as well. Any help would be much appreciated. Regards, A. M. White ###################################################### Adam M. White University of New Brunswick Saint John http://www.unbsj.ca/sase/csas m0ukb at unb.ca ###################################################### _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Aug 6 13:21:02 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 6 Aug 2003 13:21:02 -0400 (EDT) Subject: Performance monitoring tool In-Reply-To: <1060174213.3f30f9859afaf@webmail.unb.ca> Message-ID: On Wed, 6 Aug 2003, White, Adam Murray wrote: > Hello, > > I am interested in acquiring a good real time cluster performance monitoring tool, which at > least displays (dynamically while the program is running) each thread's cpu utilization and > memory usage (graphically). Not a postmortem display. Free as well. > > Any help would be much appreciated. At this time it won't QUITE do what you like, but it is within spitting distance of it. Check out: xmlsysd and wulfstat on brahma (http://www.phy.duke.edu/brahma). xmlsysd is a daemon that runs on a cluster and obtains by a variety of means statistics of interest on the system. Some of these it parses from proc, others by the use of systems calls. It is not promiscuous (it doesn't provide e.g. a complete copy of /proc to clients that connect to it) but rather offers a digested view that can be throttled so that one or more "sets" of interesting statistics can be monitored. This is to keep it lightweight, both on the system it is monitoring and on the network and client -- it is (literally) a parallel application in its own right and it isn't a good idea for a monitor application to significantly compete for any of the resources that might bottleneck a "production" parallel application. Its "prepackaged" return sets include load avg (5,10,15 min), memory (basically the data underlying the "free" command), ethernet network usage for one or more devices, date/time/cpu information, basically the kind of data one finds digested at the top of the "top" command or made available by e.g xosview in kin in graphical windows. It also has a "pid" mode where it can monitor running processes. Here throttling and filtering is a bit trickier, as one generally does NOT want to monitor every process running on a system with a supposedly lightweight tool. I thus implemented pid selection by means of matching task name or user name, a mode that returns all "userspace" tasks that have accumulated more than some cutoff in total time (5 seconds? I can't remember), as well as a to-be-rarely-used promiscuous mode that returns everything it can find including root tasks. xmlsysd's returns are in xml, and hence are easy to parse out with any xml parser for application in anything you like. That's the good news. The other good news is that wulfstat, the provided client, lets you use most of these features in a tty/ncurses window. The bad news it that there is no GUI display with little graphs and the like. This is mixed news, really, not necessarily bad. A tty display lets you use the pgup and pgdn keys and scroll arrows to page quickly through a lot of hosts, seeing instantly the full detail (actual numbers) for each field being monitored -- you might find wulfstat to be adequate. If it isn't adequate, though, you'll likely need to write some sort of client application that polls the daemon at some interval (I tend to use 5 seconds as the default, but it can be set up or down as low as 1 second, depending on how many hosts one wishes to monitor, again remembering that it is supposed to be lightweight and that it is a bad idea to run it so fast that the return latency causes the loop to pile up). This should be pretty easy -- you can actually talk to the daemon with telnet, so watching it work and testing the api is not a problem. You've got wulfstat sources to play with (both tools fully GPL). The daemon returns XML, which is easy to parse out. Finally, there are a fair number of tools or libraries that you can pipe this output into to generate graphs, either on the web or some other console. One day I'll actually write such a tool myself, but wulfstat proved so adequate for most of what we use it for that I haven't been able to justify advancing the project to the top of the triage-heap of bloody and neglected projects that fill my life:-). If you do write one, feel free to do so collaboratively and donate it back to the project so we can all share, although of course the GPL wouldn't require this as far as I can see for clients not derived from wulfstat code or that you write for yourself. xmlsysd and wulfstat have been in "production" use locally for some time, but they are still probably beta level code because most people use ganglia with its web-based displays. Personally I think xmlsysd/wulfstat provide a pretty rich set of monitor options (and actually is derived from code I originally wrote and was using somewhat before the ganglia project was begun, so I can't be accused of foolishly duplicating an existing project:-). If you have any problems with them I will cheerfully fix them, and if you have any ideas for additions or improvements that wouldn't drive me mad timewise to implement, I was cheerfully add them. rgb > > Regards, > A. M. White > > ###################################################### > Adam M. White > University of New Brunswick Saint John > http://www.unbsj.ca/sase/csas > m0ukb at unb.ca > ###################################################### > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Tue Aug 5 11:45:20 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Tue, 5 Aug 2003 11:45:20 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: <1060025481.28642.81.camel@roughneck> Message-ID: On 4 Aug 2003, Nicholas Henke wrote: We have a lot of experience with IDE RAID arrays at client sites. The DOE lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) and the price is hard to beat. The raid array that serves home directories to their clusters and workstations is backed up nightly to a second raid server, similarly to your system. To speed things along we installed an extra gigabit card in the primary and backup servers and connected the two directly. The nightly backup (cp -auf via NFS) of 410 GBs take just over an hour using the dedicated gbit link. Rsync would probably be faster. Without the shortcircuit gigabit link, it used to run four or five times longer and seriously impact NFS performance for the rest of the systems on the LAN. Hope this helps. Regards, Mike Prinkey Aeolus Research, Inc. > Hey all -- here is our situation. > > We currently have several clusters that are configured with either IBM > x342 or Dell 2650 serves with their respective vendors SCSI RAID arrays > hanging off of them. Each server + array is good for around 600GB after > RAID 5 and formatting. The IBM's have the added ability to do a RAID 50 > of multiple arrays ( which seems to work & perform quite nicely ). Each > of the servers then exports the filesystem via NFS, and is mounted on > the nodes. The clusters range from 24 to 128 nodes. For backups we > maintain an offline server + array that we use to rsync the data > nightly, then use our amanda server and tape robot to backup. We use an > offline sync, as we need a level 0 dump every 2 weeks, and doing a level > 0 dump of 600GB just trashes the performance on a live server. As we are > a .edu and all of the clusters were purchased by the individual groups, > the options we can explore have to be very cost efficient for hardware, > and free for software. > > Now for the problem... > A couple of our clusters are using the available filespace quite > rapidly, and we are looking to add space. The most cost efficient > approach we have found is to buy a IDE RAID box, like those available > from RaidZone or PogoLinux. This allows us to use the cheap IDE systems > as the offline sync, and use the scsi systems as online servers. > > And the questions: > > 1) Is there a better way to backup the systems without the need for an > offline sync? > > 2) Does anyone have experience doing RAID 50 with Dell hardware? How bad > does it bite ? > > 3) Are there any recommended IDE RAID systems? We are not looking for > super stellar performance, just a solid system that does it's job as an > offline sync for backups. > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Tue Aug 5 12:34:03 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Tue, 5 Aug 2003 12:34:03 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: <1060098427.30922.6.camel@roughneck> Message-ID: On 5 Aug 2003, Nicholas Henke wrote: > Definately does -- can you recommend hardware for the IDE RAID, or list > what you guys have used ? > > Nic > I started building these arrays when 20 GBs was a big drive and hardware ide raid controllers were very expensive. So old habits die hard. Most of my experience has been with Software RAID in Linux. We use Promise Ultra66/100/133 controller cards, Maxtor 80 - 200 GB 5400-rpm drives, and Intel-chipset motherboards. I use the Promise cards, again because they were what was available and supported in Linux in the late 90s. They are limited to two IDE channels per cards, but I have used 3 cards in addition to the on-board IDE in large arrays before. Some people buy the IDE Raid cards that have 4 or 8 IDE channels and then use Software RAID instead. The conventional wisdom is that you should only put one drive on each IDE channel to maximize performance. I have built arrays with single drive per channel and two drives per channel and find that is not really true for ATA100 and faster controllers. Two of these drives cannot saturate a 100 or 133 MB/s channel. Typically, we put eight drives in an array. I have been using a 4U rack enclosure that has 8 exposed 5.25 bays. This works well because mounting the drives in a 5.25 bay gives a nice air gap for cooling. Stacking 3 or more drives tightly together heats the middle ones up quite a bit. I also usually use 5400-RPM drives to keep the heat production down. I only use Intel chipset motherboards. Normally just single CPU P4. One of the boards with 1 or 2 onboard gigabit controllers would be a nice choice. 1 GB of RAM is more than enough, but do use ECC. Also, if you use the newest kernels, the onboard IDE controllers are fast enough to be used in the array. For an 8-drive array, I will normally use 1 promise addin card and the two on-board channels. Important Miscellany: - Power Supply. Don't skimp. 400W+ from a good vendor - IDE cables <=24" long. I tried to use the 36" IDE cables once and it nearly drove me nuts with drive corruption and random errors. The 24" ones work very well and usually give you enough length to route to 8 drives in an enclosure. Once Serial ATA gets cheaper, this will no longer be an issue. - UPS. In general, you can NEVER allow a power failure to take down the raid server. There is at least a 50% chance of low-level drive corruption on an 8-drive array if it loses power. (Don't ask about the time the cleaning crew unplugged the array from the USP!) We use a smart UPS and UPS monitoring software (upsmon) to unmount the array and raidstop it if the power goes out for more than 30 secs. I am also tempted to not even connect the power switch on the front panel. Reseting a crashed system is OK, but powering it off doesn't give the hard drives a chance to flush their buffers to disk. With 8+ spinning drives, there is a good chance at least one of them will be corrupted. - Bonnie and burn-in. There are many problems that can crop up when you build the array. IRQ issues, etc. It is paramount that you throughly abuse the array with something like bonnie to make sure that everything is working. I typically mkraid which starts the array synching, mke2fs on the raid device, and then mount the filesystem and run bonnie on it all while it is still synching. This is pretty hard on the whole system and if there is a problem, you will notice quickly. Once it is done resyncing, I usually run bonnie overnight to burn it in and verify that performance is reasonable. - Fixing things. If you do have a power failure and the raid doesn't come back up, it is usually do to a hard drive problem. The only way to fix it is to run a low-level utility (Maxtor Powermax) on the drive. Maybe someone know how to do something similary within Linux. If so, I would love to hear about it. Again, our approach is not necessarily exhaustively researched. This is just "what we do." So, take it for what it's worth. Best, Mike Prinkey Aeolus Research, Inc. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Mon Aug 4 09:50:09 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Mon, 04 Aug 2003 08:50:09 -0500 Subject: Cisco switches for lam mpi In-Reply-To: References: Message-ID: <3F2E6491.1020802@tamu.edu> I should have commented earlier, but I didn't think I had time... My experience with the Cisco 4006 was that as an aggregation switch it was OK for 10/100 or GBE. It did fine for normal "enterprise switching. The 4006's I've used had only older Supervisor Modules and ran CAT-OS, rather than IOS like the 4506 I'm testing now. For higher performance, while CPU utilization stays low, the switch falls off at higher loads. Caveat: I did not test these devices in a cluster environment; the thought never crossed my mind. I'd be using a 6509 if I had to use a Cisco, but I'd probably be shopping for HP ProCurves, Foundry's, Riverstones, or NEC Bluefires, based on what I've seen and done lately. I tested the 4006 in normal enterprise mode, and loaded it for high-perf network modes. If you ever need QoS do NOT use a 4006. Or a 4506. They can't handle it too well. But I digress. I'm gonna try to get a couple of ProCurves in and test 'em against a LAN tester made by Anritsu (MD1230/1231) for small packet capability (RFC-2544). That's been a killer for a lot of switches I've looked at. gerry Felix Rauch wrote: > On Tue, 29 Jul 2003, Jack Douglas wrote: > >>We have just installed a 32 Node Dual Xeon Cluster, with a Cisco Cataslyst >>4003 Chassis with 48 1000Base-t ports. >> >>We are running LAM MPI over gigabit, but we seem to be experiencing >>bottlenecks within the switch >> >>Typically, using the cisco, we only see CPU utilisation of around 30-40% > > [...] > > I'm not a Cisco expert, but... > > We once got a Cisco switch from our networking people that we had to > return immediately because it delivered such a bad performance. It was > a Catalyst 2900XL with 24 Fast Ethernet ports, but it could only > handle 12 ports at full speed. Above that, the performance brake down > completely. > > For some benchmark results see, e.g.: > http://www.cs.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf > > As a comparison, the quite nice results of a CentreCom 742i: > http://www.cs.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf > > Disclaimer: Maybe the Cisco you mentioned is better, or Ciscos improved > anyway since spring 2001 when I did the above tests. Besides, the > situation for Gigabit Ethernet could be different. > > As we described on our workshop paper at CAC03 you can not trust the > data sheets of switches anyway: > http://www.cs.inf.ethz.ch/CoPs/publications/#cac03 > > Conclusion: If you need a very high performing switch, you have to > evaluate/benchmark it yourself. > > - Felix > -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Wed Aug 6 08:07:45 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Wed, 06 Aug 2003 07:07:45 -0500 Subject: large filesystem & fileserver architecture issues. In-Reply-To: <1060098427.30922.6.camel@roughneck> References: <1060098427.30922.6.camel@roughneck> Message-ID: <3F30EF91.6080606@tamu.edu> We just implemented an IDE RAID system for some meteorology data/work. We're pretty happy with the results so far. Our hardwre complement is: SuperMicro X5DAE Motherboard dual Xeon 2.8GHz processors 2 GB Kingston Registered ECC RAM 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters 10 Maxtor 250 GB 7200 RPM disks 1 Maxtor 60 GB drive for system work 1 long multi-drop disk power cable... SuperMicro case (nomenclature escapes me, however, it has 1 disk bays and fits the X5DAE MoBo Cheapest PCI video card I could find (no integrated video on MoBo) Add-on Intel GBE SC fiber adapter Drawbacks: 1. I should have checked for integrated video for simplicity 2. Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with ALL the patches 3. Make sure you order the rack mount parts when you order the case; it only appeared they were included... 4. Questions have been raised about the E-1000 integrated GBE copper NIC on the Mobo; Doesn't matter: it's gonna be connected to a 100M switch and GBE will be on fiber like God intended data to be passed (No, I don't trust most terminations for GBE on copper!) It's up and working. Burning in for the last 2 weeks with no problems, it's going to the Texas GigaPoP today where it'll be live on Internet2. HTH, Gerry Nicholas Henke wrote: > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > >>On 4 Aug 2003, Nicholas Henke wrote: >> >>We have a lot of experience with IDE RAID arrays at client sites. The DOE >>lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. >>The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) >>and the price is hard to beat. The raid array that serves home >>directories to their clusters and workstations is backed up nightly to a >>second raid server, similarly to your system. To speed things along we >>installed an extra gigabit card in the primary and backup servers and >>connected the two directly. The nightly backup (cp -auf via NFS) of 410 >>GBs take just over an hour using the dedicated gbit link. Rsync would >>probably be faster. Without the shortcircuit gigabit link, it used to run >>four or five times longer and seriously impact NFS performance for the >>rest of the systems on the LAN. >> >>Hope this helps. >> >>Regards, >> >>Mike Prinkey >>Aeolus Research, Inc. > > > Definately does -- can you recommend hardware for the IDE RAID, or list > what you guys have used ? > > Nic -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Douglas.L.Farley at nasa.gov Wed Aug 6 08:35:10 2003 From: Douglas.L.Farley at nasa.gov (Doug Farley) Date: Wed, 06 Aug 2003 08:35:10 -0400 Subject: large filesystem & fileserver architecture issues. In-Reply-To: <009701c35b85$714e7110$7101a8c0@Navatek.local> References: <1060098427.30922.6.camel@roughneck> Message-ID: <5.0.2.1.2.20030806081148.00a94be8@pop.larc.nasa.gov> I noticed with acnc's 14 unit raid they used an IDE-SCSI U3 something or another, anyone know what type of hardware they used to convert the drives for this array? Just direct IDE-SCSI adaptors (which I've not seen cheaper than $80) on each drive and then connecting to something like an adaptec Raid card? Does anyone have any experience with doing this (with off the shelf parts) to create a semi-cheep raid (maybe 10 x $250 for 250G disk, + 10 x $80 IDE-SCSI converter + $800 expensive adaptec 2200 esq card )? Those costs are higher (~$420/disk ) than doing 10 disks on a 3ware 7500-12 (~$320/disk) (costs excluding host system), so is whatever gained really worth it? Doug At 09:11 AM 8/5/2003 -1000, you wrote: >I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true >NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable. >Although they do offer a NFS box that will turn one of these arrays into a >standalone. We have had great success with these units >(http://neptune.navships.com/images/harddrivearrays.jpg) . We first >acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's. >We have set it up in a RAID-5 configuration and have not yet had to replace >even one of the drives (Knockin on wood). After a year we picked up the >14slot chassis and filled it with 160 maxtor drives and it has performed >flawless... I think we paig about $4000 for the 14 slot chassis. you can >add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver >for $1500 and you got about 2TB of storage for around $7000 > >Mitchel Kagawa > >----- Original Message ----- >From: "Nicholas Henke" >To: "Michael T. Prinkey" >Cc: >Sent: Tuesday, August 05, 2003 5:47 AM >Subject: Re: large filesystem & fileserver architecture issues. > > > > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > > > On 4 Aug 2003, Nicholas Henke wrote: > > > > > > We have a lot of experience with IDE RAID arrays at client sites. The >DOE > > > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for >them. > > > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec >write) > > > and the price is hard to beat. The raid array that serves home > > > directories to their clusters and workstations is backed up nightly to a > > > second raid server, similarly to your system. To speed things along we > > > installed an extra gigabit card in the primary and backup servers and > > > connected the two directly. The nightly backup (cp -auf via NFS) of 410 > > > GBs take just over an hour using the dedicated gbit link. Rsync would > > > probably be faster. Without the shortcircuit gigabit link, it used to >run > > > four or five times longer and seriously impact NFS performance for the > > > rest of the systems on the LAN. > > > > > > Hope this helps. > > > > > > Regards, > > > > > > Mike Prinkey > > > Aeolus Research, Inc. > > > > Definately does -- can you recommend hardware for the IDE RAID, or list > > what you guys have used ? > > > > Nic > > -- > > Nicholas Henke > > Penguin Herder & Linux Cluster System Programmer > > Liniac Project - Univ. of Pennsylvania > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf ============================== Doug Farley Data Analysis and Imaging Branch Systems Engineering Competency NASA Langley Research Center < D.L.FARLEY at LaRC.NASA.GOV > < Phone +1 757 864-8141 > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Wed Aug 6 15:09:59 2003 From: ctierney at hpti.com (Craig Tierney) Date: 06 Aug 2003 13:09:59 -0600 Subject: large filesystem & fileserver architecture issues. In-Reply-To: <3F30EF91.6080606@tamu.edu> References: <1060098427.30922.6.camel@roughneck> <3F30EF91.6080606@tamu.edu> Message-ID: <1060196998.8961.17.camel@woody> On Wed, 2003-08-06 at 06:07, Gerry Creager N5JXS wrote: > We just implemented an IDE RAID system for some meteorology data/work. > We're pretty happy with the results so far. Our hardwre complement is: > > SuperMicro X5DAE Motherboard > dual Xeon 2.8GHz processors > 2 GB Kingston Registered ECC RAM > 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters > 10 Maxtor 250 GB 7200 RPM disks > 1 Maxtor 60 GB drive for system work > 1 long multi-drop disk power cable... > SuperMicro case (nomenclature escapes me, however, it has 1 disk bays > and fits the X5DAE MoBo > Cheapest PCI video card I could find (no integrated video on MoBo) > Add-on Intel GBE SC fiber adapter > Hardware choices look good. How did you configure it? Are there 1 or 2 filesystems? Raid 0, 1, 5? Do you have any performance numbers on the setup (perferably large file, dd type tests)? Thanks, Craig > Drawbacks: > 1. I should have checked for integrated video for simplicity > 2. Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with > ALL the patches > 3. Make sure you order the rack mount parts when you order the case; it > only appeared they were included... > 4. Questions have been raised about the E-1000 integrated GBE copper > NIC on the Mobo; Doesn't matter: it's gonna be connected to a 100M > switch and GBE will be on fiber like God intended data to be passed (No, > I don't trust most terminations for GBE on copper!) > > It's up and working. Burning in for the last 2 weeks with no problems, > it's going to the Texas GigaPoP today where it'll be live on Internet2. > > HTH, Gerry > > Nicholas Henke wrote: > > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote: > > > >>On 4 Aug 2003, Nicholas Henke wrote: > >> > >>We have a lot of experience with IDE RAID arrays at client sites. The DOE > >>lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them. > >>The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write) > >>and the price is hard to beat. The raid array that serves home > >>directories to their clusters and workstations is backed up nightly to a > >>second raid server, similarly to your system. To speed things along we > >>installed an extra gigabit card in the primary and backup servers and > >>connected the two directly. The nightly backup (cp -auf via NFS) of 410 > >>GBs take just over an hour using the dedicated gbit link. Rsync would > >>probably be faster. Without the shortcircuit gigabit link, it used to run > >>four or five times longer and seriously impact NFS performance for the > >>rest of the systems on the LAN. > >> > >>Hope this helps. > >> > >>Regards, > >> > >>Mike Prinkey > >>Aeolus Research, Inc. > > > > > > Definately does -- can you recommend hardware for the IDE RAID, or list > > what you guys have used ? > > > > Nic -- Craig Tierney _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Wed Aug 6 16:55:09 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed, 6 Aug 2003 16:55:09 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: <3F30EF91.6080606@tamu.edu> Message-ID: > SuperMicro X5DAE Motherboard > dual Xeon 2.8GHz processors > 2 GB Kingston Registered ECC RAM > 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters > 10 Maxtor 250 GB 7200 RPM disks > 1 Maxtor 60 GB drive for system work > 1 long multi-drop disk power cable... > SuperMicro case (nomenclature escapes me, however, it has 1 disk bays > and fits the X5DAE MoBo > Cheapest PCI video card I could find (no integrated video on MoBo) > Add-on Intel GBE SC fiber adapter > > Drawbacks: > 1. I should have checked for integrated video for simplicity I did something similar a little while back: a tyan thunder e7500 board, just one Xeon, just 1G ram, integrated video/gigabit (copper), 3ware 8500-8 in jbod mode, 8x200G WD JB disks and a ~500W PS. I don't see any reason for adding extra ram or putting in multiple, higher-powered CPUs for a fileserver. > 2. Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with > ALL the patches I'll be doing more boxes, probably with something like 8x250 SATA disks, with a pair of promise tx4 cards. open-source drivers for these cards recently became available, btw. there was a very interesting talk at OLS about doing raid intelligently over a network... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From luis.licon at yakko.cimav.edu.mx Thu Aug 7 12:05:45 2003 From: luis.licon at yakko.cimav.edu.mx (Luis Fernando Licon Padilla) Date: Thu, 07 Aug 2003 10:05:45 -0600 Subject: test Message-ID: <3F3278D9.5000709@yakko.cimav.edu.mx> _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From John.Hearns at micromuse.com Thu Aug 7 09:12:55 2003 From: John.Hearns at micromuse.com (John Hearns) Date: Thu, 07 Aug 2003 14:12:55 +0100 Subject: AMD core maths library Message-ID: <3F325057.4080801@micromuse.com> Sorry if this is old news to everyone. I saw a snippet in Linux Magazine (UK/German type) on the AMD Core Math Library for Opterons. https://wwwsecure.amd.com/gb-uk/Processors/DevelopWithAMD/0,,30_2252_2282,00.html Says it is initially released in FORTAN, with BLAS, LAPACK and FFTs. g77 under Linux and Windows. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Thu Aug 7 09:54:29 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Thu, 07 Aug 2003 08:54:29 -0500 Subject: large filesystem & fileserver architecture issues. In-Reply-To: References: Message-ID: <3F325A15.80901@tamu.edu> Mark Hahn wrote: >>SuperMicro X5DAE Motherboard >>dual Xeon 2.8GHz processors >>2 GB Kingston Registered ECC RAM >>2 HighPoint RocketRAID 404 4-channel IDE RAID adapters >>10 Maxtor 250 GB 7200 RPM disks >>1 Maxtor 60 GB drive for system work >>1 long multi-drop disk power cable... >>SuperMicro case (nomenclature escapes me, however, it has 1 disk bays >>and fits the X5DAE MoBo >>Cheapest PCI video card I could find (no integrated video on MoBo) >>Add-on Intel GBE SC fiber adapter >> >>Drawbacks: >>1. I should have checked for integrated video for simplicity > > > I did something similar a little while back: a tyan thunder e7500 board, > just one Xeon, just 1G ram, integrated video/gigabit (copper), 3ware 8500-8 > in jbod mode, 8x200G WD JB disks and a ~500W PS. > > I don't see any reason for adding extra ram or putting in multiple, > higher-powered CPUs for a fileserver. This one will A) be on the Unidata weather distribution network for general weather data AND the newer real-time radar feeds; B) be extracting some of that data for graphics; C) be doing NNTP for Unidata (one, exactly, newsgroup) for a research project; D) reside on the I2 Logistical Backbone... It's a busy box. >>2. Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with >>ALL the patches > > I'll be doing more boxes, probably with something like 8x250 SATA disks, > with a pair of promise tx4 cards. open-source drivers for these cards > recently became available, btw. > > there was a very interesting talk at OLS about doing raid intelligently > over a network... Check out loki.cs.utk.edu (I think: It's certainly a project called 'loki' and run by Micah Beck at utk.edu) about the logistical backbone. I didn't go with Promise cards because of one of my grad students, who's obviously better funded than me... He's looked at Promise, HighPoint and at least one other card, and had comparisons, and strongly recommended HighPoint as a Price/Performance leader. The HighPoints were less expensive and currently boast the same performance as the tx4's. Everyone's getting into the SATA game; I didn't go that way because I wanted to get to the 2 TB point and couldn't reasonably do it today with SATA; maybe later. I didn't want to take the time to hack the drivers HighPoint had available, since i'm overloaded these days. -- Gerry Creager -- gerry.creager at tamu.edu Network Engineering -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Page: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Wed Aug 6 19:55:07 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Wed, 6 Aug 2003 19:55:07 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: Message-ID: On Wed, 6 Aug 2003, Mark Hahn wrote: > > there was a very interesting talk at OLS about doing raid intelligently > over a network... > I have considered trying this using network block devices, but I haven't had the opportunity to try it. Is this what you are talking about or something different? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Aug 7 14:15:39 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 7 Aug 2003 14:15:39 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: Message-ID: On Wed, 6 Aug 2003, Michael T. Prinkey wrote: > On Wed, 6 Aug 2003, Mark Hahn wrote: > > > > there was a very interesting talk at OLS about doing raid intelligently > > over a network... > > > > I have considered trying this using network block devices, but I haven't > had the opportunity to try it. Is this what you are talking about or > something different? thre are similarities: http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf but it's really a development beyond NBD or DRDB. hmm, I'm not sure that brief pdf is either complete or does the idea justice. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Aug 7 15:15:01 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 7 Aug 2003 15:15:01 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: Message-ID: > I read the abstract last evening and got a taste for it. That is really a > remarkable idea to use the ethernet checksum for data integrity of stored > data. Thanks for the heads-up. for me, the crux of the idea is: - if you want big storage, $/GB drives you to IDE. - IDE is not amazingly fast, reliable or scalable. - building storage bricks out of IDE makes a lot of sense, since they can now be quite dense, low-overhead, etc. - ethernet is a wonderfully hot-pluggable interconnect for this kind of thing. - doing raid over a multicast-capable network is pretty cool. - using eth's checksumming is pretty cool. - doing it this way (all open-source, including software raid) means the system is much more transparent - you are not dependent on some closed-source vendor tools to control/monitor/upgrade your storage. Ben's approach (along with Lustre, for instance) seems very sweet for HPC type storage needs. one thing I do ponder, though, is whether it really makes sense to hide raid so firmly under the block layer. it's conceptually tidy, to be sure, and works well in practice. but suppose: - to create a filesystem, you hand some arbitrary collection of block-device extents to the mkfs tool. you also let it know which extents happen to reside on the same disk, bus, host, UPS, geographic location, etc. - you can tell the FS that your default policy should be for reliability - that raid5 across separate disks is OK, for instance. or maybe you can tell it that a particular file should be raid10 instead. or that a file should be raid1 across each geographic site. or that updates to a file should be logged. or that it should transparently compress older files. - the FS might do other HSM-like things, such as incorporating knowlege of what's on your tape/DVD/cdrom's. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Thu Aug 7 14:33:23 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Thu, 7 Aug 2003 14:33:23 -0400 (EDT) Subject: large filesystem & fileserver architecture issues. In-Reply-To: Message-ID: > > > > I have considered trying this using network block devices, but I haven't > > had the opportunity to try it. Is this what you are talking about or > > something different? > > thre are similarities: > > http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf > > but it's really a development beyond NBD or DRDB. hmm, I'm not sure > that brief pdf is either complete or does the idea justice. > I read the abstract last evening and got a taste for it. That is really a remarkable idea to use the ethernet checksum for data integrity of stored data. Thanks for the heads-up. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From twhitcomb at apl.washington.edu Thu Aug 7 15:55:15 2003 From: twhitcomb at apl.washington.edu (Timothy R. Whitcomb) Date: Thu, 7 Aug 2003 12:55:15 -0700 (PDT) Subject: (Scyld) Nodes going down unexpectedly Message-ID: We have a 10-processor cluster and are currently running a weather model on 4 of the processors. When I try to up the number, it works for a while, then the "beostatus" window will show one node's information not changing for a little while before it shows the node status as "down". Each node is dual-processor and I have noticed (but not verified) that this becomes an issue when both processors on a node are in use. After the node status changes to "down", I cannot restart it through the console tools on the root node. However, I know that the node is still alive and on the network because I can ping it successfully. This problem requires me to actually restart the node by hand, which is a bit of an issue since we're on opposite sides of the building. What's going on here and what can I do to mitigate/fix this? Tim Whitcomb twhitcomb at apl.washington.edu Applied Physics Lab University of Washington _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Aug 7 17:51:10 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 7 Aug 2003 14:51:10 -0700 Subject: large filesystem & fileserver architecture issues. In-Reply-To: References: Message-ID: <20030807215110.GA2780@greglaptop.internal.keyresearch.com> On Thu, Aug 07, 2003 at 03:15:01PM -0400, Mark Hahn wrote: > - IDE is not amazingly fast, reliable or scalable. That's about like saying "commondity servers are not fast, reliable, or scalable, so I'm going to buy an SGI Altix instead of a Beowulf." More facts, less religion. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Thu Aug 7 18:04:25 2003 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Thu, 7 Aug 2003 17:04:25 -0500 Subject: large filesystem & fileserver architecture issues. In-Reply-To: <20030807215110.GA2780@greglaptop.internal.keyresearch.com> References: <20030807215110.GA2780@greglaptop.internal.keyresearch.com> Message-ID: On Thu, 7 Aug 2003, Greg Lindahl wrote: > On Thu, Aug 07, 2003 at 03:15:01PM -0400, Mark Hahn wrote: > > > - IDE is not amazingly fast, reliable or scalable. > > That's about like saying "commondity servers are not fast, reliable, > or scalable, so I'm going to buy an SGI Altix instead of a Beowulf." > > More facts, less religion. Since when has the value of facts outweighed religion on *THIS* list?! _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sun Aug 10 04:16:42 2003 From: rouds at servihoo.com (RoUdY) Date: Sun, 10 Aug 2003 12:16:42 +0400 Subject: Implementing MPICH2-0.93 In-Reply-To: <200308081902.h78J20w27961@NewBlue.Scyld.com> Message-ID: Hello Can someone tell me if they ever use this MPI version. Because I have some difficulty in implementing it. I was unable to implement the slave nodes. thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sun Aug 10 04:16:42 2003 From: rouds at servihoo.com (RoUdY) Date: Sun, 10 Aug 2003 12:16:42 +0400 Subject: Implementing MPICH2-0.93 In-Reply-To: <200308081902.h78J20w27961@NewBlue.Scyld.com> Message-ID: Hello Can someone tell me if they ever use this MPI version. Because I have some difficulty in implementing it. I was unable to implement the slave nodes. thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Sun Aug 10 14:51:40 2003 From: gropp at mcs.anl.gov (William Gropp) Date: Sun, 10 Aug 2003 13:51:40 -0500 Subject: Implementing MPICH2-0.93 In-Reply-To: References: <200308081902.h78J20w27961@NewBlue.Scyld.com> Message-ID: <5.1.1.6.2.20030810135037.016afce8@localhost> At 12:16 PM 8/10/2003 +0400, RoUdY wrote: >Hello >Can someone tell me if they ever use this MPI version. Because I have some >difficulty in implementing it. I was unable to implement the slave nodes. Questions and bug reports on MPICH2 should be sent to mpich2-maint at mcs.anl.gov . Thanks! Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gropp at mcs.anl.gov Sun Aug 10 14:51:40 2003 From: gropp at mcs.anl.gov (William Gropp) Date: Sun, 10 Aug 2003 13:51:40 -0500 Subject: Implementing MPICH2-0.93 In-Reply-To: References: <200308081902.h78J20w27961@NewBlue.Scyld.com> Message-ID: <5.1.1.6.2.20030810135037.016afce8@localhost> At 12:16 PM 8/10/2003 +0400, RoUdY wrote: >Hello >Can someone tell me if they ever use this MPI version. Because I have some >difficulty in implementing it. I was unable to implement the slave nodes. Questions and bug reports on MPICH2 should be sent to mpich2-maint at mcs.anl.gov . Thanks! Bill _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Mon Aug 11 22:44:45 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Tue, 12 Aug 2003 10:44:45 +0800 (CST) Subject: PBSPro with 1024 nodes :-O (oh!) Message-ID: <20030812024445.80371.qmail@web16812.mail.tpe.yahoo.com> Looks like the problems with OpenPBS in large clusters were all fixed in PBSPro, ASU has a 1024 node cluster (http://www.pbspro.com/press_030811.html). Also, heard from PBS developers that the next release of PBSPro (5.4) will add fault tolerance in the master node, very similar to the shadow master concept in Gridengine. Sounds to me PBSPro is very much better than OpenPBS. Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Aug 12 20:59:17 2003 From: becker at scyld.com (Donald Becker) Date: Tue, 12 Aug 2003 20:59:17 -0400 (EDT) Subject: $900,000 RFP for climate simulation machine at UC Irvine (fwd) Message-ID: -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 ---------- Forwarded message ---------- Date: Tue, 12 Aug 2003 16:54:44 -0700 From: Charlie Zender To: Donald Becker Subject: $900,000 RFP for climate simulation machine at UC Irvine Dear Donald, Ooops. Forgot the announcement itself. Here it is. Please disseminate! Thanks, Charlie Cut here ====================================================================== Dear High Performance Computing Vendor, The University of California at Irvine is pleased to announce the immediate availability of US$900,000 towards the purchase of an Earth System Modeling Facility (ESMF). Following a competitive bid process open to all interested vendors, the ESMF contract will be awarded to the proposal with the most competitive response to our Request for Proposals (RFP). All necessary details about the ESMF and the RFP process are available from the ESMF homepage: http://www.ess.uci.edu/esmf Bids are due August 22, 2003. Please visit the ESMF homepage for more details and contact Mr. Ralph Kupcha with any questions. All further contact contact with potential vendors will take place on the ESMF Potential Vendor Mail List. You may subscribe to this list by visiting https://maillists.uci.edu/mailman/listinfo/esmfvnd Please pass this Announcement of Opportunity on to any interested colleagues. Sincerely, Ralph Kupcha Senior Buyer, Procurement Services, UCI _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jhearns at micromuse.com Thu Aug 14 05:04:57 2003 From: jhearns at micromuse.com (John Hearns) Date: Thu, 14 Aug 2003 10:04:57 +0100 Subject: Slashdot thread on supercomputers Message-ID: <3F3B50B9.4090405@micromuse.com> Everyone has probably seen the thread on Slashdot. Here are links to the two relevant stories. http://www.eetimes.com/story/OEG20030811S0018 http://www.eetimes.com/story/OEG20030812S0011 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From farschad at myrealbox.com Thu Aug 14 15:21:47 2003 From: farschad at myrealbox.com (Farschad Torabi) Date: Fri, 15 Aug 2003 00:11:47 +0450 Subject: MPICH Message-ID: <1060890107.c2a01a60farschad@myrealbox.com> Hi, I am a new user to this mailing list. And also I am very new to Beowulf clusters. So I will have to many questions, please be patient :^) At the moment, I want to run a sample program using MPI. The program is in F90 and I use PGF90 to compile it. I installed the MPICH and pgf90 and compiled the program successfully. Now, my question is that how can I run the output executable file on a cluster?? Should I use lamboot to make the systems ready to work together?? It seems that lamboot is not for MPICH. It is for LAM and now I want to know, what is the alternative command for lamboot!! Thank you in advance Farschad Torabi _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jconnor at atmos.colostate.edu Thu Aug 14 18:55:20 2003 From: jconnor at atmos.colostate.edu (Jason Connor) Date: 14 Aug 2003 16:55:20 -0600 Subject: MPICH In-Reply-To: <1060890107.c2a01a60farschad@myrealbox.com> References: <1060890107.c2a01a60farschad@myrealbox.com> Message-ID: <1060901719.6160.11.camel@gentoo.atmos.colostate.edu> Hi Farschad, Here are only some possible answers to your questions. Like all things, there is more than one way to do these things. On Thu, 2003-08-14 at 13:21, Farschad Torabi wrote: > Hi, > I am a new user to this mailing list. > And also I am very new to Beowulf clusters. > So I will have to many questions, please be patient :^) > > At the moment, I want to run a sample program using > MPI. The program is in F90 and I use PGF90 to compile it. > > I installed the MPICH and pgf90 and compiled the program successfully. Now, my question is that how can I run the output executable file on a cluster?? using mpich: /bin/mpirun -np <# of nodes to run on> \ -machinefile /util/machines/machines.LINUX \ the -machinefile doesn't need need to be explicit, as long as you have the file mentioned above filled with the names of your cluster nodes. mpirun --help is always a good reference =) > > Should I use lamboot to make the systems ready to work together?? It seems that lamboot is not for MPICH. It is for LAM and now I want to know, what is > the alternative command for lamboot!! There isn't one. Just have whatever shell your using with mpich (rsh or ssh) setup so that you don't need a password to login to the nodes. > > Thank you in advance > Farschad Torabi > I hope this helps. In case you care, I like lam better. =) Jason Connor Colorado State University Prof. Scott Denning's BioCycle Research Group jconnor at atmos.colostate.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Aug 14 21:52:06 2003 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 15 Aug 2003 11:52:06 +1000 Subject: Scalable PBS Message-ID: <200308151152.09499.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, Just joined the list, so apologies if this is already well known. I noticed a recent message in the archive about OpenPBS and problems with scalability, and I think it's worth noting that there is an alternative (and actively developed) fork of OpenPBS called "Scalable PBS" available from: http://www.supercluster.org/projects/pbs/ Amongst other features it has (quoting the website): Better Scalability - Significantly improved server to MOM communication model, the ability to handle larger clusters, larger jobs, larger messages, etc. - Scales up to 2K nodes vs ~300 nodes for standard OpenPBS. Improved Usability by incorporating more extensive logging, as well as, more human readable logging(ie no more 'error 15038 on command 42'). We're using SPBS here at VPAC on our IBM cluster and it's a lot better than the last OpenPBS release (2.3.16, from 2001). They forked off from 2.3.12 rather than the last OpenPBS because it had a more open license. The folks behind the project have worked very quickly with us to fix bugs we've been finding in it, typically when I found a bug they had fixed it within a day or so, usually overnight from my perspective in Oz. :-) If you are considering using it I'd suggest using the current snapshot release from: http://www.supercluster.org/downloads/spbs/temp/ as that irons out a couple of bugs that might bite. For the less adventurous there is a new release SOpenPBS-2.3.12p4 due out in the near future that will include the fixes from the current snapshot. cheers, Chris - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/PDzGO2KABBYQAh8RAnKwAJ9OeSE508v7elkeDHL2qDehjH9LvwCfUrmu J4wal1ph00ExP8w/5HgVCek= =Nyjb -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Thu Aug 14 13:28:21 2003 From: josip at lanl.gov (Josip Loncaric) Date: Thu, 14 Aug 2003 11:28:21 -0600 Subject: Two AMD Opteron clusters for LANL Message-ID: <3F3BC6B5.5040706@lanl.gov> This October, LANL will be getting large AMD Opteron model 244 clusters ("Lightning" consisting of 1408 dual-CPU machines and "Orange" consisting of 256 dual-CPU machines, both built by Linux Networx): http://www.itworld.com/Comp/1437/030814supercomp/ http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~73744,00.html Sincerely, Josip _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kapurs at seas.upenn.edu Fri Aug 15 11:41:39 2003 From: kapurs at seas.upenn.edu (kapurs at seas.upenn.edu) Date: Fri, 15 Aug 2003 11:41:39 -0400 Subject: Hard Drive Upgrade(Internal or External) Message-ID: <1060962099.3f3cff33d8b6c@webmail.seas.upenn.edu> Hi- Does any one know if we can add an external or internal hard drive (EIDE, 200GB) to the Dell Precision 530 Workstation. It's running on red hat linux 7.1, has a USB 1.1 port, two 36-GB SCSI hard drives. The primary EIDE controler on system board is empty. thanks- -sumeet- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Matthew_Wygant at dell.com Fri Aug 15 17:54:05 2003 From: Matthew_Wygant at dell.com (Matthew_Wygant at dell.com) Date: Fri, 15 Aug 2003 16:54:05 -0500 Subject: Hard Drive Upgrade(Internal or External) Message-ID: <6CB36426C6B9D541A8B1D2022FEA7FC10273F510@ausx2kmpc108.aus.amer.dell.com> The 530 appears to include both a SCSI U160 and 2 ATA100 IDE channels. The ATA100 defaults to 'auto' in the BIOS, so I would imagine the node should pick it up. -matt -----Original Message----- From: kapurs at seas.upenn.edu [mailto:kapurs at seas.upenn.edu] Sent: Friday, August 15, 2003 10:42 AM To: beowulf at beowulf.org Subject: Hard Drive Upgrade(Internal or External) Hi- Does any one know if we can add an external or internal hard drive (EIDE, 200GB) to the Dell Precision 530 Workstation. It's running on red hat linux 7.1, has a USB 1.1 port, two 36-GB SCSI hard drives. The primary EIDE controler on system board is empty. thanks- -sumeet- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From zender at uci.edu Fri Aug 15 18:14:58 2003 From: zender at uci.edu (Charlie Zender) Date: Fri, 15 Aug 2003 15:14:58 -0700 Subject: Bid deadline extended for UC Irvine climate computer Message-ID: Hi Donald, Response from members on the beowulf list has been so positive that we are extending our bid deadline in order to give your list members who want to bid a fair chance to prepare competitive bids. Would you please allow posting of this notice of extension so that those vendors who thought they may not have enough time to submit bids become aware of the two week extension? I promise not to bother you again :) One thought: We are not the only Institution buying medium size "super-computers" that Beowulf vendors might like to know about. It might be a good idea for the whole Beowulf community to create a separate list for RFPs. Such a list would help buyers and Beowulf vendors find eachother. Thanks! Charlie -- Charlie Zender, zender at uci dot edu, (949) 824-2987, Department of Earth System Science, University of California, Irvine CA 92697-3100 -------------------------------------------------------------------- Dear HPC Vendors, We are extending by two weeks the deadline for submission of bids in response to the $900,000 Earth System Modeling Facility RFP: http://www.ess.uci.edu/esmf The new bid deadline is Friday, September 5. All other deadlines and the expected timeline are also shifted by two weeks, and these changes are reflected on the recently updated web page and conference summary. Consequently, the deadline to send bid-related questions to Ralph Kupcha is Friday, August 29. We hope that this extension provides some additional breathing room to improve any parts of your bid that you might have rushed to finish. At the same time, we are now ready to accept any completed proposals and look forward to reading your ideas on how best to meet our coupled climate modeling needs. Sincerely, Ralph Kupcha _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sat Aug 16 00:03:57 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sat, 16 Aug 2003 12:03:57 +0800 (CST) Subject: Scalable PBS In-Reply-To: <200308151152.09499.csamuel@vpac.org> Message-ID: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> How big is your cluster? Did you use Gridengine before -- how does SPBS compare to SGE? Andrew. --- Chris Samuel ????> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > Just joined the list, so apologies if this is > already well known. > > I noticed a recent message in the archive about > OpenPBS and problems with > scalability, and I think it's worth noting that > there is an alternative (and > actively developed) fork of OpenPBS called "Scalable > PBS" available from: > > http://www.supercluster.org/projects/pbs/ > > Amongst other features it has (quoting the website): > > Better Scalability > - Significantly improved server to MOM > communication model, the ability to > handle larger clusters, larger jobs, larger > messages, etc. > - Scales up to 2K nodes vs ~300 nodes for > standard OpenPBS. > > Improved Usability by incorporating more extensive > logging, as well as, more > human readable logging(ie no more 'error 15038 on > command 42'). > > We're using SPBS here at VPAC on our IBM cluster and > it's a lot better than > the last OpenPBS release (2.3.16, from 2001). They > forked off from 2.3.12 > rather than the last OpenPBS because it had a more > open license. > > The folks behind the project have worked very > quickly with us to fix bugs > we've been finding in it, typically when I found a > bug they had fixed it > within a day or so, usually overnight from my > perspective in Oz. :-) > > If you are considering using it I'd suggest using > the current snapshot release > from: > > http://www.supercluster.org/downloads/spbs/temp/ > > as that irons out a couple of bugs that might bite. > > For the less adventurous there is a new release > SOpenPBS-2.3.12p4 due out in > the near future that will include the fixes from the > current snapshot. > > cheers, > Chris > - -- > Chris Samuel -- VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing > Bldg 91, 110 Victoria Street, Carlton South, > VIC 3053, Australia - http://www.vpac.org/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/PDzGO2KABBYQAh8RAnKwAJ9OeSE508v7elkeDHL2qDehjH9LvwCfUrmu > J4wal1ph00ExP8w/5HgVCek= > =Nyjb > -----END PGP SIGNATURE----- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sat Aug 16 00:51:40 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 16 Aug 2003 08:51:40 +0400 Subject: Beowulf digest, Vol 1 #1412 - 5 msgs In-Reply-To: <200308151905.h7FJ5uw13967@NewBlue.Scyld.com> Message-ID: Hello Jason Connor, It look as if you know something about mpich, well I am using MPICH2-0.93 and in this one their no directory for 'machines.linux' instead we have mpd.hosts. But my problem is that I do not know now to configure this file despite of reading the online help. Please help me Thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sat Aug 16 00:51:40 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 16 Aug 2003 08:51:40 +0400 Subject: Beowulf digest, Vol 1 #1412 - 5 msgs In-Reply-To: <200308151905.h7FJ5uw13967@NewBlue.Scyld.com> Message-ID: Hello Jason Connor, It look as if you know something about mpich, well I am using MPICH2-0.93 and in this one their no directory for 'machines.linux' instead we have mpd.hosts. But my problem is that I do not know now to configure this file despite of reading the online help. Please help me Thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From farschad at myrealbox.com Sat Aug 16 08:47:55 2003 From: farschad at myrealbox.com (Farschad Torabi) Date: Sat, 16 Aug 2003 17:37:55 +0450 Subject: Beowulf digest, Vol 1 #1412 - 5 msgs Message-ID: <1061039275.c61518e0farschad@myrealbox.com> Dear Jason Connor and Roudy, I think that my question covers Roudy's questions too ;^) First of all Roudy, the new version of MPICH is available on the net i.e. mpich-1.2.5; you can dl it. As Jason Connor advised me, I ran the following command: /bin/mpirun -np 1 -machinefile machs -arch machines.arc a.out the contents of machs is like this node1 node1 and the contents of machines.arc (architecture file): node1.parallel.net node1.parallel.net node1.parallel.net (Roudy I think that you have to use your file like this! the name of the machines are written in this file; in your case let say -arch mpd.hosts) the program runs well on -np 1 machine but, when I wanted to define two processes on a single machine (i.e -np 2)it messages me: "Could not find enough architecture for machines LINUX" the question is, can we define more that ONE processes on a SINGLE machine?? Thanks -----Original Message----- From: "RoUdY" To: beowulf at scyld.com, beowulf at beowulf.org Date: Sat, 16 Aug 2003 08:51:40 +0400 Subject: Re: Beowulf digest, Vol 1 #1412 - 5 msgs Hello Jason Connor, It look as if you know something about mpich, well I am using MPICH2-0.93 and in this one their no directory for 'machines.linux' instead we have mpd.hosts. But my problem is that I do not know now to configure this file despite of reading the online help. Please help me Thanks Roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Sat Aug 16 10:29:17 2003 From: rodmur at maybe.org (Dale Harris) Date: Sat, 16 Aug 2003 07:29:17 -0700 Subject: Scalable PBS In-Reply-To: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> References: <200308151152.09499.csamuel@vpac.org> <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> Message-ID: <20030816142917.GA24928@maybe.org> On Sat, Aug 16, 2003 at 12:03:57PM +0800, Andrew Wang elucidated: > How big is your cluster? > > Did you use Gridengine before -- how does SPBS compare > to SGE? > > Andrew. > In a quick glance, it already wins points with me because it uses GNU autoconf instead of aimk to build. -- Dale Harris rodmur at maybe.org /.-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Sat Aug 16 12:07:13 2003 From: rodmur at maybe.org (Dale Harris) Date: Sat, 16 Aug 2003 09:07:13 -0700 Subject: Scalable PBS In-Reply-To: <20030816142917.GA24928@maybe.org> References: <200308151152.09499.csamuel@vpac.org> <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> <20030816142917.GA24928@maybe.org> Message-ID: <20030816160713.GB24928@maybe.org> On Sat, Aug 16, 2003 at 07:29:17AM -0700, Dale Harris elucidated: > > > In a quick glance, it already wins points with me because it uses GNU > autoconf instead of aimk to build. > However, the fact that it requires tcl/tk does not. Whatever happen to the concept of making a simple tool that just does it's job well. I don't see why I need a GUI for a job scheduler. Let the emacs people make some frontend for it. Dale _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sat Aug 16 23:12:21 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sun, 17 Aug 2003 11:12:21 +0800 (CST) Subject: Scalable PBS In-Reply-To: <20030816160713.GB24928@maybe.org> Message-ID: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com> For SGE, I simply download the binary package, and then do the full install. I don't have to build the source so it doesn't matter if it use aimk or autoconf. I looked at SPBS a while ago, I think if you don't need to build the GUI, then you don't need tcl/tk, and you just need to use the command line for managing the cluster. Andrew. --- Dale Harris ???? > On Sat, Aug 16, 2003 at 07:29:17AM -0700, Dale > However, the fact that it requires tcl/tk does not. > Whatever happen to > the concept of making a simple tool that just does > it's job well. I > don't see why I need a GUI for a job scheduler. Let > the emacs people > make some frontend for it. > > Dale > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dmcollins79 at hotmail.com Sun Aug 17 07:14:24 2003 From: dmcollins79 at hotmail.com (Timothy M Collins) Date: Sun, 17 Aug 2003 12:14:24 +0100 Subject: Request for parallel applications to test on beowulf cluster. Message-ID: Hi, I have built a beowulf (Redhat8 with PVM&LAM) Looking for parallel applications for different size and complexity to test fault tolerance. If anybody has one or knows where I can find one/some, please let me know. Kind regards Collins _________________________________________________________________ Stay in touch with absent friends - get MSN Messenger http://www.msn.co.uk/messenger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Aug 17 21:52:56 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 18 Aug 2003 11:52:56 +1000 Subject: Scalable PBS In-Reply-To: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> References: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> Message-ID: <200308181152.57812.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 16 Aug 2003 02:03 pm, Andrew Wang wrote: > How big is your cluster? http://www.vpac.org/content/services_and_support/facility/linux_cluster.php (If it looks a little sparse, that's because someone's in the process of updating it) > Did you use Gridengine before -- how does SPBS compare > to SGE? Nope, it's always been running OpenPBS prior to migrating to SPBS. - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QDF4O2KABBYQAh8RAncrAJoDWbSivr52PpPy/jyNkqdVFqLLCwCfVK8S 604i8kwR1wNA+7J5oWMPxBg= =Znzi -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Aug 17 21:55:24 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 18 Aug 2003 11:55:24 +1000 Subject: Scalable PBS In-Reply-To: <20030816160713.GB24928@maybe.org> References: <200308151152.09499.csamuel@vpac.org> <20030816142917.GA24928@maybe.org> <20030816160713.GB24928@maybe.org> Message-ID: <200308181155.25278.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 17 Aug 2003 02:07 am, Dale Harris wrote: > However, the fact that it requires tcl/tk does not. Whatever happen to > the concept of making a simple tool that just does it's job well. I > don't see why I need a GUI for a job scheduler. Let the emacs people > make some frontend for it. 1) I don't believe it requires tk/tcl 2) The tk/tcl isn't for a GUI, it's for one of the example schedulers. 3) That was inherited from OpenPBS 4) There is a GUI (plain old X) for monitoring PBS, xpbsmon, but I'd ignore it if I were you.. cheers, Chris - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QDIMO2KABBYQAh8RAnYiAJ9TBbBiGNRSJTP122dhqr8fXtQF9ACfatF7 XL5HFH/3hMPqm1K0FuCJlc8= =+U9N -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Aug 17 21:57:25 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 18 Aug 2003 11:57:25 +1000 Subject: Scalable PBS In-Reply-To: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com> References: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com> Message-ID: <200308181157.26287.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 17 Aug 2003 01:12 pm, Andrew Wang wrote: > I looked at SPBS a while ago, I think if you don't > need to build the GUI, then you don't need tcl/tk, and > you just need to use the command line for managing the > cluster. The tk/tcl is for one of the example schedulers (there are 3, one written in C, one in tk/tcl and one in BASL). Viz: --set-sched=TYPE sets the scheduler type. If TYPE is "c" the scheduler will be written in C "tcl" the server will use a Tcl based scheduler "basl" will use the rule based scheduler "no" then their will be no scheduling done (the "c" scheduler is the default) - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QDKFO2KABBYQAh8RAtIZAJwN0D0dts5DyU3tSN4eLsucYn6DsQCgiB7q wVSIraBXrPWoODE2LbglW14= =4Etb -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rene.storm at emplics.com Mon Aug 18 05:26:17 2003 From: rene.storm at emplics.com (Rene Storm) Date: Mon, 18 Aug 2003 11:26:17 +0200 Subject: mulitcast copy or snowball copy Message-ID: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Hi Beowulfers, Problem: I want to distribute large files over a cluster. To raise performance I decided to copy the file to the local HD of any node in the cluster. Did someone find a multicast solution for that or maybe something with snowball principle? Till now I've take a look at msync (multicast rsync). Does someone have experiences with JETfs ? My idea was to write some scripts which copy files via rsync with snowball, but there are some heavy problems. e.g. What happens if one node (in the middle) is down. How does the next snowball generation know when to start copying (the last ones have finished copying)? Any ideas ? Thanks in advance Ren? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Mon Aug 18 09:12:52 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Mon, 18 Aug 2003 15:12:52 +0200 (CEST) Subject: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Message-ID: On Mon, 18 Aug 2003, Rene Storm wrote: > Problem: > I want to distribute large files over a cluster. > To raise performance I decided to copy the file to the local HD of > any node in the cluster. Quick solution: Dolly [1] ;-) Longer description: I once wrote a tool called "Dolly" to clone whole hard-disk drives, partitions, or large files to many nodes in a cluster. It does so by sending the files concurrently around the cluster in a "TCP chain". In a switched network, this solution is often faster then IP multicast becauce Dolly can use the proven TCP congestion control and error correction, whereas high-speed reliable multicast is something difficult. > Till now I've take a look at msync (multicast rsync). Another tool is "udpcast". > What happens if one node (in the middle) is down. Dolly, can't handle that (it's a working prototype), but Atsushi Manabe extended Dolly into Dolly++, which supposedly can handle node failures (see link in [1]). We use Dolly regularly to clone our small 16-node cluster and the local support group uses Dolly to clone the larger 128-node cluster. Because that cluster has two Fast Ethernet networks, we can clone whole disks with about 20 MByte/s to all nodes in the cluster. If you want to clone files instead of partitions, just specify your file name in the config file instead of the device file. - Felix [1] http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mike at etek.chalmers.se Mon Aug 18 08:56:33 2003 From: mike at etek.chalmers.se (Mikael Fredriksson) Date: Mon, 18 Aug 2003 14:56:33 +0200 Subject: mulitcast copy or snowball copy References: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Message-ID: <3F40CD01.89E6AB62@etek.chalmers.se> Rene Storm wrote: > > Hi Beowulfers, > > Problem: > I want to distribute large files over a cluster. > Any ideas ? Jepp, there is a distribution system for large files mainly for the Internet, but it can probbably be of use for you. It's a fast way to distribute a large file from one host to several others, at the same time. Check out: http://bitconjurer.org/BitTorrent/index.html MF _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 10:51:27 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 10:51:27 -0400 (EDT) Subject: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Message-ID: On Mon, 18 Aug 2003, Rene Storm wrote: > I want to distribute large files over a cluster. How large? Some people think that 1MB is large, while others consider large files to be 2GB+ (e.g. "Large File Summit"). This will have a significant impact on how you copy the file. > To raise performance I decided to copy the file to the local HD of any > node in the cluster. > > Did someone find a multicast solution for that or maybe something with > snowball principle? There are several multicast file distribution protocols, but they all share the same potential flaw: they use multicast. That means that they will work great in a few specific installations, generally small clusters on a single Ethernet switch. But as you grow, multicast becomes more of a problem. Here is a strong indicator for using multicast A shared media or repeater-based network (e.g. traditional Ethernet) Here are a few of the contra-indications for using multicast Larger clusters Non-Ethernet networks "Smart" Ethernet switches which try to filter packets Random communication traffic while copying Heavy non-multicast traffic while copying Multiple multicast streams NICs with mediocre, broken or slow to configure multicast filters Drivers not tuned for rapid multicast filter changes Or, in summary, "using the cluster for something besides a multicast demo. Here is an example: The Intel EEPro100 design configures the multicast filter with a special command appended to the transmit command queue. The command is followed by a list of the multicast addresses to accept. While the command is usually queued to avoid delaying the OS, the chip makes an effort to keep the Rx side synchronous by turning off the receiver while it's computing the new multicast filter. So the longer the multicast filter list and the more frequently it is changed, the more packets dropped. And what's the biggest performance killer with multicast? Dropped packets.. > My idea was to write some scripts which copy files via rsync with snowball, If you are doing this for yourself, the solution is easy. Try the different approaches and stop when you find one that works for you. If you are building a system for use by others (as we do), then the problem becomes more challenging. > but there are some heavy problems. > e.g. > What happens if one node (in the middle) is down. Good: first consider the semantics of failure. That means both recovery and reporting the failure. My first suggesting is that *not* implement a program that copies a file to every available node. Instead use a system where you first get a list of available ("up") nodes, and then copy the files to that node list. When the copy completes continue to use that node list rather then letting jobs use newly-generated "up" lists. A geometrically cascading copy can work very well. It very effectively uses current networks (switched Ethernet, Myrinet, SCI, Quadrics, Infiniband), and can make use of the sendfile() system call. For a system such as Scyld, use a zero-base geometric cascade: move the work off of the master as the first step. The master generates the work list and immediately shifts the process creation work off to the first compute node. The master then only monitors for completion. You can implement low-overhead fault checking by counting down job issues and job completion. As the first machine falls idle, check that the final machine to assign work is still running. As the next-to-last job completes, check that the one machine still working is up. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erik at aarg.net Mon Aug 18 10:14:25 2003 From: erik at aarg.net (Erik Arneson) Date: Mon, 18 Aug 2003 07:14:25 -0700 Subject: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> References: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com> Message-ID: <20030818141424.GA16386@aarg.net> On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote: > Hi Beowulfers, > > Problem: > I want to distribute large files over a cluster. > To raise performance I decided to copy the file to the local HD of any node in the cluster. > > Did someone find a multicast solution for that or maybe something with snowball principle? I am really new to the Beowulf thing, so I am not sure if this solution is a good one or not. But have you taken a look at the various network filesystems? OpenAFS has a configurable client-side cache, and if the files are needed only for reading this ends up being a very quick and easy way to distribute changes throughout a number of nodes. (However, I have noticed that network filesystems are not often mentioned in conjunction with Beowulf clusters, and I would really love to learn why. Performance? Latency? Complexity?) -- ;; Erik Arneson SD, Ashland Lodge No. 23 ;; ;; GPG Key ID: 2048R/8B4CBC9C CoTH, Siskiyou Chapter No. 21 ;; ;; ;; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 481 bytes Desc: not available URL: From farschad at myrealbox.com Mon Aug 18 12:17:14 2003 From: farschad at myrealbox.com (Farschad Torabi) Date: Mon, 18 Aug 2003 21:07:14 +0450 Subject: MPICH again Message-ID: <1061224634.8d1769c0farschad@myrealbox.com> Hi All, I still have some problems running MPICH on my machine :^( I've installed MPICH and PGF90 on my PC and I am able to compile parallel codes using MPI with mpif90 command. But the problem arise when I want to run the executable file on a Bowulf cluster. As Jason Connor told me, I use the following command /bin/mpirun -machinefile machs -np 2 a.out But it prompts me that there are not enough architecture on LINUX. In this case it is like when I run the executable file (i.e. a.out) manually without using mpirun. what do you think about this?? Thank you in advance Farschad Torabi _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rene.storm at emplics.com Mon Aug 18 11:34:16 2003 From: rene.storm at emplics.com (Rene Storm) Date: Mon, 18 Aug 2003 17:34:16 +0200 Subject: AW: mulitcast copy or snowball copy Message-ID: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com> Hi Donald, > I want to distribute large files over a cluster. How large? Some people think that 1MB is large, while others consider large files to be 2GB+ (e.g. "Large File Summit"). This will have a significant impact on how you copy the file. Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB each. (Overall 30 GB) And the cluster is 128++ nodes. Here is an example: The Intel EEPro100 design configures the multicast filter with a special command appended to the transmit command queue. The command is followed by a list of the multicast addresses to accept. While the command is usually queued to avoid delaying the OS, the chip makes an effort to keep the Rx side synchronous by turning off the receiver while it's computing the new multicast filter. So the longer the multicast filter list and the more frequently it is changed, the more packets dropped. And what's the biggest performance killer with multicast? Dropped packets.. Rene: Thats right, but what if I ignore dropped packets and accept the corrupt files ? I would be able to rsync them later on. First Multicast to create files, Second step is to compare with rsync. I've tried this and it isn't really slow, if you're doing the rsync via snowball. If you are doing this for yourself, the solution is easy. Try the different approaches and stop when you find one that works for you. If you are building a system for use by others (as we do), then the problem becomes more challenging. Rene: That's the problem with all the things you do, first they are for your own and then everybody wants them ;o) > but there are some heavy problems. > e.g. > What happens if one node (in the middle) is down. Good: first consider the semantics of failure. That means both recovery and reporting the failure. My first suggesting is that *not* implement a program that copies a file to every available node. Instead use a system where you first get a list of available ("up") nodes, and then copy the files to that node list. When the copy completes continue to use that node list rather then letting jobs use newly-generated "up" lists. Rene: Good idea You can implement low-overhead fault checking by counting down job issues and job completion. As the first machine falls idle, check that the final machine to assign work is still running. As the next-to-last job completes, check that the one machine still working is up. Rene: But how do I get this status back to my "master", e.g command from master: node16 copy to node17? I don't want do de-centralize my job, like fire and forget. Cya, Rene _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 12:50:57 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 12:50:57 -0400 (EDT) Subject: AW: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com> Message-ID: On Mon, 18 Aug 2003, Rene Storm wrote: > Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB > each. (Overall 30 GB) > And the cluster is 128++ nodes. Those are important parameters. What network type are you using? If Ethernet, what switches and topology? (My guess is that you are using "smart" switches, likely connected with a chassis backplane.) > > Here is an example: ...the longer > the multicast filter list ... the more packets dropped. > Rene: Thats right, but what if I ignore dropped packets and accept the > corrupt files ? I would be able to rsync them later on. This is costly. "Open loop" multicast protocols work by having the receiver track the missing blocks, and requesting (or interpolating) them later. Here you are discarding that information and doing much extra work on both the sending and receiving side by later locating the missing blocks. An alternative is closed-loop multicast, with positive acknowledgment before proceeding more than one window. > First Multicast to create files, Second step is to compare with rsync. > I've tried this and it isn't really slow, if you're doing the > rsync via snowball. This is verifying/filling with a neighbor instead of the original sender. Except here you don't know when you are both missing the same blocks. > If you are doing this for yourself, the solution is easy. ... > Rene: That's the problem with all the things you do, first they are for > your own and then everybody wants them ;o) If your end goal is to publish papers, do the hack. If your end goal is make works useful for other, you have to start with a wider view. >> [Do] *not* implement a program that copies a file >> to every available node. Instead use a system where you first get a >> list of available ("up") nodes, and then copy the files to that node >> list. When the copy completes continue to use that node list rather >> then letting jobs use newly-generated "up" lists. > > Rene: Good idea This approach applies to a wide range of cluster tasks. A similar idea is that you don't care as much about which nodes are currently up as you care about which nodes have remained up since you last checked. [[ Ideally you could ask "which nodes will be up when this program completes", but there are all sorts of temporal and halting issues there. ]] >> You can implement low-overhead fault checking by counting down job >> issues and job completion. As the first machine falls idle, check that >> the final machine to assign work is still running. As the next-to-last >> job completes, check that the one machine still working is up. > > Rene: But how do I get this status back to my "master", e.g command from > master: node16 copy to node17? We have a positive completion indication as part of the Job/Process Management subsystem. If you consider the problem, the final acknowledgment must flow from the last worker to the process that is checking for job completion. You might as well put that process on the cluster master. The natural Unix-style implementation is having the controlling machine hold the parent of the process tree implementing the work, even if the work is divided elsewhere. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 13:31:17 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 13:31:17 -0400 (EDT) Subject: mulitcast copy or snowball copy In-Reply-To: <20030818141424.GA16386@aarg.net> Message-ID: On Mon, 18 Aug 2003, Erik Arneson wrote: > On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote: > > Hi Beowulfers, > > > > Problem: > > I want to distribute large files over a cluster. > > To raise performance I decided to copy the file to the local HD of any node in the cluster. > > > > Did someone find a multicast solution for that or maybe something with snowball principle? > > I am really new to the Beowulf thing, so I am not sure if this solution is a > good one or not. But have you taken a look at the various network > filesystems? OpenAFS has a configurable client-side cache, and if the files > are needed only for reading this ends up being a very quick and easy way to > distribute changes throughout a number of nodes. This is a good example of why Grid/wide-area tools should not be confused with local cluster approaches. The time scale, performance and complexity issues are much different. AFS uses TCP/IP to transfer whole files from a server. With multiple servers the configuration is static or slow changing. > (However, I have noticed that network filesystems are not often mentioned in > conjunction with Beowulf clusters, and I would really love to learn why. > Performance? Latency? Complexity?) It's because file systems are critically important to many applications. There is no universal cluster file system, and thus no single solution. The best approach is not tie the cluster management, membership, or process control to the file system in any way. Instead the file system should be selection based on the application's need for consistency, performance and reliability. For instance, NFS is great for small, read-only input files. But using NFS for large files, or when any files will be written or updated, results in both performance and consistency problems. When working from a large read-only database, explicitly pre-staging (copying) the database to the compute nodes is usually better than relying on an underlying FS. It's easier, more predictable and more explicit than per-directory tuning FS cache parameters. As as example of why predictability is very important, imagine what happens to an adaptive algorithm when a cached parameter file expires, or a daemon does a bunch of work. That machine suddenly is slower, and that part of the problem now looks "harder". So the work is reshuffled, only to be shuffled back during the next time step. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lange at informatik.Uni-Koeln.DE Mon Aug 18 14:21:59 2003 From: lange at informatik.Uni-Koeln.DE (Thomas Lange) Date: Mon, 18 Aug 2003 20:21:59 +0200 Subject: AW: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com> References: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com> Message-ID: <16193.6471.238244.224191@informatik.uni-koeln.de> Hi, I would try rgang, a nice tools which uses a tree structure for copying files or executing commands on a large list of nodes. It's written in python but there's also a compiled binary. It's very flexible and fast. Search for rgang in google to find the download page. To allow scaling to kiloclusters, the new rgang can utilize a tree-structure, via an "nway" switch. When so invoked, rgang uses rsh/ssh to spawn copies of itself on multiple nodes. These copies in turn spawn additional copies. Product Name: rgang Product Version: 2.5 ("rgang" cvs rev. 1.103) Date (mm/dd/yyyy): 06/23/2003 ORIGIN ====== Author Ron Rechenmacher Fermi National Accelerator Laboratory - Mail Station 234 P.O Box 500 Batavia, IL 60510 Internet: rgang-support at fnal.gov -- regards Thomas _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mitchskin at comcast.net Mon Aug 18 13:40:59 2003 From: mitchskin at comcast.net (Mitchell Skinner) Date: 18 Aug 2003 10:40:59 -0700 Subject: AW: mulitcast copy or snowball copy In-Reply-To: References: Message-ID: <1061228458.5291.32.camel@zeitgeist> On Mon, 2003-08-18 at 09:50, Donald Becker wrote: > This is costly. "Open loop" multicast protocols work by having the > receiver track the missing blocks, and requesting (or interpolating) > them later. Here you are discarding that information and doing much > extra work on both the sending and receiving side by later locating the > missing blocks. Some possible google terms include: reliable multicast, forward error correction There's an ietf working group on reliable multicast that wasn't making a whole lot of progress the last time I checked. At that time, I recall there being some acknowledgment-based implementations as well as one forward error correction-based implementation using reed-solomon codes, from an academic in Italy whose name I forgot. It's been a little while, but when I looked at the code for that FEC-based reliable multicast program (rmdp?) I think it could only handle pretty small files. My understanding is that FEC-based approaches should scale better in terms of the number of receiving nodes, but the algorithms can be very time/space intensive. There's a patented algorithm from Digital Fountain that's supposed to be pretty efficient (google tornado codes, michael luby, digital fountain) but I'm not aware that they have a cluster-oriented product. My impression of them was that they were pretty WAN-oriented. If I was less lazy I'd give some links instead of google terms, but hopefully that's some food for thought. Mitch _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 16:00:04 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 16:00:04 -0400 (EDT) Subject: AW: mulitcast copy or snowball copy In-Reply-To: <1061228458.5291.32.camel@zeitgeist> Message-ID: On 18 Aug 2003, Mitchell Skinner wrote: > On Mon, 2003-08-18 at 09:50, Donald Becker wrote: > > This is costly. "Open loop" multicast protocols work by having the > > receiver track the missing blocks, and requesting (or interpolating) > > them later. Here you are discarding that information and doing much > > extra work on both the sending and receiving side by later locating the > > missing blocks. .. > There's an ietf working group on reliable multicast that wasn't making a > whole lot of progress the last time I checked. It's a hard problem, and when they agree on a protocol it likely won't apply to clusters. The packet loss characteristic and cost trade-off is much different on a WAN than with a local Ethernet switch on a cluster. On a WAN every packet is costly to transport, so it's worth having both end stations doing extensive computations. On a cluster we might talk about doing more computation to avoid communication, but that's only for a few applications. In reality we prefer to do minimal work. Thus we prefer OS-bypass for application communication, and kernel-only for file system I/O. Notice the attention given to zero copy, TCP offload, TOE/TSO and sendfile(). Multicast and packet FEC add exactly what people are trying to avoid, extra copying, complexity and work. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rene.storm at emplics.com Mon Aug 18 17:27:56 2003 From: rene.storm at emplics.com (Rene Storm) Date: Mon, 18 Aug 2003 23:27:56 +0200 Subject: AW: AW: mulitcast copy or snowball copy Message-ID: <29B376A04977B944A3D87D22C495FB2301278E@vertrieb.emplics.com> Ok, A geometrically cascading structure gives me some more disadvantages. If you are using an additional high performance network, eg myrinet or infiniband you won't have problems with the switch bandwidth. If you are using low cost Ethernet/Gigabit network topology with 2 or more hups between the nodes (like FFN), the last "generation" of the snowball could be a heavy bottleneck. It seems, there are too many variables for too many kinds of clusters. A big cluster farm often got a "idle" network, but only one, while a MPI cluster could have a network for the message passing and one for commands and copying. You could use this service-network to copy our files without using full bandwidth of this network. But this would cost something cluster users don't have: time. Rene _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Mon Aug 18 17:37:27 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Mon, 18 Aug 2003 14:37:27 -0700 Subject: big memory opteron In-Reply-To: <1061228458.5291.32.camel@zeitgeist> References: <1061228458.5291.32.camel@zeitgeist> Message-ID: <20030818213727.GB2131@greglaptop.internal.keyresearch.com> I'm attempting to put together a big memory 2-cpu Opteron box, without success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of memory. Now that's a pretty strange number, since if I was out of chip selects, it should see exactly 4 GBytes. Any clues? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From farooqkamal_76 at yahoo.com Mon Aug 18 18:38:21 2003 From: farooqkamal_76 at yahoo.com (Farooq Kamal) Date: Mon, 18 Aug 2003 15:38:21 -0700 (PDT) Subject: Newbie Message-ID: <20030818223821.11770.qmail@web21209.mail.yahoo.com> Hi Everyone, Its my first email to this group. What I was looking for is that "is beowulf transparent to applications running". What I mean by that is suppose I run a apache server on the master node; will the cluster manage the load balancing and process migration itself? or every application that is intented to run on beowulf must be written from scracth to do so. And at last if beowulf can't do, is there anyother implematation of clusters that has these above said qualities Regards Farooq Kamal SZABIST - Karachi Pakistan __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Aug 18 22:41:49 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 19 Aug 2003 12:41:49 +1000 Subject: Newbie In-Reply-To: <20030818223821.11770.qmail@web21209.mail.yahoo.com> References: <20030818223821.11770.qmail@web21209.mail.yahoo.com> Message-ID: <200308191241.50419.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 19 Aug 2003 08:38 am, Farooq Kamal wrote: > And at last if beowulf can't do, is there anyother > implematation of clusters that has these above said > qualities I think what you're looking for is OpenMOSIX. http://www.openmosix.org/ There's an introduction to it at the Intel website at: http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic+Editorial%3a%3axeon_openmosix&cntType=IDS_EDITORIAL&catCode=BMB Excuse the large URL! good luck! Chris - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QY5tO2KABBYQAh8RAqKrAJ9SY5wfCvvL35hLPubrEa8/xFuYsgCdFHYi 4wDadQBbfYpz06hX3YRkwRI= =QIb3 -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Aug 18 22:43:57 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 18 Aug 2003 22:43:57 -0400 (EDT) Subject: AW: AW: mulitcast copy or snowball copy In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278E@vertrieb.emplics.com> Message-ID: On Mon, 18 Aug 2003, Rene Storm wrote: > A geometrically cascading structure gives me some more disadvantages. > If you are using an additional high performance network, eg myrinet or > infiniband you won't have problems with the switch bandwidth. > > If you are using low cost Ethernet/Gigabit network topology with 2 or > more hups between the nodes (like FFN), the last "generation" of the > snowball could be a heavy bottleneck. No one uses Ethernet repeaters on a cluster. 32 port Fast Ethernet switches are under $5/port. Even for Gigabit Ethernet, 8 port switches can be found for $20/port. An unusual topology might be better utilized by mapping the copy topology to the physical, but that's not the usual case. The typical case is an essentially flat topology, or one close enough that treating it as flat avoids complexity. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Mon Aug 18 23:21:16 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Tue, 19 Aug 2003 11:21:16 +0800 (CST) Subject: Scalable PBS In-Reply-To: <200308181152.57812.csamuel@vpac.org> Message-ID: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com> --- Chris Samuel ???? >http://www.vpac.org/content/services_and_support/facility/linux_cluster.php Interesting ;-> May be you can take a look at the PBS addons like mpiexec, maui scheduler. > Nope, it's always been running OpenPBS prior to > migrating to SPBS. SGE is sponsored by Sun, and is opensource, I am currently using it. http://gridengine.sunsource.net/ Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Mon Aug 18 23:47:40 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 19 Aug 2003 13:47:40 +1000 Subject: Scalable PBS In-Reply-To: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com> References: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com> Message-ID: <200308191347.42057.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 19 Aug 2003 01:21 pm, Andrew Wang wrote: > May be you can take a look at the PBS addons like > mpiexec, maui scheduler. Already there. :-) We've got some users using mpiexec (though it does mean that you can nolonger restart a mom and have an MPI job keep going like you could with MPICH's mpirun) and we swapped to the MAUI scheduler yesterday (not without problems). > > Nope, it's always been running OpenPBS prior to > > migrating to SPBS. > > SGE is sponsored by Sun, and is opensource, I am > currently using it. > > http://gridengine.sunsource.net/ What's your impression of it ? Does it integrate with commercial molecular modelling packages like MSI ? cheers, Chris - -- Chris Samuel -- VPAC Systems & Network Admin Victorian Partnership for Advanced Computing Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia - http://www.vpac.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/QZ3cO2KABBYQAh8RAnBbAJ9MbVoDWNp0pjp6CHANpDZe9K2i0QCfSbE9 jlJDiWkEkM2a1uY+qCETprU= =9w8a -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bradshaw at mcs.anl.gov Tue Aug 19 00:42:52 2003 From: bradshaw at mcs.anl.gov (Rick Bradshaw) Date: Mon, 18 Aug 2003 23:42:52 -0500 Subject: big memory opteron In-Reply-To: <20030818213727.GB2131@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Mon, 18 Aug 2003 14:37:27 -0700") References: <1061228458.5291.32.camel@zeitgeist> <20030818213727.GB2131@greglaptop.internal.keyresearch.com> Message-ID: <87k79agsvn.fsf@skywalker-lin.mcs.anl.gov> Greg, This seems to be a huge bug that has been in the Bios for over a year now. I have only seen this on the AGP motherboards though. Unfortunetly they still perform much better than the none AGP boards that do recognise all the memory. Rick Greg Lindahl writes: > I'm attempting to put together a big memory 2-cpu Opteron box, without > success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of > memory. Now that's a pretty strange number, since if I was out of chip > selects, it should see exactly 4 GBytes. > > Any clues? > > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Tue Aug 19 08:42:21 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Tue, 19 Aug 2003 14:42:21 +0200 (CEST) Subject: mulitcast copy or snowball copy In-Reply-To: Message-ID: On Mon, 18 Aug 2003, Donald Becker wrote: > On Mon, 18 Aug 2003, Rene Storm wrote: [...] > > Rene: That's the problem with all the things you do, first they are for > > your own and then everybody wants them ;o) > > If your end goal is to publish papers, do the hack. If you want to write a paper you might also want to consider reading the following papers as related work: @article{ CCPE2002, author = "Felix Rauch and Christian Kurmann and Thomas M. Stricker", title = "{Optimizing the Distribution of Large Data Sets in Theory and Practice}", journal = "Concurrency and Computation: Practice and Experience", year = 2002, volume = 14, number = 3, pages = "165--181", month = apr } % frisbee-usenix03.pdf % Cloning tool, with multicast data distribution, compression techniques etc. @inproceedings{ Frisbee-Usenix2003, author = "Mike Hibler and Leigh Stoller and Jay Lepreau and Robert Ricci and Chad Barb", title = "{Fast, Scalable Disk Imaging with Frisbee}", booktitle = "Proceedings of the USENIX Annual Technical Conference 2003", year = 2003, month = jun, organization = "The USENIX Association" } Regards, Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lehi.gracia at amd.com Tue Aug 19 10:15:33 2003 From: lehi.gracia at amd.com (lehi.gracia at amd.com) Date: Tue, 19 Aug 2003 09:15:33 -0500 Subject: big memory opteron Message-ID: <99F2150714F93F448942F9A9F112634C07BE62CD@txexmtae.amd.com> Greg, You might want to try upgrading to the lates BIOS, what type of board do you have? -Lehi -----Original Message----- From: Greg Lindahl [mailto:lindahl at keyresearch.com] Sent: Monday, August 18, 2003 4:37 PM To: beowulf at beowulf.org Subject: big memory opteron I'm attempting to put together a big memory 2-cpu Opteron box, without success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of memory. Now that's a pretty strange number, since if I was out of chip selects, it should see exactly 4 GBytes. Any clues? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmpc at sanger.ac.uk Tue Aug 19 12:53:59 2003 From: gmpc at sanger.ac.uk (Guy Coates) Date: Tue, 19 Aug 2003 17:53:59 +0100 (BST) Subject: AW: mulitcast copy or snowball copy In-Reply-To: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com> References: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com> Message-ID: We've tried both multicast and snowball for data distribution on our cluster. We have a 60Gig dataset which we have to distribute to 1000 nodes. We started off using snowball copies. They work, but care is needed in your choice of tools for the file-transfers. rsync works, but can have problems with large (> 2Gig) files if you use rsh as the transport mechanism. (this is an rsh bug on some redhat versions rather than an rsync bug). rsync over ssh gets around that problem, but of course has the added encryption overhead. You should also avoid the incremental update mode of rsync (which is the default). We've found that it will silently corrupt your files if you rsync across different architectures (eg alpha-->ia32). It also has problems with large files. The only usable multicast code we've found that actually works is udpcast. http://udpcast.linux.lu/ There are plenty of other multicast codes to choose from out on the web, and most of them fall over horribly as soon as you cross more than one switch or have more than 10-20 hosts. We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used it to sucessfully distribute our 60gig dataset over large numbers of nodes simultaneously. In practice, on gigabit, we find that disk write speed is the limiting factor rather than the network. Lawrence Livermore use udpcast to install OS images on the MCR cluster, and I believe they side-step the disk performance issue by writing data to a ramdisk as an intermediate step. Obviously this only makes sense if your dataset < size of memory. Our current file distribution strategy is to use a combination of rsync and updcast. We do a dummy rsync to find out what files need updating, tar them up, pipe the tarball through udpcast and then untar the files and the client. The main performance killer we've found for udpcast is cheap switches. Cheers, Guy Coates -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csmith at lnxi.com Tue Aug 19 13:48:24 2003 From: csmith at lnxi.com (Curtis Smith) Date: Tue, 19 Aug 2003 11:48:24 -0600 Subject: AW: mulitcast copy or snowball copy References: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com> Message-ID: <072a01c3667a$16624c90$a423a8c0@blueberry> You might want to look into the Clusterworx product from Linux Networx. It has been used to boot and image clusters over 1100 nodes in size using multicast, and supports image sizes over 4GB. Multiple images can be served by a single server using ethernet. Each channel can use 100% of the network bandwidth (12.5MB per second on Fast Ethernet) or can be throttled to a specific rate. We typically use a transmission rate of 10MB per second on Fast Ethernet (30 seconds for a 300MB image), allowing DHCP traffic to get through. The multicast server can also be throttled to ensure that its doesn't overdrive the switch or hub (if you are using cheap ones) which in many cases can account for up to 95% of packet loss. If your switch is fast and is IGMP enabled, you will generally experience little to no packet loss. The technology is based on UDP and multicast and works with LinuxBios and Etherboot, and was used to image the MCR cluster many times prior to its deployment at LLNL. MCR could go from powered-off bare metal to running in about 7 minutes (most of which was disk formatting). Curtis Smith Principal Software Engineer Linux Networx (www.lnxi.com) ----- Original Message ----- From: "Guy Coates" To: Sent: Tuesday, August 19, 2003 10:53 AM Subject: Re:AW: mulitcast copy or snowball copy > > We've tried both multicast and snowball for data distribution on our > cluster. We have a 60Gig dataset which we have to distribute to 1000 > nodes. > > We started off using snowball copies. They work, but care is needed in > your choice of tools for the file-transfers. rsync works, but can have > problems with large (> 2Gig) files if you use rsh as the transport > mechanism. (this is an rsh bug on some redhat versions rather than an > rsync bug). > > rsync over ssh gets around that problem, but of course has the added > encryption overhead. > > You should also avoid the incremental update mode of rsync (which is the > default). We've found that it will silently corrupt your files if you > rsync across different architectures (eg alpha-->ia32). It also has > problems with large files. > > > The only usable multicast code we've found that actually works is udpcast. > > http://udpcast.linux.lu/ > > There are plenty of other multicast codes to choose from out on the web, > and most of them fall over horribly as soon as you cross more than one > switch or have more than 10-20 hosts. > > We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used > it to sucessfully distribute our 60gig dataset over large numbers of nodes > simultaneously. > > In practice, on gigabit, we find that disk write speed is the limiting > factor rather than the network. Lawrence Livermore use udpcast to install > OS images on the MCR cluster, and I believe they side-step the disk > performance issue by writing data to a ramdisk as an intermediate step. > Obviously this only makes sense if your dataset < size of memory. > > Our current file distribution strategy is to use a combination of rsync > and updcast. We do a dummy rsync to find out what files need updating, tar > them up, pipe the tarball through udpcast and then untar the files and the > client. > > The main performance killer we've found for udpcast is cheap switches. > > Cheers, > > Guy Coates > > -- > Guy Coates, Informatics System Group > The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK > Tel: +44 (0)1223 834244 ex 7199 > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john152 at libero.it Wed Aug 20 07:26:43 2003 From: john152 at libero.it (john152 at libero.it) Date: Wed, 20 Aug 2003 13:26:43 +0200 Subject: Detection performance? Message-ID: Hi all, does anyone know about the performance of Mii-diag using ioctl calls? Using Mii-diag, what could be the average delay between the link-status change ( phisically ) and the detection of this event. I'm using a 3Com 905 Tornado PC card; is there a different delay for each PC card in changing the status register? How long could this delay be in your experience? I'd like to have a delay minor than 1 ms between the time in which i phisically disconnect the cable and the time in which I have the detection (in example with a printf on video, ...) In your experience, is it reasonable? Normally do I have to wait for a greater delay? Thanks in advance for your kind answer and observations. Giovanni di Giacomo _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Aug 20 08:30:58 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 20 Aug 2003 14:30:58 +0200 (CEST) Subject: mulitcast copy or snowball copy In-Reply-To: Message-ID: On Tue, 19 Aug 2003, Guy Coates wrote: > The only usable multicast code we've found that actually works is udpcast. > > http://udpcast.linux.lu/ > > There are plenty of other multicast codes to choose from out on the web, > and most of them fall over horribly as soon as you cross more than one > switch or have more than 10-20 hosts. > > We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used > it to sucessfully distribute our 60gig dataset over large numbers of nodes > simultaneously. That's interesting, since I tried udpcast once (just a few tests) on our Cabletron SmartSwitchRouter with Gigabit Ethernet without disk accesses and I got about 350 Mbps, while Dolly ran with approx. 500 Mbps on Machines with 1 GHz processors. I even used Dolly once (already many years ago, with 400 MHz machines) to clone two 24-node clusters at the same time, they were connected to two different switches and had a router in between. The throughput for the nodes was about 6.9 MByte/s over Fast Ethernet for every of the nodes. > The main performance killer we've found for udpcast is cheap switches. True. I tried it once with a cheap and simple ATI 24-port Fast Ethernet switch. Udpcast run with only about 1 MByte/s since the switch decided to multicast everything with only 10 Mbps (one machine that wasn't a member of the multicast group was connected with only 10 Mbps). Dolly on the other hand worked perfect with full wire speed on that switch. - Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Wed Aug 20 09:39:16 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 20 Aug 2003 15:39:16 +0200 (CEST) Subject: Detection performance? In-Reply-To: Message-ID: On Wed, 20 Aug 2003, wrote: > Using Mii-diag, what could be the average delay > between the link-status change ( phisically ) > and the detection of this event. Depends on the card capabilities and the driver. Most drivers poll for change, some use an interrupt. > I'm using a 3Com 905 Tornado PC card; is there > a different delay for each PC card in changing the > status register? I don't understand the question... > I'd like to have a delay minor than 1 ms > between the time in which i phisically > disconnect the cable and the time in which > I have the detection (in example with a printf on video, ...) The 3c59x driver polls every 60 seconds for media status when using autonegotiation (default). People from the HA and bonding projects have modified this to allow very fine polling of the media registers, however this has a big disadvantage: the CPU spends a lot of time waiting for completion of in/out operations - the finer the poll, the more CPU lost. The time taken to talk to th MII does not depend on the CPU speed, but on th PCI speed, so the faster the CPU, the more instruction cycles are lost to I/O. > In your experience, is it reasonable? No, because the network card should transfer data, not be a watchdog. There is one other solution, but there is no code for it yet. At least the Tornado cards allow generating an interrupt whenever the media changes. This would alleviate the need to continually poll the media registers and would give an indication very soon after the event happened. This was on my to-do list for a long time, but it was never done and probably won't be done soon. > Normally do I have to wait for a greater delay? If by "normally" you mean "the 359x driver distributed with the kernel" or "the 3c59x driver from Scyld", then yes. > Thanks in advance for your kind answer and observations. This isn't really beowulf related. Please use vortex at scyld.com for discussing the 3c59x driver. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jhearns at micromuse.com Wed Aug 20 10:50:30 2003 From: jhearns at micromuse.com (John Hearns) Date: Wed, 20 Aug 2003 15:50:30 +0100 Subject: Detection performance? In-Reply-To: References: Message-ID: <3F438AB6.6090507@micromuse.com> That's an interesting question. Can you tell us what your application is, and why it needs fast response? First thought I had would be to SNMP trap the port status on the switch, rather than the card. But I must admit I have no idea of the latency there, but would I would expect it to be much more than 1ms. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Wed Aug 20 12:33:49 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 20 Aug 2003 12:33:49 -0400 Subject: clubmask 0.5 released Message-ID: <1061397229.16487.45.camel@roughneck> Name : Clubmask Version : 0.5 Release : 1 Group : Cluster Resource Management and Scheduling Vendor : Liniac Project, University of Pennsylvania License : GPL-2 URL : http://clubmask.sourceforge.net What is Clubmask ---------------- Clubmask is a resource manager designed to allow Bproc based clusters enjoy the full scheduling power and configuration of the Maui HPC Scheduler. Clubmask uses a modified version of the Supermon resource monitoring software to gather resource information from the cluster nodes. This information is combined with job submission data and delivered to the Maui scheduler. Maui issues job control commands back to Clubmask, which then starts or stops the job scripts using the Bproc environment. Clubmask also provides builtin support for a supermon2ganglia translator that allows a standard Ganlgia web backend to contact supermon and get XML data that will disply through the Ganglia web interface. Clubmask is currently running on around 10 clusters, varying in size from 8 to 128 nodes, and has been tested up to 5000 jobs. Links ------------- Bproc: http://bproc.sourceforge.net Ganglia: http://ganglia.sourceforge.net Maui Scheduler: http://www.supercluster.org/maui Supermon: http://supermon.sourceforge.net Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Fri Aug 22 00:39:04 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 22 Aug 2003 12:39:04 +0800 (CST) Subject: SGE on AMD Opteron ? In-Reply-To: <200308201609.UAA08558@nocserv.free.net> Message-ID: <20030822043904.18171.qmail@web16811.mail.tpe.yahoo.com> Using the 32-bit x86 glinux binary package, it works on my machine. SGE gets the load information and the system/hardware information correctly: > qhost HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - opteron1 glinux 2 0.00 997.0M 47.8M 1.0G 4.3M Andrew. --- Mikhail Kuzminsky ???? > Sorry, is here somebody who > works w/Sun GrideEngine on AMD Opteron platform ? > I'm interesting in any information - > about binary SGE distribution in 32-bit mode, > or about compilation from the source for x86-64 > mode, > under SuSE or RedHat distribution etc. > > Yours > Mikhail Kuzminsky > Zelinsky Institute of Organic Chemistry > Moscow > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ??????? - ???????????? http://fate.yahoo.com.tw/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jhearns at micromuse.com Sat Aug 23 03:48:58 2003 From: jhearns at micromuse.com (John Hearns) Date: 23 Aug 2003 08:48:58 +0100 Subject: SGE on AMD Opteron ? In-Reply-To: <200308201609.UAA08558@nocserv.free.net> References: <200308201609.UAA08558@nocserv.free.net> Message-ID: <1061624938.1182.57.camel@harwood> On Wed, 2003-08-20 at 17:09, Mikhail Kuzminsky wrote: > Sorry, is here somebody who > works w/Sun GrideEngine on AMD Opteron platform ? > I'm interesting in any information >From - Return-Path: <> Received: from localhost by clarice with LMTP for ; Sat, 23 Aug 2003 08:51:33 +0100 Received: from mta.micromuse.com (mta.micromuse.com [194.131.185.92]) by mailstore.micromuse.co.uk (Switch-2.2.6/Switch-2.2.4) with ESMTP id h7N7pXZ27346 for ; Sat, 23 Aug 2003 08:51:33 +0100 Received: from marstons.services.quay.plus.net (marstons.services.quay.plus.net [212.159.14.223]) by mta.micromuse.com (Switch-2.2.6/Switch-2.2.6) with SMTP id h7N7pWY27479 for ; Sat, 23 Aug 2003 08:51:32 +0100 Message-Id: <200308230751.h7N7pWY27479 at mta.micromuse.com> Received: (qmail 19110 invoked for bounce); 23 Aug 2003 07:51:26 -0000 Date: 23 Aug 2003 07:51:26 -0000 From: MAILER-DAEMON at marstons.services.quay.plus.net To: jhearns at micromuse.com Subject: failure notice X-Perlmx-Spam: Gauge=XXXIIIIIIIII, Probability=39%, Report="FAILURE_NOTICE_1, MAILER_DAEMON, NO_MX_FOR_FROM, NO_REAL_NAME, SPAM_PHRASE_00_01" X-Evolution-Source: imap://jhearns at mta.micromuse.com/ Mime-Version: 1.0 Hi. This is the qmail-send program at marstons.services.quay.plus.net. I'm afraid I wasn't able to deliver your message to the following addresses. This is a permanent error; I've given up. Sorry it didn't work out. : Sorry, I couldn't find any host named bewoulf.org. (#5.1.2) --- Below this line is a copy of the message. Return-Path: Received: (qmail 19106 invoked by uid 10001); 23 Aug 2003 07:51:26 -0000 Received: from dockyard.plus.com (HELO .) (212.159.87.168) by marstons.services.quay.plus.net with SMTP; 23 Aug 2003 07:51:26 -0000 Subject: Re: SGE on AMD Opteron ? From: John Hearns To: bewoulf at bewoulf.org In-Reply-To: <200308201609.UAA08558 at nocserv.free.net> References: <200308201609.UAA08558 at nocserv.free.net> Content-Type: text/plain Organization: Micromuse Message-Id: <1061624843.1183.52.camel at harwood> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 23 Aug 2003 08:47:23 +0100 Content-Transfer-Encoding: 7bit On Wed, 2003-08-20 at 17:09, Mikhail Kuzminsky wrote: > Sorry, is here somebody who > works w/Sun GrideEngine on AMD Opteron platform ? > I'm interesting in any information - I'm working with this. More news when I get it. Also, and I know that all I have to do is Google and do some reading, but does andone on the list have experience with lm_sensors on Opteron? Specifically HDAMA motherboards. A quick Google just turned up a post by Mikhail in June on this very subject... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From saville at comcast.net Sat Aug 23 17:38:49 2003 From: saville at comcast.net (Gregg Germain) Date: Sat, 23 Aug 2003 17:38:49 -0400 Subject: Help! Endless RARP requests Message-ID: <3F47DEE9.F00C5213@comcast.net> Hi, I have installed the Scyle basic edition I got from Linux Central (RH 6.2). I've done the installation and I selected the range of IP addresses they suggest for the slave nodes. ifconfig shows that eth1 is operating. I connect a slave node to the Master node by connecting the Slave's eth0 card to the Master's eth1 card. I created a slave boot floppy, and boot the slave. It boots ok but starts sending RARP requests that never get satisfied. It sits there forever making more requests (well eventually it reboots itself and tries again but then there's endless RARP requests). Can any one give me a hint? Do I have to go through a hub to connect that first slave to the master? I know I'll have to have a hub for the second slave, but I thought I could make a direct connection for the first one. Any help would be greatly appreciated. thanks Gregg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From adm35 at georgetown.edu Sun Aug 24 14:36:02 2003 From: adm35 at georgetown.edu (adm35 at georgetown.edu) Date: Sun, 24 Aug 2003 14:36:02 -0400 Subject: Help! Endless RARP requests Message-ID: <1967012801.1280119670@georgetown.edu> You'll either need a hub, switch or crossover cable. Arnie Miles Systems Administrator: Advanced Research Computing Adjunct Faculty: Computer Science 202.687.9379 168 Reiss Science Building http://www.georgetown.edu/users/adm35 http://www.guppi.arc.georgetown.edu ----- Original Message ----- From: Gregg Germain Date: Saturday, August 23, 2003 5:38 pm Subject: Help! Endless RARP requests > Hi, > > I have installed the Scyle basic edition I got from Linux Central (RH > 6.2). > > I've done the installation and I selected the range of IP addresses > they suggest for the slave nodes. ifconfig shows that eth1 is > operating. > I connect a slave node to the Master node by connecting the Slave's > eth0 card to the Master's eth1 card. > > I created a slave boot floppy, and boot the slave. It boots ok but > starts sending RARP requests that never get satisfied. It sits there > forever making more requests (well eventually it reboots itself and > tries again but then there's endless RARP requests). > > Can any one give me a hint? > > Do I have to go through a hub to connect that first slave to the > master? I know I'll have to have a hub for the second slave, but I > thought I could make a direct connection for the first one. > > Any help would be greatly appreciated. > > thanks > > Gregg > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Mon Aug 25 03:29:29 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Mon, 25 Aug 2003 09:29:29 +0200 Subject: PCI-X/133 NICs on PCI-X/100 In-Reply-To: <200308221815.WAA27091@nocserv.free.net> References: <200308221815.WAA27091@nocserv.free.net> Message-ID: <200308250929.29082.joachim@ccrl-nece.de> Mikhail Kuzminsky: > Really I need to estimate: will Mellanox MTPB23108 IB PCI-X/133 cards > work w/PCI-X/100 slots on Opteron-based mobos (most of > them have PCI-X/100, exclusions that I know are Tyan S2885 and Apppro > mobos) - i.e. how high is the probability that they are > incompatible ? Very low. But why don't you ask the vendor directly? Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wade.hampton at nsc1.net Mon Aug 25 08:31:34 2003 From: wade.hampton at nsc1.net (Wade Hampton) Date: Mon, 25 Aug 2003 08:31:34 -0400 Subject: help with e1000 upgrade Message-ID: <3F4A01A6.4090608@nsc1.net> G'day, I am upgrading the larger of my clusters to 1G ethernet. All nodes are TYAN motherboards (including the head node), and have on-board 1G. I've been using the default e1000 driver on my head node for the past year. It's version 4.1.7. However, when I try to boot my slave nodes, they appear to "hang" after initializing the NIC. I tried upgrading to the newer 5.1.3 driver. The head node is up and working. I made a boot floppy and tried booting, but once again, it hung right after the line displaying the e1000 and its IRQ. In the slave node BIOS, I have turned off the eepro100 and turned on the e1000. Any help would be appreciated. Cheers, -- Wade Hampton _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wade.hampton at nsc1.net Mon Aug 25 11:33:26 2003 From: wade.hampton at nsc1.net (Wade Hampton) Date: Mon, 25 Aug 2003 11:33:26 -0400 Subject: help with e1000 upgrade In-Reply-To: <3F4A163E.B3301A38@accessgate.net> References: <3F4A01A6.4090608@nsc1.net> <3F4A163E.B3301A38@accessgate.net> Message-ID: <3F4A2C46.6040007@nsc1.net> Doug Shubert wrote: >Hello Wade, > >Wade Hampton wrote: > > > >>G'day, >> >>I am upgrading the larger of my clusters to 1G ethernet. All nodes are >> >> > >Are the on-board NIC's Intel ? > Intel >>I tried upgrading to the newer 5.1.3 driver. The head node >>is up and working. I made a boot floppy and tried booting, >>but once again, it hung right after the line displaying the >>e1000 and its IRQ. >> >> >> > >Are you using Cat5e or Cat6 cabling? > >We have found that Cat6 works more reliably >on auto sense 10/100/1000 NIC's and switches. > So far, CAT5E (3-6 foot cables). >>In the slave node BIOS, I have turned off the eepro100 >>and turned on the e1000. >> >> >We are using the E1000 driver in kernel 2.4.21 and it works flawlessly. > I've been using it in the Scyld 2.4.17 kernel from my head node for nearly a year without any issues. The master has the same motherboard and chips, only more disks, etc. The issue seems to be with booting my slave nodes from the Scyld boot disc. Thanks, -- Wade Hampton _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Tue Aug 26 01:33:33 2003 From: rouds at servihoo.com (RoUdY) Date: Tue, 26 Aug 2003 09:33:33 +0400 Subject: linpack In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com> Message-ID: Hello, Can someone please tell me a bit more about linpack and how to implement it so that i can measure its performance . And also some recommended sites Thnx roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Tue Aug 26 01:33:33 2003 From: rouds at servihoo.com (RoUdY) Date: Tue, 26 Aug 2003 09:33:33 +0400 Subject: linpack In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com> Message-ID: Hello, Can someone please tell me a bit more about linpack and how to implement it so that i can measure its performance . And also some recommended sites Thnx roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Tue Aug 26 07:16:42 2003 From: rouds at servihoo.com (RoUdY) Date: Tue, 26 Aug 2003 15:16:42 +0400 Subject: mpich2-0.93 In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com> Message-ID: hello Please help me, i really need help Because i can run mpd on the localhost but not in a ring of PC's When I type "mpiexec -np 3 ~/mpich2-0.93/examples/cpi " I get the answer Permission to node1 denied Permission to node 2 denied.................. Hope to hear from u very soon -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Tue Aug 26 10:33:05 2003 From: angel at wolf.com (Angel Rivera) Date: Tue, 26 Aug 2003 14:33:05 GMT Subject: Change Management Control Message-ID: <20030826143305.24318.qmail@houston.wolf.com> I am looking for information/sites and a formal best practice change control for clusters. Can someone point me in the right direction? thx -ar _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nordquist at geosci.uchicago.edu Tue Aug 26 16:50:34 2003 From: nordquist at geosci.uchicago.edu (Russell Nordquist) Date: Tue, 26 Aug 2003 15:50:34 -0500 (CDT) Subject: mpich2-0.93 In-Reply-To: Message-ID: It sounds like you haven't setup password-less communication between your nodes. Can you "rsh node1" or "ssh node1" and not be prompted? If not you need to setup password-less rsh (usaully .rhosts) or ssh. You can tell which one mpirun (or change it) is using by the value of RSHCOMMAND (at least in the 1.2 version) in mpirun. russell On Tue, 26 Aug 2003 at 15:16, RoUdY wrote: > hello > Please help me, i really need help > Because i can run mpd on the localhost but not in a ring > of PC's > When I type "mpiexec -np 3 ~/mpich2-0.93/examples/cpi " > I get the answer > Permission to node1 denied > Permission to node 2 denied.................. > > Hope to hear from u very soon > -------------------------------------------------- > Get your free email address from Servihoo.com! > http://www.servihoo.com > The Portal of Mauritius > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > - - - - - - - - - - - - Russell Nordquist UNIX Systems Administrator Geophysical Sciences Computing http://geosci.uchicago.edu/computing NSIT, University of Chicago - - - - - - - - - - - _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Aug 26 20:34:38 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 27 Aug 2003 10:34:38 +1000 Subject: mpich2-0.93 In-Reply-To: References: Message-ID: <200308271034.39916.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 27 Aug 2003 06:50 am, Russell Nordquist wrote: > Can you "rsh node1" or "ssh node1" and not be prompted? If not you need to > setup password-less rsh (usaully .rhosts) or ssh. You can tell which one > mpirun (or change it) is using by the value of RSHCOMMAND (at least in > the 1.2 version) in mpirun. Standard security blah - rsh is evil, ssh is your friend. :-) It is possible to not install rsh, rlogin and rcp and replace them with symbolic links to ssh, slogin and scp. This should work for most cases, but of course, test, test and test again. We were fortunate, although we had installed the r-series clients on our cluster the daemons weren't enabled in inetd, so we knew we couldn't break anything by removing them (as they'd never have worked in the first place). So far not found anything that has a problem because of this - although don't nuke users .rhosts files as some other programs, like PBS, call ruserok() to validate connections! cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/S/yeO2KABBYQAh8RAr/QAKCNHOz5hxIejvGOW34KZsRW74u0NwCeOONj C49BRL6ceXRIHHNhl1mqHss= =BM9q -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajiang at mail.eecis.udel.edu Wed Aug 27 17:32:38 2003 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Wed, 27 Aug 2003 17:32:38 -0400 (EDT) Subject: A question about Bewoulf software: Message-ID: Hi, These days, our lab are planning to built up a Beowulf cluster, which uses Intel Xeon Processors or Pentium 4, and Intel Pro Gigabit (10/100/100) ethernet card. We wonder if we choose commerical software, such as scyld, which version will support Xeon Processor or Pentium 4 respectively? And which version will support Intel Pro Gigabit Ethernet card? If we try buliding by ourself, which version of software we should choose? Thanks a lot for your kind suggestion. I am looking forward to hearing from you. Thanks again. Tom _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Aug 27 19:41:17 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 27 Aug 2003 19:41:17 -0400 (EDT) Subject: A question about Bewoulf software: In-Reply-To: Message-ID: On Wed, 27 Aug 2003, Ao Jiang wrote: > These days, our lab are planning to built up > a Beowulf cluster, which uses Intel Xeon Processors > or Pentium 4, and Intel Pro Gigabit (10/100/100) > ethernet card. > We wonder if we choose commerical software, such as > scyld, which version will support Xeon Processor or > Pentium 4 respectively? Most Linux distributions will "support" the Pentium 4 and Xeon. The question is if the kernel is compiled to take advantage of the newer processor features. The Scyld distribution now has about a dozen different kernels to match the processor types and UP/SMP on the master and compute nodes. Typically only two to four of the kernels are installed, based which checkboxes are slected during installation. We always install a safe, featureless i386 uniprocessor BTW, you might think that the processor family is the most important optimization, but there is an even bigger difference between uniprocessor and SMP kernels. > If we try buliding by ourself, which version of software > we should choose? You pretty much have two choices: be library version compatible with a consumer/workstation distribution (Red Hat, SuSE, Debian), or use a meta-distribution such as GenToo or Debian and compile everything yourself. > And which version will support Intel Pro Gigabit Ethernet card? Every few weeks Intel comes out with a new card version with a new PCI ID. The e1000 driver is one of the five or so drivers that we are constantly updating to support just-introduced chips. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From exa at kablonet.com.tr Wed Aug 27 20:39:21 2003 From: exa at kablonet.com.tr (Eray Ozkural) Date: Thu, 28 Aug 2003 03:39:21 +0300 Subject: gigabit switches for 32-64 nodes Message-ID: <200308280339.21458.exa@kablonet.com.tr> hi there, are there any high performance gigabit ethernet switches for a beowulf cluster consisting of 32 to 64 nodes? what do you recommend for the interconnect of such a system? regards, -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From widyono at cis.upenn.edu Tue Aug 26 17:33:06 2003 From: widyono at cis.upenn.edu (Daniel Widyono) Date: Tue, 26 Aug 2003 17:33:06 -0400 Subject: perl-bproc bindings Message-ID: <20030826213306.GA2497@central.cis.upenn.edu> Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc, something more recent than spring of 2001? Failing that, anyone have a chance to flesh out the missing information in Dan's work (e.g. C constant() function which doesn't seem to exist, error handling, etc.)? I have it "just working" for "just users", and barely at that. Error handling consists of returning -128 minus negated error code if there's an error. I've already Googled and checked these archives (perl bproc binding). Everything points back to Dan's work. Thanks, Dan W. -- -- Daniel Widyono http://www.cis.upenn.edu/~widyono -- Liniac Project, CIS Dept., SEAS, University of Pennsylvania -- Mail: CIS Dept, 302 Levine 3330 Walnut St Philadelphia, PA 19104 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Aug 28 01:49:42 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 28 Aug 2003 01:49:42 -0400 (EDT) Subject: perl-bproc bindings In-Reply-To: <20030826213306.GA2497@central.cis.upenn.edu> Message-ID: On Tue, 26 Aug 2003, Daniel Widyono wrote: > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc, > something more recent than spring of 2001? There are updated bindings, and a small example, at ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mitchel at navships.com Thu Aug 28 05:08:13 2003 From: mitchel at navships.com (Mitchel Kagawa) Date: Wed, 27 Aug 2003 23:08:13 -1000 Subject: gigabit switches for 32-64 nodes References: <200308280339.21458.exa@kablonet.com.tr> Message-ID: <000601c36d43$ea127360$0a02a8c0@kitsu2> We use a Foundry Fastiron II Plus with 64 non-blocking copper gigabit ports. A little on the pricy side but it works very well. ~Mitchel ----- Original Message ----- From: "Eray Ozkural" To: Sent: Wednesday, August 27, 2003 2:39 PM Subject: gigabit switches for 32-64 nodes > hi there, > > are there any high performance gigabit ethernet switches for a beowulf cluster > consisting of 32 to 64 nodes? what do you recommend for the interconnect of > such a system? > > regards, > > -- > Eray Ozkural (exa) > Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org > www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza > GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Thu Aug 28 08:57:15 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu, 28 Aug 2003 14:57:15 +0200 (CEST) Subject: 32bit slots and riser cards Message-ID: Dear beowulfers, In planning for some new cluster nodes, I hit a small problem. I want: - a modern mainboard for dual-Xeon (preferred) or dual-Athlon - 1U or 2U rackmounted case - to use an old Myrinet M2M-PCI32 (Lanai4) which requires a riser card for installing in the case The problem is that all mainboards that I looked at position the 32bit PCI slot(s) near the edge of the mainboard and I cannot see how the riser card can be installed into them so that the card still fits in the case; the Myrinet card does not fit (keyed differently) into the 64bit PCI slots or 64bit risers. Is there some solution to this problem or do I have to go back to midi-tower cases ? Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Thu Aug 28 10:24:19 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 28 Aug 2003 10:24:19 -0400 Subject: perl-bproc bindings In-Reply-To: References: Message-ID: <1062080659.7565.0.camel@roughneck> On Thu, 2003-08-28 at 01:49, Donald Becker wrote: > On Tue, 26 Aug 2003, Daniel Widyono wrote: > > > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc, > > something more recent than spring of 2001? > > There are updated bindings, and a small example, at > ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz Any chance you guys have updated python bindings as well? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at bellsouth.net Thu Aug 28 11:04:52 2003 From: laytonjb at bellsouth.net (Jeffrey B. Layton) Date: Thu, 28 Aug 2003 11:04:52 -0400 Subject: Intel acquiring Pallas Message-ID: <3F4E1A14.8030900@bellsouth.net> Good morning! I though I would post this for those who haven't seen it yet: http://www.theregister.co.uk/content/4/32522.html "Intel has signed on to acquire German software maker Pallas, hoping the company's performance tools can give it an edge in the compute cluster arena." Enjoy! Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 11:30:30 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 11:30:30 -0400 (EDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E1A14.8030900@bellsouth.net> Message-ID: On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > Good morning! > > I though I would post this for those who haven't seen it yet: > > http://www.theregister.co.uk/content/4/32522.html > > "Intel has signed on to acquire German software maker Pallas, > hoping the company's performance tools can give it an edge in > the compute cluster arena." Interesting. I'm trying to understand where and how this will help them -- more often than not it is a Bad Thing when hardware mfrs start dabbling in something higher than firmware or compilers -- Apple (and Next in its day) stands at one end of that path. It's especially curious given that Intel is already overwhelmingly dominant in the compute cluster arena (with only AMD a meaningful cluster competitor, and with apple and the PPC perhas a distant third). Not to mention the fact that if they REALLY wanted to get an edge in the compute cluster arena, they'd acquire somebody like Dolphin or Myricom. Monitoring is lovely and even important for application tuning, but it is an application layer on TOP of both systems software and the network. Or perhaps they are buying them so they can instrument their compilers? rgb > > Enjoy! > > Jeff > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at bellsouth.net Thu Aug 28 11:50:41 2003 From: laytonjb at bellsouth.net (Jeffrey B. Layton) Date: Thu, 28 Aug 2003 11:50:41 -0400 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <3F4E24D1.9010301@bellsouth.net> Robert G. Brown wrote: >On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > > >>Good morning! >> >> I though I would post this for those who haven't seen it yet: >> >>http://www.theregister.co.uk/content/4/32522.html >> >>"Intel has signed on to acquire German software maker Pallas, >>hoping the company's performance tools can give it an edge in >>the compute cluster arena." >> >> > >Interesting. I'm trying to understand where and how this will help them >-- more often than not it is a Bad Thing when hardware mfrs start >dabbling in something higher than firmware or compilers -- Apple (and >Next in its day) stands at one end of that path. > >It's especially curious given that Intel is already overwhelmingly >dominant in the compute cluster arena (with only AMD a meaningful >cluster competitor, and with apple and the PPC perhas a distant third). >Not to mention the fact that if they REALLY wanted to get an edge in the >compute cluster arena, they'd acquire somebody like Dolphin or Myricom. > >Monitoring is lovely and even important for application tuning, but it >is an application layer on TOP of both systems software and the network. >Or perhaps they are buying them so they can instrument their compilers? > > rgb > Bob, Very interesting observation. I wonder if Intel doesn't have something else up their sleeve? Could they be trying to get back into Supercomputer game (not likely, but didn't they get some DoD money recently?). Could they be helping with networking stuff (Intel has been discussing the next generation networking stuff lately). Maybe some sort of TCP Offload Engine? Maybe something with their new bus ( PCI Express?) They have also created CSA (Communication Streaming Architecture) in their new chipset to bypass the PCI bottleneck. Of course they could also be after the Pallas parallel debuggers to integrate into their compilers (like you mentioned) or perhaps to help with debugging threaded code in the hyperthreaded chips. Not that you mention it, this is a somewhat interesting development. I wonder what they're up to? >Robert G. Brown http://www.phy.duke.edu/~rgb/ >Duke University Dept. of Physics, Box 90305 >Durham, N.C. 27708-0305 >Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 12:05:52 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 12:05:52 -0400 (EDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E24D1.9010301@bellsouth.net> Message-ID: On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > to bypass the PCI bottleneck. Of course they could also be after the Pallas > parallel debuggers to integrate into their compilers (like you mentioned) > or perhaps to help with debugging threaded code in the hyperthreaded chips. > Not that you mention it, this is a somewhat interesting development. > I wonder what they're up to? My guess is something like this, given what pallas does, but if this is the case, they may be preparing to attempt a task that has brought strong programmers to their knees repeatedly in the past -- create a true parallel compiler. A compiler where the thread library transparently hides a network-based cluster, complete with migration and load balancing. So the same code, written on top of a threading library, could compile and run transparently on a single processor or a multiprocessor or a distributed cluster. Or something. Hell, they're one of the few entities that can afford to tackle such a blue-sky project, and just perhaps it is time for the project to be tackled. At least they can attack it from both ends at once -- writing the compiler at the same time they hack the hardware around. But they're going to have create a hardware-level virtual interface for a variety of IPC mechanism's for this to work, I think, in order to instrument it locally and globally with no particular penalty either way. Or, of course, buy SCI and start putting the chipset on their motherboards as a standard feature on a custom bus. Myricom wouldn't like that (or Dolphin if they went the other way), but it would make a hell of a clustering motherboard. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Thu Aug 28 12:22:01 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Thu, 28 Aug 2003 11:22:01 -0500 (CDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E24D1.9010301@bellsouth.net> Message-ID: On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > Bob, > > Very interesting observation. I wonder if Intel doesn't have something > else up their sleeve? Could they be trying to get back into Supercomputer > game (not likely, but didn't they get some DoD money recently?). Could > they be helping with networking stuff (Intel has been discussing the next > generation networking stuff lately). Maybe some sort of TCP Offload > Engine? Maybe something with their new bus ( PCI Express?) They have also > created CSA (Communication Streaming Architecture) in their new chipset > to bypass the PCI bottleneck. Of course they could also be after the Pallas > parallel debuggers to integrate into their compilers (like you mentioned) > or perhaps to help with debugging threaded code in the hyperthreaded chips. > Not that you mention it, this is a somewhat interesting development. > I wonder what they're up to? > Intel's already dropped Infiniband, and they have also recently gotten very quiet about using PCI Express as a node interconnect. In fact, this use of PCI Express has recently been switched to one of their "non-Goals" for the technology. I'd guess that Intel does not care about this market. This is fine by me. I'd rather have the Myricom's and Dolphin's that live or die by their products to ensure the products are getting the attention they deserve. -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From walkev at presearch.com Thu Aug 28 12:49:25 2003 From: walkev at presearch.com (Vann H. Walke) Date: Thu, 28 Aug 2003 12:49:25 -0400 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <1062089290.4363.22.camel@localhost.localdomain> Hmmm... Not to throw water on hopes for parallelizing compilers and Intel supported parallel debuggers, but my guess is that Intel's move is much less revolutionary (but perhaps still important). Pallas's main HPC product is Vampir/Vampirtrace. These are performance analysis tools. As such they would only be peripherally useful for compiler design (perhaps to measure the effects of certain changes). Even for this purpose, Vampir/Vampirtrace doesn't provide the amount of detail that Intel's own V-Tune product does. For debugging, Pallas resells Etnus Totalview. For compiler options Pallas has the Intel compilers as well as PGI. As far as I can tell, Pallas doesn't do any significant independent development for these systems. So, what does the Pallas performance analysis product do that is important? Vampir/Vampirtrace allows the collection and display of data from a large number of programs running in parallel. Doing this well is not trivial. Time differences between machines must be taken into account. The tools must be able to handle a potentially huge amount of trace data (running a profiler on a 1000 process system is a much different animal from instrumenting a single process job). And, finally once all this data is collected it has to be presented in some way which can actually be of use. VA/VT is among the best available tools for this purpose. So, why would Intel want to acquire Pallas? First, they have a good product which can be sold at a high price. Combined with some Intel marketing they "should" be able to make money on the product. Second, Vampirtrace has the capability of using processor performance counters. By pushing the capabilities of VA/VT to work on Intel processors it promotes "lock-in" to Intel processors. In this way a developer using the Intel compilers, V-Tune for single process analysis, and Vampir for parallel profiling, wouldn't be likely to move to an AMD or Power platform. Is this a good thing? For the most part probably so. Intel should be able to help improve the Vampir software. Making it work even better on Intel processors doesn't really hurt things if you're using another system and might make things really nice for those of us on Intel hardware. Hopefully it's development on other systems won't languish. But, on the basis of this acquisition, I wouldn't hold my breath for parallel compilers or a full fledged Intel return to the HPC market. Vann Presearch, Inc. On Thu, 2003-08-28 at 12:05, Robert G. Brown wrote: > On Thu, 28 Aug 2003, Jeffrey B. Layton wrote: > > > to bypass the PCI bottleneck. Of course they could also be after the Pallas > > parallel debuggers to integrate into their compilers (like you mentioned) > > or perhaps to help with debugging threaded code in the hyperthreaded chips. > > Not that you mention it, this is a somewhat interesting development. > > I wonder what they're up to? > > My guess is something like this, given what pallas does, but if this is > the case, they may be preparing to attempt a task that has brought > strong programmers to their knees repeatedly in the past -- create a > true parallel compiler. A compiler where the thread library > transparently hides a network-based cluster, complete with migration and > load balancing. So the same code, written on top of a threading > library, could compile and run transparently on a single processor or a > multiprocessor or a distributed cluster. Or something. > > Hell, they're one of the few entities that can afford to tackle such a > blue-sky project, and just perhaps it is time for the project to be > tackled. At least they can attack it from both ends at once -- writing > the compiler at the same time they hack the hardware around. But > they're going to have create a hardware-level virtual interface for a > variety of IPC mechanism's for this to work, I think, in order to > instrument it locally and globally with no particular penalty either > way. Or, of course, buy SCI and start putting the chipset on their > motherboards as a standard feature on a custom bus. Myricom wouldn't > like that (or Dolphin if they went the other way), but it would make a > hell of a clustering motherboard. > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Thu Aug 28 13:53:32 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu, 28 Aug 2003 10:53:32 -0700 Subject: Intel acquiring Pallas In-Reply-To: <3F4E24D1.9010301@bellsouth.net> References: Message-ID: <5.2.0.9.2.20030828104843.03073620@mailhost4.jpl.nasa.gov> At 11:50 AM 8/28/2003 -0400, Jeffrey B. Layton wrote: >Robert G. Brown wrote: > >> >>Interesting. I'm trying to understand where and how this will help them >>-- more often than not it is a Bad Thing when hardware mfrs start >>dabbling in something higher than firmware or compilers -- Apple (and >>Next in its day) stands at one end of that path. >> >>It's especially curious given that Intel is already overwhelmingly >>dominant in the compute cluster arena (with only AMD a meaningful >>cluster competitor, and with apple and the PPC perhas a distant third). >>Not to mention the fact that if they REALLY wanted to get an edge in the >>compute cluster arena, they'd acquire somebody like Dolphin or Myricom. >> >> rgb > >Bob, > > Very interesting observation. I wonder if Intel doesn't have something >else up their sleeve? Could they be trying to get back into Supercomputer >game (not likely, but didn't they get some DoD money recently?). Could >they be helping with networking stuff (Intel has been discussing the next >generation networking stuff lately). Maybe some sort of TCP Offload >Engine? Maybe something with their new bus ( PCI Express?) They have also >created CSA (Communication Streaming Architecture) in their new chipset >to bypass the PCI bottleneck. Of course they could also be after the Pallas >parallel debuggers to integrate into their compilers (like you mentioned) >or perhaps to help with debugging threaded code in the hyperthreaded chips. > Not that you mention it, this is a somewhat interesting development. >I wonder what they're up to? > >Jeff Intel is making a big push into wireless and RF technology. A recent article ( I don't recall where exactly,but one of the trade rags..) mentioned that the mass market (consumer) don't seem to need much more processor crunch (at least until Windows XXXP comes out, then you'll need all that power just to apply the patches), but that they saw a big market opportunity in integrated wireless networking. Simultaneously, the generalized tanking of the telecom industry has meant that they can hire very skilled RF engineers for reasonable wages without having to compete against speculative piles of options, etc. (I suspect that there are some skilled RF engineers who are now older and wiser and less speculative, too!) We're talking about RF chip designers, as well as PWB layout, circuit designers, and antenna folks. It wouldn't surprise me that Intel is looking at other areas than traditional CPU and processor support kinds of roles. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at cert.ucr.edu Thu Aug 28 14:19:28 2003 From: glen at cert.ucr.edu (Glen Kaukola) Date: Thu, 28 Aug 2003 11:19:28 -0700 Subject: thrashing Message-ID: <3F4E47B0.3000805@cert.ucr.edu> Hi there, So for our newest simulations, we're working with a different domain, where each of our grid cells are much smaller, and so we're expecting the runs to take about 4 times longer. But actually they're taking around 40 times longer. I'm thinking this may have something to do with not having enough memory. The problem with this theory is that I'm not really sure how to tell if my machines are thrashing. On a desktop machine I can tell no problem, as the disk starts going crazy and the system pretty much grinds to a halt. But on a machine up in my server room on which I don't have any gui and where it's too loud to hear any disk activity, I'm really not sure how to tell whether it's thrashing or not. I mean, I can look at top, and free, and sar and everything doesn't look much different than when the other simulations were running, except for maybe 'sar -W', which is a little bit higher. Anyway, if someone could help me out with a way to determine without a doubt if my machines are thrashing or not, then I'd greatly appriciate it. Thanks for your time, Glen Kaukola _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Aug 28 14:35:49 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 28 Aug 2003 14:35:49 -0400 (EDT) Subject: perl-bproc bindings In-Reply-To: <1062080659.7565.0.camel@roughneck> Message-ID: On 28 Aug 2003, Nicholas Henke wrote: > On Thu, 2003-08-28 at 01:49, Donald Becker wrote: > > On Tue, 26 Aug 2003, Daniel Widyono wrote: > > > > > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc, > > > something more recent than spring of 2001? > > > > There are updated bindings, and a small example, at > > ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz > > Any chance you guys have updated python bindings as well? 0.9-8 is the current version -- which are you using? The last bugfix was logged in October of 2003. The next planned refresh has added bindings for the Beostat statistics library, Beomap job mapping and BBQ job scheduling systems. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at bellsouth.net Thu Aug 28 14:39:02 2003 From: laytonjb at bellsouth.net (Jeffrey B. Layton) Date: Thu, 28 Aug 2003 14:39:02 -0400 Subject: thrashing In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu> References: <3F4E47B0.3000805@cert.ucr.edu> Message-ID: <3F4E4C46.1010300@bellsouth.net> Use vmstat. Try something like, vmstat 1 10 (1 second delay, 10 repeats). Look in the columns labeled, swap si so that will give you the information you want. Good Luck! Jeff > Hi there, > > So for our newest simulations, we're working with a different domain, > where each of our grid cells are much smaller, and so we're expecting > the runs to take about 4 times longer. But actually they're taking > around 40 times longer. I'm thinking this may have something to do > with not having enough memory. The problem with this theory is that > I'm not really sure how to tell if my machines are thrashing. On a > desktop machine I can tell no problem, as the disk starts going crazy > and the system pretty much grinds to a halt. But on a machine up in > my server room on which I don't have any gui and where it's too loud > to hear any disk activity, I'm really not sure how to tell whether > it's thrashing or not. I mean, I can look at top, and free, and sar > and everything doesn't look much different than when the other > simulations were running, except for maybe 'sar -W', which is a little > bit higher. Anyway, if someone could help me out with a way to > determine without a doubt if my machines are thrashing or not, then > I'd greatly appriciate it. > > Thanks for your time, > Glen Kaukola > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Thu Aug 28 15:08:53 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: 28 Aug 2003 15:08:53 -0400 Subject: thrashing In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu> References: <3F4E47B0.3000805@cert.ucr.edu> Message-ID: <1062097733.8882.120.camel@protein.scalableinformatics.com> Hi Glen: Several methods. 1) vmstat vmstat 1 and look at the so/si columns, not to mention the r/b/w. 2) swapon -s to see the swap usage 3) top has an ok summary of the vm info 4) cat /proc/meminfo can give a crude picture of the memory system. On Thu, 2003-08-28 at 14:19, Glen Kaukola wrote: > Hi there, > > So for our newest simulations, we're working with a different domain, > where each of our grid cells are much smaller, and so we're expecting > the runs to take about 4 times longer. But actually they're taking > around 40 times longer. I'm thinking this may have something to do with > not having enough memory. The problem with this theory is that I'm not > really sure how to tell if my machines are thrashing. On a desktop > machine I can tell no problem, as the disk starts going crazy and the > system pretty much grinds to a halt. But on a machine up in my server > room on which I don't have any gui and where it's too loud to hear any > disk activity, I'm really not sure how to tell whether it's thrashing or > not. I mean, I can look at top, and free, and sar and everything > doesn't look much different than when the other simulations were > running, except for maybe 'sar -W', which is a little bit higher. > Anyway, if someone could help me out with a way to determine without a > doubt if my machines are thrashing or not, then I'd greatly appriciate it. > > Thanks for your time, > Glen Kaukola > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Aug 28 15:43:08 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 28 Aug 2003 12:43:08 -0700 Subject: 32bit slots and riser cards In-Reply-To: References: Message-ID: <20030828194308.GA1778@greglaptop.internal.keyresearch.com> On Thu, Aug 28, 2003 at 02:57:15PM +0200, Bogdan Costescu wrote: > the Myrinet card does not fit (keyed differently) into the 64bit PCI slots > or 64bit risers. Is there some solution to this problem or do I have to go > back to midi-tower cases ? Doesn't that mean that the Myrinet card is 5 volts, and you only have 3.3 volt PCI slots? It's such an old Myrinet card that I don't remember the details of when PCI got a 3.3 volt option. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 16:24:47 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 16:24:47 -0400 (EDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E44D8.7090200@wildopensource.com> Message-ID: On Thu, 28 Aug 2003, Stephen Gaudet wrote: > With Itanium2 this is not the case. Both Red Hat and SuSe have a fixed > cost of about $400.00, plus or minus a few dollars per system. > Therefore, due to this fixed cost, MOST people looking at a cluster > won't touch Itanium2. Steve, Are you suggesting RH has put together a package that is NOT GPL in any way that would significantly affect the 64 bit market? The kernel, the compiler, and damn near every package is GPL, much of it from Gnu itself. Am I crazy here? So I'm having a hard time seeing why one would HAVE to pay them $400/system for anything except perhaps proprietary non-GPL "advanced server" packages that almost certainly wouldn't be important to HPC cluster builders (and which they would have had to damn near develop in a sealed room to avoid incorporating GPL stuff in it anywhere). > Some white box resellers are looking at taking RH Advanced Server and > stripping it down and offering on their ia64 clusters. However, if > their not working with code lawyers, and paying very close attention to > copy right laws, they could end up with law suits down the road. If Red Hat isn't careful and not working very carefully with code lawyers, I think the reverse is a lot more likely, as Richard Stallman is known to take the Gnu Public License (free as in air at the source level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't "own" a hell of a lot of code in what they sell; the bulk of what they HAVE written is GPL derived and hence GPL by inheritance alone. The Open Source community would stomp anything out of line with hobnailed boots and club it until it stopped twitching... So although many a business may cheerfully pay $400/seat for advanced server because it is a cost and support model they are comfortable with, I don't see what there is to stop anyone from taking an advanced server copy (which necessarily either comes with src rpm's or makes them publically available somewhere), doing an rpmbuild on all the src rpm's (as if anyone would care that you went through an independent rebuild vs just used the distribution rpm's) and putting it on 1000 systems, or giving the sources to a friend, or even reselling a repackaging of the whole thing (as long as they don't call them Red Hat and as long as they omit any really proprietary non-GPL work). I even thought there were some people on the list who were using at least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm wrong...:-( rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Aug 28 16:23:37 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 28 Aug 2003 13:23:37 -0700 Subject: thrashing In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu> References: <3F4E47B0.3000805@cert.ucr.edu> Message-ID: <20030828202337.GA1964@greglaptop.internal.keyresearch.com> On Thu, Aug 28, 2003 at 11:19:28AM -0700, Glen Kaukola wrote: > I mean, I can look at top, and free, and sar and everything > doesn't look much different than when the other simulations were > running, A clear sign of thrashing is that the program should be getting a lot less than 100% of the cpu, because it's waiting for blocks from the disk. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Aug 28 16:31:20 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 28 Aug 2003 13:31:20 -0700 Subject: Change Management Control In-Reply-To: <20030826143305.24318.qmail@houston.wolf.com> References: <20030826143305.24318.qmail@houston.wolf.com> Message-ID: <20030828203120.GB1964@greglaptop.internal.keyresearch.com> > I am looking for information/sites and a formal best practice change > control for clusters. Can someone point me in the right direction? thx -ar Most clusters are a lot more informal, and don't have any kind of change control. I suspect your best bet would be to look at people involved in LISA: Large Installation Systems Administration. These guys are mostly commercial, and we (the HPC cluster community) don't talk to them much, even though we should. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 16:42:19 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 16:42:19 -0400 (EDT) Subject: thrashing In-Reply-To: <1062097733.8882.120.camel@protein.scalableinformatics.com> Message-ID: On 28 Aug 2003, Joseph Landman wrote: > Hi Glen: > > Several methods. > > 1) vmstat > > vmstat 1 > > and look at the so/si columns, not to mention the r/b/w. > > 2) swapon -s > > to see the swap usage > > 3) top > > has an ok summary of the vm info > > 4) cat /proc/meminfo > > can give a crude picture of the memory system. and if you want to watch pretty much all of this information in parallel (on all the systems at once) xmlsysd provides output fields with the information available in both vmstat and free (cat /proc/meminfo), so you can actually watch for swapping or paging or leaks on lots of systems at once in wulfstat. It easily handles updates with a 5 second granularity and can often manage 1 second (depending on your network and number of nodes and so forth). It's on the brahma website or linked under my own. I don't really provide a direct monitor of disk activity (partly out of irritation at custom-parsing the multidelimited "disk_io" field in /proc/stat), but if you were really interested in it I could probably bite the bullet and add a "disk" display that would work for up to four disks in a few hours of work. I'd guess that ganglia could also manage this sort of monitoring as well, but I don't use it (as I wrote my package before they started theirs by a year or three) so I don't know for sure. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Aug 28 16:49:06 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 28 Aug 2003 16:49:06 -0400 (EDT) Subject: 32bit slots and riser cards In-Reply-To: <20030828194308.GA1778@greglaptop.internal.keyresearch.com> Message-ID: On Thu, 28 Aug 2003, Greg Lindahl wrote: > On Thu, Aug 28, 2003 at 02:57:15PM +0200, Bogdan Costescu wrote: > > > the Myrinet card does not fit (keyed differently) into the 64bit PCI slots > > or 64bit risers. Is there some solution to this problem or do I have to go > > back to midi-tower cases ? > > Doesn't that mean that the Myrinet card is 5 volts, and you only have > 3.3 volt PCI slots? It's such an old Myrinet card that I don't > remember the details of when PCI got a 3.3 volt option. I think that this is right, Greg -- the keying is related to voltage. If your actual PCI slots are keyed correctly, they should be able to manage either voltage (IIRC), but you may have to replace the risers. We've had trouble getting risers that didn't key correctly or work correctly for one kind of card or the other (or one motherboard or another) in the past. It sounds like this might be your problem if you're referring to replacing the cases and not the motherboard itself. Look around and see if you find better/different risers -- there are a fair number of different kinds of risers out there, at least for 2U. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Aug 28 17:06:40 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 28 Aug 2003 14:06:40 -0700 (PDT) Subject: Intel acquiring Pallas In-Reply-To: Message-ID: The cost of the os, either of a blessed one, or a roll your own one hasn't been a significant factor in our reluctance to use Itanium II. The lack of commodity mainboards. The steep price of the cpu's. and lack of a clear view into intels product lifecycle for itaniumII. have been issues. Itanium II 1.3ghz 3mb cpu's have only recently arrived at ~$1400ea. opteron 244s are less than half that and that's before we put the rest of the system around it. we have some off-the-shelf compaq itanium boxes to evaluate but at around $8000 ea that sort of a non-starter. joelja On Thu, 28 Aug 2003, Robert G. Brown wrote: > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > > With Itanium2 this is not the case. Both Red Hat and SuSe have a fixed > > cost of about $400.00, plus or minus a few dollars per system. > > Therefore, due to this fixed cost, MOST people looking at a cluster > > won't touch Itanium2. > > Steve, > > Are you suggesting RH has put together a package that is NOT GPL in any > way that would significantly affect the 64 bit market? The kernel, the > compiler, and damn near every package is GPL, much of it from Gnu > itself. Am I crazy here? > > So I'm having a hard time seeing why one would HAVE to pay them > $400/system for anything except perhaps proprietary non-GPL "advanced > server" packages that almost certainly wouldn't be important to HPC > cluster builders (and which they would have had to damn near develop in > a sealed room to avoid incorporating GPL stuff in it anywhere). > > > Some white box resellers are looking at taking RH Advanced Server and > > stripping it down and offering on their ia64 clusters. However, if > > their not working with code lawyers, and paying very close attention to > > copy right laws, they could end up with law suits down the road. > > If Red Hat isn't careful and not working very carefully with code > lawyers, I think the reverse is a lot more likely, as Richard Stallman > is known to take the Gnu Public License (free as in air at the source > level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't > "own" a hell of a lot of code in what they sell; the bulk of what they > HAVE written is GPL derived and hence GPL by inheritance alone. The > Open Source community would stomp anything out of line with hobnailed > boots and club it until it stopped twitching... > > So although many a business may cheerfully pay $400/seat for advanced > server because it is a cost and support model they are comfortable with, > I don't see what there is to stop anyone from taking an advanced server > copy (which necessarily either comes with src rpm's or makes them > publically available somewhere), doing an rpmbuild on all the src rpm's > (as if anyone would care that you went through an independent rebuild vs > just used the distribution rpm's) and putting it on 1000 systems, or > giving the sources to a friend, or even reselling a repackaging of the > whole thing (as long as they don't call them Red Hat and as long as they > omit any really proprietary non-GPL work). > > I even thought there were some people on the list who were using at > least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm > wrong...:-( > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Aug 28 17:16:38 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 28 Aug 2003 14:16:38 -0700 (PDT) Subject: 32bit slots and riser cards In-Reply-To: Message-ID: On Thu, 28 Aug 2003, Bogdan Costescu wrote: > In planning for some new cluster nodes, I hit a small problem. I want: > - a modern mainboard for dual-Xeon (preferred) or dual-Athlon > - 1U or 2U rackmounted case > - to use an old Myrinet M2M-PCI32 (Lanai4) which requires a riser card for > installing in the case the 2U chassis should be trivial to solve for either 32bit or 64bit pci slots for 1U chassis... you need to pick "right motherboard" that works with the chassis ... and pci cards ( you cannot do a mix and match with any motherboard ) if you want performance out of your pci card, you will have to use 64bit pci slots or 32bit pci slot - but the riser card should be one piece instead of the whacky non-conforming wires between the "2 sections of the pci riser" 32 and 64 bit pci riser cards (not cheap but lot better than most others) http://www.adexelec.com c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From johnb at quadrics.com Thu Aug 28 17:22:13 2003 From: johnb at quadrics.com (John Brookes) Date: Thu, 28 Aug 2003 22:22:13 +0100 Subject: thrashing Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E2C1@stegosaurus.bristol.quadrics.com> The test I would probably suggest to someone whose machine I had no access to is 'swapoff -a'. It's not big and it's not clever, but largely removes the need for value judgements: if it bombs in an OOM style, you were most probably thrashing. Just a thought. Cheers, John Brookes Quadrics > -----Original Message----- > From: Glen Kaukola [mailto:glen at mail.cert.ucr.edu] > Sent: 28 August 2003 19:19 > To: beowulf at beowulf.org > Subject: thrashing > > > Hi there, > > So for our newest simulations, we're working with a different domain, > where each of our grid cells are much smaller, and so we're expecting > the runs to take about 4 times longer. But actually they're taking > around 40 times longer. I'm thinking this may have something > to do with > not having enough memory. The problem with this theory is > that I'm not > really sure how to tell if my machines are thrashing. On a desktop > machine I can tell no problem, as the disk starts going crazy and the > system pretty much grinds to a halt. But on a machine up in > my server > room on which I don't have any gui and where it's too loud to > hear any > disk activity, I'm really not sure how to tell whether it's > thrashing or > not. I mean, I can look at top, and free, and sar and everything > doesn't look much different than when the other simulations were > running, except for maybe 'sar -W', which is a little bit higher. > Anyway, if someone could help me out with a way to determine > without a > doubt if my machines are thrashing or not, then I'd greatly > appriciate it. > > Thanks for your time, > Glen Kaukola > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From exa at kablonet.com.tr Thu Aug 28 13:56:53 2003 From: exa at kablonet.com.tr (Eray Ozkural) Date: Thu, 28 Aug 2003 20:56:53 +0300 Subject: Filesystem In-Reply-To: References: Message-ID: <200308282056.54106.exa@kablonet.com.tr> On Saturday 02 August 2003 05:45, Alvin Oga wrote: > i think ext3 is better than reiserfs > > i think ext3 is not any better than ext2 in terms > of somebody hitting pwer/reset w/o proper shutdown > - i always allow it to run e2fsck when it does > an unclean shutdown ... > > - yes ext3 will timeout and continue and restore from > backups but ... am paranoid about the underlying ext2 > getting corrupted by random power off and resets > I basically think ext3 and ext2 are a joke and we use XFS on the nodes with no performance problem. Excellent reliability! Regards, -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sp at scali.com Thu Aug 28 16:55:51 2003 From: sp at scali.com (Steffen Persvold) Date: Thu, 28 Aug 2003 22:55:51 +0200 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <3F4E6C57.9030406@scali.com> Robert G. Brown wrote: > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > >>With Itanium2 this is not the case. Both Red Hat and SuSe have a fixed >>cost of about $400.00, plus or minus a few dollars per system. >>Therefore, due to this fixed cost, MOST people looking at a cluster >>won't touch Itanium2. > > > Steve, > > Are you suggesting RH has put together a package that is NOT GPL in any > way that would significantly affect the 64 bit market? The kernel, the > compiler, and damn near every package is GPL, much of it from Gnu > itself. Am I crazy here? > > So I'm having a hard time seeing why one would HAVE to pay them > $400/system for anything except perhaps proprietary non-GPL "advanced > server" packages that almost certainly wouldn't be important to HPC > cluster builders (and which they would have had to damn near develop in > a sealed room to avoid incorporating GPL stuff in it anywhere). > > >>Some white box resellers are looking at taking RH Advanced Server and >>stripping it down and offering on their ia64 clusters. However, if >>their not working with code lawyers, and paying very close attention to >>copy right laws, they could end up with law suits down the road. > > > If Red Hat isn't careful and not working very carefully with code > lawyers, I think the reverse is a lot more likely, as Richard Stallman > is known to take the Gnu Public License (free as in air at the source > level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't > "own" a hell of a lot of code in what they sell; the bulk of what they > HAVE written is GPL derived and hence GPL by inheritance alone. The > Open Source community would stomp anything out of line with hobnailed > boots and club it until it stopped twitching... > > So although many a business may cheerfully pay $400/seat for advanced > server because it is a cost and support model they are comfortable with, > I don't see what there is to stop anyone from taking an advanced server > copy (which necessarily either comes with src rpm's or makes them > publically available somewhere), doing an rpmbuild on all the src rpm's > (as if anyone would care that you went through an independent rebuild vs > just used the distribution rpm's) and putting it on 1000 systems, or > giving the sources to a friend, or even reselling a repackaging of the > whole thing (as long as they don't call them Red Hat and as long as they > omit any really proprietary non-GPL work). > > I even thought there were some people on the list who were using at > least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm > wrong...:-( > Robert, AFAIK, there is no "proprietary non-GPL" work in RedHat's Enterprise Linux line. I think the price is so high because of the support level you're buing. All the source for RHEL, either 32bit or 64bit is available on their ftp sites for download. And as long as they do that I don't think they're violating GPL, but I might be wrong (as I'm not a lawyers, but I'm sure RH has plenty of them). And actually, according to their web site, the cheapest (most suitable cluster) release for ITP2; RHEL WS (workstation) is $792, AS (advanced server) is $1992 for standard edition and $2988 for premium edition. Regards, -- Steffen Persvold ,,, mailto: sp at scali.com Senior Software Engineer (o-o) http://www.scali.com -----------------------------oOO-(_)-OOo----------------------------- Scali AS, PObox 150, Oppsal, N-0619 Oslo, Norway, Tel: +4792484511 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Aug 28 18:11:19 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 28 Aug 2003 15:11:19 -0700 (PDT) Subject: Intel acquiring Pallas In-Reply-To: <3F4E6C57.9030406@scali.com> Message-ID: Stephen... anyone who wants can grab the entire srpms dir for AS and build it. The only way they'll end up with a lawsuit is if they represent the result as official suppoprt redhat linux AS... If you like you can pick it up from the RH mirrors including mine. > > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > > >>Some white box resellers are looking at taking RH Advanced Server and > >>stripping it down and offering on their ia64 clusters. However, if > >>their not working with code lawyers, and paying very close attention to > >>copy right laws, they could end up with law suits down the road. > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Thu Aug 28 18:25:42 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 29 Aug 2003 00:25:42 +0200 (CEST) Subject: 32bit slots and riser cards In-Reply-To: Message-ID: On Thu, 28 Aug 2003, Alvin Oga wrote: > the 2U chassis should be trivial to solve for either 32bit or 64bit pci > slots Well, maybe trivial for you who do this for a living :-) > for 1U chassis... you need to pick "right motherboard" that works with the > chassis ... and pci cards > ( you cannot do a mix and match with any motherboard ) Sure, but I was looking for example at the Intel offerings which pair dual Xeon mainboards with 1U/2U cases that are certified to work together. > if you want performance out of your pci card, I know that this 32bit/33MHz card looks slow by today's standards, but I think that it can still provide lower latency than e1000 or tg3 -driven cards, so Id' like to continue to use them. > 32 and 64 bit pci riser cards (not cheap but lot better than most others) > http://www.adexelec.com Many thanks for this address. I did try to use google before writting to the list, but I came with all sorts of shops, but nothing with good descriptions, which is what I needed most. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Thu Aug 28 18:44:24 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 29 Aug 2003 00:44:24 +0200 (CEST) Subject: 32bit slots and riser cards In-Reply-To: <20030828194308.GA1778@greglaptop.internal.keyresearch.com> Message-ID: On Thu, 28 Aug 2003, Greg Lindahl wrote: > Doesn't that mean that the Myrinet card is 5 volts, and you only have > 3.3 volt PCI slots ? Bingo. This is exactly what most people that wrote to me off-list probably missed, although I did mention that it doesn't fit because of a different keying - I should have probably mentioned this explicitly. All the 32bit slots that I've seen on these mainboards allow inserting such cards, which makes me believe that they support both 5V and 3.3V cards; but the 64bit slots are 3.3V only. I don't have much experience with rackmounted systems, which it's probably evident, so I didn't know what to expect from a riser. Thanks to Alvin Oga's mention of Adexelec site, I was able to find out that the risers exist in many different variations. For example, I was wondering if such a riser exist that would allow mounting of the card from the edge toward the middle of the mainboard, while most common way is the other way around - I still need to find out if the case allows fixing of the card the other way around, but this is an easier problem to solve. One other thing that turned me off was that in a system composed of only Intel components (SE7501WV2 mainboard and SR2300 case) the 64bit PCI slots on the mainboard allow inserting of the Myrinet card (but didn't try to see if it works), while the riser cards that came with the case do not, allowing only 3.3V ones - so the riser imposes additional limitations... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gotero at linuxprophet.com Thu Aug 28 19:46:29 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Thu, 28 Aug 2003 16:46:29 -0700 Subject: Intel acquiring Pallas In-Reply-To: Message-ID: Joel- Have you actually built RH AS from scratch using their SRPMS? Or do you know anyone that has? I'm very interested in doing this but I heard there were some pretty significant obstacles along the lines of package dependencies. Glen On Thursday, August 28, 2003, at 03:11 PM, Joel Jaeggli wrote: > Stephen... anyone who wants can grab the entire srpms dir for AS and > build > it. The only way they'll end up with a lawsuit is if they represent > the > result as official suppoprt redhat linux AS... > > If you like you can pick it up from the RH mirrors including mine. > >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote: >>> >>>> Some white box resellers are looking at taking RH Advanced Server >>>> and >>>> stripping it down and offering on their ia64 clusters. However, if >>>> their not working with code lawyers, and paying very close >>>> attention to >>>> copy right laws, they could end up with law suits down the road. >>> > > -- > ----------------------------------------------------------------------- > --- > Joel Jaeggli Unix Consulting > joelja at darkwing.uoregon.edu > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F > 56B2 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From walkev at presearch.com Thu Aug 28 19:57:53 2003 From: walkev at presearch.com (Vann H. Walke) Date: Thu, 28 Aug 2003 19:57:53 -0400 Subject: Intel acquiring Pallas (Redhat AS Rebuild) In-Reply-To: References: Message-ID: <1062115073.7007.2.camel@localhost.localdomain> Haven't tried it but... http://www2.uibk.ac.at/zid/software/unix/linux/rhel-rebuild.htm http://www.uibk.ac.at/zid/software/unix/linux/rhel-rebuild-l.html Vann On Thu, 2003-08-28 at 19:46, Glen Otero wrote: > Joel- > > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? I'm very interested in doing this but I heard > there were some pretty significant obstacles along the lines of package > dependencies. > > Glen > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Aug 28 20:54:43 2003 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 29 Aug 2003 10:54:43 +1000 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <200308291054.45059.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 29 Aug 2003 09:46 am, Glen Otero wrote: > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? - From the Rocks Cluster Distribution website: http://www.rocksclusters.org/Rocks/ [...] Rocks 2.3.2 IA64 is based on Red Hat Advanced Workstation 2.1 recompiled from Red Hat's publicly available source RPMs. [...] - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/TqRTO2KABBYQAh8RAnd4AJkBCFmq3tyb97EgHvg5x9mrsqkGGQCghGqG 9cF9eAKLTHD6lQS4kZGtg0A= =WVIz -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Thu Aug 28 20:59:57 2003 From: timm at fnal.gov (Steven Timm) Date: Thu, 28 Aug 2003 19:59:57 -0500 Subject: Intel acquiring Pallas In-Reply-To: Message-ID: The ROCKS distribution at www.rocksclusters.org claims to have done so for the IA64 architecture.. I have not tested it myself. Your mileage may vary. Steve ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Core Support Services Dept. Assistant Group Leader, Scientific Computing Support Group Lead of Computing Farms Team On Thu, 28 Aug 2003, Glen Otero wrote: > Joel- > > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? I'm very interested in doing this but I heard > there were some pretty significant obstacles along the lines of package > dependencies. > > Glen > > On Thursday, August 28, 2003, at 03:11 PM, Joel Jaeggli wrote: > > > Stephen... anyone who wants can grab the entire srpms dir for AS and > > build > > it. The only way they'll end up with a lawsuit is if they represent > > the > > result as official suppoprt redhat linux AS... > > > > If you like you can pick it up from the RH mirrors including mine. > > > >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote: > >>> > >>>> Some white box resellers are looking at taking RH Advanced Server > >>>> and > >>>> stripping it down and offering on their ia64 clusters. However, if > >>>> their not working with code lawyers, and paying very close > >>>> attention to > >>>> copy right laws, they could end up with law suits down the road. > >>> > > > > -- > > ----------------------------------------------------------------------- > > --- > > Joel Jaeggli Unix Consulting > > joelja at darkwing.uoregon.edu > > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F > > 56B2 > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nfalano at hotmail.com Thu Aug 28 21:29:34 2003 From: nfalano at hotmail.com (Norman Alano) Date: Fri, 29 Aug 2003 09:29:34 +0800 Subject: mpich Message-ID: greetings ! i already installed mpich... but the problem is whenever i run an application for instant the examples in the mpich the graphics wont show.... how can i configure so that i can run the application with graphic? cheers norman _________________________________________________________________ The new MSN 8: advanced junk mail protection and 2 months FREE* http://join.msn.com/?page=features/junkmail _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Aug 29 00:04:27 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 28 Aug 2003 21:04:27 -0700 (PDT) Subject: Intel acquiring Pallas In-Reply-To: Message-ID: I've built almost all of it with the exception of gtk and kde related stuff which was outside the scope of my interest, on a redhat 7.2 box... I wouldn't try it on a 9 host. joelja On Thu, 28 Aug 2003, Glen Otero wrote: > Joel- > > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? I'm very interested in doing this but I heard > there were some pretty significant obstacles along the lines of package > dependencies. > > Glen > > On Thursday, August 28, 2003, at 03:11 PM, Joel Jaeggli wrote: > > > Stephen... anyone who wants can grab the entire srpms dir for AS and > > build > > it. The only way they'll end up with a lawsuit is if they represent > > the > > result as official suppoprt redhat linux AS... > > > > If you like you can pick it up from the RH mirrors including mine. > > > >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote: > >>> > >>>> Some white box resellers are looking at taking RH Advanced Server > >>>> and > >>>> stripping it down and offering on their ia64 clusters. However, if > >>>> their not working with code lawyers, and paying very close > >>>> attention to > >>>> copy right laws, they could end up with law suits down the road. > >>> > > > > -- > > ----------------------------------------------------------------------- > > --- > > Joel Jaeggli Unix Consulting > > joelja at darkwing.uoregon.edu > > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F > > 56B2 > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sgaudet at wildopensource.com Fri Aug 29 09:52:31 2003 From: sgaudet at wildopensource.com (Stephen Gaudet) Date: Fri, 29 Aug 2003 09:52:31 -0400 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <3F4F5A9F.5000809@wildopensource.com> Robert, and everyone else, To be clear on this without breaking NDA's see below; > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > >>With Itanium2 this is not the case. Both Red Hat and SuSe have a fixed >>cost of about $400.00, plus or minus a few dollars per system. >>Therefore, due to this fixed cost, MOST people looking at a cluster >>won't touch Itanium2. > > > Steve, > > Are you suggesting RH has put together a package that is NOT GPL in any > way that would significantly affect the 64 bit market? The kernel, the > compiler, and damn near every package is GPL, much of it from Gnu > itself. Am I crazy here? > > So I'm having a hard time seeing why one would HAVE to pay them > $400/system for anything except perhaps proprietary non-GPL "advanced > server" packages that almost certainly wouldn't be important to HPC > cluster builders (and which they would have had to damn near develop in > a sealed room to avoid incorporating GPL stuff in it anywhere). > > >>Some white box resellers are looking at taking RH Advanced Server and >>stripping it down and offering on their ia64 clusters. However, if >>their not working with code lawyers, and paying very close attention to >>copy right laws, they could end up with law suits down the road. I can't really comment here on what I hear resellers looking to do. > If Red Hat isn't careful and not working very carefully with code > lawyers, I think the reverse is a lot more likely, as Richard Stallman > is known to take the Gnu Public License (free as in air at the source > level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't > "own" a hell of a lot of code in what they sell; the bulk of what they > HAVE written is GPL derived and hence GPL by inheritance alone. The > Open Source community would stomp anything out of line with hobnailed > boots and club it until it stopped twitching... > > So although many a business may cheerfully pay $400/seat for advanced > server because it is a cost and support model they are comfortable with, > I don't see what there is to stop anyone from taking an advanced server > copy (which necessarily either comes with src rpm's or makes them > publically available somewhere), doing an rpmbuild on all the src rpm's > (as if anyone would care that you went through an independent rebuild vs > just used the distribution rpm's) and putting it on 1000 systems, or > giving the sources to a friend, or even reselling a repackaging of the > whole thing (as long as they don't call them Red Hat and as long as they > omit any really proprietary non-GPL work). > > I even thought there were some people on the list who were using at > least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm > wrong...:-( > In regards to the high-performance/technical computing space. People buy Red Hat Advanced Server and SuSE Linux Enterprise Server because that's what the ISVs support (Oracle, DB2, Sybase, WebLogic etc.). RHAS and SLES are primarity targeted at the commercial computing space. In the HPC space, there is a void in the sense that Red Hat doesn't have a "community" distribution for IA-64 anymore (7.2 was the last). Don't know whether SuSE make their bits readily available. There are, however, several free alternatives: - Debian, for instance, is available for all HP hardware (as it is the internal software development vehicle at HP). - MSC Linux is also available for download (www.msclinux.com). - Rocks (www.rocksclusters.org) is a stripped and shipped Red Hat Advanced Server 2.1 for IA-64. So it's perfectly reasonable to use any of the above - as long as you don't require technical support (something WOS could provide, though). The strip and ship game works for now. However, given the increasing customization and branding done by Red Hat in later releases (8 and 9, also in RHAS 3) it is probably not going to be feasible to keep doing this going forward. Red Hat's brand is very strong and consequently it's all over the place in their products now. So I guesstimate that debranding is going to be at least an order of magnitude harder for RHAS 3. And just to clear up confusion. Here's the scoop with RHAS, availabity, support agreements, etc.: 1. Red Hat has decided *not* to make binaries/ISO images of RHAS available for download. Given that the distribution is covered by the GPL, *nothing* prevents somebody else from making it available. It is out there on the net if you look hard enough. 2. Again, being covered by the GPL, nothing prevents you from distributing it in unmodified form. It's perfectly legal to burn CDs and give them to customers. 3. If you modify the product in any way you invalidate the branding on RHAS as a whole, and you can no longer call the result RHAS without infringing Red Hat's trademarks. 4. If you buy RHAS from Red Hat you have to sign a service level agreement. This agreement is not restricting distribution of the RHAS binaries or source. It is a service level agreement between you and Red Hat (which you unfortunately have to sign to get access to the product in the first place). 5. One of the clauses in the SLA states that you agree to pay a support fee for each system you use RHAS on (and you grant RH the right to audit your network). If you choose not to comply with this clause, Red Hat will declare the service agreement null and void and you will no longer have access to patches and security fixes. 6. Given that the update packages are covered by the GPL, *nothing* prevents a receiver of said packages to make them available for download on the Internet. Red Hat can do *nothing* to prevent further distribution. IOW, nothing prevents you from buying one license and make the updates available to the rest of the world. Red Hat can, however, potentially decide not to provide you with future updates if you do this. This is a bit unclear in the SLA. Ok. So, executive summary: Red Hat are using a service customer level agreement to limit spreading of binary versions of RHAS. Given that RHAS is covered by the GPL, they cannot prevent distribution. Their only rebuttal will be refusal of further updates as per the SLA. But in the case of technical computing it isn't really that important whether the product is called RHAS, Rocks or HP Linux for HPC. They are all functionally identical. mkp, Resident Paralegal -- Martin K. Petersen Wild Open Source, Inc. mkp at wildopensource.com http://www.wildopensource.com/ BTW: http://www.msclinux.com/ has been shut down. -- Steve Gaudet Wild Open Source (home office) ---------------------- Bedford, NH 03110 pH:603-488-1599 cell:603-498-1600 http://www.wildopensource.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From asabigue at fing.edu.uy Fri Aug 29 04:06:38 2003 From: asabigue at fing.edu.uy (Ariel Sabiguero) Date: Fri, 29 Aug 2003 11:06:38 +0300 Subject: European Commission Patentability rules Message-ID: <3F4F098E.7080202@fing.edu.uy> Dear all: I have not seen comments on the list regarding to this subject. I know that this might be considered political and off-topic but I believe that most of our (beowulf) software technology is Open/Free and that the results of further regulations might affect our work. Sorry for the noise for those of you who already knew this. Regards Ariel On September 1st the European Commission is going to vote a revised version of the European Patentability rules. The proposed revision contains a set of serious challenges to Open Source development since regulation regarding software patents will be broadly extended and might forbid independent development of innovative (Open Source and not) software-based solutions. The European Open Source community is very concerned about the upcoming new regulation and has organized a demo protest for August 27, asking Open Source supporting sites to change their home pages to let everyone know what is going on at the European Parliament. For further information please see http://swpat.ffii.org and http://petition.eurolinux.org. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Aug 29 10:17:09 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 29 Aug 2003 10:17:09 -0400 (EDT) Subject: 32bit slots and riser cards In-Reply-To: Message-ID: On Fri, 29 Aug 2003, Bogdan Costescu wrote: > One other thing that turned me off was that in a system composed of only > Intel components (SE7501WV2 mainboard and SR2300 case) the 64bit PCI slots > on the mainboard allow inserting of the Myrinet card (but didn't try to > see if it works), while the riser cards that came with the case do not, > allowing only 3.3V ones - so the riser imposes additional limitations... One last possible solution you can consider if you're using 2U cases and don't mind ugly is that MANY of the cards you might want to add nowadays are half-height cards on full height backplates. Usually the backplate is held on by two little screws. The half height cards will snap into a regular PCI slot normally (vertically) and still permit the case to close with no riser at all. The two negatives are there there are no "half height riser backplates" that I know of, so the back of each chassis will be open to the air, which may or may not screw around with cooling airflow in negative ways, and the fact that you can't "screw the cards down". Both of these can be solved (or ignored) with a teeny bit of effort, although you'll probably prefer to just get a riser that meets your needs -- there are risers with a key that fits in the AGP slot, risers with 32 bit keys, risers with 64 bit keys.. shop around. Be aware that some of the risers you can buy don't work properly (why I can't say, given that they appear to be little more than bus extenders with keys to grab power and timing/address lines). At a guess this won't help you with an old myrinet card as it is probably full height, but if you get desperate and it's not, you could likely make this work. rgb > > -- > Bogdan Costescu > > IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen > Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY > Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 > E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Aug 29 11:10:43 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri, 29 Aug 2003 17:10:43 +0200 (CEST) Subject: 32bit slots and riser cards In-Reply-To: Message-ID: On Fri, 29 Aug 2003, Robert G. Brown wrote: > is that MANY of the cards you might want to add nowadays are half-height > cards on full height backplates. Nice try :-) It's a full-height card. And buying a taller case for each node with these Myrinet cards to allow vertical mounting would make me start looking for an 100U rack :-) -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lehi.gracia at amd.com Fri Aug 29 10:48:55 2003 From: lehi.gracia at amd.com (lehi.gracia at amd.com) Date: Fri, 29 Aug 2003 09:48:55 -0500 Subject: Intel acquiring Pallas Message-ID: <99F2150714F93F448942F9A9F112634C07BE62F3@txexmtae.amd.com> >6. Given that the update packages are covered by the GPL, *nothing* > prevents a receiver of said packages to make them available for > download on the Internet. Red Hat can do *nothing* to prevent > further distribution. IOW, nothing prevents you from buying one > license and make the updates available to the rest of the world. > > Red Hat can, however, potentially decide not to provide you with > future updates if you do this. This is a bit unclear in the SLA. Correct me if I'm wrong, I though part of the GPL was that you have to give the source code to anyone that asks for it, is it not? Per section 2b. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. http://www.gnu.org/copyleft/gpl.html?cid=6 They still keep patches on their web site do they not? Lehi Gracia -----Original Message----- From: Stephen Gaudet [mailto:sgaudet at wildopensource.com] Sent: Friday, August 29, 2003 8:53 AM To: Robert G. Brown Cc: Rocky McGaugh; Jeffrey B. Layton; beowulf at beowulf.org Subject: Re: Intel acquiring Pallas Robert, and everyone else, To be clear on this without breaking NDA's see below; > On Thu, 28 Aug 2003, Stephen Gaudet wrote: > > >>With Itanium2 this is not the case. Both Red Hat and SuSe have a >>fixed >>cost of about $400.00, plus or minus a few dollars per system. >>Therefore, due to this fixed cost, MOST people looking at a cluster >>won't touch Itanium2. > > > Steve, > > Are you suggesting RH has put together a package that is NOT GPL in > any way that would significantly affect the 64 bit market? The > kernel, the compiler, and damn near every package is GPL, much of it > from Gnu itself. Am I crazy here? > > So I'm having a hard time seeing why one would HAVE to pay them > $400/system for anything except perhaps proprietary non-GPL "advanced > server" packages that almost certainly wouldn't be important to HPC > cluster builders (and which they would have had to damn near develop > in a sealed room to avoid incorporating GPL stuff in it anywhere). > > >>Some white box resellers are looking at taking RH Advanced Server and >>stripping it down and offering on their ia64 clusters. However, if >>their not working with code lawyers, and paying very close attention to >>copy right laws, they could end up with law suits down the road. I can't really comment here on what I hear resellers looking to do. > If Red Hat isn't careful and not working very carefully with code > lawyers, I think the reverse is a lot more likely, as Richard Stallman > is known to take the Gnu Public License (free as in air at the source > level, with inheritance) pretty seriously. Red Hat (or SuSE) doesn't > "own" a hell of a lot of code in what they sell; the bulk of what they > HAVE written is GPL derived and hence GPL by inheritance alone. The > Open Source community would stomp anything out of line with hobnailed > boots and club it until it stopped twitching... > > So although many a business may cheerfully pay $400/seat for advanced > server because it is a cost and support model they are comfortable > with, I don't see what there is to stop anyone from taking an advanced > server copy (which necessarily either comes with src rpm's or makes > them publically available somewhere), doing an rpmbuild on all the src > rpm's (as if anyone would care that you went through an independent > rebuild vs just used the distribution rpm's) and putting it on 1000 > systems, or giving the sources to a friend, or even reselling a > repackaging of the whole thing (as long as they don't call them Red > Hat and as long as they omit any really proprietary non-GPL work). > > I even thought there were some people on the list who were using at > least some 64 bit stuff already, both for AMD's and Intels. Maybe I'm > wrong...:-( > In regards to the high-performance/technical computing space. People buy Red Hat Advanced Server and SuSE Linux Enterprise Server because that's what the ISVs support (Oracle, DB2, Sybase, WebLogic etc.). RHAS and SLES are primarity targeted at the commercial computing space. In the HPC space, there is a void in the sense that Red Hat doesn't have a "community" distribution for IA-64 anymore (7.2 was the last). Don't know whether SuSE make their bits readily available. There are, however, several free alternatives: - Debian, for instance, is available for all HP hardware (as it is the internal software development vehicle at HP). - MSC Linux is also available for download (www.msclinux.com). - Rocks (www.rocksclusters.org) is a stripped and shipped Red Hat Advanced Server 2.1 for IA-64. So it's perfectly reasonable to use any of the above - as long as you don't require technical support (something WOS could provide, though). The strip and ship game works for now. However, given the increasing customization and branding done by Red Hat in later releases (8 and 9, also in RHAS 3) it is probably not going to be feasible to keep doing this going forward. Red Hat's brand is very strong and consequently it's all over the place in their products now. So I guesstimate that debranding is going to be at least an order of magnitude harder for RHAS 3. And just to clear up confusion. Here's the scoop with RHAS, availabity, support agreements, etc.: 1. Red Hat has decided *not* to make binaries/ISO images of RHAS available for download. Given that the distribution is covered by the GPL, *nothing* prevents somebody else from making it available. It is out there on the net if you look hard enough. 2. Again, being covered by the GPL, nothing prevents you from distributing it in unmodified form. It's perfectly legal to burn CDs and give them to customers. 3. If you modify the product in any way you invalidate the branding on RHAS as a whole, and you can no longer call the result RHAS without infringing Red Hat's trademarks. 4. If you buy RHAS from Red Hat you have to sign a service level agreement. This agreement is not restricting distribution of the RHAS binaries or source. It is a service level agreement between you and Red Hat (which you unfortunately have to sign to get access to the product in the first place). 5. One of the clauses in the SLA states that you agree to pay a support fee for each system you use RHAS on (and you grant RH the right to audit your network). If you choose not to comply with this clause, Red Hat will declare the service agreement null and void and you will no longer have access to patches and security fixes. 6. Given that the update packages are covered by the GPL, *nothing* prevents a receiver of said packages to make them available for download on the Internet. Red Hat can do *nothing* to prevent further distribution. IOW, nothing prevents you from buying one license and make the updates available to the rest of the world. Red Hat can, however, potentially decide not to provide you with future updates if you do this. This is a bit unclear in the SLA. Ok. So, executive summary: Red Hat are using a service customer level agreement to limit spreading of binary versions of RHAS. Given that RHAS is covered by the GPL, they cannot prevent distribution. Their only rebuttal will be refusal of further updates as per the SLA. But in the case of technical computing it isn't really that important whether the product is called RHAS, Rocks or HP Linux for HPC. They are all functionally identical. mkp, Resident Paralegal -- Martin K. Petersen Wild Open Source, Inc. mkp at wildopensource.com http://www.wildopensource.com/ BTW: http://www.msclinux.com/ has been shut down. -- Steve Gaudet Wild Open Source (home office) ---------------------- Bedford, NH 03110 pH:603-488-1599 cell:603-498-1600 http://www.wildopensource.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Aug 29 11:17:04 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 29 Aug 2003 11:17:04 -0400 Subject: Intel acquiring Pallas In-Reply-To: References: Message-ID: <1062170224.9421.4.camel@roughneck> On Thu, 2003-08-28 at 19:46, Glen Otero wrote: > Joel- > > Have you actually built RH AS from scratch using their SRPMS? Or do > you know anyone that has? I'm very interested in doing this but I heard > there were some pretty significant obstacles along the lines of package > dependencies. > The links to the rhel-rebuild howto and mailing list are enought to get this done -- I just did 2.1 ES ( why bother with spending more for AS ? ). We purchased one copy of ES, and I used that to do the rebuild. Of course, it is not completely automatic, but there are only a handfull of packages that do not build without a bit of tweaking. As far as pkg dependencies go, it is _much_ easier to build on a similar system. Now for the $10K question -- are there any reasons that I ( or someone else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It of course still has the RH branding all over it, but it could be distributed being called 'Nics Fun RH clone', or something similar. Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From robert at yu.org Fri Aug 29 12:09:46 2003 From: robert at yu.org (Robert K. Yu) Date: Fri, 29 Aug 2003 09:09:46 -0700 (PDT) Subject: beowulf to a good home Message-ID: <20030829160946.6897.qmail@web40904.mail.yahoo.com> Hi, I have the following: 16 machines 450 MHz dual Celeron each (i.e. 32 CPU) 128M memory each 100BaseT switch 6G drive each I would like to donate these machines and see them put to good use. Pick up from the San Francisco south bay area, or you pay for shipment. Thanks. -Robert ===== Robert K. Yu mailto:robert at yu.org _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Aug 29 12:19:31 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 29 Aug 2003 12:19:31 -0400 (EDT) Subject: Intel acquiring Pallas In-Reply-To: <99F2150714F93F448942F9A9F112634C07BE62F3@txexmtae.amd.com> Message-ID: On Fri, 29 Aug 2003 lehi.gracia at amd.com wrote: Red Hat can do *nothing* to prevent > > further distribution. IOW, nothing prevents you from buying one > > license and make the updates available to the rest of the world. > > > > Red Hat can, however, potentially decide not to provide you with > > future updates if you do this. This is a bit unclear in the SLA. > > Correct me if I'm wrong, I though part of the GPL was that you have to > give the source code to anyone that asks for it, is it not? Per section > 2b. No, section 2b states that you must propage the license, not make the source code available to any third party. Section 3 covers distribution and redistribution. You don't have to make the source code available to an arbitrary third party, just those with the offers in 3b or 3c. For distributions Red Hat ship with the source code, they have no further obligations. > >6. Given that the update packages are covered by the GPL, *nothing* > > prevents a receiver of said packages to make them available for > > download on the Internet. For most individual packages, correct. And the following discussion covers individual packages, not the distribution as a whole. If the package contains a trademarked logo embedded with GPL code they Should grant the right to use a package unmodified, including the logo (The GPL doesn't explicitly cover the case of logos, but a reasonable reading is that if Red Hat itself packages up the logo you have the right of unmodified distribution.) May require you to remove the logo with any modificatation The entire distribution is another issue. It may be protected by copyright on the collection. The may restrict distribution of packages consisting of Red Hat branding and logos, which means some level of content reassembly is necessary to distribute. Red Hat may also insist that you not misrepresent a copy as a Red Hat product. This is an area where it's difficult to generalize. They may require removing packages/elements consisting of just logos or Red Hat documentation. And third parties can use the trademark name where it's descriptive, but not misleading. Consider the difference between "Chevrolet Service Station" and "Service Station for Chevrolets" [[ Native English speakers immediately understand the difference, and think of this rule as just part of the language. But you will not find this legally-inculcated distinction as a part of the grammer. ]] -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Fri Aug 29 13:16:06 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Fri, 29 Aug 2003 10:16:06 -0700 (PDT) Subject: IBM releases C/C++/F90 compilers - optimized for G5 Message-ID: <20030829171606.62071.qmail@web11408.mail.yahoo.com> Free download: http://www-3.ibm.com/software/awdtools/ccompilers/ http://www-3.ibm.com/software/awdtools/fortran/ Rayson __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Aug 29 14:52:19 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Fri, 29 Aug 2003 11:52:19 -0700 (PDT) Subject: Intel acquiring Pallas In-Reply-To: <5E9EB8D6-DA4C-11D7-A174-000393911A90@linuxprophet.com> Message-ID: On Fri, 29 Aug 2003, Glen Otero wrote: > > You can redistribute it as long as it doesn't have RH all over it and > you don't use the RH name while endorsing/promoting it. I suppose you > could say it's RH compliant and built from RH srpms. The loop hole that > RH is taking advantage of is the fact that they are compliant with the > GPL as long as they release the sources. They comply with the GPL by > releasing the sources in srpm format, and so technically do not have to > make the isos freely available. By making it slightly difficult to > build your own distro, and not offering support to those who do, RH is > coaxing people to take the path of least resistance (wrt effort) and > buy licenses. I wouldn't really consider it a loophole, it's compatible with the spirit of the gpl. it's not as convenient as some people might like... but the sources are all there and they build and work. > Glen > > > > Nic > > -- > > Nicholas Henke > > Penguin Herder & Linux Cluster System Programmer > > Liniac Project - Univ. of Pennsylvania > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gotero at linuxprophet.com Fri Aug 29 14:12:37 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Fri, 29 Aug 2003 11:12:37 -0700 Subject: Intel acquiring Pallas In-Reply-To: <1062170224.9421.4.camel@roughneck> Message-ID: <5E9EB8D6-DA4C-11D7-A174-000393911A90@linuxprophet.com> On Friday, August 29, 2003, at 08:17 AM, Nicholas Henke wrote: > On Thu, 2003-08-28 at 19:46, Glen Otero wrote: >> Joel- >> >> Have you actually built RH AS from scratch using their SRPMS? Or do >> you know anyone that has? I'm very interested in doing this but I >> heard >> there were some pretty significant obstacles along the lines of >> package >> dependencies. >> > > The links to the rhel-rebuild howto and mailing list are enought to get > this done -- I just did 2.1 ES ( why bother with spending more for AS ? > ). We purchased one copy of ES, and I used that to do the rebuild. Of > course, it is not completely automatic, but there are only a handfull > of > packages that do not build without a bit of tweaking. > > As far as pkg dependencies go, it is _much_ easier to build on a > similar > system. > > Now for the $10K question -- are there any reasons that I ( or someone > else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It > of course still has the RH branding all over it, but it could be > distributed being called 'Nics Fun RH clone', or something similar. You can redistribute it as long as it doesn't have RH all over it and you don't use the RH name while endorsing/promoting it. I suppose you could say it's RH compliant and built from RH srpms. The loop hole that RH is taking advantage of is the fact that they are compliant with the GPL as long as they release the sources. They comply with the GPL by releasing the sources in srpm format, and so technically do not have to make the isos freely available. By making it slightly difficult to build your own distro, and not offering support to those who do, RH is coaxing people to take the path of least resistance (wrt effort) and buy licenses. Glen > > Nic > -- > Nicholas Henke > Penguin Herder & Linux Cluster System Programmer > Liniac Project - Univ. of Pennsylvania > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > Glen Otero, Ph.D. Linux Prophet 619.917.1772 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Fri Aug 29 19:01:03 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Fri, 29 Aug 2003 16:01:03 -0700 (PDT) Subject: IBM releases C/C++/F90 compilers - optimized for Apple G5 In-Reply-To: <99F2150714F93F448942F9A9F112634C07BE62F5@txexmtae.amd.com> Message-ID: <20030829230103.91270.qmail@web11407.mail.yahoo.com> (Sorry, didn't made it clear in my last email...) The compilers are for MacOSX. Rayson > Which one do we use for Linux, will the AIX one work? > > > Free download: > > > > http://www-3.ibm.com/software/awdtools/ccompilers/ > > http://www-3.ibm.com/software/awdtools/fortran/ > > __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rouds at servihoo.com Sat Aug 30 12:48:36 2003 From: rouds at servihoo.com (RoUdY) Date: Sat, 30 Aug 2003 20:48:36 +0400 Subject: .rhosts or /etc/hosts.equiv In-Reply-To: <200308291902.h7TJ2Ow14727@NewBlue.Scyld.com> Message-ID: hi If i don't find these to file should i create it? i know that .rhosts is hidden but when I do ls -a i cannot find it even if i use the command locate therefore if i create it what permission should i give them thanks roudy -------------------------------------------------- Get your free email address from Servihoo.com! http://www.servihoo.com The Portal of Mauritius _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Sat Aug 30 13:53:56 2003 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Sat, 30 Aug 2003 12:53:56 -0500 Subject: .rhosts or /etc/hosts.equiv In-Reply-To: ; from rouds@servihoo.com on Sat, Aug 30, 2003 at 08:48:36PM +0400 References: <200308291902.h7TJ2Ow14727@NewBlue.Scyld.com> Message-ID: <20030830125356.C3206@mikee.ath.cx> On Sat, 30 Aug 2003, RoUdY wrote: > hi > If i don't find these to file should i create it? > i know that .rhosts is hidden but when I do ls -a > i cannot find it even if i use the command locate > therefore if i create it what permission should i give > them > thanks > roudy the file ~/.rhosts should have permissions of 600 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Sun Aug 31 19:45:52 2003 From: csamuel at vpac.org (Chris Samuel) Date: Mon, 1 Sep 2003 09:45:52 +1000 Subject: Trademark caveats about building RHAS from SRPMS (was Re: Intel acquiring Pallas) In-Reply-To: <1062170224.9421.4.camel@roughneck> References: <1062170224.9421.4.camel@roughneck> Message-ID: <200309010945.53871.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 30 Aug 2003 01:17 am, Nicholas Henke wrote: > Now for the $10K question -- are there any reasons that I ( or someone > else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It > of course still has the RH branding all over it, but it could be > distributed being called 'Nics Fun RH clone', or something similar. Redhat have a set of rules of what you can and cannot do. Basically whilst they comply with the GPL they do restrict what you can do with their trademarks (i.e. things like Redhat and the ShadowMan logo). Two of the major things are: http://www.redhat.com/about/corporate/trademark/guidelines/page6.html C. You may not state that your product "contains Red Hat Linux X.X." This would amount to impermissible use of Red Hat's trademarks. [...] D. You must modify the files identified as REDHAT-LOGOS and ANACONDA-IMAGES so as to remove all use of images containing the "Red Hat" trademark or Red Hat's Shadow Man logo. Note that mere deletion of these files may corrupt the software. So if you want to build and redistribute from their SRPMS you will need to do extra work to make them happy. Note that RMS thinks that this use of trademark in relation to the GPL is legitimate, in an interview quoted on the "Open For Business" website he says (in regards to Mandrake): http://www.ofb.biz/modules.php?name=News&file=article&sid=260 [quote] TRB: Another interesting current issue is the concept of what might be seen as "hybrid licensing." For example, MandrakeSoft's Multi-Network Firewall is based on entirely Free Software, however the Mandrake branding itself is placed under a more restrictive license (you can't redistribute it for a fee). This give the user or consultant two choices -- use the software under the more restrictive licensing or remove the Mandrake artwork. What are your thoughts on this type or approach? RMS: I think it is legitimate. Freedom to redistribute and change software is a human right that must be protected, but the commercial use of a logo is a very different matter. Provided that removing the logo from the software is easy to do in practice, the requirement to pay for use of the logo does not stain the free status of the software itself. [/quote] - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/UoiwO2KABBYQAh8RAji6AJ4smNhqZ/my4k8i787Uaqs+n4rfsACcC4yS BLtsLZDIzG8Hm0KEACBOZyo= =A0dE -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf