From Andrew.Cannon at nnc.co.uk Wed Oct 1 04:12:32 2003 From: Andrew.Cannon at nnc.co.uk (Cannon, Andrew) Date: Wed, 1 Oct 2003 09:12:32 +0100 Subject: RH8 vs RH9 Message-ID: Hi All, We have a small test cluster running RH8 which seems to work well. We are going to expand this cluster and I was wondering what, if any, are the advantages of installing the cluster using RH9 instead of RH8? Are there any disadvantages? Thanks Andrew Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, Cheshire, WA16 8QZ. Telephone; +44 (0) 1565 843768 email: mailto:andrew.cannon at nnc.co.uk NNC website: http://www.nnc.co.uk NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856). The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ. This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leopold.palomo at upc.es Wed Oct 1 04:21:59 2003 From: leopold.palomo at upc.es (Leopold Palomo Avellaneda) Date: Wed, 1 Oct 2003 10:21:59 +0200 Subject: Environment monitoring In-Reply-To: References: Message-ID: <200310011001.31106.lepalom@vilma.upc.es> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > Dont overlook lm_sensors+cron > Why? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leopold.palomo at upc.es Wed Oct 1 04:24:21 2003 From: leopold.palomo at upc.es (Leopold Palomo Avellaneda) Date: Wed, 1 Oct 2003 10:24:21 +0200 Subject: Environment monitoring In-Reply-To: References: Message-ID: <200310011001.31106.lepalom@vilma.upc.es> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > Dont overlook lm_sensors+cron > Why? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 1 08:24:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 1 Oct 2003 08:24:13 -0400 (EDT) Subject: RH8 vs RH9 In-Reply-To: Message-ID: On Wed, 1 Oct 2003, Cannon, Andrew wrote: > Hi All, > > We have a small test cluster running RH8 which seems to work well. We are > going to expand this cluster and I was wondering what, if any, are the > advantages of installing the cluster using RH9 instead of RH8? Are there any > disadvantages? Many humans wonder about that, given the very short time that RH8 was around before RH9 came out. The usual rule is that major number upgrades are associated with changes in core libraries that break binary compatibility, so that binaries built for RH 8 are not guaranteed to work for RH 9. I think that the easiest way for you to determine precisely what changed is to look at e.g. ftp://ftp.dulug.duke.edu/pub/redhat/linux/9/en/os/i386/RELEASE-NOTES and see if anything in there is important to your work. Beyond that, there are a few issues to consider: a) 8 will, probably fairly soon, be no longer maintained. 9 will be, at least for a while (possibly for one more year). Of course the maintenance issue right now is very cloudy for RH in general with the Fedora/RHEL situation a work in progress. However, maintenance alone is (in my opinion) a good reason to be using 9 and to move from 8 to 9 to achieve it. Fedora will likely be strongly derived from 9 and the current rawhide in any event. How the "community based" RH release will end up being maintained is the interesting question. One possibility is "as rapidly as RHEL plus a few days", the difference being the time required to download the GPL-required logo-free source rpm(s) after an update and rebuild them and insert them into the community version. Or of course you can spring for a RHEL license (set) for your cluster, which may or may not be reasonable in cost or scale well per node by the time all the University-price dickering is done. b) 9 had some fairly significant library upgrades, service upgrades, and bug fixes. That doesn't mean 8 is "bad" -- it just means that your chances of encountering trouble with 9 are in principle smaller than with 8, and one hopes that the upgrades added a bit to performance as well. c) A lot of the enhancements in 9 were more useful or relevant to userspace and LAN client operation (CUPS or Open Office, for example) than they were to cluster nodes. So in that sense perhaps it doesn't matter as much. We're using 9 on a bunch of hosts and nodes with happiness. We're also using 7.3 (still) on a bunch of hosts and nodes with happiness. We skipped 8 only because they released 9 before we finished creating a stable/tested 8 repository as RH changed their release cycle and dropped the .0, .1 and so forth "correction" releases. I don't know that we'll ever use RHEL with happiness unless RH charges something like $1 per system as their university price (which isn't insane, actually, given that an entire university can install and maintain, as Duke does, off of a single campus-local repository largely run by and debugged by and maintained by campus administrators, so RH's costs don't scale at all strongly with the number of internal campus RH systems). Fedora, quite possibly, but as noted we are fearful, uncertain, and doubtful at the moment, for once because of real issues and not just as a sort of Microsoft joke... rgb > > Thanks > > Andrew > > Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, > Cheshire, WA16 8QZ. > > Telephone; +44 (0) 1565 843768 > email: mailto:andrew.cannon at nnc.co.uk > NNC website: http://www.nnc.co.uk > > > > NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856). The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ. > > This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lepalom at vilma.upc.es Wed Oct 1 04:01:30 2003 From: lepalom at vilma.upc.es (Leopold Palomo Avellaneda) Date: Wed, 1 Oct 2003 10:01:30 +0200 Subject: Environment monitoring In-Reply-To: References: Message-ID: <200310011001.31106.lepalom@vilma.upc.es> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > Dont overlook lm_sensors+cron > Why? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From thornton at yoyoweb.com Wed Oct 1 08:34:40 2003 From: thornton at yoyoweb.com (Thornton Prime) Date: Wed, 01 Oct 2003 05:34:40 -0700 Subject: RH8 vs RH9 In-Reply-To: References: Message-ID: <1065011679.1923.16.camel@localhost.localdomain> > We have a small test cluster running RH8 which seems to work well. We are > going to expand this cluster and I was wondering what, if any, are the > advantages of installing the cluster using RH9 instead of RH8? Are there any > disadvantages? You should check out the release notes. On the whole, I'd say there isn't much advantage unless you can take advantage of NTPL. Most of the other enhancements were primarily for desktop users. The next release should be 2.6-kernel ready, so rather than 9 you may consider experimenting with Severn or Taroon. Taroon has much better support for 64-bit platforms, if you are headed there. thornton _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 1 08:37:44 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 1 Oct 2003 08:37:44 -0400 (EDT) Subject: Environment monitoring In-Reply-To: <200310011001.31106.lepalom@vilma.upc.es> Message-ID: On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote: > A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > > Dont overlook lm_sensors+cron > > > Why? On a system equipped with an internal sensor, lm_sensors can often read e.g. core CPU temperature on the system itself. A polling cron script can then read this and take action, e.g. initiate a shutdown if it exceeds some threshold. There are good and bad things about this. A good thing is it addreses the real problem -- overheating in the system itself -- and not room temperature. CPU's can overheat because of a fan failure when the room remains cold, and a sensors-driven poweroff can then save your hardware on a node by node basis. The bad thing is that it does NOT give you any sort of measure of room temperature per se, although if you have the poweroff script send you mail first, getting deluged with N messages as the entire cluster shuts down would be a good clue that your room cooling failed:-). Also, lm_sensors has the API from hell. In fact, I would hardly call it an API. One has to pretty much craft a polling script on the basis of each supported sensor independently, which requires you to know WAY more than you ever wanted to about the particular sensor your system may or may not have. Alas, if only somebody would give the lm_sensors folks a copy of a good book on XML for christmas, and they decided to take the monumental step of converting /proc/sensors into a single xml-based file with the RELEVANT information presented in toplevel tags like 50.4 and the irrelevant information presented in tags like lm781.22a then we could ALL reap the fruits of their labor without needing a copy of the lm78 version 1.22a API manual and having to write an application that supports each of the sensors THROUGH THEIR INTERFACE one at a time...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Wed Oct 1 09:46:25 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Wed, 1 Oct 2003 08:46:25 -0500 (CDT) Subject: Environment monitoring In-Reply-To: Message-ID: On Wed, 1 Oct 2003, Robert G. Brown wrote: > Alas, if only somebody would give the lm_sensors folks a copy of a good > book on XML for christmas, and they decided to take the monumental step > of converting /proc/sensors into a single xml-based file with the > RELEVANT information presented in toplevel tags like > > 50.4 > > and the irrelevant information presented in tags like > > lm781.22a > > then we could ALL reap the fruits of their labor without needing a copy > of the lm78 version 1.22a API manual and having to write an application > that supports each of the sensors THROUGH THEIR INTERFACE one at a > time...;-) We have that. lm_sensors+cron+gmond. Nice little XML stream on every node with every other nodes temps. One can keep a range of tolerance for cpu0, cpu1, motherboard, and disk temps and shutdown whenever you need to. a netbotz would be cooler though. i'd still use the lm_sensors+cron+gmond and still have the netbotz as a toy..:) -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lepalom at upc.es Wed Oct 1 10:13:46 2003 From: lepalom at upc.es (Leopold Palomo) Date: Wed, 1 Oct 2003 16:13:46 +0200 Subject: Environment monitoring In-Reply-To: References: Message-ID: <200310011613.46297.lepalom@upc.es> A Dimecres 01 Octubre 2003 14:37, Robert G. Brown va escriure: > On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote: > > A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > > > Dont overlook lm_sensors+cron > > > > Why? > > On a system equipped with an internal sensor, lm_sensors can often read > e.g. core CPU temperature on the system itself. A polling cron script > can then read this and take action, e.g. initiate a shutdown if it > exceeds some threshold. > > There are good and bad things about this. A good thing is it addreses > the real problem -- overheating in the system itself -- and not room > temperature. CPU's can overheat because of a fan failure when the room > remains cold, and a sensors-driven poweroff can then save your hardware > on a node by node basis. > > The bad thing is that it does NOT give you any sort of measure of room > temperature per se, although if you have the poweroff script send you > mail first, getting deluged with N messages as the entire cluster shuts > down would be a good clue that your room cooling failed:-). Also, > lm_sensors has the API from hell. In fact, I would hardly call it an > API. One has to pretty much craft a polling script on the basis of each > supported sensor independently, which requires you to know WAY more than > you ever wanted to about the particular sensor your system may or may > not have. > > Alas, if only somebody would give the lm_sensors folks a copy of a good > book on XML for christmas, and they decided to take the monumental step > of converting /proc/sensors into a single xml-based file with the > RELEVANT information presented in toplevel tags like > > 50.4 > > and the irrelevant information presented in tags like > > lm781.22a > > then we could ALL reap the fruits of their labor without needing a copy > of the lm78 version 1.22a API manual and having to write an application > that supports each of the sensors THROUGH THEIR INTERFACE one at a > time...;-) Ok. I was a bit surprise about your sentence. I know that lmsensors is not perfect, but it does their job. Ok, I don't think that use lm_sensors to try to calculate the T of the room is a bit excesive. About the xml,... well, ok, it would be a nice feature, but as plain text, knowing your hardware it's so good, too. Best Regards. Pd How about the pdf, ps, etc? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 1 10:33:29 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 1 Oct 2003 10:33:29 -0400 (EDT) Subject: Environment monitoring In-Reply-To: <200310011613.46297.lepalom@upc.es> Message-ID: On Wed, 1 Oct 2003, Leopold Palomo wrote: > Ok. I was a bit surprise about your sentence. I know that lmsensors is not > perfect, but it does their job. Ok, I don't think that use lm_sensors to try > to calculate the T of the room is a bit excesive. > > About the xml,... well, ok, it would be a nice feature, but as plain text, > knowing your hardware it's so good, too. > Sorry, I tend to get distracted and rant from time to time (even though as Greg noted, sometimes the rants are of lesser quality:-). In this particular case the rant is really directed to all of /proc, but the sensors interface is the worst example of the lot. I'm "entitled" to rant because I've written two tools (procstatd and xmlsysd) that parse all sorts of data, including sensors data in procstatd, out and provide it to clients for monitoring purposes. Even my daemon wasn't the first to do this, but I think it was one of the first two that functioned as a binary without running a shell script or the like on each node. procstatd actually predated ganglia by a fair bit, FWIW. On the basis of this fairly extensive experience I can say that lmsensors output is very poorly organized from the perspective of somebody trying to write a general purpose parser to extract the data it provides. In particular, it uses a directory tree structure where the PARTICULAR sensors interface that you have appears as part of the path, and where what you find underneath that path depends on the particular sensor that you've got as well. Hopefully it is obvious how Evil this makes it from the point of view of somebody trying to write a general purpose tool to parse it. Basically, to write such a tool one has to go through the lmsensors sources and reverse engineer each interface it supports to determine what is produced and where, one at a time. This is more than slightly nuts. What do "most" sensors provide? Fields like cpu temperature (for cpu's 0-N), fan speed (for fans 0-N), core voltage (for lines 0-N). Sure, some provide more, some provide less, but what are we discussing? The monitoring of cpu temperature, under the reasonable assumption that either we have a sensor that provides it or we don't, and that we really don't give a rodent's furry touchis WHICH sensor we have as long as it gives us "CPU Temperature", preferrably for every CPU. So a good API is one that has a single file entitled /proc/sensors, and in that file one finds things like: 54.2 51.7 lm78 ... ... I can write code to parse this in a few minutes of work, literally, and the same code will work for all interfaces that lm_sensors might support, and I don't need to know the interface the system has in it beforehand (although with the knowledge I might add some advanced features if it supports them). Presenting the knowledge is also trivial -- a web interface might be as sparse as a reader/parser and/or a DTD. Compare to parsing something like (IIRC) /proc/sensors/device-with-a-bunch-of-numbers/subunit/field where the path that you find under specific devices-with-numbers depends on the toplevel value on a device by device basis and the contents of field can as well. Yech. And Rocky, hiding the problem with gmond is fine, but then it puts the burden for writing an API for the API on the poor people that have to support the gmond interface. Yes they can (and I could) do this. I personally refuse. They obviously have gritted their teeth and done so. The correct solution is clearly to redo the lm_sensors interface itself so that it is organized as the above indicates. Which criticism, by the way, applies to a LOT of /proc, which currently looks like it was organized by a bunch of wild individualists who have handled every emergent subfield by overloading its data in a single "field" line, usually with documentation only in the form of reading procps or kernel source. Just because this is actually true doesn't excuse it. Parsing the contents of /proc is maddening for just this reason, and the cost is a lot of needless complexity, pointless bugs and upgrade incompatibilities for many people. Putting the data into xml-wrapped form would be a valuable exercise in the discipline of structuring data, for the most part. rgb > Best Regards. > > Pd How about the pdf, ps, etc? I'll try to work on this as soon as I can. My task list for the day looks something like a) debug/fix some dead nodes; b) add a requested feature/view to wulfstat (that has been on hold for a week or more:-(, c) work on a bunch of documents associated with teaching and curriculum at Duke (sigh); d) about eight more tasks, none of which I will likely get to, including work on my research. However, this is about the third or fourth time people have requested a "fix" for the ps/pdf/font issue (with acroread it can even fail altogether to read the document -- presumably some gs/acrobat incompatibility where I use gs-derived tools) so I'll try very hard to craft some sort of fix by the weekend. -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Oct 1 12:36:26 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 1 Oct 2003 12:36:26 -0400 (EDT) Subject: Environment monitoring In-Reply-To: Message-ID: On Wed, 1 Oct 2003, Rocky McGaugh wrote: > On Wed, 1 Oct 2003, Robert G. Brown wrote: > > Alas, if only somebody would give the lm_sensors folks a copy of a good > > book on XML for christmas, and they decided to take the monumental step ... > > then we could ALL reap the fruits of their labor without needing a copy > > of the lm78 version 1.22a API manual and having to write an application > > that supports each of the sensors THROUGH THEIR INTERFACE one at a > > time...;-) > > We have that. lm_sensors+cron+gmond. I think you missed RGB's point. The lm_sensors implementation sucks. Sure, any one specific implementation can be justified. But having each implementation use a different output and calibration shows that this is not an architecture, just a collection of hacks. The usual reply at this point is "just update the user-level script for the new motherboard type". Yup... and you should probably update the constants in your programs' delay loops at the same time. With lm_sensors you can get a one-off hack working, but cannot implement a general case. Compare this to IPMI, which presents the same information. IPMI has a crufty design and ugly implementations, but it is an architected system. With care you can implement and deploy code that works on a broad range of current and future machine. While I'm on the soapbox, gmond deserves its own mini-butane-torch flame. I implemented the translator from Beostat (our status/statistics subsystem) to gmond (per-machine information for Ganglia), so I have a pretty good side-by-side comparison. First, how did they choose what statistics to present? Apparently just because the numbers were there. What is the point of using a XML DTD if it is just used to package undefined data types? A wrapper around a wrapper... Example metric lines: Not only are these metric types not enumerated, they are made more confusing by abbreviations and no definition. To tie both together: What is "proc_total"? Number of processors? Number of processes? Does it count system daemons? It seems to be the useless number "ps x | wc", rather than the number of end user, application processes. Many statistics are only usable when used/presented as a set. Why split the numbers into multiple elements? It just multiplies the size and parsing load. ____ Background: Beostat is our status/statistics interface that we published 3+ years ago. It exports interfaces at multiple levels: network protocol, shared memory table only for very performance sensitive programs, such as schedulers dynamic library the preferred interface for programs command output Thus Beostat is a infrastructure subsystem, rather than a single-purpose stack of programs. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From johnb at quadrics.com Wed Oct 1 09:59:16 2003 From: johnb at quadrics.com (John Brookes) Date: Wed, 1 Oct 2003 14:59:16 +0100 Subject: Upper bound on no. of sockets Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com> I think there is a 1k per-process limit on open sockets. It's tuneable in 2.4 kernels, IIRC, but I don't remember how (off the top of my head). 'ulimit -n' adjusts the max number of fd's, but I'm not sure that'll take it past a/the kernel limit. Maybe recompile kernel? Maybe poke /proc/sys/.../...? Maybe adjust in userland? Maybe use fewer sockets ;-) Does anybody know the score? Cheers, John Brookes Quadrics > -----Original Message----- > From: Balaji Rangasamy [mailto:br66 at HPCL.CSE.MsState.Edu] > Sent: 30 September 2003 05:44 > To: beowulf at beowulf.org > Subject: Upper bound on no. of sockets > > > Hi, > Is there an upper bound on the number of sockets that can be > created by a > process? If there is one, is the limitation enforced by OS? > And what other > factors does it depend on? Can you please be specific on the > numbers for > different OS (RH Linux 7.2) ? > Thank you very much, > Balaji. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rokrau at yahoo.com Wed Oct 1 13:06:40 2003 From: rokrau at yahoo.com (Roland Krause) Date: Wed, 1 Oct 2003 10:06:40 -0700 (PDT) Subject: RH8 vs RH9 (Robert G. Brown) In-Reply-To: <200310011504.h91F4DY02889@NewBlue.Scyld.com> Message-ID: <20031001170640.91750.qmail@web40002.mail.yahoo.com> --- beowulf-request at scyld.com wrote: > 6. Re:RH8 vs RH9 (Robert G. Brown) > From: "Robert G. Brown" > Many humans wonder about that, given the very short time that RH8 was > around before RH9 came out. The usual rule is that major number > upgrades are associated with changes in core libraries that break > binary > compatibility, so that binaries built for RH 8 are not guaranteed to > work for RH 9. Indeed some of them wont, I have first hand experience that binaries produced with the Intel Fortran compiler on RH-8, even when statically linked, will not run on a RH-9 system. Further, if you need the Intel Fortan compiler, RH-9 is not really an option for you because it is not officially supported and it will not be either. Inofficially I can confirm that it works fine if you are not using the OpenMP capabilities of the compiler. > achieve it. Fedora will likely be strongly derived from 9 and the > current rawhide in any event. How the "community based" RH release > will > end up being maintained is the interesting question. One possibility > is > "as rapidly as RHEL plus a few days", the difference being the time > required to download the GPL-required logo-free source rpm(s) after > an > update and rebuild them and insert them into the community version. Having used fedora in the past on a desktop client I am hopeful that it will be possible to get all necessary packages for a cluster into an 'aptable' repository, be it hosted by fedora or somewhere else (think e.g. sourceforge). If people work together, as they have in the past, I dont see why RH would succeed pushing their rediculous price policies upon cluster users. Roland __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Wed Oct 1 18:10:14 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed, 1 Oct 2003 15:10:14 -0700 Subject: Environment monitoring In-Reply-To: References: Message-ID: <20031001221014.GA28394@sphere.math.ucdavis.edu> I'd recommend: http://www.maxim-ic.com/quick_view2.cfm/qv_pk/2820 For $25.00 you have a trivial to interface to temperature probe that even is smart enough to collect samples even if a machine is down (complete with time stamp). It will even build a histogram of temp samples for you. It's kinda cool that you can leave one in your luggage or send it up in a space probe and then get periodic samples when you arrive at your destination. In anyways people use them for all kinds of things, even in space: http://www.voiceofidaho.org/tvnsp/01atchrn.htm More info: http://www.ibutton.com/ibuttons/thermochron.html They can also be connected via USB, Parallel, and serial. The other cool feature is they are chainable, so we have one behind the machine (i.e. rack temp), one on top of the rack (room temp), and one at the airconditioner output all on one wire. Each button has a guarenteed unique 64 bit ID. Once you get a feel for the dynamics of the system it becomes really easy to spot anomalies. Recommended, the thermo buttons are cheaper, but IMO for most things the thermocron premium is worth it so you can have continuous sampling even if a machine crashes. The logs are very handy for fighting when facilities to combat the well it's not really getting that hot that often kinda thing. Oh, I guess I should mention I have no financial ties to any of the mentioned companies. So no I won't sell you one. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Wed Oct 1 18:36:35 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 01 Oct 2003 15:36:35 -0700 Subject: more on structural models for clusters Message-ID: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> In regards to my recent post looking for cluster implementations for structural dynamic models, I would like to add that I'm interested in "highly distributed" solutions where the computational load for each processor is very, very low, as opposed to fairly conventional (and widely available) schemes for replacing the Cray with a N-node cluster. The number of processors would be comparable to the number of structural nodes (to a first order of magnitude) Imagine you had something like a geodesic dome with a microprocessor at each vertex that wanted to compute the loads for that vertex, communicating only with the adjacent vertices... Trivial, egregiously simplified, and demo cases are just fine, and, in fact, probably preferable.... James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Wed Oct 1 19:19:26 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed, 1 Oct 2003 16:19:26 -0700 Subject: RH8 vs RH9 (Robert G. Brown) In-Reply-To: <20031001170640.91750.qmail@web40002.mail.yahoo.com> References: <200310011504.h91F4DY02889@NewBlue.Scyld.com> <20031001170640.91750.qmail@web40002.mail.yahoo.com> Message-ID: <20031001231926.GA2900@greglaptop.internal.keyresearch.com> On Wed, Oct 01, 2003 at 10:06:40AM -0700, Roland Krause wrote: > Inofficially I can confirm that it works fine if you are not using > the OpenMP capabilities of the compiler. Which is no surprise, as the thread library stuff changed fairly radically in RedHat 9. I have some sympathy for Intel's compiler guys on that issue. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 1 19:01:55 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 2 Oct 2003 09:01:55 +1000 Subject: RH8 vs RH9 In-Reply-To: References: Message-ID: <200310020901.57000.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 1 Oct 2003 10:24 pm, Robert G. Brown wrote: > a) 8 will, probably fairly soon, be no longer maintained. 9 will be, > at least for a while (possibly for one more year). Updates for 7.3 ends on December 31st 2003. Updates for 8.0 ends on December 31st 2003. Updates for 9 ends on April 30th 2004. So going to 9 will only get you an extra 4 months of updates. http://www.redhat.com/apps/support/errata/ - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/e1zjO2KABBYQAh8RArjhAJoDUAq9xSKjz6pJ58nIvSk1GEqG2QCeJ7f3 5XYQ/rJIzUPP744CNvAOLXA= =UNIB -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 1 18:58:21 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 2 Oct 2003 08:58:21 +1000 Subject: Environment monitoring In-Reply-To: <200310011001.31106.lepalom@vilma.upc.es> References: <200310011001.31106.lepalom@vilma.upc.es> Message-ID: <200310020858.30401.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 1 Oct 2003 06:21 pm, Leopold Palomo Avellaneda wrote: > A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > > Dont overlook lm_sensors+cron > > Why? Presumably because you can use it to monitor the temp and fan sensors and stuff and raise alarms if they go out of bounds. http://secure.netroedge.com/~lm78/ And from the info page: Project Mission / Background / Ethics: The primary mission for our project is to provide the best and most complete hardware health monitoring drivers for Linux. We strive to produce well organized, efficient, safe, flexible, and tested code free of charge to all Linux users using the Intel x86 hardware platform. The project attempts to support as many related devices as possible (when testing and documentation is available), especially those which are commonly included on mainboards. Our drivers provide the base software layer for utilities to acquire data on the environmental conditions of the hardware. We also provide a sample text-oriented utility to display sensor data. While this simple utility is sufficient for many users, others desire more elaborate user interfaces. We leave the development of these GUI-oriented utilities to others. See our useful addresses page for references. http://secure.netroedge.com/~lm78/info.html NB: I've used these at home from time to time, but we don't use them on our IBM cluster as we can grab the same info out of CSM. - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/e1wQO2KABBYQAh8RApUxAJ0V9QuvuGOLCnS7qXCkWD+9/OrOlgCfezuT QQ5wnTot9uoJCy3tRjuDKAQ= =fDWX -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Wed Oct 1 18:27:58 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 01 Oct 2003 15:27:58 -0700 Subject: cluster computing for mechanical structural FEM models Message-ID: <5.2.0.9.2.20031001152545.03110070@mailhost4.jpl.nasa.gov> I'm looking for references to work on distributed computing for structural models like trusses and spaceframes. They are typically sparse/diagonalish matrices that represent the masses and springs, so distributing the work in a cluster seems a natural fit. Anybody done anything like this (as a demonstration, e.g.) say, using NASTRAN inputs? James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From vanw at tticluster.com Thu Oct 2 08:37:50 2003 From: vanw at tticluster.com (Kevin Van Workum) Date: Thu, 2 Oct 2003 08:37:50 -0400 (EDT) Subject: lm_sensors output Message-ID: The recent discussion on environment sensors motivated me to take the subject more seriously. I therefore installed lm_senors on one of my nodes for testing. I simply used the lm_sensors RPM from RH8.0, ran sensors-detect and did what it told me to do. It apparently worked. The problem is, I don't really know what the output means or what I should be looking for. I guess I'm a novice. Anyways, the output from sensors is shown below. What is VCore and why is mine out of range? What are all the other voltages describing? V5SB is out of range also, is that a bad thing? I have only 1 CPU, so I guess temp2 and fan2 are meaningless, right? $ sensors w83697hf-isa-0290 Adapter: ISA adapter Algorithm: ISA algorithm VCore: +1.50 V (min = +0.00 V, max = +0.00 V) +3.3V: +3.29 V (min = +2.97 V, max = +3.63 V) +5V: +5.02 V (min = +4.50 V, max = +5.48 V) +12V: +12.20 V (min = +10.79 V, max = +13.11 V) -12V: -12.85 V (min = -13.21 V, max = -10.90 V) -5V: -5.42 V (min = -5.51 V, max = -4.51 V) V5SB: +5.51 V (min = +4.50 V, max = +5.48 V) VBat: +3.29 V (min = +2.70 V, max = +3.29 V) fan1: 4687 RPM (min = 187 RPM, div = 32) fan2: 0 RPM (min = 187 RPM, div = 32) temp1: +53?C (limit = +60?C, hysteresis = +127?C) sensor = thermistor temp2: +208.0?C (limit = +60?C, hysteresis = +50?C) sensor = thermistor alarms: beep_enable: Sound alarm disabled Kevin Van Workum, Ph.D. www.tsunamictechnologies.com ONLINE COMPUTER CLUSTERS __/__ __/__ * / / / / / / _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From AlberT at SuperAlberT.it Thu Oct 2 03:35:57 2003 From: AlberT at SuperAlberT.it (AlberT) Date: Thu, 2 Oct 2003 09:35:57 +0200 Subject: Upper bound on no. of sockets In-Reply-To: References: Message-ID: <200310020935.58006.AlberT@SuperAlberT.it> On Tuesday 30 September 2003 06:44, Balaji Rangasamy wrote: > Hi, > Is there an upper bound on the number of sockets that can be created by a > process? If there is one, is the limitation enforced by OS? And what other > factors does it depend on? Can you please be specific on the numbers for > different OS (RH Linux 7.2) ? > Thank you very much, > Balaji. > from man setrlimit: [quote] getrlimit and setrlimit get and set resource limits respectively. Each resource has an associated soft and hard limit, as defined by the rlimit structure (the rlim argument to both getrlimit() and setrlimit()): struct rlimit { rlim_t rlim_cur; /* Soft limit */ rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */ }; The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process may make arbitrary changes to either limit value. The value RLIM_INFINITY denotes no limit on a resource (both in the structure returned by getrlimit() and in the structure passed to setrlimit()). [snip] RLIMIT_NOFILE Specifies a value one greater than the maximum file descriptor number that can be opened by this process. Attempts (open(), pipe(), dup(), etc.) to exceed this limit yield the error EMFILE. [/QUOTE] -- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathog at mendel.bio.caltech.edu Thu Oct 2 11:33:21 2003 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 02 Oct 2003 08:33:21 -0700 Subject: Environment monitoring Message-ID: Robert G. Brown rgb at phy.duke.edu wrote: >The bad thing is that it does NOT give you any sort of measure of room >temperature per se, Well, no, but to be fair that's hardly lm_sensors fault. The problem is that few (any?) motherboards have a sensor positioned away from hot devices on the upstream end of the wind flow. One can sometimes acquire a fair approximation of this info using SMART from a hard drive if the airflow across the drive is good and the drive itself does not run very hot. We have not yet filled the second processor slot on the mobos of our beowulf and that temperature sensor gives a pretty good indication of the air temperature in the case (32C) vs. under a live Athlon MP 2200+ processor (no load, 40.5C). We use lm_sensors with mondo http://mondo-daemon.sourceforge.net/ to watch the systems and shut them down if they overheat. Generally this works well. Mondo can compensate for the shortcomings of the lm_sensors/motherboard combos which sometimes arise. For instance, on our ASUS A7V266 mobos (workstations, not in a beowulf!) some of the sensors tend to go whacky for one or two measurements. Fan speeds go to 0 or temps to 255C. Mondo is set to require an out of range condition for 3 seconds before triggering a shutdown, and so far we have not seen a glitch last that long. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcmoore at atipa.com Thu Oct 2 13:56:04 2003 From: jcmoore at atipa.com (Curt Moore) Date: 02 Oct 2003 12:56:04 -0500 Subject: lm_sensors output In-Reply-To: References: Message-ID: <1065117364.12473.27.camel@picard.lab.atipa.com> This is really the bad thing about lm_sensors which some have touched on previously; too much guesswork. Many times even if the drivers are present and up to date for your specific hardware, the values may be meaningless as different board manufacturers may choose to physically connect the monitoring chip(s) to different onboard devices, such as in the case with fans. You have to have a knowledge of which onboard piece of hardware is connected to which input of the monitoring chip in order to make sense of the sensors output. Don't get me wrong, when lm_sensors works, it works great but sometimes it takes a little work to get to that point. Even if the values are sane for your hardware, you still have to go into the sensors.conf and set max, min, and hysteresis values, if you so choose, in order to have this information make sense for your specific hardware. In recent months, vendors such as Tyan have begun to distribute customized sensors.conf files for their boards which take into account the differences between boards and how sensor chips are connected to the onboard devices for each of their boards. As Don mentioned earlier, IPMI is more generalized and is much easier to ask for "CPU 1 Temperature" and actually get "CPU 1 Temperature" instead of data from some other onboard thermistor. A mistake in this area could end up costing time and money if something overheats and it's not detected because of polling the wrong data. >From my experience, it would be very difficult to come up with a generalized set of sensors values to work across differing motherboard types. A "standard" such as IPMI makes things much easier to accurately collect and act upon as all of the "hard" work has already been done by those implementing IPMI on the hardware. One would hope that these individuals would have the in-depth knowledge of exactly which values to map to which sensor inputs and any computations needed for these values so that clean and accurate values are returned when the hardware is polled. -Curt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Curt Moore Systems Integration Engineer At?pa Technologies jcmoore at atipa.com (O) 785-813-0312 (Fax) 785-841-1809 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brbarret at osl.iu.edu Thu Oct 2 13:05:54 2003 From: brbarret at osl.iu.edu (Brian Barrett) Date: Thu, 2 Oct 2003 10:05:54 -0700 Subject: Upper bound on no. of sockets In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com> References: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com> Message-ID: On Oct 1, 2003, at 6:59 AM, John Brookes wrote: > I think there is a 1k per-process limit on open sockets. It's tuneable > in > 2.4 kernels, IIRC, but I don't remember how (off the top of my head). > 'ulimit -n' adjusts the max number of fd's, but I'm not sure that'll > take it > past a/the kernel limit. Maybe recompile kernel? Maybe poke > /proc/sys/.../...? Maybe adjust in userland? > > Maybe use fewer sockets ;-) > > Does anybody know the score? On linux, there is a default per-process limit of 1024 (hard and soft limits) file descriptors. You can see the per-process limit by running limit (csh/tcsh) or ulimit -n (sh). There is also a limit on the total number of file descriptors that the system can have open, which you can find by looking at /proc/sys/fs/file-max. On my home machine, the max file descriptor count is around 104K (the default), so that probably isn't a worry for you. There is the concept of a soft and hard limit for file descriptors. The soft limit is the "default limit", which is generally set to somewhere above the needs of most applications. The soft limit can be increased by a normal user application up to the hard limit. As I said before, the defaults for the soft and hard limits on modern linux machines are the same, at 1024. You can adjust either limit by adding the appropriate lines in /etc/security/limits.conf (at least, that seems to be the file on both Red Hat and Debian). In theory, you could set the limit up to file-max, but that probably isn't a good idea. You really don't want to run your system out of file descriptors. There is one other concern you might want to think about. If you ever use any of the created file descriptors in a call to select(), you have to ensure all the select()ed file descriptors fit in an FD_SET. On Linux, the size of an FD_SET is hard-coded at 1024 (on most of the BSDs, Solaris, and Mac OS X, it can be altered at application compile time). So you may not want to ever set the soft limit above 1024. Some applications may expect that any file descriptor that was successfully created can be put into an FD_SET. If this isn't the case, well, life could get interesting. Hope this helps, Brian _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csmith at lnxi.com Thu Oct 2 11:45:40 2003 From: csmith at lnxi.com (Curtis Smith) Date: Thu, 2 Oct 2003 09:45:40 -0600 Subject: lm_sensors output References: Message-ID: <006001c388fc$3b67cb60$a423a8c0@blueberry> VCore is the voltage of the CPU #1. You can get the full definition of all values at http://www2.lm-sensors.nu/~lm78/. Curtis Smith Principle Software Engineer Linux Networx Inc. (www.lnxi.com) ----- Original Message ----- From: "Kevin Van Workum" To: Sent: Thursday, October 02, 2003 6:37 AM Subject: lm_sensors output > The recent discussion on environment sensors motivated me to take the > subject more seriously. I therefore installed lm_senors on one of my nodes > for testing. I simply used the lm_sensors RPM from RH8.0, ran > sensors-detect and did what it told me to do. It apparently worked. The > problem is, I don't really know what the output means or what I should be > looking for. I guess I'm a novice. Anyways, the output from sensors is > shown below. > > What is VCore and why is mine out of range? > What are all the other voltages describing? > V5SB is out of range also, is that a bad thing? > I have only 1 CPU, so I guess temp2 and fan2 are meaningless, right? > > $ sensors > w83697hf-isa-0290 > Adapter: ISA adapter > Algorithm: ISA algorithm > VCore: +1.50 V (min = +0.00 V, max = +0.00 V) > +3.3V: +3.29 V (min = +2.97 V, max = +3.63 V) > +5V: +5.02 V (min = +4.50 V, max = +5.48 V) > +12V: +12.20 V (min = +10.79 V, max = +13.11 V) > -12V: -12.85 V (min = -13.21 V, max = -10.90 V) > -5V: -5.42 V (min = -5.51 V, max = -4.51 V) > V5SB: +5.51 V (min = +4.50 V, max = +5.48 V) > VBat: +3.29 V (min = +2.70 V, max = +3.29 V) > fan1: 4687 RPM (min = 187 RPM, div = 32) > fan2: 0 RPM (min = 187 RPM, div = 32) > temp1: +53?C (limit = +60?C, hysteresis = +127?C) sensor = thermistor > temp2: +208.0?C (limit = +60?C, hysteresis = +50?C) sensor = thermistor > alarms: > beep_enable: > Sound alarm disabled > > Kevin Van Workum, Ph.D. > www.tsunamictechnologies.com > ONLINE COMPUTER CLUSTERS > > __/__ __/__ * > / / / > / / / > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Oct 2 17:25:20 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 2 Oct 2003 17:25:20 -0400 (EDT) Subject: Power Supply: Supermicro P4DL6 Board? In-Reply-To: Message-ID: > > disks are much, much cooler than they used to be, probably dropping > > below power consumed by ram on most clusters. > > Note that most performance-oriented RAM types now have metal cases and > heat sinks. They didn't add the metal because it _looks_ cool. I'm not so sure. I looked at the spec for a current samsung pc333 ddr 512Mb chip, and it works out to about 16W per GB. I think most people still have 512MB dimms, and probably pc266 (13.6W/GB). I don't really see why a dimm would have trouble dissipating ~20W, considering its size. I suspect dimm heatsinks are actually a fashion statement inspired by the heat-spreaders found on some rambus rimms (which were *spreaders*, a consequence of how rambus does power management...) personally, I'm waiting till I can invest in peltier-cooled dimms ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Oct 2 19:08:39 2003 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 3 Oct 2003 09:08:39 +1000 Subject: more on structural models for clusters In-Reply-To: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> References: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> Message-ID: <200310030908.41322.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 2 Oct 2003 08:36 am, Jim Lux wrote: > Imagine you had something like a geodesic dome with a microprocessor at > each vertex that wanted to compute the loads for that vertex, communicating > only with the adjacent vertices... The nearest I can remember to something like that (which sounds like an excellent idea) was for a fault tolerant model built around processors connected in a grid where each monitored the neighbours and if one was seen to go bad it could be sent a kill signal and the grid would logically reform without that processor. I think I read it in New Scientist between 1-4 years ago, but this abstract from the IEEE Transactions on Computers sounds similar (you've got to pay for the full article apparently): http://csdl.computer.org/comp/trans/tc/1988/11/t1414abs.htm A Multiple Fault-Tolerant Processor Network Architecture for Pipeline Computing Good luck! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/fK/3O2KABBYQAh8RArPyAKCCoaQXbywrq9h+3geGOVCE97dhgQCeKzV0 B94q2Yd0yPYFwDbcVINl/4w= =rbMB -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Thu Oct 2 20:39:33 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu, 02 Oct 2003 17:39:33 -0700 Subject: more on structural models for clusters In-Reply-To: <20031003002932.GA5984@sphere.math.ucdavis.edu> References: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> Message-ID: <5.2.0.9.2.20031002173001.0310ce38@mailhost4.jpl.nasa.gov> At 05:29 PM 10/2/2003 -0700, Bill Broadley wrote: >On Wed, Oct 01, 2003 at 03:36:35PM -0700, Jim Lux wrote: > > In regards to my recent post looking for cluster implementations for > > structural dynamic models, I would like to add that I'm interested in > > "highly distributed" solutions where the computational load for each > > processor is very, very low, as opposed to fairly conventional (and widely > > available) schemes for replacing the Cray with a N-node cluster. > > > > The number of processors would be comparable to the number of structural > > nodes (to a first order of magnitude) > >Er, why bother? Is there some reason to distribute those things so >thinly? Your average dell can do 1-4 Billion floating point ops/sec, >why bother with so few per CPU? Am I missing something? Your average Dell isn't suited to inclusion as a MCU core in an ASIC at each node and would cost more than $10/node... I'm looking at Z80/6502/low end DSP kinds of computational capability in a mesh containing, say, 100,000 nodes. Sure, we'd do algorithm development on a bigger machine, but in the end game, you're looking at zillions of fairly stupid nodes. The commodity cluster aspect would only be in the development stages, and because it's much more likely that someone has solved the problem for a Beowulf (which is fairly loosely coupled and coarse grained) than for a big multiprocessor with tight coupling like a Cray. Haven't fully defined the required performance yet, but, as a starting point, I'd need to "solve the system" in something like 100 microseconds. The key is that I need an algorithm for which the workload scales roughly linearly as a function of the number of nodes, because the computational power available also scales as the number of loads. Clearly, I'm not going to do a brute force inversion or LU decomposition of a 100,000x100,000 matrix... However, inverting 100,000 matrices, each, say, 10x10, is reasonable. >Bill Broadley >Mathematics >UC Davis James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From he.94 at osu.edu Thu Oct 2 20:59:55 2003 From: he.94 at osu.edu (Hao He) Date: Thu, 02 Oct 2003 20:59:55 -0400 Subject: NFS Problem Message-ID: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com> Hi, there. I am building a cluster with 16 or 32 nodes, based on Pentium 4, Intel 875P chipsets and Intel CSA Gigabit NIC. The distribution is RedHat 9. I have some experience before but I still got some problem in NFS. Problem 1: When I just use 'rw' and 'intr' as the parameters used in /etc/fstab, I got following problem when startup clients (while the server with NFS daemon is running): Mount: RPC: Remote system error -- no route to host Then I added 'bg' to /etc/fstab, this time the result is better. Several minutes after the client booted up, the remote directory mounted. However, in many cases following meassage was prompted: nfs warning: mount version older than kernel Problem 2: I am mounting two remote directories from the server, however, at some nodes, only one directory even no directory got mounted. If only one directory mounted successfully, it differs from one client to another, and to the same node, it changes from time to time at system booting up, like dicing. This really confused me. Problem 3: Sometimes I got the message at the server node like this: (scsi 0:A:0:0): Locking max tag count at 33. However, seems it does not make trouble to mounted directories. I think it must be related with NFS. I have a further question: Since there may be 16 or 32 or even more clients try to mount the remote directory at the same time, can the NFS server really handle so much requests simultaneously? Is there any effective alternate method to share data, besides NFS? How to solve these problems? Any suggestion? Thank you very much. I will appreciate your response. Best wishes, Hao He _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Fri Oct 3 01:13:37 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Fri, 3 Oct 2003 07:13:37 +0200 Subject: NFS Problem In-Reply-To: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com> References: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com> Message-ID: <20031003051337.GA6263@unthought.net> On Thu, Oct 02, 2003 at 08:59:55PM -0400, Hao He wrote: > Hi, there. > > I am building a cluster with 16 or 32 nodes, based on Pentium 4, Intel 875P > chipsets and Intel CSA Gigabit NIC. > The distribution is RedHat 9. > I have some experience before but I still got some problem in NFS. > > Problem 1: When I just use 'rw' and 'intr' as the parameters used in > /etc/fstab, I got following problem when startup clients (while the server > with NFS daemon is running): > Mount: RPC: Remote system error -- no route to host That's a network problem or a network configuration problem. Usually this would be a name resolution problem. Check that the hostname in your fstab can be resolved in early boot (add it to your hosts file if necessary), or use the IP address of the server instead. But the error message seems to indicate that it's not resolution but routing - very odd... Is the network up? Do you have any special networking setup? Try checking your init-scripts to see that the network is really started before the NFS filesystems are mounted. > Then I added 'bg' to /etc/fstab, this time the result is better. Several > minutes after the client booted up, the remote directory mounted. So you NFS mount depends on something (network related) that isn't up at the time when the system tries to mount your NFS filesystems. Either you have a special (and wrong) setup, or RedHat messed up good :) Check the order in which things are started in your /etc/rc3.d/ directory. Network should go before NFS. > However, in many cases following meassage was prompted: > nfs warning: mount version older than kernel Most likely this is not really a problem - I've had systems with that message work just fine. You could check to see if RedHat has updates to mount. > > Problem 2: I am mounting two remote directories from the server, however, at > some nodes, only one directory even no directory got mounted. > If only one directory mounted successfully, it differs from one client to > another, and to the same node, it changes from time to time at system > booting up, like dicing. > This really confused me. Isn't this problem 1 over again? > > Problem 3: Sometimes I got the message at the server node like this: > (scsi 0:A:0:0): Locking max tag count at 33. That's a SCSI diagnostic. You can ignore it. > However, seems it does not make trouble to mounted directories. > I think it must be related with NFS. It's not related to NFS. > > I have a further question: Since there may be 16 or 32 or even more clients > try to mount the remote directory at the same time, > can the NFS server really handle so much requests simultaneously? Is there > any effective alternate method to share data, besides NFS? That should be no problem at all. NFS should be up to the task with no special tuning at all. Once you have all your nodes mounting NFS properly, you can start looking into tuning for performance - but it really should work 'out of the box' with no special tweaking. > > How to solve these problems? Any suggestion? > Thank you very much. I will appreciate your response. Use the following options to the NFS mounts in your fstab: hard,intr You can add rsize=8192,wsize=8192 for tuning. You should not need 'bg' - although it may be convenient if you need to be able to boot your nodes when the NFS server is down. One thing you should make sure: never use host-names or netgroups in your exports file on the server (!) *Only* use IP addresses or wildcards - *Never* use names. Using names in your 'exports' file on the server can cause *all* kinds of weird sporadic irreproducible problems - it's a long-standing and extremely annoying problem, but fortunately one that has an easy workaround. Check: *) Server: Your exports file (only IP or wildcard exports) *) Clients: Your fstab (use server IP or name in hosts file) *) Clients: Is network started before NFS mount? Please write to the list about your progress :) -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Andrew.Cannon at nnc.co.uk Fri Oct 3 05:30:07 2003 From: Andrew.Cannon at nnc.co.uk (Cannon, Andrew) Date: Fri, 3 Oct 2003 10:30:07 +0100 Subject: Filesystem question (sort of newbie) Message-ID: Hi All, I am going to be setting up a 16 node cluster in the near future. I have only set up a 4 node cluster before and I am a little unsure about how to sort out the disk space. Each computer will be running Red Hat (either 8 or 9 I haven't decided yet, any advice is still appreciated), and I was wondering how to best organise the disks on each node. I am thinking (only started wondering about this today) of installing the cluster software on the master node (pvm, MPI and the actual calculation software, MCNP) and mounting the disk on each of the other nodes, so that all they have on their hard drives is the minimal install of RH. The question I am asking is, will this work and what sort of performance hit will there be? Would I be better installing the software on each computer? TIA (sorry for being so stoopid, I'm still very much a learner at linux and clustering) Andy Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, Cheshire, WA16 8QZ. Telephone; +44 (0) 1565 843768 email: mailto:andrew.cannon at nnc.co.uk NNC website: http://www.nnc.co.uk NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856). The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ. This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Fri Oct 3 05:32:34 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Fri, 03 Oct 2003 05:32:34 -0400 Subject: Filesystem question (sort of newbie) In-Reply-To: References: Message-ID: <3F7D4232.3070900@lmco.com> Andrew, Let me recommend Warewulf (warewulf-cluster.org). It boots the nodes using RH 7.3 (although it should work with 8 or but I haven't tested it), but it boots into a small Ram Disk (about 70 megs depending upon what you need on the nodes). It's very easy to setup, configure and use, plus you don't need to install RH on each node. Warewulf will use a hard disk in the nodes if available for swap and local scratch space. However, it will also work with diskless nodes (although you don't get swap or scratch space). Warewulf will also take /home from the master node and NFS mount it throughout the cluster. So you can install your code on /home for all of the nodes. Good Luck! Jeff > Hi All, > > I am going to be setting up a 16 node cluster in the near future. I have > only set up a 4 node cluster before and I am a little unsure about how to > sort out the disk space. > > Each computer will be running Red Hat (either 8 or 9 I haven't decided > yet, > any advice is still appreciated), and I was wondering how to best > organise > the disks on each node. > > I am thinking (only started wondering about this today) of installing the > cluster software on the master node (pvm, MPI and the actual calculation > software, MCNP) and mounting the disk on each of the other nodes, so that > all they have on their hard drives is the minimal install of RH. The > question I am asking is, will this work and what sort of performance hit > will there be? Would I be better installing the software on each > computer? > > TIA (sorry for being so stoopid, I'm still very much a learner at > linux and > clustering) > > Andy > > Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, > Cheshire, WA16 8QZ. > > Telephone; +44 (0) 1565 843768 > email: mailto:andrew.cannon at nnc.co.uk > NNC website: http://www.nnc.co.uk > > > > NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC > Limited (no. 1120437), National Nuclear Corporation Limited (no. > 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited > (no. 235856). The registered office of each company is at Booths > Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for > Technica-NNC Limited whose registered office is at 6 Union Row, > Aberdeen AB10 1DQ. > > This email and any files transmitted with it have been sent to you by > the relevant UK operating company and are confidential and intended > for the use of the individual or entity to whom they are addressed. > If you have received this e-mail in error please notify the NNC system > manager by e-mail at eadm at nnc.co.uk. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Dr. Jeff Layton Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Fri Oct 3 08:59:52 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Fri, 03 Oct 2003 08:59:52 -0400 Subject: Filesystem question (sort of newbie) In-Reply-To: References: Message-ID: <3F7D72C8.3050409@lmco.com> Mark Hahn wrote: > > 8 or but I haven't tested it), but it boots into a small Ram > > Disk (about 70 megs depending upon what you need on the > > alternately, it's almost trivial to PXE boot nodes, mount a simple > root FS from a server/master, and use the local disk, if any, for > swap and/or tmp. one nice thing about this is that you can do it > with any distribution you like - mine's RH8, for instance. > > personally, I prefer the nfs-root approach, probably because once > you boot, you won't be wasting any ram with boot-only files. > for a cluster of 48 nodes, there seems to be no drawback; > for a much larger cluster, I expect all the boot-time traffic > would be crippling, and you might want to use some kind of > multicast to distribute a ramdisk image just once... > While I don't prefer the nfs-root approach, Warewulf can do that as well (haven't tried it personally). What kind of network do you use for the 48-node cluster? Anybody else use the nfs-root approach? The 70 megs used in the ram disk is pretty well thought out. There are some basic things to boot the node, but it also includes glibc and you can easily add MPICH, LAM, Ganglia, SGE, etc. The developer has thought out these packages very well so that only the pieces of each of these packages that needs to be on the nodes actually gets installed on the nodes. Very well thought out. Oh, one other thing. The image that goes to the nodes via TFTP (over PXE) is compressed so it's about half the size of the final ram disk. This really helps cut down on network traffic (even works over my poor rtl8139 network). One of the things I'd like to experiment with is using squasfs to reduce the size of the ram disk. IMHO, 70 megs is not very big, but reducing it to 30-40 Megs might be worth the effort. > regards, mark hahn. > Thanks! Jeff -- Dr. Jeff Layton Senior Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Fri Oct 3 09:34:30 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Fri, 3 Oct 2003 09:34:30 -0400 (EDT) Subject: Filesystem question (sort of newbie) In-Reply-To: <3F7D4232.3070900@lmco.com> Message-ID: > 8 or but I haven't tested it), but it boots into a small Ram > Disk (about 70 megs depending upon what you need on the alternately, it's almost trivial to PXE boot nodes, mount a simple root FS from a server/master, and use the local disk, if any, for swap and/or tmp. one nice thing about this is that you can do it with any distribution you like - mine's RH8, for instance. personally, I prefer the nfs-root approach, probably because once you boot, you won't be wasting any ram with boot-only files. for a cluster of 48 nodes, there seems to be no drawback; for a much larger cluster, I expect all the boot-time traffic would be crippling, and you might want to use some kind of multicast to distribute a ramdisk image just once... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 3 11:24:48 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 3 Oct 2003 11:24:48 -0400 (EDT) Subject: Filesystem question (sort of newbie) In-Reply-To: Message-ID: On Fri, 3 Oct 2003, Cannon, Andrew wrote: > Each computer will be running Red Hat (either 8 or 9 I haven't decided yet, > any advice is still appreciated), and I was wondering how to best organise > the disks on each node. > > I am thinking (only started wondering about this today) of installing the > cluster software on the master node (pvm, MPI and the actual calculation > software, MCNP) and mounting the disk on each of the other nodes, so that > all they have on their hard drives is the minimal install of RH. The > question I am asking is, will this work and what sort of performance hit > will there be? Would I be better installing the software on each computer? > > TIA (sorry for being so stoopid, I'm still very much a learner at linux and > clustering) If the nodes have lots of memory, most of their access to non-data disk (programs and libraries) will come out of caches after the systems have been up for a while, so they won't take a HUGE performance hit, but things like loading a big program for the first time may take longer. However, if you work to master PXE and kickstart (which go together like ham and eggs) and have adequate disk, in the long run your maintenance will be minimized by putting energy into developing a node kickstart script. Then you just boot the nodes into kickstart over the network, wait a few minutes for the install and boot into production. This will take you some time to learn (there are HOWTO-like resource online, so it isn't a LOT of time) and if you got nodes with NICs that don't support PXE you'll likely want to replace them or add ones that do, but once you invest these capital costs the payback is that your marginal cost for installing additional nodes after the first node you get to install "perfectly" is so close to zero as to make no nevermind. Make a dhcp table entry. Boot node into install. Boot node. Reinstalling is exactly the same process and can be done in minutes if a hard disk crashes. It gets to be so easy that we almost routinely do a reinstall after working on a system for any reason, including ones where it probably isn't necessary. You can reinstall a system from anywhere on the internet (if your hardware is accessible and preconfigured for this to work). Finally, if you include yum on the nodes, you can automagically update the nodes from a master repository image on your server, and mirror your server image from one of the Red hat mirrors, and actually maintain a stream of updates onto the nodes with no further action on your part. At this point, if you aren't doing Scyld or one of the preconfigured cluster packages and want to roll your own cluster out of a base install plus selected RPMs (and why not?) PXE+kickstart/RH+yum forms a pretty solid low-energy paradigm for installation and maintenance once you've learned how to make it work. rgb > > Andy > > Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, > Cheshire, WA16 8QZ. > > Telephone; +44 (0) 1565 843768 > email: mailto:andrew.cannon at nnc.co.uk > NNC website: http://www.nnc.co.uk > > > > NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856). The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ. > > This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajiang at mail.eecis.udel.edu Sat Oct 4 12:00:51 2003 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Sat, 4 Oct 2003 12:00:51 -0400 (EDT) Subject: Help: About Intel Fortran Compiler: Message-ID: Hi, All: I tried to compile a Fortran 90 MPI program by the Intel Frotran Compiler in the OSCAR cluster. I run the command: " ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 " The system failed to compile it and gave me the following information: " module EHFIELD program FDTD3DPML external function RISEF external function WINDOWFUNCTION external function SIGMA external function GETISTART external function GETIEND external subroutine COM_EYZ external subroutine COM_EYX external subroutine COM_EZX external subroutine COM_EZY external subroutine COM_HYZ external subroutine COM_HYX external subroutine COM_HZX external subroutine COM_HZY 3228 Lines Compiled /tmp/ifcVao851.o(.text+0x5a): In function `main': : undefined reference to `mpi_init_' /tmp/ifcVao851.o(.text+0x6e): In function `main': : undefined reference to `mpi_comm_rank_' /tmp/ifcVao851.o(.text+0x82): In function `main': : undefined reference to `mpi_comm_size_' /tmp/ifcVao851.o(.text+0xab): In function `main': : undefined reference to `mpi_wtime_' /tmp/ifcVao851.o(.text+0x422): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x448): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x47b): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x49e): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x4c1): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x4e7): more undefined references to `mpi_bcast_' follow /tmp/ifcVao851.o(.text+0x24511): In function `com_hzy_': : undefined reference to `mpi_recv_' /tmp/ifcVao851.o(.text+0x24b76): In function `com_hzy_': : undefined reference to `mpi_send_' " At the same time, I tried the same program in the other scyld cluster, using NAG compiler. I use command: " f95 -I/usr/include -lmpi -lm -o p_wg3 p_fdtd3dwg3.f90 " It works fine. So that means my fortran program in fine. Both of the cluster use the MPICH implementation. But because I have to work on that OSCAR cluster with Intel compiler, I wonder 1. why the errors happen? 2. Is the problem of cluster or the Intel compiler? 3. How I can solve it. I know there are a lot of guy with experience and experts of cluster and MPI in this mailing list. I appreciate your suggestion and advice from you. Thanks. Tom _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From br66 at HPCL.CSE.MsState.Edu Sun Oct 5 00:52:45 2003 From: br66 at HPCL.CSE.MsState.Edu (Balaji Rangasamy) Date: Sat, 4 Oct 2003 23:52:45 -0500 (CDT) Subject: Upper bound on no. of sockets In-Reply-To: Message-ID: Thanks a billion for all the responses. Here is another question: Is there a way to send some data to the listener when I do a connect()? I tried using sin_zero field of the sockaddr_in structure, but quite unsuccessfully. The problem is I want to uniquely identify the actively connecting process (IP address and port number information wont suffice). I can send() the identifier value to the listener after the connect(), but I want to cut down the cost of an additional send. Any suggestions are greatly appreciated. Thanks, Balaji. PS: I am not sure if it is appropriate to send this question to this mailing list. My sincere apologies for those who find this question annoyingly incongruous. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Sat Oct 4 15:15:08 2003 From: lathama at yahoo.com (Andrew Latham) Date: Sat, 4 Oct 2003 12:15:08 -0700 (PDT) Subject: Filesystem question (sort of newbie) In-Reply-To: Message-ID: <20031004191508.27391.qmail@web60306.mail.yahoo.com> This is by far my favorite approach however I tend to tweak it with a very large initrd and custom kernel. I am using older hardware with its max ram so I use it as best I can. with no local harddisk I am always looking at the best method of network file access and have gone so far as to try wget with http. --- Mark Hahn wrote: > > 8 or but I haven't tested it), but it boots into a small Ram > > Disk (about 70 megs depending upon what you need on the > > alternately, it's almost trivial to PXE boot nodes, mount a simple > root FS from a server/master, and use the local disk, if any, for > swap and/or tmp. one nice thing about this is that you can do it > with any distribution you like - mine's RH8, for instance. > > personally, I prefer the nfs-root approach, probably because once > you boot, you won't be wasting any ram with boot-only files. > for a cluster of 48 nodes, there seems to be no drawback; > for a much larger cluster, I expect all the boot-time traffic > would be crippling, and you might want to use some kind of > multicast to distribute a ramdisk image just once... > > regards, mark hahn. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== Andrew Latham Penguin loving, moralist agnostic. LathamA.com - (lay-th-ham-eh) lathama at lathama.com - lathama at yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From franz.marini at mi.infn.it Mon Oct 6 04:21:50 2003 From: franz.marini at mi.infn.it (Franz Marini) Date: Mon, 6 Oct 2003 10:21:50 +0200 (CEST) Subject: Help: About Intel Fortran Compiler: In-Reply-To: References: Message-ID: On Sat, 4 Oct 2003, Ao Jiang wrote: > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 Try with : ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 Btw, a cleaner way to compile mpi programs is to use the mpif90 (mpif77 for fortran77) command (which is a wrapper for the real compiler). You should be able to make it use the ifc by setting the MPICH_F90 (MPICH_F77 for fortran77) and MPICH_F90LINKER environment variables to choose which compiler to use, e.g. let's say you want to use the ifc compiler, and you're using bash, you would have to do: export MPICH_F90=ifc export MPICH_F90LINKER=ifc and then, in order to compile your mpi program you should issue the command: mpif90 -o p_wg3 p_fdtd3dwg3_pml.f90 > 2. Is the problem of cluster or the Intel compiler? Neither. Intel works fine with Oscar. Have a good day, F. --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it phone : +390250317221 --------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Mon Oct 6 07:22:35 2003 From: ds10025 at cam.ac.uk (ds10025 at cam.ac.uk) Date: Mon, 06 Oct 2003 12:22:35 +0100 Subject: Copy files between Nodes via NFS In-Reply-To: References: <003f01c37d29$47d7ec10$0e01010a@hpcncd.cpe.ku.ac.th> Message-ID: <5.0.2.1.0.20031006121907.03a25120@hermes.cam.ac.uk> Morning I have basic node PC that NFS mount directories from the master node. When I try to copy files using 'cp' from the node to NFS mounted directory the node PC just hang. Have any comes across this problem? How best move/copy files across nodes? Regards Dan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Mon Oct 6 07:21:02 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Mon, 06 Oct 2003 13:21:02 +0200 Subject: Intel compilers and libraries Message-ID: <20031006112102.GC15837@sgirmn.pluri.ucm.es> Hello: We are thinking about purchasing the Intel C++ compiler for linux, mainly for getting the most of our harware (Xeon 2.4Gz processors), we are also interested in the Intel MKL (Math Kernel Library), I would like to know if the performance gain using Intel compiler+libraries, which exploit SSE2 and make other optimizations for P4/Xeon, are as good as Intel claims, anyone in the list using those products? On the other hand, isn't MKL just as good as any other good math library compiled with Xeon/P4 optimization and extensions (using Intel C++ compiler for example). Another question, the only difference I can see reading Intel docs between P4 and Xeon is more cache on Xeon, and HyperThreading (below P4/Xeon 3Ghz), does it really makes a big difference taking into account the much more expensive Xeons are. Any one having experience with both platforms. Greetings: Jose M. P?rez. Madrid. Spain. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From j.a.white at larc.nasa.gov Mon Oct 6 10:05:07 2003 From: j.a.white at larc.nasa.gov (Jeffery A. White) Date: Mon, 06 Oct 2003 10:05:07 -0400 Subject: undefined references to pthread related calls Message-ID: <3F817693.9040001@larc.nasa.gov> Hi group, I have a user of my software (a f90 based CFD code using mpich) that is haveing trouble installing my code on their system. They are using mpich and the Intel version 7.1 ifc compiler. The problem occurs at the link step. They are getting undefined references to what appear to be system calls to pthread related functions such as pthread_self, pthread_equal, pthread_mutex_lock. Does any one else encountered and know how to fix this problem? Thanks, Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.kidger at quadrics.com Mon Oct 6 03:23:32 2003 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Mon, 6 Oct 2003 08:23:32 +0100 Subject: Help: About Intel Fortran Compiler calling mpich In-Reply-To: References: Message-ID: <200310060823.32239.daniel.kidger@quadrics.com> Tom, this is the standard old chestnut about Fortran and trailing underscores on function names. if you do say ' nm -a /opt/mpich-1.2.5/lib/libmpi.a |grep -i mpi_comm_rank' I expect you will see 2 trailing underscores. Different Fortran vendors add a different number of underscores - some add 2 by default (eg g77), some one (eg ifc), and some none. Sometimes there is a a compiler option to change this. There are three solutions to this issue: 1/ (Lazy option) recompile mpich several times; once with each Fortran compiler you have. 2/ Compile your application with the option that matches your prebuilt mpich (presumably 2 underscores - but note that ifc doesn't have an option for this) 3/ rebuild mpich with '-fno-second-underscore' (using say g77) . This is the common ground. You can link code to this with all current Fortran compilers. You may also meet the 'mpi_getarg, x_argc' issue - this too is easy to fix. -- Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- On Saturday 04 October 2003 5:00 pm, Ao Jiang wrote: > Hi, All: > I tried to compile a Fortran 90 MPI program by > the Intel Frotran Compiler in the OSCAR cluster. > I run the command: > " > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 > " > The system failed to compile it and gave me the following information: > " > module EHFIELD > program FDTD3DPML > external function RISEF > external function WINDOWFUNCTION > external function SIGMA > external function GETISTART > external function GETIEND > external subroutine COM_EYZ > external subroutine COM_EYX > external subroutine COM_EZX > external subroutine COM_EZY > external subroutine COM_HYZ > external subroutine COM_HYX > external subroutine COM_HZX > external subroutine COM_HZY > > 3228 Lines Compiled > > /tmp/ifcVao851.o(.text+0x5a): In function `main': > : undefined reference to `mpi_init_' > > /tmp/ifcVao851.o(.text+0x6e): In function `main': > : undefined reference to `mpi_comm_rank_' > > /tmp/ifcVao851.o(.text+0x82): In function `main': > : undefined reference to `mpi_comm_size_' > > /tmp/ifcVao851.o(.text+0xab): In function `main': > : undefined reference to `mpi_wtime_' > > /tmp/ifcVao851.o(.text+0x422): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x448): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x47b): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x49e): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x4c1): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x4e7): more undefined references to `mpi_bcast_' > follow > > /tmp/ifcVao851.o(.text+0x24511): In function `com_hzy_': > : undefined reference to `mpi_recv_' > > /tmp/ifcVao851.o(.text+0x24b76): In function `com_hzy_': > : undefined reference to `mpi_send_' > > " > > At the same time, I tried the same program in the other scyld cluster, > using NAG compiler. > > I use command: > " > f95 -I/usr/include -lmpi -lm -o p_wg3 p_fdtd3dwg3.f90 > " > > It works fine. So that means my fortran program in fine. > > Both of the cluster use the MPICH implementation. > > But because I have to work on that OSCAR cluster with Intel compiler, > I wonder > 1. why the errors happen? > 2. Is the problem of cluster or the Intel compiler? > 3. How I can solve it. > > I know there are a lot of guy with experience and experts of cluster and > MPI in this mailing list. I appreciate your suggestion and advice from > you. > > Thanks. > > Tom > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Mon Oct 6 10:54:43 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Mon, 6 Oct 2003 14:54:43 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: <20031006112102.GC15837@sgirmn.pluri.ucm.es> References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> Message-ID: Jose, Pardon me for some advertising here, but our OptimaNumerics Linear Algebra Library can very significantly outperform Intel MKL. Depending on the particular routine and platform, we have seen performance advantage of almost 32x (yes, that's 32 times!) using OptimaNumerics Linear Algebra Library! I can send you one of our white papers which shows performance benchmark details off-line. If anyone else is interested, please do send me an e-mail also. Best wishes, Kenneth Tan ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- This e-mail (and any attachments) is confidential and privileged. It is intended only for the addressee(s) stated above. If you are not an addressee, please accept my apologies and please do not use, disseminate, disclose, copy, publish or distribute information in this e-mail nor take any action through knowledge of its contents: to do so is strictly prohibited and may be unlawful. Please inform me that this e-mail has gone astray, and delete this e-mail from your system. Thank you for your co-operation. ----------------------------------------------------------------------- On Mon, 6 Oct 2003, [iso-8859-1] Jos? M. P?rez S?nchez wrote: > Date: Mon, 06 Oct 2003 13:21:02 +0200 > From: "[iso-8859-1] Jos? M. P?rez S?nchez" > To: beowulf at beowulf.org > Subject: Intel compilers and libraries > > Hello: > > We are thinking about purchasing the Intel C++ compiler for linux, > mainly for getting the most of our harware (Xeon 2.4Gz processors), we > are also interested in the Intel MKL (Math Kernel Library), I would like > to know if the performance gain using Intel compiler+libraries, which exploit > SSE2 and make other optimizations for P4/Xeon, are as good as Intel > claims, anyone in the list using those products? > > On the other hand, isn't MKL just as good as any other good math library compiled > with Xeon/P4 optimization and extensions (using Intel C++ compiler for > example). > > Another question, the only difference I can see reading Intel docs between > P4 and Xeon is more cache on Xeon, and HyperThreading (below P4/Xeon 3Ghz), > does it really makes a big difference taking into account the much more > expensive Xeons are. Any one having experience with both platforms. > > Greetings: > > Jose M. P?rez. > Madrid. Spain. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From clwang at csis.hku.hk Sun Oct 5 21:59:34 2003 From: clwang at csis.hku.hk (Cho Li Wang) Date: Mon, 06 Oct 2003 09:59:34 +0800 Subject: Cluster2003: Call for Participation (Preliminary) Message-ID: <3F80CC86.FAFBFAD2@csis.hku.hk> ---------------------------------------------------------------------- CALL FOR PARTICIPATION 2003 IEEE International Conference on Cluster Computing 1 - 4 December 2003 Sheraton Hong Kong Hotel & Towers, Tsim Sha Tsui, Kowloon, Hong Kong URL: http://www.csis.hku.hk/cluster2003/ Cosponsored by IEEE Computer Society IEEE Computer Society Task Force on Cluster Computing IEEE Hong Kong Section Computer Chapter The University of Hong Kong Industrial Sponsors : Hewlett-Packard, Microsoft, IBM, Extreme Networks, Sun Microsystems, Intel, Dawning, and Dell. ----------------------------------------------------------------------- Dear Friends, You are cordially invited to participate the annual international cluster computing conference to be held on Dec. 1-4, 2003 in Hong Kong, the most dynamic city in the Orient. The Cluster series of conferences is one of the flagship events sponsored by the IEEE Task Force on Cluster Computing (TFCC) since its inception in 1999. The competition among refereed papers was particularly strong this year, with 48 papers being selected as full papers from the 164 papers that were submitted, for a 29% acceptance rate. An additional 19 papers were selected for poster presentation. Besides the technical paper presentation, there will be three keynotes, four tutorials, one panel, a Grid live demo session, and a number of invited talks and exhibits to be arranged during the conference period. A preliminary program schedule is attached below. Please share this Call for Participation information with your colleagues working in the area of cluster computing. For registration, please visit our registration web page at: http://www.csis.hku.hk/cluster2003/registration.htm (The deadline for advance registration is October 22, 2003.) TCPP Awards will be granted to students members, and will partially cover the registration and travel cost to attend the conference. See : http://www.caip.rutgers.edu/~parashar/TCPP/TCPP-Awards.htm We look forward meeting you in Hong Kong! Cho-Li Wang and Daniel Katz Cluster2003, Program Co-chairs ------------------------------------------------------------------ ***************************************** Cluster 2003 Preliminary Program Schedule ***************************************** Monday, December 1 ------------------ 8:00-5:00 - Conference/Tutorial Registration 8:30-12:00: Morning Tutorials Designing Next Generation Clusters with Infiniband: Opportunities and Challenges D. Panda (Ohio State University) Using MPI-2: Advanced Features of the Message Passing Interface W. Gropp, E. Lusk, R. Ross, R. Thakur (Argonne National Lab.) 12:00-1:30 - Lunch 1:30-5:00 : Afternoon Tutorials The Gridbus Toolkit for Grid and Utility Computing R. Buyya (University of Melbourne) Building and Managing Clusters with NPACI Rocks G. Bruno, M. Katz, P. Papadopoulos, F. Sacerdoti, NPACI Rocks group at San Diego Supercomputer Center), L. Liew, N. Ninaba (Singapore Computing Systems) ************************ Tuesday, December 2 ************************ 7:00-5:00 Conference Registration 9:00-9:15 Welcome and Opening Remarks 9:15-10:15 Keynote 1 (TBA) 10:45-12:15 : Session 1A, 1B, 1C Session 1A (Room A) : Scheduling I Dynamic Scheduling of Parallel Real-time Jobs by Modelling Spare Capabilities in Heterogeneous Clusters Ligang He, Stephen A. Jarvis, Graham R. Nudd, Daniel P. Spooner (University of Warwick, UK) Parallel Job Scheduling on Multi-Cluster Computing Systems Jemal Abawajy and S. P. Dandamudi (Carleton University, Canada) Interstitial Computing: Utilizing Spare Cycles on Supercomputers Stephen Kleban and Scott Clearwater (Sandia National Laboratories, USA) Session 1B (Room B) : Applications A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen, Guang R. Gao (University of Delaware, USA) Computing Large-scale Alignments on a Multi-cluster Chunxi Chen and Bertil Schmidt (Nanyang Technological University, Singapore) Auto-CFD: Efficiently Parallelizing CFD Applications on Clusters Li Xiao (Michigan State University, USA), Xiaodong Zhang (College of WIlliam and Mary, NSF, USA), Zhengqian Kuang, Baiming Feng, Jichang Kang (Northwestern Polytechnic University, China) Session 1C (Room C) : Performance Analysis Performance Analysis of a Large-Scale Cosmology Application on Three Cluster Systems Zhiling Lan and Prathibha Deshikachar (Illinois Institute of Technology, USA) A Performance Monitor based on Virtual Global Time for Clusters of PCs Michela Taufer (UC San Diego, USA), Thomas M. Stricker (ETH Zurich, Switzerland) A Distributed Performance Analysis Architecture for Clusters Holger Brunst, Wolfgang E. Nagel (Dresden University of Technology, Germany), Allen D. Malony (University of Oregon, USA) 12:15-2:00 Lunch 2:00-3:30 : Session 2A, 2B, 2C Session 2A (Room A) : Scheduling II Coordinated Co-scheduling in time-sharing Clusters through a Generic Framework Saurabh Agarwal (IBM India Research Labs, India), Gyu Sang Choi, Chita R. Das (Pennsylvania State University, USA), Andy B. Yoo (Lawrence Livermore National Laboratory, USA), Shailabh Nagar (IBM T.J. Watson Research Center, USA) A Robust Scheduling Strategy for Moldable Jobs Sudha Srinivasan, Savitha Krishnamoorthy, P. Sadayappan (Ohio State University, USA) Towards Load Balancing Support for I/O-Intensive Parallel Jobs in a Cluster of Workstations Xiao Qin, Hong Jiang, Yifeng Zhu, David R. Swanson (University of Nebraska-Lincoln, USA) Session 2B (Room B) : Java JavaSplit: A Runtime for Execution of Monolithic Java Programs on Heterogeneous Collections of Commodity Workstations Michael Factor (IBM Research Lab in Haifa, Israel), Assaf Schuster, Konstantin Shagin (Israel Institute of Technology, Israel) Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, Myrinet and SCI Clusters Guillermo L. Taboada, Juan Touri?o, Ramon Doallo (University of A Coruna, Spain) Compiler Optimized Remote Method Invocation Ronald Veldema and Michael Philippsen (University of Erlangen-Nuremberg, Germany) Session 2C (Room C) : Communication I Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters Jarek Nieplocha , V. Tipparaju, M. Krishnan (Pacific Northwest National Laboratory, USA), G. Santhanaraman, D.K. Panda (Ohio State University, USA) Impact of Computational Resource Reservation to the Communication Performance in the Hypercluster Environment Kai Wing Tse and P.K Lun (The Hong Kong Polytechnic University, Hong Kong) Kernel Implementations of Locality-Aware Dispatching Techniques for Web Server Clusters Michele Di Santo, Nadia Ranaldo, Eugenio Zimeo (University of Sannio, Italy) 3:30-4:00 Coffee Break 4:00-4:30 Invited Talk 1 (Room C) : TBA 4:30-5:00 Invited Talk 2 (Room C) : TBA 5:30-7:30 Poster Session (Details Attached at the End) 6:00-7:30 Reception ****************************** Wednesday, December 3 ******************************* 8:30-5:00 Conference Registration 9:00-10:00 Keynote 2 (Room C) TBA 10:00-10:30 Coffee Break 10:30-12:00 Session 3A, 3B, 3C Session 3A (Room A): Middleware OptimalGrid: Middleware for Automatic Deployment of Distributed FEM Problems on an Internet-Based Computing Grid Tobin Lehman and James Kaufman (IBM Almaden Research Center, USA) Adaptive Grid Resource Brokering Abdulla Othman, Peter Dew, Karim Djemame, Iain Gourlay (University of Leeds, UK) HPCM: A Pre-compiler Aided Middleware for the Mobility of Legacy Code Cong Du, Xian-He Sun, Kasidit Chanchio (Illinois Institute of Technology, USA) Session 3B (Room B) : Cluster/Job Management I The Process Management Component of a Scalable Systems Software Environment Ralph Butler (Middle Tennessee State University, USA), Narayan Desai, Andrew Lusk, Ewing Lusk (Argonne National Laboratory,USA) Load Distribution for Heterogeneous and Non-Dedicated Clusters Based on Dynamic Monitoring and Differentiated Services Liria Sato (University of Sao Paulo, Brazil), Hermes Senger(Catholic University of Santos, Brazil) GridRM: An Extensible Resource Monitoring System Mark Baker and Garry Smith (University of Portsmouth, UK) Session 3C (Room C) : I/O I A High Performance Redundancy Scheme for Cluster File Systems Manoj Pillai and Mario Lauria (Ohio State University, USA) VegaFS: A Prototype for File-sharing Crossing Multiple Administrative Domains Wei Li, Jianmin Liang, Zhiwei Xu (Chinese Academy of Sciences, China) Design and Performance of the Dawning Cluster File System Jin Xiong, Sining Wu, Dan Men, Ninghui Sun, Guojie Li (Chinese Academy of Sciences, China) 12:00-1:30 Lunch 1:30-3:00 Session 4A, 4B, Vender Talk 1 Session 4A (Room A) Novel Systems Coordinated Checkpoint versus Message Log for Fault Tolerant MPI Aur?lien Bouteiller, Lemarinier, Krawezik, Cappello (Universit? de Paris Sud, France) A Performance Comparison of Linux and a Lightweight Kernel Ron Brightwell, Rolf Riesen, Keith Underwood (Sandia National Laboratories, USA), Trammell B. Hudson (Operating Systems Research, Inc.), Patrick Bridges, Arthur B. Maccabe (University of New Mexico, USA) Implications of a PIM Architectural Model for MPI Arun Rodrigues, Richard Murphy, Peter Kogge, Jay Brockman (University of Notre Dame, USA), Ron Brightwell, Keith Underwood (Sandia National Laboratories, USA) Session 4B (Room B) Cluster/Job Management II Reusable Mobile Agents for Cluster Computing Ichiro Satoh (National Institute of Informatics, Japan) High Service Reliability For Cluster Server Systems M. Mat Deris, M.Rabiei, A. Noraziah, H.M. Suzuri (University College of Science and Technology, Malaysia) Wide Area Cluster Monitoring with Ganglia Federico D. Sacerdoti, Mason J. Katz (San Diego Supercomputing Center, USA), Matthew L. Massie, David E. Culler (UC Berkeley, USA) Vender Talk 1 (Room C) 3:00-3:30 Coffee Break 3:30-5:00 Panel Discussion 6:30-8:30 Banquet Dinner (Ballroom, Conference Hotel) **************************** Thursday, December 4 **************************** 8:30-5:00 Conference Registration Special Technical Session : Dec. 4 (9am - 4:30pm) Grid Demo - Life Demonstrations of Grid Technologies and Applications Session Chairs: Peter Kacsuk (MTA SZTAKI Research Institute, Hungary), Rajkumar Buyya (University of Melbourne, Australia) 9:00-10:00 Keynote 3 (Room C) 10:00-10:30 Coffee Break 10:30-12:00 Vender Talk 2, 5B, 5C Vender Talk 2 (Room A) Session 5B (Room B) : Novel Software Efficient Parallel Out-of-core Matrix Transposition Sriram Krishnamoorthy, Gerald Baumgartner, Daniel Cociorva, Chi-Chung Lam, P Sadayappan (Ohio State University, USA) A Case Study of Parallel I/O for Biological Sequence Search on Linux Clusters Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson (University of Nebraska-Lincoln, USA) CTFS: A New Light-weight, Cooperative Temporary File System for Cluster-based Web Server Jun Wang (University of Nebraska-Lincoln, USA) Session 5C (Room C) I/O II Efficient Structured Data Access in Parallel File Systems Avery Ching, Alok Choudhary, Wei-keng Liao (Northwestern University, USA), Robert Ross, William Gropp (Argonne National Laboratory, USA) View I/O: Improving the Performance of Non-contiguous I/O Florin Isaila and Walter F. Tichy (University of Karlsruhe, Germany) Supporting Efficient Noncontiguous Access in PVFS over InfiniBand Jiesheng Wu (Ohio State University), Pete Wyckoff (Ohio Supercomputer Center, USA), D.K. Panda (Ohio State University, USA) 12:00-2:00 Lunch 2:00-2:30 Invited Talk 3 (Room C) 2:30-3:00 Invited Talk 4 (Room C) 3:00-3:30 Coffee Break 3:30-5:00 : Session 6A, 6B, 6C Session 6A (Room A) : Scheduling III A General Self-adaptive Task Scheduling System for Non-dedicated Heterogeneous Computing Ming Wu and Xian-He Sun (Illinois Institute of Technology, USA) Adding Memory Resource Consideration into Workload Distribution for Software DSM Systems Yen-Tso Liu, Ce-Kuen Shieh (National Chung Kung University, Taiwan), Tyng-Yeu Liang (National Kaohsiung University of Applied Sciences, Taiwan) An Energy-Based Implicit Co-scheduling Model for Beowulf Cluster Somsak Sriprayoonsakul and Putchong Uthayopas (Kasetsart University, Thailand) Session 6B (Room B) : High Availability Availability Prediction and Modeling of High Availability OSCAR Cluster Lixin Shen, Chokchai Leangsuksun, Tong Liu, Hertong Song (Louisiana Tech University, USA), Stephen L. Scott (Oak Ridge National Laboratory, USA) A System Recovery Benchmark for Clusters Ira Pramanick, James Mauro, Ji Zhu (Sun Microsystems, Inc., USA) Performance Evaluation of Routing Algorithms in RHiNET-2 Cluster Michihiro Koibuchi, Konosuke Watanabe, Kenichi Kono, Akiya Jouraku, Hideharu Amano (Keio University, Japan) Session 6C (Room C) : Communications II Application-Bypass Reduction for Large-Scale Clusters Adam Wagner, Darius Buntias, D.K. Panda (Ohio State University, USA), Ron Brightwell (Sandia National Laboratories, USA) Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost Surendra Byna (Illinois Institute of Technology, USA), William Gropp (Argonne National Laboratory, USA), Xian-He Sun (Illinois Institute of Technology, USA), Rajeev Thakur (Argonne National Laboratory, USA) Shared Memory Mirroring for Reducing Communication Overhead on Commodity Networks Jarek Nieplocha, B. Palmer, E. Apra (Pacific Northwest National Laboratory, USA) ************************************* 5:00 : End of the Conference ************************************* ------------------------------------------------------------------- Poster Session/Short Papers "Plug-and-Play" Cluster Computing using Mac OS X Dean Dauger (Dauger Research, Inc.) and Viktor K. Decyk (UC Los Angeles, USA) Improving Performance of a Dynamic Load Balancing System by Using Number of Effective Tasks Min Choi, Jung-Lok Yu, Seung-Ryoul Maeng (Korea Advanced Institute of Science and Technology, Korea) Dynamic Self-Adaptive Replica Location Method in Data Grids Dongsheng Li, Nong Xiao, Xicheng Lu, Kai Lu, Yijie Wang (National University of Defense Technology, China) Efficient I/O Caching in Data Grid and Cluster Management Song Jiang (College of William and Mary, USA), Xiaodong Zhang (National Science Foundation, USA) Optimized Implementation of Extendible Hashing to Support Large File System Directory Rongfeng Tang, Dan Mend, Sining Wu (Chinese Academy of Sciences, China) Parallel Design Pattern for Computational Biology and Scientific Computing Applications Weiguo Liu and Bertil Schmidt (Nanyang Technological University, Singapore) FJM: A High Performance Java Message Library Tsun-Yu Hsiao, Ming-Chun Cheng, Hsin-Ta Chiao, Shyan-Ming Yuan (National Chiao Tung University, Taiwan) Cluster Architecture with Lightweighted Redundant TCP Stacks Hai Jin and Zhiyuan Shao (Huazhong University of Science and Technology, China) >From Clusters to the Fabric: The Job Management Perspective Thomas R?oblitz, Florian Schintke, Alexander Reinefeld (Zuse Institute Berlin, Germany) Towards an Efficient Cluster-based E-Commerce Server Victoria Ungureanu, Benjamin Melamed, Michael Katehakis (Rutgers University, USA) A Kernel Running in a DSM - Design Aspects of a Distributed Operating System Ralph Goeckelmann, Michael Schoettner, Stefan Frenz, Peter Schulthess (University of Ulm, Germany) Distributed Recursive Sets: Programmability and Effectiveness for Data Intensive Applications Roxana Diaconescu (UC Irvine, USA) Run-Time Prediction of Parallel Applications on Shared Environment Byoung-Dai Lee (University of Minnesota, USA), Jennifer M. Schopf (Argonne National Laboratory, USA) An Instance-Oriented Security Mechanism in Grid-based Mobile Agent System Tianchi Ma and Shanping Li (Zhejiang University, China) A Hierarchical and Distributed Approach for Mapping Large Applications to Heterogeneous Grids using Genetic Algorithms Soumya Sanyal, Amit Jain, Sajal Das (University of Texas at Arlington, USA), Rupak Biswas (NASA Ames Research Center, USA) BCFG: A Configuration Management Tool for Heterogeneous Clusters Narayan Desai, Andrew Lusk, Rick Bradshaw, Remy Evard (Argonne National Laboratory, USA) Communication Middleware Systems for Heterogenous Clusters: A Comparative Study Daniel Balkanski, Mario Trams, Wolfgang Rehm (Technische Universita Chemnitz, Germany) QoS-Aware Adaptive Resource Management in Distributed Multimedia System Using Server Clusters Mohammad Riaz Moghal, Mohammad Saleem Mian (University of Engineering and Technology, Pakistan) On the InfiniBand Subnet Discovery Process Aurelio Berm?dez, Rafael Casado, Francisco J. Quiles (Universidad de Castilla-La Mancha, Spain), Timothy M. Pinkston (University of Southern California, USA), Jos? Duato (Universidad Polit?cnica de Valencia, Spain) -------------------------------------------------------------- Chairs/Committees General Co-Chairs Jack Dongarra (University of Tennessee) Lionel Ni (Hong Kong University of Science and Technology) General Vice Chair Francis C.M. Lau (The University of Hong Kong) Program Co-Chairs Daniel S. Katz (Jet Propulsion Laboratory) Cho-Li Wang (The University of Hong Kong) Program Vice Chairs Bill Gropp (Argonne National Laboratory) -- Middleware Wolfgang Rehm (Technische Universit?t Chemnitz) -- Hardware Zhiwei Xu (Chinese Academy of Sciences, China) -- Applications Tutorials Chair Ira Pramanick (Sun Microsystems) Workshops Chair Jiannong Cao (Hong Kong Polytechnic University) Exhibits/Sponsors Chairs Jim Ang (Sandia National Lab) Nam Ng (The University of Hong Kong) Publications Chair Rajkumar Buyya (The University of Melbourne) Publicity Chair Arthur B. Maccabe (The University of New Mexico) Poster Chair Putchong Uthayopas (Kasetsart University) Finance/Registration Chair Alvin Chan (Hong Kong Polytechnic University) Local Arrangements Chair Anthony T.C. Tam (The University of Hong Kong) Programme Committee David Abramson (Monash U., Australia) Gabrielle Allen (Albert Einstein Institute, Germany) David A. Bader (U. of New Mexico, USA) Mark Baker (U. of Portsmouth, UK) Ron Brightwell (Sandia National Laboratory USA) Rajkumar Buyya (U. of Melbourne, Australia) Giovanni Chiola (Universita' di Genova Genova, Italy) Sang-Hwa Chung (Pusan National U., Korea) Toni Cortes (Universitat Politecnica de Catalunya, Spain) Al Geist (Oak Ridge National Laboratory, USA) Patrick Geoffray (Myricom Inc., USA) Yutaka Ishikawa (U. of Tokyo, Japan) Chung-Ta King (National Tsing Hua U., Taiwan) Tomohiro Kudoh (AIST, Japan) Ewing Lusk (Argonne National Laboratory, USA) Jens Mache (Lewis and Clark College, USA) Phillip Merkey (Michigan Tech U., USA) Matt Mutka (Michigan State U., USA) Charles D. Norton (JPL, California Institute of Technology, USA) D.K. Panda (Ohio State U., USA) Philip Papadopoulos (UC San Diego, USA) Myong-Soon Park (Korea U., Korea) Neil Pundit (Sandia National Laboratory, USA) Thomas Rauber (U. Bayreuth, Germany) Alexander Reinefeld (ZIB, Germany) Rob Ross (Argonne National Laboratory, USA) Gudula Ruenger (Chemnitz U. of Technology, Germany) Jennifer Schopf (Argonne National Laboratory, USA) Peter Sloot (U. of Amsterdam, Netherlands) Thomas Stricker (Institut fur Computersysteme, Switzerland) Ninghui Sun (Chinese Academy of Sciences, China) Xian-He Sun (Illinois Institute of Technology, USA) Rajeev Thakur (Argonne National Laboratory, USA) Putchong Uthayopas (Kasetsart U., Thailand) David Walker (U. of Wales Cardiff, UK) Xiaodong Zhang (NSF, USA) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Florent.Calvayrac at univ-lemans.fr Mon Oct 6 11:54:09 2003 From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac) Date: Mon, 06 Oct 2003 17:54:09 +0200 Subject: undefined references to pthread related calls In-Reply-To: <3F817693.9040001@larc.nasa.gov> References: <3F817693.9040001@larc.nasa.gov> Message-ID: <3F819021.2050605@univ-lemans.fr> Jeffery A. White wrote: > Hi group, > > I have a user of my software (a f90 based CFD code using mpich) that is > haveing trouble installing my code on > their system. They are using mpich and the Intel version 7.1 ifc > compiler. The problem occurs at the link step. > They are getting undefined references to what appear to be system calls > to pthread related functions such as > pthread_self, pthread_equal, pthread_mutex_lock. Does any one else > encountered and know how to fix this problem? > > Thanks, > > Jeff > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > is the compiler installed on a Redhat 8.0 ? Besides, maybe they use OpenMP/HPF directives and options which can mess up things and are usually useless on a cluster with one CPU per node. -- Florent Calvayrac | Tel : 02 43 83 26 26 Laboratoire de Physique de l'Etat Condense | Fax : 02 43 83 35 18 UMR-CNRS 6087 | http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences | 72085 Le Mans Cedex 9 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Oct 6 12:56:37 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 6 Oct 2003 12:56:37 -0400 (EDT) Subject: Help: About Intel Fortran Compiler: In-Reply-To: Message-ID: On Mon, 6 Oct 2003, Franz Marini wrote: > On Sat, 4 Oct 2003, Ao Jiang wrote: > > > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 > > Try with : > > ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 > > Btw, a cleaner way to compile mpi programs is to use the mpif90 > (mpif77 for fortran77) command (which is a wrapper for the real > compiler). Acckkk!! This is one of the horribly broken things about most MPI implementations. It's not reasonable to say "to use this library you must use our magic compile script" A MPI library should be just that -- a library conforming to system standards. You should be able to link it with just "-lmpi". Most of the Fortran underscore issues may be hidden from the user with weak linker aliases. Similarly, it's not reasonable to say "to run this program, you must use our magic script" You should be able to just run the program, by name, in the usual way. Our BeoMPI implementation demonstrated how to do it right many years ago, and we provided the code back to the community. Many people on this list seem to take the attitude "I've already learned the crufty way, therefore the improvements don't matter." One element of a high-quality library is ease of use, and in the long run that matters more than a few percent faster for a specific function call. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Mon Oct 6 13:24:57 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Mon, 06 Oct 2003 10:24:57 -0700 Subject: Intel compilers and libraries In-Reply-To: <20031006112102.GC15837@sgirmn.pluri.ucm.es> References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> Message-ID: <3F81A569.80805@cert.ucr.edu> Jos? M. P?rez S?nchez wrote: >Hello: > >We are thinking about purchasing the Intel C++ compiler for linux, >mainly for getting the most of our harware (Xeon 2.4Gz processors), we >are also interested in the Intel MKL (Math Kernel Library), I would like >to know if the performance gain using Intel compiler+libraries, which exploit >SSE2 and make other optimizations for P4/Xeon, are as good as Intel >claims, anyone in the list using those products? > You realize that there's a free version of the Intel compiler for Linux right? Anyway, our experience with their Fortran compiler has been that it's roughly on par with the Portland Group's compiler. However, if Pentium 4 optimizations are turned on, the code produced by the Intel compiler runs just a little bit faster. Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Wolfgang.Dobler at kis.uni-freiburg.de Mon Oct 6 12:50:06 2003 From: Wolfgang.Dobler at kis.uni-freiburg.de (Wolfgang Dobler) Date: Mon, 6 Oct 2003 18:50:06 +0200 Subject: Beowulf digest, Vol 1 #1482 - 2 msgs In-Reply-To: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com> References: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com> Message-ID: <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de> Hi Ao, > I tried to compile a Fortran 90 MPI program by > the Intel Frotran Compiler in the OSCAR cluster. > I run the command: > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 > > The system failed to compile it and gave me the following information: > 3228 Lines Compiled > /tmp/ifcVao851.o(.text+0x5a): In function `main': > : undefined reference to `mpi_init_' [...] > I wonder > 1. why the errors happen? > 2. Is the problem of cluster or the Intel compiler? Looks like the infamous underscore problem. You have a library (libmpi.so or libmpi.a) that has been built using the GNU F77 compiler without the option `-fno-second-underscore' and accordingly the MPI symbols are called `mpi_init__', not `mpi_init_', etc. But the Intel compiler (and all other non-G77 compilers) expects a symbol with only one underscore appended ( `mpi_init_'), but that one is not in the library. > 3. How I can solve it. The way out is to either rebuild the library, compiling with `g77 -fno-second-underscore' or with the Intel compiler, or (the less elegant choice) to refer to the MPI functions with one underscore in you F90 code: call MPI_INIT_(ierr) There is one related question I want to ask the ld-specialists on the list: On some machines libraries like MPICH contain all symbol names with both underscore conventions, i.e. `mpi_init__', and `mpi_init_' at the same time. Does anybody know whether there are easy ways of building such a library? Is there something like `symbol aliases' and how would one create these when generating the library? W o l f g a n g _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From j.a.white at larc.nasa.gov Mon Oct 6 12:56:56 2003 From: j.a.white at larc.nasa.gov (Jeffery A. White) Date: Mon, 06 Oct 2003 12:56:56 -0400 Subject: undefined references to pthread related calls Message-ID: <3F819ED8.9020002@larc.nasa.gov> Group, Thanks for your responses. Turns out that the problem appears to be an incompatiblilty between ifc 7.1 and the glibc version in the version of RH 8.0 being used. The RH 8.0 being used had some patches that updated glibc. I was able to fix it by removing the -static option when compling with ifc. I have tested this with a patch free version of 8.0 and I don't see the problem wit or without the -static option specified. At runtime my code does not use any calls that seem to access pthread related system routines. I am guessing that by deferring reolution of the link until runtime I have bypassed the problem. Obviously if I did use routines that needed pthread related code I would still have a problem so this isn't a general fix. Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Oct 6 13:29:09 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 6 Oct 2003 13:29:09 -0400 (EDT) Subject: Filesystem question (sort of newbie) In-Reply-To: Message-ID: On Fri, 3 Oct 2003, Mark Hahn wrote: > > 8 or but I haven't tested it), but it boots into a small Ram > > Disk (about 70 megs depending upon what you need on the For the Scyld Beowulf system we developed a more sophisticated "diskless administrative" approach that has better scaling and more predictable performance. We cache executable objects, libraries and executables, using an method that works unchanged with either Ramdisk (==tmpfs) or local disk cache. Keep in mind that this is just one element of making a cluster system scalable and easy to manage. Using a workstation-oriented distribution as the software base for compute nodes means generating many different kinds of configuration files, and dealing with the scheduling impact of the various daemons. > alternately, it's almost trivial to PXE boot nodes, mount a simple > root FS from a server/master, and use the local disk, if any, for > swap and/or tmp. one nice thing about this is that you can do it > with any distribution you like - mine's RH8, for instance. The obvious problems are configuration, scaling and update consistency issues. > personally, I prefer the nfs-root approach, probably because once > you boot, you won't be wasting any ram with boot-only files. They are trivial to get rid of either by explicitly erasing or switching to a new ramdisk (e.g. our old stage 3) when initialization completes. > for a cluster of 48 nodes, there seems to be no drawback; > for a much larger cluster, I expect all the boot-time traffic > would be crippling, and you might want to use some kind of > multicast to distribute a ramdisk image just once... Multicast bulk data transfer was a good idea back when we had Ethernet repeaters. Today it should only be used for service discovery and low-rate status updates. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Oct 6 14:02:39 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 06 Oct 2003 14:02:39 -0400 Subject: Help: About Intel Fortran Compiler: In-Reply-To: References: Message-ID: <3F81AE3F.9050202@lmco.com> Donald Becker wrote: > On Mon, 6 Oct 2003, Franz Marini wrote: > > > On Sat, 4 Oct 2003, Ao Jiang wrote: > > > > > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 > p_fdtd3dwg3_pml.f90 > > > > Try with : > > > > ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 > > > > Btw, a cleaner way to compile mpi programs is to use the mpif90 > > (mpif77 for fortran77) command (which is a wrapper for the real > > compiler). > > Acckkk!! > > This is one of the horribly broken things about most MPI > implementations. It's not reasonable to say > "to use this library you must use our magic compile script" > A MPI library should be just that -- a library conforming to system > standards. You should be able to link it with just "-lmpi". > I don't like the mpi compiler helper scripts much either. I just want a simple makefile or a list of the libraries to link in in the correct order. I usually end up reading the helper scripts and pulling out the library order and putting it in my makefiles anyway (no offense to anyone). However, in defense of the different MPI implementations, they have somewhat different philosophies on how to get the best performance and ease of use. Sometimes this involves other libraries. Just telling the user to add '-lmpi' to the end of their link command may not tell them everything (e.g. they may need to add the pthreads library, or libdl or whatever). > One element of a high-quality library is ease of use, and in the long > run that matters more than a few percent faster for a specific function > call. > One piece of data. While we haven't looked at specific MPI calls, we have noticed up to about a 30% difference in wall clock time with our codes between the various MPI implementations using the same system (same nodes, same code, same input, same network, same nodes, etc.). I'm all for that kind of performance boost even if it's a little more cumbersome to compile/link/run (although one's mileage may vary depending upon the code) Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From franz.marini at mi.infn.it Mon Oct 6 14:55:57 2003 From: franz.marini at mi.infn.it (Franz Marini) Date: Mon, 6 Oct 2003 20:55:57 +0200 (CEST) Subject: Help: About Intel Fortran Compiler: In-Reply-To: References: Message-ID: On Mon, 6 Oct 2003, Donald Becker wrote: > Acckkk!! Ok, I shouldn't have said "a cleaner way". I don't usually use mpif77 (or f90) to compile programs requiring mpi libs, in fact. I prefer to explicitly tell the compiler which library I want, and where to find them. Btw, this is much simpler and faster if you have multiple versions/releases of the same library. Anyway, clean or not, elegant or not, mpif77 should (and I say should) work. Btw, I still can't understand why the hell each fortran compiler uses a different way to treat underscores. This, and another thousands of reasons make me hate fortran. (erm, please, this is a *personal* pov, let's not start another flame/discussion on the fortran vs issue ;)). Have a nice day, F. --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it phone : +39 02 50317221 --------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Mon Oct 6 17:18:07 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Mon, 6 Oct 2003 14:18:07 -0700 Subject: Intel compilers and libraries In-Reply-To: References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> Message-ID: <20031006211806.GB2091@greglaptop.internal.keyresearch.com> On Mon, Oct 06, 2003 at 02:54:43PM +0000, C J Kenneth Tan -- Heuchera Technologies wrote: > Pardon me for some advertising here, but our OptimaNumerics Linear > Algebra Library can very significantly outperform Intel MKL. Kenneth, Welcome to the beowulf mailing list. Here are some helpful suggestions: 1) Don't top post. Answer postings like I do here, by quoting the relevant part of the posting you're replying to. 2) Don't include an 8-line confidentiality notice in a posting to a public, archived mailing list, distributed all over the world. 3) Marketing slogans and paragraphs with several !s don't work so well here. More sophisticated customers aren't drawn by a claim of a 32x performance advantage without knowing what is being measured. Is it a 100x100 matrix LU decomposition? Well, no, because Intel's MKL and the free ATLAS library run at a respectable % of peak. Is it on a 1000 point FFT? Well, no, because the free FFTW library runs at a respectable % of peak on that. 4) Put your performance whitepapers on your website, or it looks fishy. I looked and didn't see a single performance claim there. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Mon Oct 6 18:41:34 2003 From: ds10025 at cam.ac.uk (D. Scott) Date: 06 Oct 2003 23:41:34 +0100 Subject: Root-nfs error 13 while mounting Message-ID: Evening I'm getting error 13 when my diskless client try to mount file system. Hoe best to resolved this error 13? Dan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Oct 6 23:55:13 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 6 Oct 2003 23:55:13 -0400 (EDT) Subject: Root-nfs error 13 while mounting In-Reply-To: Message-ID: > I'm getting error 13 when my diskless client try to mount file system. Hoe > best to resolved this error 13? it's best resolved by translating it to text: EACCESS or "permission denied". I'm guessing you should look at the logs on your fileserver, since it seems to be rejecting your clients. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Tue Oct 7 05:17:36 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Tue, 7 Oct 2003 11:17:36 +0200 Subject: weak symbols [Re: Beowulf digest, Vol 1 #1482 - 2 msgs] In-Reply-To: <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de> References: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com> <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de> Message-ID: <200310071117.36507.joachim@ccrl-nece.de> Wolfgang Dobler: > On some machines libraries like MPICH contain all symbol names with both > underscore conventions, i.e. `mpi_init__', and `mpi_init_' at the same > time. Does anybody know whether there are easy ways of building such a > library? Is there something like `symbol aliases' and how would one create > these when generating the library? Yes, most linkers support "weak symbols" in one way or another (there is no common way, usually a pragma or "function attributes" (gcc) are used) which supply all required API symbols for the one real implemented function. Just take a look at a source file like mpich/src/pt2pt/send.c to see how this can be done (some preprocessing "magic"). It can also be done w/o weak symbols at the cost of a slightly bigger library. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Tue Oct 7 07:07:19 2003 From: jcownie at etnus.com (James Cownie) Date: Tue, 07 Oct 2003 12:07:19 +0100 Subject: more on structural models for clusters Message-ID: <1A6pgV-16F-00@etnus.com> Jim, > Your average Dell isn't suited to inclusion as a MCU core in an ASIC > at each node and would cost more than $10/node... I'm looking at > Z80/6502/low end DSP kinds of computational capability in a mesh > containing, say, 100,000 nodes. Have you seen this gizmo ? (It's just so cute I had to pass it on :-) http://www.lantronix.com/products/eds/xport/ It's a 48MHz x86 with 256KB of SRAM and 512KB of flash, a 10/100Mb ethernet interface an RS232 and three bits of digital I/O and it all fits _inside_ an RJ45 socket. It comes loaded up with a web server and so on. It's on sale here in the UK for GBP 39 + VAT one off, so should come down somewhere near the price you mention above for your 100,000 off in the US. (It might also be useful to the folks who want to build their own environmental monitoring. Couple one of these up to the serial interconnect on a temperature monitoring button and you'd immediately be able to access it from the net). -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Tue Oct 7 08:18:16 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Tue, 7 Oct 2003 09:18:16 -0300 (ART) Subject: Tools for debuging Message-ID: <20031007121816.67790.qmail@web12208.mail.yahoo.com> I'm having problems with a prograa, and i really need a tool for debug it. There's specific debugers for mpi programas, if have more than one, what is the best choice? Thanks ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - o melhor webmail do Brasil http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nican at nsc.liu.se Tue Oct 7 07:50:26 2003 From: nican at nsc.liu.se (Niclas Andersson) Date: Tue, 7 Oct 2003 13:50:26 +0200 (CEST) Subject: CALL FOR PARTICIPATION: Workshop on Linux Clusters for Super Computing Message-ID: CALL FOR PARTICIPATION ================================================================ 4th Annual Workshop on Linux Clusters For Super Computing (LCSC) Clusters for High Performance Computing and Grid Solutions 22-24 October, 2003 Hosted by National Supercomputer Centre (NSC) Link?ping University, SWEDEN ================================================================ The programme is in its final state. The workshop is brimful of knowledgeable speakers giving exciting talks about Linux clusters, grids and distributed applications requiring vast computational resources. Just a few samples: - Keynote: Andrew Grimshaw, University of Virginia and CTO of Avaki Inc. - Comparisons of Linux clusters with the Red Storm MPP William J. Camp, Project Leader of Red Storm, Sandia National Laboratories - The EGEE project: building a grid infrastructure for Europe Bob Jones, EGEE Technical Director, CERN - Linux on modern NUMA architectures Jes Sorensen, Wild Open Source Inc. - The AMANDA Neutrino Telescope Stephan Hundertmark, Stockholm University and many more. In addition to invited speakers there will be vendor presentations, exhibitions and tutorials. Last date for registration: October 10. For more information and registration: http://www.nsc.liu.se/lcsc _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From keith.murphy at attglobal.net Tue Oct 7 11:28:50 2003 From: keith.murphy at attglobal.net (Keith Murphy) Date: Tue, 7 Oct 2003 08:28:50 -0700 Subject: Tools for debuging References: <20031007121816.67790.qmail@web12208.mail.yahoo.com> Message-ID: <025701c38ce7$b5b64060$02fea8c0@oemcomputer> Check out Etnus's Totalview parallel debugger www.etnus.com Keith Murphy Dolphin Interconnect C: 818-292-5100 T: 818-597-2114 F: 818-597-2119 www.dolphinics.com ----- Original Message ----- From: "Mathias Brito" To: Sent: Tuesday, October 07, 2003 5:18 AM Subject: Tools for debuging > I'm having problems with a prograa, and i really need > a tool for debug it. There's specific debugers for mpi > programas, if have more than one, what is the best > choice? > > Thanks > > > ===== > Mathias Brito > Universidade Estadual de Santa Cruz - UESC > Departamento de Ci?ncias Exatas e Tecnol?gicas > Estudante do Curso de Ci?ncia da Computa??o > > Yahoo! Mail - o melhor webmail do Brasil > http://mail.yahoo.com.br > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rokrau at yahoo.com Tue Oct 7 13:30:19 2003 From: rokrau at yahoo.com (Roland Krause) Date: Tue, 7 Oct 2003 10:30:19 -0700 (PDT) Subject: undefined references to pthread related calls In-Reply-To: <200310071605.h97G5FV13188@NewBlue.Scyld.com.scyld.com> Message-ID: <20031007173019.8519.qmail@web40010.mail.yahoo.com> --- beowulf-request at scyld.com wrote: > 2. undefined references to pthread related calls (Jeffery A. > White) FYI, Intel has released a version of their compiler that fixes the link problem for applications that use OpenMP. Intel Fortran now supports glibc-2.3.2 which is used in RH-9 and Suse-8.2. The old compatibility hacks have become obsolete at least. I hear Intel-8 is in beta, anyone have experience with it? Roland > Subject: undefined references to pthread related calls > > Group, > > Thanks for your responses. Turns out that the problem appears to > be > an incompatiblilty between ifc 7.1 and the glibc version > in the version of RH 8.0 being used. The RH 8.0 being used had some > patches that updated glibc. I was able to fix it by removing > the -static option when compling with ifc. I have tested this with a > patch free version of 8.0 and I don't see the problem wit or without > the -static option specified. At runtime my code does not use any > calls > that seem to access pthread related system routines. I am > guessing that by deferring reolution of the link until runtime I > have > bypassed the problem. Obviously if I did use routines that > needed pthread related code I would still have a problem so this > isn't a > general fix. > > Jeff > > __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Oct 7 15:28:42 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 07 Oct 2003 15:28:42 -0400 Subject: updated cluster finishing script system Message-ID: <1065554922.32374.47.camel@protein.scalableinformatics.com> Folks: I updated my cluster finishing script package. This package allows you to perform post-installation configuration changes (e.g. finishing) for an RPM based cluster which maintains image state on local disks. It used to be specialized to the ROCKS distribution, but it has evolved significantly and should work with generalized RPM based distributions. Major changes: 1) No RPMs are distributed (this is a good thing, read on) 2) a build script generates customized RPMs for you after asking you 4 questions. (please, no jokes about unladen swallows, neither European nor African...) These RPMs allow you to customize the finishing server and the finishing script client as you require for your task. This includes choosing the server's IP address (used to be hard-coded to 10.1.1.1), the server's export directory (used to be hard-coded to /opt/finishing), the cluster's network (used to be hard-coded to 10.0.0.0), and the cluster's netmask (used to be hard-coded to 255.0.0.0). 3) Documentation (see below) Have a look at http://scalableinformatics.com/finishing/ for more details, including new/better instructions. It is licensed under the GPL for end users. Contact us offline if you want to talk about redistribution licenses. Joe -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Oct 7 19:50:38 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 8 Oct 2003 09:50:38 +1000 Subject: updated cluster finishing script system In-Reply-To: <1065554922.32374.47.camel@protein.scalableinformatics.com> References: <1065554922.32374.47.camel@protein.scalableinformatics.com> Message-ID: <200310080950.41343.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 8 Oct 2003 05:28 am, Joseph Landman wrote: > It is licensed under the GPL for end users. Contact us offline if you want > to talk about redistribution licenses. Err, if it's licensed under the GPL then the "end users" who receive it under that license can redistribute it themselves under the GPL. Part 6 of the GPL v2 says: [quote] 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. [...] [/quote] Of course as the copyright holder you could also do dual licensing, so I guess this is what you mean - correct ? But whichever it is, once you have released something under the GPL you cannot prevent others from redistributing it under the GPL themselves. cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/g1FOO2KABBYQAh8RAu1oAJ0fLlcljVYwXj7xgnkjGFyNaoWOFwCfWM/r IC1/xPLO2ePGM2zlJF2ZHK8= =HOnr -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajiang at mail.eecis.udel.edu Tue Oct 7 21:58:21 2003 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Tue, 7 Oct 2003 21:58:21 -0400 (EDT) Subject: Still about the MPICH and Intel Fortran Compiler: In-Reply-To: Message-ID: Hi, First, I want to thank all of you for the answers and suggestions for my question last time. ( Last time, I tried: " ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 " The system failed to compile it and gave me the following information: " module EHFIELD program FDTD3DPML external function RISEF 3228 Lines Compiled /tmp/ifcVao851.o(.text+0x5a): In function `main': : undefined reference to `mpi_init_' . . . ) Most of friends suggest me to use '-lmpi', instead of '-Lmpi', I tried it, the system gave me the following error: " ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 module EHFIELD program FDTD3DPML external function RISEF external subroutine COM_HZY 3228 Lines Compiled ld: cannot find -lmpi " either does '-lmpif', although there exist 'mpif.h' and 'mpi.h' in the directory '/opt/mpich-1.2.5/include'. I also tried the command: " /opt/mpich-1.2.5/bin/mpif90 -w -o p_wg3 p_fdtd3dwg3_pml.f90 " The system gave the error: " 3228 Lines Compiled /opt/intel/compiler70/ia32/lib/libIEPCF90.a(f90fioerr.o)(.text+0x4d3): In function `f_f77ioerr': : undefined reference to `__ctype_b' " In fact, I don't know what this error means. Of course, I don't know how to slove it either. Tom _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Oct 7 22:23:04 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 8 Oct 2003 12:23:04 +1000 Subject: updated cluster finishing script system In-Reply-To: <1065579065.32368.134.camel@protein.scalableinformatics.com> References: <1065554922.32374.47.camel@protein.scalableinformatics.com> <200310080950.41343.csamuel@vpac.org> <1065579065.32368.134.camel@protein.scalableinformatics.com> Message-ID: <200310081223.05966.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 8 Oct 2003 12:11 pm, Joseph Landman wrote: > Thanks for catching the wording error. No worries, I wasn't intending to be pedantic, just curious. :-) - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/g3UIO2KABBYQAh8RAgSkAJ48X7RY3ABNnYa2DlQ0z0vHfinaxACfdsMk hIZqsuVLevZqp2OBtfAafEs= =2vpF -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Oct 7 22:11:05 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 07 Oct 2003 22:11:05 -0400 Subject: updated cluster finishing script system In-Reply-To: <200310080950.41343.csamuel@vpac.org> References: <1065554922.32374.47.camel@protein.scalableinformatics.com> <200310080950.41343.csamuel@vpac.org> Message-ID: <1065579065.32368.134.camel@protein.scalableinformatics.com> On Tue, 2003-10-07 at 19:50, Chris Samuel wrote: > On Wed, 8 Oct 2003 05:28 am, Joseph Landman wrote: > > > It is licensed under the GPL for end users. Contact us offline if you want > > to talk about redistribution licenses. > > Err, if it's licensed under the GPL then the "end users" who receive it under > that license can redistribute it themselves under the GPL. Part 6 of the GPL > v2 says: ... > Of course as the copyright holder you could also do dual licensing, so I guess > this is what you mean - correct ? Commercial redistribution ala the MySQL form of license. You are correct, it was a mis-wording on my part. Basically if someone decides to turn this into a commercial product (ok, stop laughing...), or wants support, or a warranty, then they need to speak with us first. As the package is mostly source code, make files and scripts, it seems odd to consider distributing it any other way. More to the point, there are some things that should be free (Libre and beer, though some keep asking me where the free beer is). Stuff like this should be free (as in Libre). RGB and I had a conversation about this I think... . I leave it to others to supply the beer. > But whichever it is, once you have released something under the GPL you cannot > prevent others from redistributing it under the GPL themselves. ... which I don't want to hinder (redistribution under GPL), rather I want to encourage ... Thanks for catching the wording error. -- Joseph Landman _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Wed Oct 8 13:16:03 2003 From: ctierney at hpti.com (Craig Tierney) Date: 08 Oct 2003 11:16:03 -0600 Subject: Still about the MPICH and Intel Fortran Compiler: In-Reply-To: References: Message-ID: <1065633362.22256.8.camel@woody> On Tue, 2003-10-07 at 19:58, Ao Jiang wrote: > Hi, > First, I want to thank all of you for the answers and suggestions > for my question last time. > ( > Last time, I tried: > " > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 > " > The system failed to compile it and gave me the following information: > " > module EHFIELD > program FDTD3DPML > external function RISEF > > 3228 Lines Compiled > /tmp/ifcVao851.o(.text+0x5a): In function `main': > : undefined reference to `mpi_init_' The option -L specifies the path for libraries. The option -l specifies the library to link. Your command should be: ifc -I/opt/mpich-1.2.5/include -L/opt/mpich-1.2.5/lib -lmpi -w -lm -o p_wg3 p_fdtd3dwg3_pml.f90 Craig > . > . > . > ) > > Most of friends suggest me to use '-lmpi', instead of '-Lmpi', I tried > it, the system gave me the following error: > " > ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 > > module EHFIELD > program FDTD3DPML > external function RISEF > external subroutine COM_HZY > > 3228 Lines Compiled > ld: cannot find -lmpi > " > either does '-lmpif', although there exist 'mpif.h' and 'mpi.h' in the > directory '/opt/mpich-1.2.5/include'. > > I also tried the command: > " > /opt/mpich-1.2.5/bin/mpif90 -w -o p_wg3 p_fdtd3dwg3_pml.f90 > " > > The system gave the error: > " > 3228 Lines Compiled > /opt/intel/compiler70/ia32/lib/libIEPCF90.a(f90fioerr.o)(.text+0x4d3): In > function `f_f77ioerr': > : undefined reference to `__ctype_b' > " > > In fact, I don't know what this error means. Of course, I don't know > how to slove it either. > > Tom > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Wed Oct 8 14:02:17 2003 From: ds10025 at cam.ac.uk (D. Scott) Date: 08 Oct 2003 19:02:17 +0100 Subject: Root-nfs error 13 while mounting Message-ID: Evening Have resolved the problem. It was due to setting in dhcpd.conf it require option root-path pointing to root path of the node. I get another error. When diskless node boot up it can not find init file. Also, what is min files is transfer to /tftfpboot/node/? Dan On Oct 7 2003, Mark Hahn wrote: > > I'm getting error 13 when my diskless client try to mount file system. > > Hoe best to resolved this error 13? > > it's best resolved by translating it to text: EACCESS or "permission > denied". I'm guessing you should look at the logs on your fileserver, > since it seems to be rejecting your clients. > > > _______________________________________________ Beowulf mailing list, > Beowulf at beowulf.org To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Wed Oct 8 16:50:34 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed, 8 Oct 2003 13:50:34 -0700 (PDT) Subject: building a RAID system In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org> Message-ID: the 3ware 6000 7000 and 7006 cards are all gone from the marketplace the cards you want to look at are the 3ware 7506 (parallel ata) or the 3ware 8506 (serial ata). the 2400 was never seriously in the running for us because it only supports 4 drives. joelja On Wed, 8 Oct 2003, Daniel Fernandez wrote: > Hi, > > I would like to know some advice about what kind of technology apply > into a RAID file server ( through NFS ) . We started choosing hardware > RAID to reduce cpu usage. > > We have two options , SCSI RAID and ATA RAID. The first would give the > best results but on the other hand becomes really expensive so we have > in mind two ATA RAID controllers: > > Adaptec 2400A > 3Ware 6000/7000 series controllers > > Any one of these has its strong and weak points, after seeing various > benchmarks/comparisons/reviews these are the only candidates that > deserve our attention. > > The server has a dozen of client workstations connected through a > switched 100Mbit LAN , all of these equipped with it's own OS and > harddisk, all home directories will be stored under the main server, > main workload (compilation and edition) would be done on the local > machines tough, server only takes care of file sharing. > > Also parallel MPI executions will be done between the clients. > > Considering that not all the workstantions would be working full time > and with cost in mind ? it's worth an ATA RAID solution ? > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Wed Oct 8 17:46:31 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 8 Oct 2003 14:46:31 -0700 (PDT) Subject: building a RAID system In-Reply-To: Message-ID: hi ya daniel > On Wed, 8 Oct 2003, Daniel Fernandez wrote: > > > Hi, > > > > I would like to know some advice about what kind of technology apply > > into a RAID file server ( through NFS ) . We started choosing hardware > > RAID to reduce cpu usage. > > > > We have two options , SCSI RAID and ATA RAID. The first would give the > > best results but on the other hand becomes really expensive so we have > > in mind two ATA RAID controllers: > > > > Adaptec 2400A > > 3Ware 6000/7000 series controllers > > > > Any one of these has its strong and weak points, after seeing various > > benchmarks/comparisons/reviews these are the only candidates that > > deserve our attention. good points about ata raid - large disks storage ( 300GB drives at $300 each +/- ) - get those drives w/ 8MB buffer disk cache - cheap ... can do with software raid or $40 ata-133 ide controller - $300 more for making ata drives appear like scsi drives with 3ware raid controllers - slower rpm disks ... usually it tops out at 7200rpm - it supposedly can sustain 133MB/sec transfers - if you use software raid, you can monitor the raid status - if you use hardware raid, you are limited to the tools the hw vendor gives you tomonitor the raid status of pending failures or dead drives good points about scsi .. - some say scsi disks are faster ... - super expensive .. $200 for 36 GB .. at 15000rpm - it supposedly can sustain 320MB/sec transfers if the disks does transfer at its full speed ... 320MB/sec or 133MB/sec does the rest of the system get to keep up with processing the data spewing off and onto the disks independent of which raid system is built, you wil need 2 or 3 more backup systems to backup your Terabyte sized raid systems more raid fun http://www.1U-Raid5.net c ya alvin > > The server has a dozen of client workstations connected through a > > switched 100Mbit LAN , all of these equipped with it's own OS and > > harddisk, all home directories will be stored under the main server, > > main workload (compilation and edition) would be done on the local > > machines tough, server only takes care of file sharing. > > > > Also parallel MPI executions will be done between the clients. > > > > Considering that not all the workstantions would be working full time > > and with cost in mind ? it's worth an ATA RAID solution ? good p > > > > > > -- > -------------------------------------------------------------------------- > Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Wed Oct 8 15:46:59 2003 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Wed, 08 Oct 2003 21:46:59 +0200 Subject: building a RAID system Message-ID: <1065642419.9483.55.camel@qeldroma.cttc.org> Hi, I would like to know some advice about what kind of technology apply into a RAID file server ( through NFS ) . We started choosing hardware RAID to reduce cpu usage. We have two options , SCSI RAID and ATA RAID. The first would give the best results but on the other hand becomes really expensive so we have in mind two ATA RAID controllers: Adaptec 2400A 3Ware 6000/7000 series controllers Any one of these has its strong and weak points, after seeing various benchmarks/comparisons/reviews these are the only candidates that deserve our attention. The server has a dozen of client workstations connected through a switched 100Mbit LAN , all of these equipped with it's own OS and harddisk, all home directories will be stored under the main server, main workload (compilation and edition) would be done on the local machines tough, server only takes care of file sharing. Also parallel MPI executions will be done between the clients. Considering that not all the workstantions would be working full time and with cost in mind ? it's worth an ATA RAID solution ? -- Daniel Fernandez Laboratori de Termot?cnia i Energia - CTTC UPC Campus Terrassa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Wed Oct 8 06:33:13 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 08 Oct 2003 06:33:13 -0400 Subject: Why NFS hang when copying files of 6MB? In-Reply-To: References: Message-ID: <1065609193.28674.32.camel@squash.scalableinformatics.com> On Wed, 2003-10-08 at 18:17, D. Scott wrote: > Hi > > On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes? Lots of possibilities, though I am not sure you have supplied enough information to hazard a guess (unless someone ran into this before and already knows the answer). An operation on an NFS mounted file system can hang when: 1) the nfs server becomes unresponsive (crash, overload, file system full, ...) 2) the client becomes unresponsive ... 3) the network becomes unresponsive ... ... If you could indicate more details, it is likely someone might be able to tell you where to look next. > > > Dan -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Wed Oct 8 18:17:02 2003 From: ds10025 at cam.ac.uk (D. Scott) Date: 08 Oct 2003 23:17:02 +0100 Subject: Why NFS hang when copying files of 6MB? Message-ID: Hi On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes? Dan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Wed Oct 8 19:39:41 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed, 8 Oct 2003 19:39:41 -0400 (EDT) Subject: building a RAID system In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org> Message-ID: > I would like to know some advice about what kind of technology apply > into a RAID file server ( through NFS ) . We started choosing hardware > RAID to reduce cpu usage. that's unfortunate, since the main way HW raid saves CPU usage is by running slower ;) seriously, CPU usage is NOT a problem with any normal HW raid, simply because a modern CPU and memory system is *so* much better suited to performing raid5 opterations than the piddly little controller in a HW raid card. the master/fileserver for my cluster is fairly mundane (dual-xeon, i7500, dual PC1600), and it can *easily* saturate its gigabit connection. after all, ram runs at around 2 GB/s sustained, and the CPU can checksum at 3 GB/s! concern for PCI congestion is a much more serious issue. finally, why do you care at all? are you fileserving through a fast (>300 MB/s) network like quadrics/myrinet/IB? most people limp along at a measly gigabit, which even a two-ide-disk raid0 can saturate... > The server has a dozen of client workstations connected through a > switched 100Mbit LAN , all of these equipped with it's own OS and jeez, since your limited to 10 MB/s, you could do raid5 on a 486 and still saturate the net. seriously, CPU consumption is NOT an issue at 10 MB/s. > machines tough, server only takes care of file sharing. so excess cycles on the fileserver will be wasted unless used. > Considering that not all the workstantions would be working full time > and with cost in mind ? it's worth an ATA RAID solution ? you should buy a single promise sata150 tx4 and four big sata disks (7200 RPM 3-year models, please). regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Wed Oct 8 19:28:37 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 8 Oct 2003 16:28:37 -0700 (PDT) Subject: Why NFS hang when copying files of 6MB? In-Reply-To: <1065609193.28674.32.camel@squash.scalableinformatics.com> Message-ID: On Wed, 8 Oct 2003, Joe Landman wrote: > On Wed, 2003-10-08 at 18:17, D. Scott wrote: > > Hi > > > > On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes? > > Lots of possibilities, though I am not sure you have supplied enough > information to hazard a guess (unless someone ran into this before and > already knows the answer). > > An operation on an NFS mounted file system can hang when: > > 1) the nfs server becomes unresponsive (crash, overload, file system > full, ...) not enough memory, too much swap spce > 2) the client becomes unresponsive ... > 3) the network becomes unresponsive ... > ... - bad hub, bad switch, bad cables - bad nic cards, bad motherboard, - bad kernel, bad drivers - bad dhcp config, waiting for machines that went offline c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 8 19:41:12 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 9 Oct 2003 09:41:12 +1000 Subject: Why NFS hang when copying files of 6MB? In-Reply-To: References: Message-ID: <200310090941.13302.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 9 Oct 2003 08:17 am, D. Scott wrote: > On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes? You need to give a lot more detail on that, try having a quick read of: http://www.catb.org/~esr/faqs/smart-questions.html#beprecise Basically there are all sorts of possible problems from kernel bugs, node hardware problems through to various network problems... Useful information would be things like: /etc/fstab from the nodes output of the mount command the output of strace when you try and do the 'cp': strace -o cp.log -e trace=file cp /path/to/file /path/to/destination good luck! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/hKCYO2KABBYQAh8RAqltAJ4/R91yD0KKVA6wB3+UDZxZcAOsFwCbBZn1 DeaCjkFO8bwGLhhSkxB20yE= =d7Gz -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Wed Oct 8 22:27:35 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 8 Oct 2003 19:27:35 -0700 (PDT) Subject: CAD In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver> Message-ID: hi ya On Thu, 9 Oct 2003, Manoj Gupta wrote: > Hello, > > One of my clients has asked me to provide a solution for his AutoCAD > work. > The minimum file size on which he works is nearly of 400 MB and it takes > 15-20 minutes to load on his single system. tell them to break the drawing up into itty-bitty pieces and work on a real autocad drawing .. :-) - separate the item into separate pieces so it can be bent, welded, drilled, etc or get a 3Ghz cpu and load up 4GB or 8GB of memory and nope ... beowulf or any other cluster will not help autocad c ya alvin - part time autocad me ..but i cant draw a line .. :-) - easier to contract out the 1u chassis design "drawings" :-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 8 19:47:32 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 9 Oct 2003 09:47:32 +1000 Subject: PocketPC Cluster Message-ID: <200310090947.33601.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Not strictly a Beowulf as there's no Linux, but interesting nonetheless. :-) IrDA for the networking, 11 compute + 1 management, slower than "a mainstream Pentium II-class desktop PC" (they don't specify what spec). http://www.spbsoftwarehouse.com/dev/articles/pocketcluster/index.html Twelve Pocket PC devices have been joined in a cluster to perform distributed calculations - the devices share the load of a complex calculation. The concept was to compare the performance of several Pocket PC devices linked into a cluster with the performance of a typical Pentium II-class desktop computer. [...] - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/hKIUO2KABBYQAh8RAvJvAJoDNqZ/2m8cIqo02Hbbwzpm2DWeMQCeOltt 3LuUp1Kkoc4jnmwVNgoDoFI= =+abL -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mg_india at sancharnet.in Wed Oct 8 20:03:57 2003 From: mg_india at sancharnet.in (Manoj Gupta) Date: Thu, 09 Oct 2003 05:33:57 +0530 Subject: CAD Message-ID: <000001c38df8$e1a6d9c0$bbd2003d@myserver> Hello, One of my clients has asked me to provide a solution for his AutoCAD work. The minimum file size on which he works is nearly of 400 MB and it takes 15-20 minutes to load on his single system. Can Beowulf be used to solve this problem and minimize the time required so as to improve productivity? Sawan Gupta || mg_india at sancharnet.in || _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Wed Oct 8 20:23:28 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed, 8 Oct 2003 20:23:28 -0400 (EDT) Subject: building a RAID system In-Reply-To: Message-ID: > - get those drives w/ 8MB buffer disk cache what reason do you have to regard 8M as other than a useless marketing feature? I mean, the kenel has a cache that's 100x bigger, and a lot faster. > - slower rpm disks ... usually it tops out at 7200rpm unless your workload is dominated by tiny, random seeks, the RPM of the disk isn't going to be noticable. > - it supposedly can sustain 133MB/sec transfers it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE disks in raid0. interestingly, the chipset controller is normally not competing for the same bandwidth as the PCI, so even with entry-level hardware, it's not hard to break 133. > - if you use software raid, you can monitor the raid status this is the main and VERY GOOD reason to use sw raid. > - some say scsi disks are faster ... usually lower-latency, often not higher bandwidth. interestingly, ide disks usually fall off to about half peak bandwidth on inner tracks. scsi disks fall off too, but usually less so - they don't push capacity quite as hard. > - it supposedly can sustain 320MB/sec transfers that's silly, of course. outer tracks of current disks run at between 50 and 100 MB/s, so that's the max sustained. you can even argue that's not really 'sustained', since you'll eventually get to slower inner tracks. > independent of which raid system is built, you wil need 2 or 3 > more backup systems to backup your Terabyte sized raid systems backup is hard. you can get 160 or 200G tapes, but they're almost as expensive as IDE disks, not to mention the little matter of a tape drive that costs as much as a server. raid5 makes backup less about robustness than about archiving or rogue-rm-protection. I think the next step is primarily a software one - some means of managing storage, versioning, archiving, etc... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Wed Oct 8 21:04:03 2003 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed, 8 Oct 2003 21:04:03 -0400 Subject: CAD In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver> References: <000001c38df8$e1a6d9c0$bbd2003d@myserver> Message-ID: <20031008210403.F28876@www2> AutoCAD versions since R13 only run on Windows, and AFAIK no version of AutoCAD has ever been shipped for Linux. Beowulf is a Linux- (or, taken more liberally than most people intend, Unix-) specific thing. Thus, unless I misunderstand, no. --Bob Drzyzgula On Thu, Oct 09, 2003 at 05:33:57AM +0530, Manoj Gupta wrote: > > Hello, > > One of my clients has asked me to provide a solution for his AutoCAD > work. > The minimum file size on which he works is nearly of 400 MB and it takes > 15-20 minutes to load on his single system. > > Can Beowulf be used to solve this problem and minimize the time required > so as to improve productivity? > > > Sawan Gupta || mg_india at sancharnet.in || > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 8 21:45:08 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 8 Oct 2003 21:45:08 -0400 (EDT) Subject: building a RAID systemo In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Mark Hahn wrote: > you should buy a single promise sata150 tx4 and four big sata disks > (7200 RPM 3-year models, please). I totally agree with everything Mark said and second this. Although 3-year ata (lower) or scsi (higher) disks would be just fine too, depending on how much you care to spend and how much it costs you if things go down. e.g. md raid under linux works marvelously well, and one can even create a kickstart file so that it makes your raid for you on a fully automated install, which is very cool. It is also dirt cheap. My home (switched 100 Mbps, 8-9 hosts/nodes depending on what is on) has a 150 GB RAID-5 server (3x80 GB 3-year ATA 7200 RPM disks) on a 2.2 GHz Celeron server with an extra ATA controller so there is only one disk per channel. It cost about $800 total to build inside a full tower case with extra fans including one with leds in front so that it glows blue. You couldn't get the CASE of a HW raid for that price, I don't think (although I admit that it won't do hot swap and dual power supplies). The total RAID/NFS load since 9/19 is: root 11 0.0 0.0 0 0 ? SW Sep19 0:00 [mdrecoveryd] root 21 0.0 0.0 0 0 ? SW Sep19 0:00 [raid1d] root 22 0.0 0.0 0 0 ? SW Sep19 0:02 [raid5d] root 23 0.0 0.0 0 0 ? SW Sep19 5:03 [raid5d] ... root 4928 0.0 0.0 0 0 ? SW Sep19 2:58 [nfsd] root 4929 0.0 0.0 0 0 ? SW Sep19 2:57 [nfsd] root 4930 0.0 0.0 0 0 ? SW Sep19 3:00 [nfsd] root 4931 0.0 0.0 0 0 ? SW Sep19 2:43 [nfsd] root 4932 0.0 0.0 0 0 ? SW Sep19 3:00 [nfsd] root 4933 0.0 0.0 0 0 ? SW Sep19 2:43 [nfsd] root 4934 0.0 0.0 0 0 ? SW Sep19 2:56 [nfsd] root 4935 0.0 0.0 0 0 ? SW Sep19 2:58 [nfsd] (or less than 30 minutes of total CPU). At 1440 min/day, for 18 days (conservatively) that is about 0.1% load, on average. This is a home network load, sure (which includes gaming and a fair bit of data access, but no, we're not talking GB per day moving over the lines). In a more data-intensive environment this would increase, but there is a lot of head room. The point is that a 2.2 GHz system has a LOT of horsepower. We used to run entire departments of twenty or thirty workstations using $10-20,000 Sun servers at maybe 5 MEGAHertz on 10 Mbps thinwire networks with fair to middling satisfaction. My $800 home server has several thousand times the raw speed, about a thousand times the memory, a thousand times the disk, AND it is RAID 5 disk at that. The network has only increased in speed by a factor of maybe 10-20 (allowing for switched vs hub). Mucho headroom indeed. BTW, our current department primary server is a 1 GHz PIII, although we're adding a second CPU shortly as load dictates. And if you are planning your server to handle something other than a small cluster or LAN where downtime isn't too "expensive" you may want to look at higher quality (rackmount) servers and disk arrays in enclosures that permit e.g. hot swap and that have redundant power. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 8 23:12:41 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 9 Oct 2003 13:12:41 +1000 Subject: building a RAID system In-Reply-To: References: Message-ID: <200310091312.42544.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 9 Oct 2003 10:23 am, Mark Hahn wrote: > raid5 makes backup > less about robustness than about archiving or rogue-rm-protection. > I think the next step is primarily a software one - > some means of managing storage, versioning, archiving, etc... For those who haven't seen it, this is a very interesting way of doing snapshot style backups: http://www.mikerubel.org/computers/rsync_snapshots/ - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/hNIpO2KABBYQAh8RAvXaAJ0ecv77jUJe3DWpsinqBFgs4W4JlQCfRz/z HfXF/JkFSszlvX10/JXjisM= =7lAy -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Oct 8 22:58:17 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 8 Oct 2003 22:58:17 -0400 (EDT) Subject: building a RAID system In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Mark Hahn wrote: > > - get those drives w/ 8MB buffer disk cache > > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. The larger cache does provide some benefit. Disks now read and cache up to a whole track/cylinder at once, starting from when the head settles from a seek up to when the desired sector is read. You can't do that type of caching in the kernel. As disks become more dense, more memory is needed to save a cylinder's worth of data, so we should expect the cache size to increase. But you point is likely "disk cache is mostly legacy superstition". MS-Windows 98 and earlier had such horrible caching behavior that a few MB of on-disk cache could triple the performance. This was also why MS-Windows would run much faster under Linux+VMWare than on the raw hardware. > > - it supposedly can sustain 133MB/sec transfers Normal disks top out at 70MB/sec read, 50MB/sec write on the outer tracks. These numbers drop significantly on the inner tracks. You might get 10MB/sec better with 10K or 15K RPM SCSI drives, but it's certainly not linear with the speed. BTW, 2.5" laptop drives are _far_ worse. Typical for a modern fast drive is 20MB/sec read and 10MB/sec write. Older drivers were worse. > > - some say scsi disks are faster ... > > usually lower-latency, often not higher bandwidth. interestingly, > ide disks usually fall off to about half peak bandwidth on inner > tracks. scsi disks fall off too, but usually less so - they > don't push capacity quite as hard. Look at the shape of the transfer performance curve -- the shape is sometimes the same as the similar IDE drive, but sometimes has a much different curve. Wider tracks mean faster seek settling but lower density. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Wed Oct 8 22:33:49 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 8 Oct 2003 19:33:49 -0700 (PDT) Subject: building a RAID system - yup In-Reply-To: Message-ID: hi ya mark On Wed, 8 Oct 2003, Mark Hahn wrote: > > - get those drives w/ 8MB buffer disk cache > > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. for those squeezing the last 1MB/sec transfer out of their disks ... 8MB did seem to make a difference ( streaming video apps - encoding/decoding/xmit ) > > - slower rpm disks ... usually it tops out at 7200rpm > > unless your workload is dominated by tiny, random seeks, > the RPM of the disk isn't going to be noticable. usually a side affect of partitioning too > > - it supposedly can sustain 133MB/sec transfers > > it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE > disks in raid0. interestingly, the chipset controller is normally > not competing for the same bandwidth as the PCI, so even with > entry-level hardware, it's not hard to break 133. super easy to overflow the disks and pci .. depending on apps > > - if you use software raid, you can monitor the raid status > > this is the main and VERY GOOD reason to use sw raid. yup > > - some say scsi disks are faster ... > > usually lower-latency, often not higher bandwidth. interestingly, > ide disks usually fall off to about half peak bandwidth on inner > tracks. scsi disks fall off too, but usually less so - they > don't push capacity quite as hard. scsi capacity doesnt seem to be an issue for them ... they're falling behind by several generations ( scsi disks used to be the highest capacity drives .. not any more ) > > - it supposedly can sustain 320MB/sec transfers > > that's silly, of course. outer tracks of current disks run at > between 50 and 100 MB/s, so that's the max sustained. you can even > argue that's not really 'sustained', since you'll eventually get > to slower inner tracks. yup ... those are just marketing numbers... all averages ... and bigg differences between inner tracks and outer tracks > > independent of which raid system is built, you wil need 2 or 3 > > more backup systems to backup your Terabyte sized raid systems > > backup is hard. you can get 160 or 200G tapes, but they're almost to me ... backup of terabyte sized systems is trivial ... - just give me lots of software raid subsystems ( 2 backups for each "main" system ) - lot cheaper than tape drives and 1000x faster than tapes for live backups - will never touch a tape backup again ... too sloow and too unreliable no matter how clean the tape heads are ( too slow being the key problem for restoring ) c ya alvin > as expensive as IDE disks, not to mention the little matter of a > tape drive that costs as much as a server. raid5 makes backup > less about robustness than about archiving or rogue-rm-protection. > I think the next step is primarily a software one - > some means of managing storage, versioning, archiving, etc... > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 8 22:31:50 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 8 Oct 2003 22:31:50 -0400 (EDT) Subject: CAD In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver> Message-ID: On Thu, 9 Oct 2003, Manoj Gupta wrote: > Hello, > > One of my clients has asked me to provide a solution for his AutoCAD > work. > The minimum file size on which he works is nearly of 400 MB and it takes > 15-20 minutes to load on his single system. Load from what into what? It is hard for me to see how a 400 MB file could take this long to load into memory over any modern channel, as this is less than 0.5 MB/sec. This is roughly the bandwidth one achieves throwing floppies across a room one at a time by hand. That is, I can't imagine how this is bandwidth limited, unless the client has primitive hardware. From a local disk (even a bad one) this should take ballpark of a few seconds to load into memory. From NFS order of a minute or three (in most configurations, less on faster networks). If the load is so slow because the program is crunching the file as it loads it (reading a bit, thinking a bit, reading a bit more) then nothing can speed this up unless AutoCAD has a parallel version of their program. > Can Beowulf be used to solve this problem and minimize the time required > so as to improve productivity? I don't know for sure (although somebody else on the list might). I doubt it, though, unless autocad has a parallel version that can use a linux cluster to speed things up. However, your first step in answering it for yourself is going to be doing measurements to determine what the bottleneck is. If it is I/O then invest in better I/O (perhaps a better network). So measure e.g. the network load if it is getting the file from a network file server. If the problem is that the file is coming from a winXX server with too little memory on an antique CPU and with creaky old disks on a 10 Mbps hub, well, FIRST replace the winxx with linux, the old server with a new server, the old disks with new disks, the 10 BT with 1000 BT. At that point you won't have a bandwidth problem, as the server should be able to deliver files at some tens of MB/sec pretty easily. If the problem persists, try to figure out what autocad is doing when it loads. rgb > > > Sawan Gupta || mg_india at sancharnet.in || > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Thu Oct 9 02:00:33 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Wed, 8 Oct 2003 23:00:33 -0700 (PDT) Subject: building a RAID system In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Mark Hahn wrote: > > - get those drives w/ 8MB buffer disk cache > > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. I found a comparison of 8MB vs 2MB drives in a raid, though it's windows based and not that great: http://www.madshrimps.be/?action=getarticle&number=13&artpage=289&articID=69 Seems like the 8MB didn't really make much of a difference. > > independent of which raid system is built, you wil need 2 or 3 > > more backup systems to backup your Terabyte sized raid systems > > backup is hard. you can get 160 or 200G tapes, but they're almost > as expensive as IDE disks, not to mention the little matter of a 100GB LTO tapes can be had for $36, that's less than half the price of the cheapest 200 GB drives. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From maurice at harddata.com Thu Oct 9 00:58:27 2003 From: maurice at harddata.com (Maurice Hilarius) Date: Wed, 08 Oct 2003 22:58:27 -0600 Subject: building a RAID system In-Reply-To: <200310090112.h991CPb24907@NewBlue.scyld.com> Message-ID: <5.1.1.6.2.20031008225509.04259800@mail.harddata.com> Where you said: >I would like to know some advice about what kind of technology apply >into a RAID file server ( through NFS ) . We started choosing hardware >RAID to reduce cpu usage. > >We have two options , SCSI RAID and ATA RAID. The first would give the >best results but on the other hand becomes really expensive so we have >in mind two ATA RAID controllers: > > Adaptec 2400A > 3Ware 6000/7000 series controllers I would suggest using the 3Ware (current models are 7506 ( parallel ATA) and 8506 ( Serial ATA)). Use mdamd to create software RAID devices. It will yield better performance, and is much more flexible. If you are building a large array, use multiple controllers to increase throughput. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue mailto:maurice at harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 03:52:39 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 00:52:39 -0700 (PDT) Subject: building a RAID system In-Reply-To: Message-ID: hi ya On Wed, 8 Oct 2003, Trent Piepho wrote: > On Wed, 8 Oct 2003, Mark Hahn wrote: > > > - get those drives w/ 8MB buffer disk cache > > > > what reason do you have to regard 8M as other than a useless > > marketing feature? I mean, the kenel has a cache that's 100x > > bigger, and a lot faster. > > I found a comparison of 8MB vs 2MB drives in a raid, though it's windows > based and not that great: > http://www.madshrimps.be/?action=getarticle&number=13&artpage=289&articID=69 i dont have much data between 2MB and 8MB ... just various people's feedback ... - releasable data i do have is at http://www.Linux-1U.net/Disks/Tests/ - testing for 2MB and 8MB should be done on the same system of the same sized disks and exact same partition, distro, patchlevel and "test programs to amplify the differences" - lots of disk writes and reads ... that overflow the memory so that disk access is forced ... > Seems like the 8MB didn't really make much of a difference. > > > > independent of which raid system is built, you wil need 2 or 3 > > > more backup systems to backup your Terabyte sized raid systems -- emphasizing .. "Terabyte" sized disk subsystems > > backup is hard. you can get 160 or 200G tapes, but they're almost > > as expensive as IDE disks, not to mention the little matter of a > > 100GB LTO tapes can be had for $36, that's less than half the price of the > cheapest 200 GB drives. we/i like to build systems that backup 1TB or 2TB per 1U server ... - tapes doesn't come close ... different ballpark - a rack of 1U servers is a minimum of 40TB - 80TB of data .. - and than to turn around and simulate a disk crash and restore from backups from bare metal or how fast to get a replacement system back online ( hot swap - live backups) - i think those 200GB tape drives is something to also add into the costs of backup media .. as are restore from tape considerations before deciding on tape vs disk backup media ( all depends on the purpose of the server and data ) - last i played with tape drives was those $3K - $4K exabyte tape drives ... nice and fast (writing) .. but very slow for restore and unreliable ... and time consuming and NOT automated - people costs the mosts for doing proper backups ... ( someone has to write the backup methodology ro swap the tapes etc ) fries ( a local pc store here ) had 160GB disks 8MB buffers for $80 after rebates ... otherwise general rule is $1 per GB of raw disk storage per disk fun stuff .. have fun alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Thu Oct 9 05:25:18 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Thu, 9 Oct 2003 02:25:18 -0700 (PDT) Subject: building a RAID system In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Alvin Oga wrote: > > > backup is hard. you can get 160 or 200G tapes, but they're almost > > > as expensive as IDE disks, not to mention the little matter of a > > > > 100GB LTO tapes can be had for $36, that's less than half the price of the > > cheapest 200 GB drives. > > we/i like to build systems that backup 1TB or 2TB per 1U server ... > - tapes doesn't come close ... different ballpark How do you stick 2TB in a 1 U server? I've seen 1U cases with four IDE bays, and the largest IDE drive I've seen is 250 GB. I've got two 4U rackmount systems sitting side by side on the same shelf. One is a ADIC Scalar 24, which holds 24 100 GB LTO tapes. The other is a 16 drive server with 200GB SATA drives and two 8 port 3ware cards. The tape library has 2.4 TB and the IDE server is 3.2 TB. To be fair, the IDE server is brand new, while the ADIC is around a year old. If the tape library were bought today, it would have a LTO-2 drive with double the capacity and could store 4.8 TB. So tapes seem to come pretty close to me. It also quite a bit more practical changes tapes with the library than to be swapping hard drives around. The libraries built in barcode reader keeps track of them all for me. I can type a command and have it spit out all the tapes that a certain set of backups are on. They fit nicely in a box in their plastic cases and if I drop one it will be ok. I can stick them on a shelf for five years and still expect to read them. And the tapes don't take up any rackspace or power or need any cooling. I've never had a tape go bad on me either, even though I've been though a lot more of them than IDE drives. Of course the tape library was expensive. A new LTO-2 model can be had for around $11,600 on pricewatch. The 16 bay IDE case, CPUs/MB/memory and 3ware controllers were much less. But the cost of the media is a lot less for tapes than for SATA hard drives. Especially if you get models with 3 year warranties. Once you buy enough drives/tapes you'll break even on a $/GB comparison. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Thu Oct 9 06:04:20 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Thu, 9 Oct 2003 10:04:20 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: <20031006211806.GB2091@greglaptop.internal.keyresearch.com> References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> Message-ID: Greg, > Is it a 100x100 matrix LU decomposition? Well, no, because Intel's > MKL and the free ATLAS library run at a respectable % of peak. Our benchmarks concentrate on xGEQRF, xGESVD, xGETRF, xGETRS, xGESV, xPOTRF, xPOSV, xPPTRF, xGEEV, extending to xGETRI, and xTRTRI. Have you tried DPPSV or DPOSV on Itanium, for example? I would be interested in the percentage of peak that you achieve with MKL and ATLAS, for up to 10000x10000 matrices. ATLAS does not have full LAPACK implementation. > 4) Put your performance whitepapers on your website, or it looks > fishy. Our white papers are not on the Web they contain performance data, and particularly, performance data comparing against our competitors. It may expose us to libel legal issues. Putting legitimacy of any legal issues aside, it is not good for any business to be engulf in legal squabbles. We are in the process of clearing this with our legal department at the moment. As I have noted in my previous e-mail, anyone who wants to get a hold of the white papers are welcome to please send me an e-mail. > I looked and didn't see a single performance claim there. There is one on the front page! Ken ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 06:13:21 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 03:13:21 -0700 (PDT) Subject: building a RAID system - 8 drives In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Trent Piepho wrote: > On Thu, 9 Oct 2003, Alvin Oga wrote: > > > > backup is hard. you can get 160 or 200G tapes, but they're almost > > > > as expensive as IDE disks, not to mention the little matter of a > > > > > > 100GB LTO tapes can be had for $36, that's less than half the price of the > > > cheapest 200 GB drives. > > > > we/i like to build systems that backup 1TB or 2TB per 1U server ... > > - tapes doesn't come close ... different ballpark > > How do you stick 2TB in a 1 U server? I've seen 1U cases with four IDE bays, > and the largest IDE drive I've seen is 250 GB. 8 drives ... 250GB or 300GB each .. > I've got two 4U rackmount systems sitting side by side on the same shelf. One > is a ADIC Scalar 24, which holds 24 100 GB LTO tapes. The other is a 16 drive > server with 200GB SATA drives and two 8 port 3ware cards. The tape library > has 2.4 TB and the IDE server is 3.2 TB. To be fair, the IDE server is brand > new, while the ADIC is around a year old. If the tape library were bought > today, it would have a LTO-2 drive with double the capacity and could store > 4.8 TB. So tapes seem to come pretty close to me. It also quite a bit more > practical changes tapes with the library than to be swapping hard drives nobody swaps disks around ... unless one is using those 5.25" drive bay thingies in which case ... thats a different ball game i/we claim that if the drives fail, something is wrong ... its not necessary for the disks to be removable > around. The libraries built in barcode reader keeps track of them all for me. > I can type a command and have it spit out all the tapes that a certain set of > backups are on. They fit nicely in a box in their plastic cases and if I drop > one it will be ok. I can stick them on a shelf for five years and still i prefer hands off backups and restore .... esp if the machine is not within your hands reach ... > expect to read them. And the tapes don't take up any rackspace or power or > need any cooling. I've never had a tape go bad on me either, even though I've > been though a lot more of them than IDE drives. > > Of course the tape library was expensive. A new LTO-2 model can be had for > around $11,600 on pricewatch. The 16 bay IDE case, CPUs/MB/memory and 3ware for $11.6K ... i can build two 2TB servers or more ... 8 * $400 --> $3200 in drives ... for 2.4TB each ... + $700 for misc cpu/mem/1u case and it'd be 2 live backups of the primary 2TB system or about 2-3 months of weekly full backups depending ondata > controllers were much less. But the cost of the media is a lot less for tapes > than for SATA hard drives. Especially if you get models with 3 year > warranties. Once you buy enough drives/tapes you'll break even on a $/GB > comparison. i dont want to be baby sitting tapes ... on a daily basis and cleaning its heads or assume that someone else did c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From seth at hogg.org Thu Oct 9 06:38:54 2003 From: seth at hogg.org (Simon Hogg) Date: Thu, 09 Oct 2003 11:38:54 +0100 Subject: Intel compilers and libraries In-Reply-To: References: <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> Message-ID: <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net> At 10:04 09/10/03 +0000, C J Kenneth Tan -- Heuchera Technologies wrote: >Our white papers are not on the Web they contain performance data, and >particularly, performance data comparing against our competitors. It >may expose us to libel legal issues. Putting legitimacy of any legal >issues aside, it is not good for any business to be engulf in legal >squabbles. We are in the process of clearing this with our legal >department at the moment. > >As I have noted in my previous e-mail, anyone who wants to get a hold >of the white papers are welcome to please send me an e-mail. I would just like to comment that if you are releasing the white papers by email, what difference is that to putting it on the web? They are both still publishing. Although IANAL, I would doubt that these figures expose you legally, as long as they are correct and truthful in the figures you claim (and probabily the methodology would be pretty handy, too). Simon Hogg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Thu Oct 9 06:31:00 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Thu, 09 Oct 2003 06:31:00 -0400 Subject: building a RAID system - 8 drives In-Reply-To: References: Message-ID: <3F8538E4.9020400@lmco.com> > > How do you stick 2TB in a 1 U server? I've seen 1U cases with four > IDE bays, > > and the largest IDE drive I've seen is 250 GB. > > 8 drives ... 250GB or 300GB each .. > Cool. Do you have pictures? How do you get the other 4 drives out? I assume they're not accessible from the front so do you have to pull the unit out, pop the cover and replace the drive? > > I've got two 4U rackmount systems sitting side by side on the same > shelf. One > > is a ADIC Scalar 24, which holds 24 100 GB LTO tapes. The other is > a 16 drive > > server with 200GB SATA drives and two 8 port 3ware cards. The tape > library > > has 2.4 TB and the IDE server is 3.2 TB. To be fair, the IDE server > is brand > > new, while the ADIC is around a year old. If the tape library were > bought > > today, it would have a LTO-2 drive with double the capacity and > could store > > 4.8 TB. So tapes seem to come pretty close to me. It also quite a > bit more > > practical changes tapes with the library than to be swapping hard > drives > > nobody swaps disks around ... unless one is using those 5.25" drive bay > thingies in which case ... thats a different ball game > > i/we claim that if the drives fail, something is wrong ... its not > necessary for the disks to be removable > Are you saying that it's not necessary to have hot-swappable drives? (I'm just trying to undertand your point). Does everyone remember this: http://www.tomshardware.com/storage/20030425/index.html My only problem with this approach is off-site storage of backups. Do you pull a huge number of drives and move them off-site? (I still love the idea of using inexpensive drives for backup instead of tape though). Jeff -- Dr. Jeff Layton Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Thu Oct 9 07:07:26 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Thu, 9 Oct 2003 11:07:26 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net> References: <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net> Message-ID: Simon, > I would just like to comment that if you are releasing the white papers by > email, what difference is that to putting it on the web? They are both > still publishing. I am not a lawyer, so I cannot comment on the legal aspects of things. What if an e-mail and its attachments have a confidentiality clause attached? Ken ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 07:26:56 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 07:26:56 -0400 (EDT) Subject: building a RAID system - yup In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Alvin Oga wrote: > > > - it supposedly can sustain 320MB/sec transfers > > > > that's silly, of course. outer tracks of current disks run at > > between 50 and 100 MB/s, so that's the max sustained. you can even > > argue that's not really 'sustained', since you'll eventually get > > to slower inner tracks. > > yup ... those are just marketing numbers... all averages ... It probably refers to burst delivery out of its 8 MB cache. The actual sustained bps speed is a pure matter of N*2*\pi*\R*f/S, where N = number of heads, 2piRf is the linear/tangential speed of the platter at R the read radius, and S is the linear length per bit. This is an upper bound. Similarly average latency (seek time) is something like 1/2f, the time the platter requires to move half a rotation. > and bigg differences between inner tracks and outer tracks Well, proportional to R, at any rate. Given the physical geometry of the platters (which I get to look at when I rip open old drives to salvage their magnets) about a factor of two. > > > independent of which raid system is built, you wil need 2 or 3 > > > more backup systems to backup your Terabyte sized raid systems > > > > backup is hard. you can get 160 or 200G tapes, but they're almost > > to me ... backup of terabyte sized systems is trivial ... > - just give me lots of software raid subsystems > ( 2 backups for each "main" system ) > > - lot cheaper than tape drives and 1000x faster than tapes > for live backups > > - will never touch a tape backup again ... too sloow > and too unreliable no matter how clean the tape heads are > ( too slow being the key problem for restoring ) C'mon, Alvin. Sometimes this is a workable solution, sometimes it just plain is not. What about archival storage? What about offsite storage? What about just plain moving certain data around (where networks of ANY sort might be held to be untrustworthy). What about due diligence if you were are corporate IT exec held responsible for protecting client data against loss where the data was worth real money (as in millions to billions) compared to the cost of archival media and mechanism? "never touch a tape backup again" is romantic and passionate, but not necessarily sane or good advice for the vast range of humans out there. To backup a terabyte scale system, one needs a good automated tape changer and a pile of tapes. These days, this will (as Mark noted) cost more than your original RAID, in all probability, although this depends on how gold-plated your RAID is and whether or not you install two of them and use one to backup the other. I certainly don't have a tape changer in my house as it would cost more than my server by a factor of two or three to set up. I backup key data by spreading it around on some of the massive amounts of leftover disk that accumulates in any LAN of systems in days where the smallest drives one can purchase are 40-60 GB but install images take at most a generous allotment of 5 GB including swap. In the physics department, though, we are in the midst of a perpetual backup crisis, because it IS so much more expensive than storage and our budget is limited. Our primary department servers are all RAID and total (IIRC) over a TB and growing. We do actually back up to disk several times a day so that most file restores for dropped files take at most a few seconds to retrieve (well, more honestly a few minutes of FTE labor between finding the file and putting it back in a user's home directory). However, we ALSO very definitely make tape backups using a couple of changers, keep offsite copies and long term archives, and give users tapes of special areas or data on request. The tape system is expensive, but a tiny fraction of the cost of the loss of data due to (say) a server room fire, or a monopole storm, or a lightning strike on the primary room feed that fries all the servers to toast. I should also point out that since we've been using the RAIDs we have experienced multidisk failures that required restoring from backup on more than one occasion. The book value probability for even one occasion is ludicrously low, but the book value assumes event independence and lies. Disks are often bought in batches, and batches of disk often fail (if they fail at all) en masse. Failures are often due to e.g. overheating or electrical problems, and these are often common to either all the disks in an enclosure or all the enclosures in a server room. I don't think a sysadmin is ever properly paranoid about data loss until they screw up and drop somebody's data for which they were responsible because of inadequate backups. Our campus OIT just dropped a big chunk of courseware developed for active courses this fall because they changed the storage system for the courseware without verifying their backup, experienced a crash during the copy over, and discovered that the backup was corrupt. That's real money, people's effort, down the drain. Pants AND suspenders. Superglue around the waistband, actually. Who wants to be caught with their pants down in this way? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 09:16:43 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 09:16:43 -0400 (EDT) Subject: Intel compilers and libraries In-Reply-To: Message-ID: On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote: > > 4) Put your performance whitepapers on your website, or it looks > > fishy. > > Our white papers are not on the Web they contain performance data, and > particularly, performance data comparing against our competitors. It > may expose us to libel legal issues. Putting legitimacy of any legal Expose you to libel suits? Say what? Only if you lie about your competitor's numbers (or "cook" them so that they aren't an accurate reflection of their capabilities, as is often done in the industry) does it expose you to libel charges or more likely to the ridicule of the potential consumers (who tend to be quite knowledgeable, like Greg). One essential element to win those crafty consumers over is to compare apples to apples, not apples to apples that have been picked green, bruised, left on the ground for a while in the company of some handy worms, and then picked up so you can say "look how big and shiny and red and worm-free our apple is and how green and tiny and worm-ridden our competitor's apple is". A wise consumer is going to eschew BOTH of your "display apples" (as your competitor will often have an equally shiny and red apple to parade about and curiously bruised and sour apples from YOUR orchard) and instead insist on wandering into the various orchards to pick REAL apples from your trees for their OWN comparison. What exactly prevents you from putting your own raw numbers up, without any listing of your competitor's numbers? You can claim anything you like for your own product and it isn't libel. False advertising, possibly, but not libel. Or put the numbers up with your competitor's numbers up "anonymized" as A, B, C. And nobody will sue you for beating ATLAS/GCC/GSL numbers -- ATLAS etc are open source tools and nobody "owns" them to sue you or cares in the slightest if you beat them. The most that might happen is that if you manipulate(d) ATLAS numbers so they aren't what real humans get on real systems, people might laugh at you or more likely just ignore you thereafter. What makes you any LESS liable to libel if you distribute the white papers to (potential) customers individually? Libel is against the law no matter how, and to who, you distribute libelous material; it is against the law even if shrouded in NDA. It is against the law if you whisper it in your somebody's ears -- it is just harder to prove. Benchmark comparisons, by the way, are such a common marketing tool (and so easily bent to your own needs) that I honestly think that there is a tacit agreement among vendors not to challenge competitors' claims in court unless they are openly egregious, only to put up their own competing claims. After all, no sane company WOULD actually lie, right -- they would have a testbed system on which they could run the comparisons listed right there in court and everybody knows it. Whether the parameters, the compiler, the system architecture, the tests run etc. were carefully selected so your product wins is moot -- if it ain't a lie it ain't libel, and it is caveat emptor for the rest (and the rest is near universal practice -- show your best side, compare to their worst). > issues aside, it is not good for any business to be engulf in legal > squabbles. We are in the process of clearing this with our legal > department at the moment. > > As I have noted in my previous e-mail, anyone who wants to get a hold > of the white papers are welcome to please send me an e-mail. As if your distributing them on a person by person basis is somehow less libelous? Or so that you can ask me to sign an NDA so that your competitors never learn that you are libelling them? I rather think that an NDA that was written to protect illegal activity be it libel or drug dealing or IP theft would not stand up in court. Finally, product comparisons via publically available benchmarks of products that are openly for sale don't sound like trade secrets to me as I could easily duplicate the results at home (or not) and freely publish them. Your company's apparent desire to conceal this comes across remarkably poorly to the consumer. It has the feel of "Hey, buddy, wanna buy a watch? Come right down this alley so I can show you my watches where none of the bulls can see" compared to an open storefront with your watches on display to anyone, consumer or competitor. This is simply my own viewpoint, of course. I've simply never heard of a company shrinking away from making the statement "we are better than our competitors and here's why" as early and often as they possibly could. AMD routinely claims to be faster than Intel and vice versa, each has numbers that "prove" it -- for certain tests that just happen to be the tests that they tout in their claims, which they can easily back up. For all the rest of us humans, our mileage may vary and we know it, and so we mistrust BOTH claims and test the performance of our OWN programs on both platforms to see who wins. I'm certain that the same will prove true for your own product. I don't care about your benchmarks except as a hook to "interest" me. Perhaps they will convince me to get you to loan me access to your libraries etc to link them into my own code to see if MY code speeds up relative to the way I have it linked now, or relative to linking with a variety of libraries and compilers. Then I can do a real price/performance comparison and decide if I'm better off buying your product (and buying fewer nodes) or using an open source solution that is free (and buying more nodes). Which depends on the scaling properties of MY application, costs, and so forth, and cannot be predicted on the basis of ANY paper benchmark. Finally, don't assume that this audience is naive about benchmarking or algorithms, or at all gullible about performance numbers and vendor claims. A lot of people on the list (such as Greg) almost certainly have far more experience with benchmarks than your development staff; some are likely involved in WRITING benchmarks. If you want to be taken seriously, put up a full suite of benchmarks, by all means, and also carefully indicate how those benchmarks were run as people will be interested in duplicating them and irritated if they are unable to. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Thu Oct 9 09:02:11 2003 From: jsims at csiopen.com (Joey Sims) Date: Thu, 9 Oct 2003 09:02:11 -0400 Subject: building a RAID system - 8 drives Message-ID: <812B16724C38EE45A802B03DD01FD5472A3BF4@exchange.concen.com> 300GB Maxtor ATA133 5400RPM drives are the largest currently available. 250GB is the largest SATA currently. You can achieve 2TB in a 1U by using a drive sled that will hold two drives. The drives are mounted opposing each other and share a backplane. This is a proprietary solution. Or, if you have a chassis with 4 external trays and a few internal 3.5" bays it could be done. I personally don't believe cramming this many drives in a 1U is a good idea. Increased heat due to lack of airflow would have to decrease the lifespan of the drives. ---------------------------------------------------- |Joey P. Sims 800.995.4274 x 242 |Sales Manager 770.442.5896 - Fax |HPC/Storage Division jsims at csiopen.com |Concentric Systems,Inc. www.csilabs.net ---------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Thu Oct 9 07:02:57 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Thu, 09 Oct 2003 07:02:57 -0400 Subject: building a RAID system - 8 drives In-Reply-To: References: Message-ID: <3F854061.3040208@lmco.com> Alvin Oga wrote: > > On Thu, 9 Oct 2003, Jeff Layton wrote: > > > > > How do you stick 2TB in a 1 U server? I've seen 1U cases with four > > > IDE bays, > > > > and the largest IDE drive I've seen is 250 GB. > > > > > > 8 drives ... 250GB or 300GB each .. > > > > > > > Cool. Do you have pictures? How do you get the other 4 drives > > out? I assume they're not accessible from the front so do you > > have to pull the unit out, pop the cover and replace the drive? > > yup.. pull the cover off and pop out the drive the hard way vs > "hot swap ide tray" > > autocad generated *.jpg file > http://linux-1u.net/Dwg/jpg.sm/c2500.jpg > > ( newer version has the mb and ps swapped for better cpu cooling) > http://linux-1u.net/Dwg/jpg.sm/c2610.jpg ( also holds 8 drives ) > > > > i/we claim that if the drives fail, something is wrong ... its not > > > necessary for the disks to be removable > > > > > > > Are you saying that it's not necessary to have hot-swappable > > drives? (I'm just trying to undertand your point). > > if the drive is dying .... > - find out which brand/model# it is and avoid it > - find out if others are having similar problems > - put a 40x40x20mm fan on the (7200rpm) disks and see if it helps > > i'm not convinced that hotswap ide works w/o special ide controllers > - pull the ide disk out while its powered up > - pull the ide disk out while you're writing a 2GB file to it > > - or insert the disk while the rest of the systme is up and > running > > if you have to power down to take the ide disk out, you might as > well do a clean shutdown and replace the disk the hard way with > a screw driver instead of nice ($50 expensive) drive bay handle > $ 50 can be an extra 80GB of disk space when a good sale > is occuring at the local fries stores > We've got several NAS boxes with hot-swappable IDE drives and without it we'd be toast. Granted the controller is specialized, coming from one vendor, but it allows us to have a fail-over drive with auto-rebuild in the background. Then we just pull the bad drive, put in a new one, and designate it as the new hot spare. Works great! It's saved our bacon a few times. I've wanted to test hot-swap with 3ware controllers, but have never done it. Has anyone tested the hotswap capability of the 3ware controllers/cases? Another comment. If you have to pull the node to replace the drive, then you have to bring down the filesystem which might not be the best thing to do. Hot-swapping allows the filesystem to keep functioning, albeit at a lower performance level. > > Does everyone remember this: > > > > http://www.tomshardware.com/storage/20030425/index.html > > > > My only problem with this approach is off-site storage of > > backups. Do you pull a huge number of drives and move them > > off-site? (I still love the idea of using inexpensive drives for > > backup instead of tape though). > > i suppose you can do "incremental" backups across the wire ... > and "inode" based backups too ... > > - it'd be crazy to xfer the entire 1MB file if > only 1 line changed in it > We can't do backups across the wire to an offsite storage facility. So we have to do backups, pull the tapes, and store them off-site. I'm just not sure how this would work with disks instead of tapes. Oh, you can full and incremental backups to disk - most backup software doesn't care what the media is anyway - but I'm just not sure if you pull a set of disks and store them. How does off-site backup recovery work? Do you pop them in, mount them as read-only, and copy them to a live filesystem? However, despite all of these questions, at some point soon, disk will be the only way to get backups of LARGE filesystems in a reasonable amount of time. Jeff -- Dr. Jeff Layton Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 07:36:40 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 04:36:40 -0700 (PDT) Subject: building a RAID system - 8 drives In-Reply-To: <3F8538E4.9020400@lmco.com> Message-ID: On Thu, 9 Oct 2003, Jeff Layton wrote: > > > How do you stick 2TB in a 1 U server? I've seen 1U cases with four > > IDE bays, > > > and the largest IDE drive I've seen is 250 GB. > > > > 8 drives ... 250GB or 300GB each .. > > > > Cool. Do you have pictures? How do you get the other 4 drives > out? I assume they're not accessible from the front so do you > have to pull the unit out, pop the cover and replace the drive? yup.. pull the cover off and pop out the drive the hard way vs "hot swap ide tray" autocad generated *.jpg file http://linux-1u.net/Dwg/jpg.sm/c2500.jpg ( newer version has the mb and ps swapped for better cpu cooling) http://linux-1u.net/Dwg/jpg.sm/c2610.jpg ( also holds 8 drives ) > > i/we claim that if the drives fail, something is wrong ... its not > > necessary for the disks to be removable > > > > Are you saying that it's not necessary to have hot-swappable > drives? (I'm just trying to undertand your point). if the drive is dying .... - find out which brand/model# it is and avoid it - find out if others are having similar problems - put a 40x40x20mm fan on the (7200rpm) disks and see if it helps i'm not convinced that hotswap ide works w/o special ide controllers - pull the ide disk out while its powered up - pull the ide disk out while you're writing a 2GB file to it - or insert the disk while the rest of the systme is up and running if you have to power down to take the ide disk out, you might as well do a clean shutdown and replace the disk the hard way with a screw driver instead of nice ($50 expensive) drive bay handle $ 50 can be an extra 80GB of disk space when a good sale is occuring at the local fries stores > Does everyone remember this: > > http://www.tomshardware.com/storage/20030425/index.html > > My only problem with this approach is off-site storage of > backups. Do you pull a huge number of drives and move them > off-site? (I still love the idea of using inexpensive drives for > backup instead of tape though). i suppose you can do "incremental" backups across the wire ... and "inode" based backups too ... - it'd be crazy to xfer the entire 1MB file if only 1 line changed in it c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Thu Oct 9 08:24:20 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu, 9 Oct 2003 14:24:20 +0200 (CEST) Subject: building a RAID system In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Mark Hahn wrote: > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. Yes, but the kernel might be dumb at times, like when splitting large requests into small pieces to be fed to the block subsystem just to be reassembled again before being sent to the disk :-) Another issue is how this memory is used by the drive firmware. I've seen tests that show some Fujitsu SCSI disks (MAN or MAP series, IIRC) perform much better than competitors in multi-user situations (lots of different files accessed by different users, supposedly scattered on the disk) while the competitors were better at streaming media (one big file used by a single user, supposedly contiguously placed on disk). > unless your workload is dominated by tiny, random seeks, Or your file-system becomes full and thus fragmented. Been there, done that! I've had a big storage device changed from ext3 to XFS because ext3 at about 50% fragmentation was horribly slow; XFS allows live (without unmounting or mounting "ro") defragmentation. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 09:42:46 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 09:42:46 -0400 (EDT) Subject: building a RAID system - yup - superglue In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Alvin Oga wrote: > > Pants AND suspenders. Superglue around the waistband, actually. Who > > wants to be caught with their pants down in this way? > > always got bit by tapes... somebody didnt change the tape on the 13th > a couple months ago ... and critical data is now found to be missing > - people do forget to change tapes ... or clean heads... > ( thats the part i dont like about tapes .. and is the most > ( common failure mode for tapes ... easily/trivially avoided by > ( disks-to-disk backups > > - people get sick .. people go on vacations .. people forget > > > - no (similar) problems since doing disk-to-disk backups > - and i usually have 3-6 months of full backups floating around > in compressed form All agreed. And tapes aren't that permanent a medium either -- they deteriorate on a timescale of years to decades, with data bleeding through the film, dropped bits due to cosmic ray strikes, depolymerization of the underlying tape itself. Even before the tape itself is unreadable, you are absolutely certain to be unable to find a working drive to read it with. I have a small pile of obsolete tapes in my office -- tapes made with drives that no longer "exist", and that is after dumping the most egregiously useless of them. Still, I'd argue that the best system for many environments is to use all three: RAID, real backup to (separate) disk, possibly a RAID as well, and tape for offsite and archival purposes. The first two layers protect you against the TIME required to handle users accidentally deleting files (the most common reason to access a backup) as retrieval is usually nearly instantaneous and not at all labor intensive. It also protects you agains the most common single-server failures that get past the protection of RAID itself (multidisk failures, blown controllers). The tape (with periodic offsite storage) protects you against server room fire, brownouts or spikes that cause immediate data corruption or disk loss on both original and backup servers, and tapes can be saved for years -- far longer than one typically can go back on a disk backup mechanism. Users not infrequently want to get at a file version they had LAST YEAR, especially if they don't use CVS. Finally, some research groups generate data that exceeds even TB-scale disk resources -- they constantly move data in and out of their space in GB-sized chunks. They often like to create their own tape library as a virtual extension of the active space. Tapes aren't only about backup. So you engineer according to what you can afford and what you need, making the usual compromises brought about by finite resources. BTW, one point that hasn't been made in the soft vs hard RAID argument is that with hard RAID you are subject to (proprietary) HARDWARE obsolescence, which typically is more difficult to control than software. You build a RAID, populate it, use it. After a few years, the RAID controller itself dies (but the disks are still good). Can you get another? One that can actually retrieve the data on your disks? There are no guarantees. Maybe the company that made your controller is still in business (or rather, still in the RAID business). Maybe they either still carry old models, or can do depot repair, or maybe new models can still handle the raid encoding they implemented with the old model. Maybe you can AFFORD a new model, or maybe it has all sorts of new features and costs 3x as much as the first one did (which may not have been cheap). Maybe it takes you weeks to find a replacement and restore access to your data. Soft RAID can have problems of its own (if the software for example evolves to where it is no longer backwards compatible) but it is a whole lot easier to cope with these problems and they are strictly under your control. You are very unlikely to have any "event" like the death of the RAID server that prevents you from retrieving what is on the disks (at a cost likely to be quite controllable and in a timely way) as long as the disks themselves are not corrupted. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.worsham at mci.com Thu Oct 9 09:07:25 2003 From: michael.worsham at mci.com (Michael.Worsham) Date: Thu, 09 Oct 2003 09:07:25 -0400 Subject: CAD Message-ID: <000201c38e66$49b2aa40$94022aa6@Wcomnet.com> My wife works for a construction/architecture firm and handles AutoCad files like this all the time (some even larger at times, depending on the client). One thing we looked at first was what platform they were running the AutoCad on. Windows XP or 95/98 can't really handle Autocad as it is a highly intensive CPU application. We had a similar 'old' layout where CAD machines were more based as a word processing workstation than as a CAD station. Given the amount of work this firm produced in a single day, we went for a Dual Xeon P4 setup w/ 4 GB ram and 36 GB SCSI hard drives loaded with Windows 2000 Pro Workstation. When deciding the P4 hardware platform, look for boards that have PCI-X slots... esp for giganet NIC cards and if needed, Hardware RAID SCSI adapters. Refrain from using ATA, esp since CAD likes to really utilize the hard drives and ATA would most likely wear out faster. (Though some might look that using Xeon is overkill, lets just say there are many times it has come in handy when the customer shows up on-site unexpectedly and wants to see a progress report or has changes to be added. Pulling up the program and the data file in a couple of seconds rather than several minutes makes a beliver out of you in an instant.) If the file is being downloaded from a file server, using standard 10/100 via a cheap hub isn't going to cut it. Best to utilize something of a 10/100/1000 switch (ie. copper giganet) and 10/100/1000 NICs in each of the machines. Make sure the card is set for FULL-DUPLEX to fully utilize the bandwidth needed esp for downloading large files from the file server. Based on the file server specs, its is similar to that of the workstations however it is running Windows 2000 Advanced Server w/ Veritas Backup... can't be too careful for DR measures, esp with CAD files of this caliber. -- M Michael Worsham MCI/Intermedia Communications System Administrator & Applications Engineer Phone: 813-829-6845 Vnet: 838-6845 E-mail: michael.worsham at mci.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 07:47:55 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 04:47:55 -0700 (PDT) Subject: building a RAID system - yup - superglue In-Reply-To: Message-ID: hi ya robert On Thu, 9 Oct 2003, Robert G. Brown wrote: > On Wed, 8 Oct 2003, Alvin Oga wrote: > > > - will never touch a tape backup again ... too sloow > > and too unreliable no matter how clean the tape heads are > > ( too slow being the key problem for restoring ) > > C'mon, Alvin. Sometimes this is a workable solution, sometimes it just > plain is not. What about archival storage? What about offsite storage? > What about just plain moving certain data around (where networks of ANY > sort might be held to be untrustworthy). What about due diligence if > you were are corporate IT exec held responsible for protecting client > data against loss where the data was worth real money (as in millions to > billions) compared to the cost of archival media and mechanism? "never > touch a tape backup again" is romantic and passionate, but not > necessarily sane or good advice for the vast range of humans out there. yup .. maybe an oversimplied statement ... tapes are my (distant) 2nd choice for backups of xx-Terabyte sized servers.. disk-to-disk being my first choice ( preferrably to 2 other similar sized machines ) ( it's obviously not across a network :-) i randomly restore from backups and do a diff w/ the current servers before it dies .. > Pants AND suspenders. Superglue around the waistband, actually. Who > wants to be caught with their pants down in this way? always got bit by tapes... somebody didnt change the tape on the 13th a couple months ago ... and critical data is now found to be missing - people do forget to change tapes ... or clean heads... ( thats the part i dont like about tapes .. and is the most ( common failure mode for tapes ... easily/trivially avoided by ( disks-to-disk backups - people get sick .. people go on vacations .. people forget - no (similar) problems since doing disk-to-disk backups - and i usually have 3-6 months of full backups floating around in compressed form c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Thu Oct 9 09:26:45 2003 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Thu, 9 Oct 2003 09:26:45 -0400 (EDT) Subject: building a RAID system - 8 drives In-Reply-To: <3F854061.3040208@lmco.com> Message-ID: On Thu, 9 Oct 2003 at 7:02am, Jeff Layton wrote > spare. Works great! It's saved our bacon a few times. I've > wanted to test hot-swap with 3ware controllers, but have > never done it. Has anyone tested the hotswap capability of > the 3ware controllers/cases? Yes, and it works just as advertised. To add my $.05 to the discussion, I'm a pretty big fan of the 3wares -- I currently have 5TB of formatted space (with about 2TB of data) on them. I have two servers with 2 cards and 16 drives in them, and one with 1 card and 8 drives. On the two board servers, I run the 3wares in hardware RAID mode (R5 with a hot spare), and then do a software stripe across the two hardware arrays. With the boards on separate PCI busses, this lets the stripe go faster than the 266MB/s that the boards are limited to (these are 7500 boards, which are 64/33). 3ware's 3DM also lets you monitor the status of your arrays (it's almost too verbose, actually), and do all sorts of online maintenance. Not having used mdadm much, I can't really compare the functionality of the two. A couple of nice features of 3DM is that it lets you schedule array verification and background disk scanning, which can find problems before they affect the array. I'm not sure what cases or backplane these systems use (I bought 'em from Silicon Mechanics, who I highly recommend), but the hot swap has always just worked. If anyone's interested, I have benchmarks (bonnie++ and tiobench) of one of the 2 board systems using pure software RAID as well as the setup above. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 10:09:56 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 10:09:56 -0400 (EDT) Subject: building a RAID system - 8 drives In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Alvin Oga wrote: > > My only problem with this approach is off-site storage of > > backups. Do you pull a huge number of drives and move them > > off-site? (I still love the idea of using inexpensive drives for > > backup instead of tape though). > > i suppose you can do "incremental" backups across the wire ... > and "inode" based backups too ... > > - it'd be crazy to xfer the entire 1MB file if > only 1 line changed in it http://rdiff-backup.stanford.edu/ The name says it all. I believe it is built on top of rsync -- at any rate it is distributed in an rpm named librsync. Awesome tool -- creates a mirror, then saves incremental compressed diffs. It is the way we can restore so quickly and yet maintain a decent archival/historical backup where a user CAN request file X from last friday (or even the version between the hours of midnight and noon on last friday). Efficient enough to run several times a day on the most active part of your space and not eat a hell of a lot of either disk or network BW. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Thu Oct 9 09:48:10 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Thu, 09 Oct 2003 09:48:10 -0400 Subject: building a RAID system - yup In-Reply-To: References: Message-ID: <1065707290.4708.28.camel@protein.scalableinformatics.com> On Thu, 2003-10-09 at 07:26, Robert G. Brown wrote: > users tapes of special areas or data on request. The tape system is > expensive, but a tiny fraction of the cost of the loss of data due to > (say) a server room fire, or a monopole storm, or a lightning strike on > the primary room feed that fries all the servers to toast. Monopole storm... (smile) I seem to remember (old bad and likely wrong memory) that Max Dresden had predicted one monopole per universe as a consequence of the standard model. Not my area of (former) expertise, so reality may vary from my memory ... [...] > I don't think a sysadmin is ever properly paranoid about data loss until > they screw up and drop somebody's data for which they were responsible > because of inadequate backups. Our campus OIT just dropped a big chunk I always ask my customers a simple question: What is the cost to you to recreate all the data you lost when your disk/tape dies? That is I tend to recommend multiple redundant systems for backup. I also like to point out that you can build a single point of failure into any system, and the cost of recovering from that failure needs to be considered when designing systems to back up the possibly failing systems. If you backup all your systems over the network, and your network dies, are you in a bad way when you need to restore? What about, if you back up everything to a single tape drive, and the drive dies (and you need your backup). Single points of failure are critical to identify. They are also critical to estimate impact from. Most folks have a backup solution of some sort. Some of them are even reasonable, though few of them are about to withstand a single failure in a critical component. My old research group has a tape changer robot and drive from a well known manufacturer. Said well known manufacturer recently told them that since the unit was EOLed about 2 years ago, there would be no more fixes available for it. They (the research group) told me that they were having trouble with it... One tape drive, one point of failure. Tape drive company is happy because you now have to drop a chunk of change on their new units, or scour eBay for old ones. -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Oct 9 08:30:45 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 09 Oct 2003 12:30:45 GMT Subject: building a RAID system - 8 drives In-Reply-To: References: Message-ID: <20031009123045.7582.qmail@houston.wolf.com> Alvin Oga writes: > > On Thu, 9 Oct 2003, Trent Piepho wrote: > >> On Thu, 9 Oct 2003, Alvin Oga wrote: > nobody swaps disks around ... unless one is using those 5.25" drive bay > thingies in which case ... thats a different ball game No quite true. We use Rare drives (one box) to move up to a TB of data around w/o having to take the time to create tapes and then download them. That takes a lot of time, even w/ LTOs. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rtomek at cis.com.pl Thu Oct 9 10:22:53 2003 From: rtomek at cis.com.pl (Tomasz Rola) Date: Thu, 9 Oct 2003 16:22:53 +0200 (CEST) Subject: PocketPC Cluster In-Reply-To: <200310090947.33601.csamuel@vpac.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 9 Oct 2003, Chris Samuel wrote: > Not strictly a Beowulf as there's no Linux, but interesting nonetheless. :-) > > IrDA for the networking, 11 compute + 1 management, slower than "a mainstream > Pentium II-class desktop PC" (they don't specify what spec). > > http://www.spbsoftwarehouse.com/dev/articles/pocketcluster/index.html Yes, it's nice of course. One can also build such cluster with Linux-based devices: http://www.handhelds.org/ I myself would like to see if the performance changes after switching to Linux. One thing that should be considered is cooling. On my iPAQ, when cpu load gets too high for too long, the joy button warms itself. This means, cpu is even more heated. The other issue is power consumption. If I understand what SBP did, they run the cluster on electricity from the wall, not from the battery. My own observetion suggests, that running high load on battery consumes about 2-3 times more power than things like reading html files. - From the performance side, I wonder how this compares to the following page: http://www.applieddata.net/design_Benchmark.asp which suggests StrongARM SA 1100 @200 is 3x faster than Pentium @166? I was interested myself, so I ran the quick test on my own iPAQ 3630 (SA 1110 @206) and on AMD-k6-2 @475. On iPAQ: - -bash-2.05b# `which time` -p python /tmp/erasieve.py --limit 1000 --quiet real 0.94 user 0.91 sys 0.04 On K6: => (1020 29): /usr/bin/time -p erasieve.py --limit 1000 --quiet real 0.51 user 0.49 sys 0.02 So, how can 12 PocketPCs be slower than 1 p2 (with no clock given at all, but if I remember they were about 500MHz at best)? If I haven't misunderstood something, they probably didn't tuned their experiment too well. BTW, most PDA cpus lack fpu. So, while such claster may be nice to ad-hoc password breaking, with nanoscale simulation it will be rather the opposite, I think. bye T. - -- ** A C programmer asked whether computer had Buddha's nature. ** ** As the answer, master did "rm -rif" on the programmer's home ** ** directory. And then the C programmer became enlightened... ** ** ** ** Tomasz Rola mailto:tomasz_rola at bigfoot.com ** -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 5.0i for non-commercial use Charset: noconv iQA/AwUBP4VvRBETUsyL9vbiEQJfvwCeLU3/270BajC74e+r2HEKs27QoXgAn0fP C8FHl6mDchvmMBr04oWioqg0 =wFOr -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 10:32:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 10:32:12 -0400 (EDT) Subject: Intel compilers and libraries In-Reply-To: Message-ID: On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote: > Robert, > > You covered some of the issues that we are addressing with our lawyers > right now. It's a process which, as knowledgeable as you are, I am > sure you can understand we have to go through. The comparison, sure, go through the process. Putting your own numbers up, no, I cannot see why you need lawyers to tell you you can do this. How can somebody sue you for putting up the results of your own good-faith tests of your own product? There wouldn't be a manufacturer in existence not bogged down in court if you could (successfully) sue Tide for claiming that it gets clothes cleaner and removes stains when the first time you wash a shirt with it the shirt remains dirty and stains don't come out, for example. Why, I myself would quit work and live on the proceeds of my many suits, if every product out there had to strictly live up to its claims. The most recourse the consumer has is to not buy Tide (or whatever other detergent offendeth thee, nothing against Tide but there are plenty of stains NO detergent removes except maybe xylene or fuming nitric acid based ones:-). Or, if they are really irritated -- it is a GRASS stain and the Tide ad on TV last night shows Tide succeeding against GRASS stains in particular -- they can take the box back to the store and likely get their money back. But sue Tide? Only in Ralph Nader's dreams... Caveat emptor is more than a latin phrase, it is a principle of law. You have to look at the horse's teeth yourself, or don't blame the vendor for claiming that the old nag they sold you was really a young and vibrant horse. To them perhaps it was -- it is a question of just what an old nag is (opinion) vs the age of the horse as indicated by its teeth (fact). Only if the claims are egregious (this here snake oil will cause hair to grow on your head, cure erectile dysfunction, and make you smell nice all for the reasonable price of a dollar a bottle) is there any likelihood of grievance that might be addressed. Surely your claims aren't egregious. Your product doesn't slice, dice, and even eat your meatloaf for you...does it?;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Thu Oct 9 11:48:02 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Thu, 09 Oct 2003 11:48:02 -0400 Subject: [Fwd: [Bioclusters] 2004 Bioclusters Workshop 1st Announcement -- March 2004, Boston MA USA] Message-ID: <1065714482.4713.73.camel@protein.scalableinformatics.com> -----Forwarded Message----- > ======================================================================= > MEETING ANNOUNCEMENT / CALL FOR PRESENTERS > ======================================================================= > BIOCLUSTERS 2004 Workshop > March 30, 2004 > Hynes Convention Center, Boston MA USA > ======================================================================= > > * Speakers Wanted - Please Distribute Where Appropriate * > > Organized by several members of the bioclusters at bioinformatics.org > mailing list, the Bioclusters 2004 Workshop is a networking and > educational forum for people involved in all aspects of cluster and > grid computing within the life sciences. > > The motivation for organizers of this event was the cancellation of the > O'Reilly Bioinformatics Technology Conference series and the general > lack of forums for researchers and professionals involved with the > applied use of high performance IT and distributed computing techniques > in the life sciences. > > The primary focus of the workshop will be technical presentations from > experienced IT professionals and scientific researchers discussing real > world systems, solutions, use-cases and best practices. > > This event is being held onsite at the Hynes Convention Center on the > first day of the larger 2004 Bio-IT World Conference+Expo. BioIT-World > Magazine is generously providing space and logistical support for the > meeting and workshop attendees will have access to the expo floor and > keynote addresses. Registration & fees will be finalized in short > order. > > Presentations will be broken down among a few general content areas: > > 1. Researcher, Application & End user Issues > 2. Builder, Scaling & Integration Issues > 3. Future Directions > > The organizing committee is actively soliciting presentation proposals > from members of the life science and technical computing communities. > Interested parties should contact the committee at bioclusters04 at open- > bio.org. > > > Bioclusters 2004 Workshop Committee Members > > J.W Bizzaro ? Bioinformatics Organization Inc. > James Cuff - MIT/Harvard Broad Institute > Chris Dwan - The University of Minnesota > Chris Dagdigian ? Open Bioinformatics Foundation & BioTeam Inc. > Joe Landman ? Scalable Informatics LLC > > The committee can be reached at: bioclusters04 at open-bio.org > > > About the Bioclusters Mailing List Community > > The bioclusters at bioinformatics.org mailing list is a 600+ member forum > for users, builders and programmers of distributed systems used in life > science research and bioinformatics. For more information about the > list including the public archives and subscription information please > visit http://bioinformatics.org/mailman/listinfo/bioclusters > -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Thu Oct 9 10:35:16 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Thu, 09 Oct 2003 10:35:16 -0400 Subject: building a RAID system - yup - superglue In-Reply-To: References: Message-ID: <3F857224.9040801@lmco.com> Robert G. Brown wrote: > On Thu, 9 Oct 2003, Alvin Oga wrote: > > > > Pants AND suspenders. Superglue around the waistband, actually. Who > > > wants to be caught with their pants down in this way? > > > > always got bit by tapes... somebody didnt change the tape on the 13th > > a couple months ago ... and critical data is now found to be missing > > - people do forget to change tapes ... or clean heads... > > ( thats the part i dont like about tapes .. and is the most > > ( common failure mode for tapes ... easily/trivially avoided by > > ( disks-to-disk backups > > > > - people get sick .. people go on vacations .. people forget > > > > > > - no (similar) problems since doing disk-to-disk backups > > - and i usually have 3-6 months of full backups floating around > > in compressed form > > All agreed. And tapes aren't that permanent a medium either -- they > deteriorate on a timescale of years to decades, with data bleeding > through the film, dropped bits due to cosmic ray strikes, > depolymerization of the underlying tape itself. Even before the tape > itself is unreadable, you are absolutely certain to be unable to find a > working drive to read it with. I have a small pile of obsolete tapes in > my office -- tapes made with drives that no longer "exist", and that is > after dumping the most egregiously useless of them. > > Still, I'd argue that the best system for many environments is to use > all three: RAID, real backup to (separate) disk, possibly a RAID as > well, and tape for offsite and archival purposes. > I can say with some authority that this is what we at Lockheed Aeronautics do. And rather than extend this email by quoting Bob below, we also have an HSM system that we use for data we may need in the next couple of years. Jeff -- Dr. Jeff Layton Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Thu Oct 9 07:59:54 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Thu, 9 Oct 2003 07:59:54 -0400 (EDT) Subject: building a RAID system In-Reply-To: Message-ID: I would also echo most of Mark's points aside from the 8 MB cache issue. I have seen some noticeable speed improvements using 2 MB vs 8 MB drives. I would also offer one other point. No matter whether you use SCSI or IDE drives, be absolutely certain that you keep the drives cool. The "internal" 3.5 bays in most cases are normally useless because they place several drives in almost direct contact. The drive(s) sandwiched in the middle have only their edges exposed to air and have to dissipate the bulk of their heat through the neighboring drives. I like mount the drives in 5.25 bays. This at least provides an air gap for some cooling. For large raid servers, I like to use the cheap fan coolers. They can be had for $5 - $8 each and include 2 or 3 small fans that fill in the 5.25 opening and the 5.25-to-3.5 mounting brackets. Of course, that makes for a lot of fan noise. We typically build 2 identical raid servers connected by a dedicated gigabit link to do nightly backups, both to protect from raid failure and user error. I would like to ask if anyone has investigated Benjamin LaHaise netmd application yet? http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf I think there was some discussion of it a few months ago, but I haven't seen anything lately. Thanks, Mike Prinkey Aeolus Research, Inc. On Wed, 8 Oct 2003, Mark Hahn wrote: > > - get those drives w/ 8MB buffer disk cache > > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. > > > - slower rpm disks ... usually it tops out at 7200rpm > > unless your workload is dominated by tiny, random seeks, > the RPM of the disk isn't going to be noticable. > > > - it supposedly can sustain 133MB/sec transfers > > it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE > disks in raid0. interestingly, the chipset controller is normally > not competing for the same bandwidth as the PCI, so even with > entry-level hardware, it's not hard to break 133. > > > - if you use software raid, you can monitor the raid status > > this is the main and VERY GOOD reason to use sw raid. > > > - some say scsi disks are faster ... > > usually lower-latency, often not higher bandwidth. interestingly, > ide disks usually fall off to about half peak bandwidth on inner > tracks. scsi disks fall off too, but usually less so - they > don't push capacity quite as hard. > > > - it supposedly can sustain 320MB/sec transfers > > that's silly, of course. outer tracks of current disks run at > between 50 and 100 MB/s, so that's the max sustained. you can even > argue that's not really 'sustained', since you'll eventually get > to slower inner tracks. > > > independent of which raid system is built, you wil need 2 or 3 > > more backup systems to backup your Terabyte sized raid systems > > backup is hard. you can get 160 or 200G tapes, but they're almost > as expensive as IDE disks, not to mention the little matter of a > tape drive that costs as much as a server. raid5 makes backup > less about robustness than about archiving or rogue-rm-protection. > I think the next step is primarily a software one - > some means of managing storage, versioning, archiving, etc... > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Thu Oct 9 09:34:56 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Thu, 9 Oct 2003 13:34:56 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: References: Message-ID: Robert, You covered some of the issues that we are addressing with our lawyers right now. It's a process which, as knowledgeable as you are, I am sure you can understand we have to go through. Ken ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- On Thu, 9 Oct 2003, Robert G. Brown wrote: > Date: Thu, 9 Oct 2003 09:16:43 -0400 (EDT) > From: Robert G. Brown > To: C J Kenneth Tan -- Heuchera Technologies > Cc: Greg Lindahl , beowulf at beowulf.org > Subject: Re: Intel compilers and libraries > > On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote: > > > > 4) Put your performance whitepapers on your website, or it looks > > > fishy. > > > > Our white papers are not on the Web they contain performance data, and > > particularly, performance data comparing against our competitors. It > > may expose us to libel legal issues. Putting legitimacy of any legal > > Expose you to libel suits? Say what? > > Only if you lie about your competitor's numbers (or "cook" them so that > they aren't an accurate reflection of their capabilities, as is often > done in the industry) does it expose you to libel charges or more likely > to the ridicule of the potential consumers (who tend to be quite > knowledgeable, like Greg). > > One essential element to win those crafty consumers over is to compare > apples to apples, not apples to apples that have been picked green, > bruised, left on the ground for a while in the company of some handy > worms, and then picked up so you can say "look how big and shiny and red > and worm-free our apple is and how green and tiny and worm-ridden our > competitor's apple is". A wise consumer is going to eschew BOTH of your > "display apples" (as your competitor will often have an equally shiny > and red apple to parade about and curiously bruised and sour apples from > YOUR orchard) and instead insist on wandering into the various orchards > to pick REAL apples from your trees for their OWN comparison. > > What exactly prevents you from putting your own raw numbers up, without > any listing of your competitor's numbers? You can claim anything you > like for your own product and it isn't libel. False advertising, > possibly, but not libel. Or put the numbers up with your competitor's > numbers up "anonymized" as A, B, C. And nobody will sue you for beating > ATLAS/GCC/GSL numbers -- ATLAS etc are open source tools and nobody > "owns" them to sue you or cares in the slightest if you beat them. The > most that might happen is that if you manipulate(d) ATLAS numbers so > they aren't what real humans get on real systems, people might laugh at > you or more likely just ignore you thereafter. > > What makes you any LESS liable to libel if you distribute the white > papers to (potential) customers individually? Libel is against the law > no matter how, and to who, you distribute libelous material; it is > against the law even if shrouded in NDA. It is against the law if you > whisper it in your somebody's ears -- it is just harder to prove. > Benchmark comparisons, by the way, are such a common marketing tool (and > so easily bent to your own needs) that I honestly think that there is a > tacit agreement among vendors not to challenge competitors' claims in > court unless they are openly egregious, only to put up their own > competing claims. After all, no sane company WOULD actually lie, right > -- they would have a testbed system on which they could run the > comparisons listed right there in court and everybody knows it. Whether > the parameters, the compiler, the system architecture, the tests run > etc. were carefully selected so your product wins is moot -- if it ain't > a lie it ain't libel, and it is caveat emptor for the rest (and the rest > is near universal practice -- show your best side, compare to their > worst). > > > issues aside, it is not good for any business to be engulf in legal > > squabbles. We are in the process of clearing this with our legal > > department at the moment. > > > > As I have noted in my previous e-mail, anyone who wants to get a hold > > of the white papers are welcome to please send me an e-mail. > > As if your distributing them on a person by person basis is somehow less > libelous? Or so that you can ask me to sign an NDA so that your > competitors never learn that you are libelling them? I rather think > that an NDA that was written to protect illegal activity be it libel or > drug dealing or IP theft would not stand up in court. Finally, product > comparisons via publically available benchmarks of products that are > openly for sale don't sound like trade secrets to me as I could easily > duplicate the results at home (or not) and freely publish them. > > Your company's apparent desire to conceal this comes across remarkably > poorly to the consumer. It has the feel of "Hey, buddy, wanna buy a > watch? Come right down this alley so I can show you my watches where > none of the bulls can see" compared to an open storefront with your > watches on display to anyone, consumer or competitor. This is simply my > own viewpoint, of course. I've simply never heard of a company > shrinking away from making the statement "we are better than our > competitors and here's why" as early and often as they possibly could. > AMD routinely claims to be faster than Intel and vice versa, each has > numbers that "prove" it -- for certain tests that just happen to be the > tests that they tout in their claims, which they can easily back up. > For all the rest of us humans, our mileage may vary and we know it, and > so we mistrust BOTH claims and test the performance of our OWN programs > on both platforms to see who wins. > > I'm certain that the same will prove true for your own product. I don't > care about your benchmarks except as a hook to "interest" me. Perhaps > they will convince me to get you to loan me access to your libraries etc > to link them into my own code to see if MY code speeds up relative to > the way I have it linked now, or relative to linking with a variety of > libraries and compilers. Then I can do a real price/performance > comparison and decide if I'm better off buying your product (and buying > fewer nodes) or using an open source solution that is free (and buying > more nodes). Which depends on the scaling properties of MY application, > costs, and so forth, and cannot be predicted on the basis of ANY paper > benchmark. > > Finally, don't assume that this audience is naive about benchmarking or > algorithms, or at all gullible about performance numbers and vendor > claims. A lot of people on the list (such as Greg) almost certainly > have far more experience with benchmarks than your development staff; > some are likely involved in WRITING benchmarks. If you want to be taken > seriously, put up a full suite of benchmarks, by all means, and also > carefully indicate how those benchmarks were run as people will be > interested in duplicating them and irritated if they are unable to. > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Thu Oct 9 10:57:21 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Thu, 09 Oct 2003 09:57:21 -0500 Subject: building a RAID system In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org> References: <1065642419.9483.55.camel@qeldroma.cttc.org> Message-ID: <3F857751.4090009@tamu.edu> I've recently built a 2TB (well, a little less really) ATA RAID using a pair of HighPoint 374 controlers and 10 250-GB Maxtor 8 MB cache drives (plus a 60 GB drive for the system). It's running as 2 1TB arrays, because of disparate applications, right now. Initially, the drivers for RH9 were not available so we started with RH7.3 and all the updates; they're there now and and allow cross-card arrays. Down the pike we might re-install and span the controllers. I've also recently done a 2-drive striped array supporting a meteorology data application with a lot of data acquisition and database work. It's mounted to a number of other systems via NFS. Uses a Promise Technologies TX2000 and a pair of 80 GB Maxtors. Both RAID systems have worked very well. I suspect the next one I build will incorporate Serial ATA instead of parallel. I doubt I'll build another SCSI RAID for my applications. Gerry Creager Texas Mesonet Texas A&M University Daniel Fernandez wrote: > Hi, > > I would like to know some advice about what kind of technology apply > into a RAID file server ( through NFS ) . We started choosing hardware > RAID to reduce cpu usage. > > We have two options , SCSI RAID and ATA RAID. The first would give the > best results but on the other hand becomes really expensive so we have > in mind two ATA RAID controllers: > > Adaptec 2400A > 3Ware 6000/7000 series controllers > > Any one of these has its strong and weak points, after seeing various > benchmarks/comparisons/reviews these are the only candidates that > deserve our attention. > > The server has a dozen of client workstations connected through a > switched 100Mbit LAN , all of these equipped with it's own OS and > harddisk, all home directories will be stored under the main server, > main workload (compilation and edition) would be done on the local > machines tough, server only takes care of file sharing. > > Also parallel MPI executions will be done between the clients. > > Considering that not all the workstantions would be working full time > and with cost in mind ? it's worth an ATA RAID solution ? > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 10:39:48 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 10:39:48 -0400 (EDT) Subject: building a RAID system - yup In-Reply-To: <1065707290.4708.28.camel@protein.scalableinformatics.com> Message-ID: On Thu, 9 Oct 2003, Joseph Landman wrote: > On Thu, 2003-10-09 at 07:26, Robert G. Brown wrote: > > > users tapes of special areas or data on request. The tape system is > > expensive, but a tiny fraction of the cost of the loss of data due to > > (say) a server room fire, or a monopole storm, or a lightning strike on > > the primary room feed that fries all the servers to toast. > > Monopole storm... (smile) I seem to remember (old bad and likely wrong > memory) that Max Dresden had predicted one monopole per universe as a > consequence of the standard model. Not my area of (former) expertise, > so reality may vary from my memory ... Hell, there are more than that in California alone. So far monopoles have been discovered there at least twice; once on superconducting niobium balls in a Milliken experiement (but they went away when the balls were washed and never returned, go figure) and once in a superconduction flux trap although the events MIGHT have been caused by somebody flicking a light switch down the hall...:-) Seriously, this is theory vs experiment, and as a theorist I firmly defer to experiment. Until we find an (isolated) monopole, they are just a very attractive, compelling even, extension of Maxwell's equations and related field theories that (as a "defect") help us understand why certain quanties are quantized, or add a certain symmetry to the theory that is otherwise broken. However, it does amuse me to think of hard disks as being "experiments" like the flux loop experiment to measure the existence of monopoles. It would be interesting to determine a "signature" of disk penetration by a cosmic ray monopole and scan a small mountain of crashed disks for the signature, if such a signature is in any way unique. Such a mountain represents a lot more event phase space than a single loop or set of loops in a California laboratory. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Thu Oct 9 12:08:20 2003 From: lathama at yahoo.com (Andrew Latham) Date: Thu, 9 Oct 2003 09:08:20 -0700 (PDT) Subject: Raid Deffinitions Message-ID: <20031009160820.2217.qmail@web60304.mail.yahoo.com> Discussing a client setup the other day a cohort and I came to a different opinion on what each raid level does. Is there a guide/standard to define how it should work. Also do any vendors stray from the beaten path and add there own levels? ===== Andrew Latham Penguin loving, moralist agnostic. LathamA.com - (lay-th-ham-eh) lathama at lathama.com - lathama at yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Oct 9 14:24:22 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 9 Oct 2003 11:24:22 -0700 Subject: Intel compilers and libraries In-Reply-To: References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> Message-ID: <20031009182422.GB1865@greglaptop.internal.keyresearch.com> On Thu, Oct 09, 2003 at 10:04:20AM +0000, C J Kenneth Tan -- Heuchera Technologies wrote: > Our white papers are not on the Web they contain performance data, and > particularly, performance data comparing against our competitors. It > may expose us to libel legal issues. Welcome to the Internet. In the US, that's not an issue, so we're used to being able to get our performance data without having to ask a human. BTW, in the US, your lawyers would recommend that your "Up to 32X faster" claim would need a "results not typical" disclaimer. > > I looked and didn't see a single performance claim there. > > There is one on the front page! Sorry, I should have said "didn't see a single credible performance claim there". Bogus-looking claims do not help you sell to the HPC market, either in the US or Europe. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dag at sonsorol.org Thu Oct 9 11:42:53 2003 From: dag at sonsorol.org (chris dagdigian) Date: Thu, 09 Oct 2003 11:42:53 -0400 Subject: 2004 Bioclusters Workshop 1st Announcement -- March 2004, Boston MA USA Message-ID: <3F8581FD.3080404@sonsorol.org> ======================================================================= MEETING ANNOUNCEMENT / CALL FOR PRESENTERS ======================================================================= BIOCLUSTERS 2004 Workshop March 30, 2004 Hynes Convention Center, Boston MA USA ======================================================================= * Speakers Wanted - Please Distribute Where Appropriate * Organized by several members of the bioclusters at bioinformatics.org mailing list, the Bioclusters 2004 Workshop is a networking and educational forum for people involved in all aspects of cluster and grid computing within the life sciences. The motivation for organizers of this event was the cancellation of the O'Reilly Bioinformatics Technology Conference series and the general lack of forums for researchers and professionals involved with the applied use of high performance IT and distributed computing techniques in the life sciences. The primary focus of the workshop will be technical presentations from experienced IT professionals and scientific researchers discussing real world systems, solutions, use-cases and best practices. This event is being held onsite at the Hynes Convention Center on the first day of the larger 2004 Bio-IT World Conference+Expo. BioIT-World Magazine is generously providing space and logistical support for the meeting and workshop attendees will have access to the expo floor and keynote addresses. Registration & fees will be finalized in short order. Presentations will be broken down among a few general content areas: 1. Researcher, Application & End user Issues 2. Builder, Scaling & Integration Issues 3. Future Directions The organizing committee is actively soliciting presentation proposals from members of the life science and technical computing communities. Interested parties should contact the committee at bioclusters04 at open- bio.org. Bioclusters 2004 Workshop Committee Members J.W Bizzaro ? Bioinformatics Organization Inc. James Cuff - MIT/Harvard Broad Institute Chris Dwan - The University of Minnesota Chris Dagdigian ? Open Bioinformatics Foundation & BioTeam Inc. Joe Landman ? Scalable Informatics LLC The committee can be reached at: bioclusters04 at open-bio.org About the Bioclusters Mailing List Community The bioclusters at bioinformatics.org mailing list is a 600+ member forum for users, builders and programmers of distributed systems used in life science research and bioinformatics. For more information about the list including the public archives and subscription information please visit http://bioinformatics.org/mailman/listinfo/bioclusters _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Thu Oct 9 14:40:02 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Thu, 9 Oct 2003 11:40:02 -0700 (PDT) Subject: building a RAID system - yup In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Robert G. Brown wrote: > It probably refers to burst delivery out of its 8 MB cache. The actual > sustained bps speed is a pure matter of N*2*\pi*\R*f/S, where N = number A hard drive only reads from one head at a time. It's not possible to align every head with each other to such a degree that every track in a cylinder is readable at once. If you look at a given drive family of drives, each different sized drive is the same basic hardware with more discs/heads. For instance Seagate's Cheetah 15K.3 family (http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah15k.3.pdf) has the exact same internal transfer rate (609-891 megabits/sec) for the 18 GB model with 2 heads, the 36GB with 4 heads, and the 73GB with 8. > read radius, and S is the linear length per bit. This is an upper > bound. Similarly average latency (seek time) is something like 1/2f, > the time the platter requires to move half a rotation. The average latency is indeed 1/2 the rotational period. For a 7200 RPM drive it is 4.16 ms, for a 15k RPM drive it's 2 ms. Seek time is something completely different. It's how long it takes the head to move from one track to another. It does not included the latency. You might see track-to-track, full stroke, and average seek times in a datasheet. > I should also point out that since we've been using the RAIDs we have > experienced multidisk failures that required restoring from backup on > more than one occasion. The book value probability for even one I've had one multidisk failure in a RAID5 system. It was after moving into a new building, one array had three out of six disks fail to spin up. Of course I had anticipated this, and made a backup, to tape, just before the move. None of the tapes were damaged in transit. I've had several single drive failures. I've never seen anyone with significant number of drive-years of experience say they've never seen a drive fail. And no manufacture has a failure rate anywhere near 0%. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Thu Oct 9 13:47:43 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Thu, 9 Oct 2003 13:47:43 -0400 (EDT) Subject: building a RAID system - 8 drives In-Reply-To: <20031009123045.7582.qmail@houston.wolf.com> Message-ID: On Thu, 9 Oct 2003, Angel Rivera wrote: > Alvin Oga writes: > > > > > On Thu, 9 Oct 2003, Trent Piepho wrote: > > > >> On Thu, 9 Oct 2003, Alvin Oga wrote: > > > nobody swaps disks around ... unless one is using those 5.25" drive bay > > thingies in which case ... thats a different ball game > > No quite true. We use Rare drives (one box) to move up to a TB of data > around w/o having to take the time to create tapes and then download them. > That takes a lot of time, even w/ LTOs. Jim Grey just recommends moving the whole computer: http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=43 JG It's a very convenient way of distributing data. DP Are you sending them a whole PC? JG Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, and seven 300-GB disks--all for about $3,000. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From canon at nersc.gov Thu Oct 9 14:30:46 2003 From: canon at nersc.gov (canon at nersc.gov) Date: Thu, 09 Oct 2003 11:30:46 -0700 Subject: building a RAID system In-Reply-To: Message from Daniel Fernandez of "Wed, 08 Oct 2003 21:46:59 +0200." <1065642419.9483.55.camel@qeldroma.cttc.org> Message-ID: <200310091830.h99IUkNr014912@pookie.nersc.gov> Daniel, We have around 50 3ware boxes with a total formated space of around 50 TB. We run all of these in HW raid mode. I would avoid using software raid if you plan to have more than a dozen or so clients. Our experience is that while software raid works great, it scales poorly. This was very noticeable when the server processors were PIII class. It may be less of an issue with newer processors, but I would still recommend HW raid if the card supports it. Also, we like the 3ware cards because they have been supported by linux for ages now. Some of the other cards have been a little dicey. With our newest systems we've seen aggregate performance for a single server of around 70 MB/s and they appear to scale quite well (handle over 50 clients). This last batch of systems have 12 250 GB drives, a 12 port 3ware card, dual Xeon, on-board gigE and cost less than $7k. Also, the 3ware systems hot swap very well. We make use of it all the time. --Shane ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 PSDF Project Lead fax: 510-486-7520 National Energy Research Scientific Computing Center 1 Cyclotron Road Mailstop 943-256 Berkeley, CA 94720 canon at nersc.gov ------------------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 17:20:25 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 14:20:25 -0700 (PDT) Subject: Raid Deffinitions In-Reply-To: <20031009160820.2217.qmail@web60304.mail.yahoo.com> Message-ID: On Thu, 9 Oct 2003, Andrew Latham wrote: > Discussing a client setup the other day a cohort and I came to a different > opinion on what each raid level does. Is there a guide/standard to define how > it should work. Also do any vendors stray from the beaten path and add there > own levels? http://www.1U-Raid5.net/Differences - definitions, and pretty pictures too c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gsheppar at gpc.edu Thu Oct 9 15:43:57 2003 From: gsheppar at gpc.edu (Gene Sheppard) Date: Thu, 09 Oct 2003 15:43:57 -0400 Subject: Inquiry small system S/W In-Reply-To: Message-ID: We here are Georgia Perimeter College are planning on putting together a 5 or 6 node Beowulf system. My question: Is there any software for a system like this? What applications have been tested on a small system? If there are none, what is the smallest system out there? Thank you for your help. GEne ============================================== Gene Sheppard Georgia Perimeter College Computer Science 1000 University Center Lane Lawrenceville, GA 30043 678-407-5243 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Thu Oct 9 17:04:30 2003 From: rodmur at maybe.org (Dale Harris) Date: Thu, 9 Oct 2003 14:04:30 -0700 Subject: building a RAID system - 8 drives In-Reply-To: References: <20031009123045.7582.qmail@houston.wolf.com> Message-ID: <20031009210430.GD11051@maybe.org> On Thu, Oct 09, 2003 at 01:47:43PM -0400, Michael T. Prinkey elucidated: > > > > No quite true. We use Rare drives (one box) to move up to a TB of data > > around w/o having to take the time to create tapes and then download them. > > That takes a lot of time, even w/ LTOs. > > Jim Grey just recommends moving the whole computer: > > http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=43 > > > JG It's a very convenient way of distributing data. > > DP Are you sending them a whole PC? > > JG Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, > and seven 300-GB disks--all for about $3,000. > Kind of reminds me of a favorite fortune cookie quotes: "Never underestimate the bandwidth of a station wagon full of tapes hurling down the highway" -- Andrew S. Tannenbaum Dale _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Thu Oct 9 14:50:17 2003 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Thu, 09 Oct 2003 20:50:17 +0200 Subject: building a RAID system In-Reply-To: References: Message-ID: <1065725416.1136.59.camel@qeldroma.cttc.org> Hi again, Thanks for the advice, also it has started an interesting thread. On Thu, 2003-10-09 at 01:39, Mark Hahn wrote: > > I would like to know some advice about what kind of technology apply > > into a RAID file server ( through NFS ) . We started choosing hardware > > RAID to reduce cpu usage. > > that's unfortunate, since the main way HW raid saves CPU usage is > by running slower ;) > I cannot get the point here, the dedicated processor should take all transfer commands and offload the CPU why it would run slower ? In some tests a raid system for a single workstation ( no networking ) it's a bit useless (slower) unless you want to transfer really big files. In a networked environment there could be a massive number of I/O commands so should be critical. > seriously, CPU usage is NOT a problem with any normal HW raid, > simply because a modern CPU and memory system is *so* much better > suited to performing raid5 opterations than the piddly little > controller in a HW raid card. the master/fileserver for my > cluster is fairly mundane (dual-xeon, i7500, dual PC1600), and > it can *easily* saturate its gigabit connection. after all, ram > runs at around 2 GB/s sustained, and the CPU can checksum at 3 GB/s! > Agreed, our server would not be doing anything more than managing NFS so, there is power to spare, where talking about an Athlon XP2600+ processor. But, a really good Parallel ATA 100/133 controller is needed, and 4 channels at least... 4 HDs in 2 master/slave channels reduces drastically performance ? any controller recommended ? But must be noted that HW RAID offers better response time. HW raid offers hotswap capability and offload our work instead of maintaining a SW raid solution ...we'll see ;) > concern for PCI congestion is a much more serious issue. > We're limited at 32 bit PCI, we cannot get around this unless spend on a highly priced PCI 64 mainboard. > finally, why do you care at all? are you fileserving through > a fast (>300 MB/s) network like quadrics/myrinet/IB? most people > limp along at a measly gigabit, which even a two-ide-disk raid0 > can saturate... > > > The server has a dozen of client workstations connected through a > > switched 100Mbit LAN , all of these equipped with it's own OS and > > jeez, since your limited to 10 MB/s, you could do raid5 on a 486 > and still saturate the net. seriously, CPU consumption is NOT an issue > at 10 MB/s. There would not be noticeable difference between SW/HW mode here. The clients would be doing write bursts of 2-5Mb per second so there must not be any problem. > > machines tough, server only takes care of file sharing. > > so excess cycles on the fileserver will be wasted unless used. > > > Considering that not all the workstantions would be working full time > > and with cost in mind ? it's worth an ATA RAID solution ? > > you should buy a single promise sata150 tx4 and four big sata disks > (7200 RPM 3-year models, please). > > regards, mark hahn. > In fact we have two choices: - Use an spare existing ( relatively obsolete ) computer and couple it with a HW RAID card. - Spend on a fast CPU computer and a good but cheap Parallel ATA controller. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 17:17:34 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 17:17:34 -0400 (EDT) Subject: building a RAID system In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org> Message-ID: On Thu, 9 Oct 2003, Daniel Fernandez wrote: > > that's unfortunate, since the main way HW raid saves CPU usage is > > by running slower ;) > > > I cannot get the point here, the dedicated processor should take all > transfer commands and offload the CPU why it would run slower ? In some > tests a raid system for a single workstation ( no networking ) it's a > bit useless (slower) unless you want to transfer really big files. In a > networked environment there could be a massive number of I/O commands so > should be critical. Key word: "should" Benchmark results: "often does not" Your best bet is to try both and run your own benchmarks and do your own cost/benefit analysis. When you say things like "better response time" one is fairly naturally driven to ask "does the difference matter", for example. Given that we run over 100 workstations from a SW RAID with nearly instantaneous (entirely satisfactory) performance, you'd have to really be hammering it to perceive a difference. > In fact we have two choices: > > - Use an spare existing ( relatively obsolete ) computer and couple it > with a HW RAID card. > > - Spend on a fast CPU computer and a good but cheap Parallel ATA > controller. Or a cheap computer + PATA or SATA controller. Even a cheap computer has 2+ GHz CPUs and hundreds of MB of RAM these days. Spend more on what you put the disks in, power, cooling. If it is an old/obsolete computer, will it have enough power, enough cooling? Regardless, the disk cost itself will dominate your costs. rgb > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 17:06:44 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 14:06:44 -0700 (PDT) Subject: building a RAID system - 8 drives - drive-net In-Reply-To: <20031009123045.7582.qmail@houston.wolf.com> Message-ID: hi ya angel On Thu, 9 Oct 2003, Angel Rivera wrote: > Alvin Oga writes: > > > nobody swaps disks around ... unless one is using those 5.25" drive bay > > thingies in which case ... thats a different ball game > > No quite true. We use Rare drives (one box) to move up to a TB of data > around w/o having to take the time to create tapes and then download them. > That takes a lot of time, even w/ LTOs. yes.. guess it makes sense to move disks around for moving tb of data like floppy-net or sneaker-net - done that ( moving disks around ) myself once in a while for a quickie fix c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Thu Oct 9 16:35:04 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Thu, 9 Oct 2003 13:35:04 -0700 (PDT) Subject: building a RAID system In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org> Message-ID: On Thu, 9 Oct 2003, Daniel Fernandez wrote: > On Thu, 2003-10-09 at 01:39, Mark Hahn wrote: > > > I would like to know some advice about what kind of technology apply > > > into a RAID file server ( through NFS ) . We started choosing hardware > > > RAID to reduce cpu usage. > > > > that's unfortunate, since the main way HW raid saves CPU usage is > > by running slower ;) > > > I cannot get the point here, the dedicated processor should take all > transfer commands and offload the CPU why it would run slower ? In some Easy, said dedicated processor and memory is quite a bit slower than the main CPU and memory. If you look at thoughput in MB/sec, the latest linux software RAID is usually much faster than hardware raid implimentations. Usually CPU usage is (stupidly) reported as just a % used during a benchmark. If you transfer fewer megabytes in second, obviously the number of CPU cycles used in that second go down as well. If CPU usage is correctly reported in units of % per MB/sec, then you get a real measure of hardware efficiency. > needed, and 4 channels at least... 4 HDs in 2 master/slave channels > reduces drastically performance > ? any controller recommended ? It seems that most good 4-12 channel (NOT drive, channel!) IDE cards ARE hardware raid controllers. Lots of people use the 3ware RAID cards in JBOD mode with software raid, because their isn't a cheaper non-hardware raid card comparable to something like the 3ware 7508-8 or 7508-12. I know about cheaper 2 and 4 channel non-raid cards, but they're 32/33 PCI and not comparable to the 3ware. > > concern for PCI congestion is a much more serious issue. > > > We're limited at 32 bit PCI, we cannot get around this unless spend on a > highly priced PCI 64 mainboard. AMD 760MPX and Intel E7501 motherboards have high speed 64/66 PCI and PCI-X for the E7501. They're not that expensive really. An additional $100-$200 at most over a single PCI 32/33 motherboard. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rokrau at yahoo.com Thu Oct 9 18:02:34 2003 From: rokrau at yahoo.com (Roland Krause) Date: Thu, 9 Oct 2003 15:02:34 -0700 (PDT) Subject: Experience with Omni anyone? Message-ID: <20031009220234.64852.qmail@web40010.mail.yahoo.com> Folks, I came across the Omni OpenMP compiler lately and I was wondering whether anyone here has used it and what the experience was. I.o.w., is it "industrial strength"? I know of and use Portland and Intel compilers but I am also curious. Roland __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at lathama.com Thu Oct 9 17:52:22 2003 From: lathama at lathama.com (Andrew Latham) Date: Thu, 9 Oct 2003 14:52:22 -0700 (PDT) Subject: Raid Deffinitions In-Reply-To: Message-ID: <20031009215222.68022.qmail@web60307.mail.yahoo.com> thanks. I know that all the different raid levels are here for a reason and raid5 is great but what are the benefits of the rest? --- Mark Hahn wrote: > > Discussing a client setup the other day a cohort and I came to a different > > opinion on what each raid level does. Is there a guide/standard to define > how > > it should work. Also do any vendors stray from the beaten path and add > there > > own levels? > > sure they do. IMO the only important levels are: > > raid0 - striping > raid1 - mirroring > raid5 - rotating parity-based array > > vendors who make a big deal of obvious extensions like raid 10 > (mirrored stripes or vice versa) are immediately hung up on by me... > ===== Andrew Latham Penguin loving, moralist agnostic. LathamA.com - (lay-th-ham-eh) lathama at lathama.com - lathama at yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Thu Oct 9 16:24:36 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Thu, 9 Oct 2003 15:24:36 -0500 (CDT) Subject: Inquiry small system S/W In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Gene Sheppard wrote: > We here are Georgia Perimeter College are planning on putting together a 5 > or 6 node Beowulf system. > > My question: > Is there any software for a system like this? > What applications have been tested on a small system? > > If there are none, what is the smallest system out there? > > Thank you for your help. > > GEne System or application software? For system software, any of the beowulf kits will work. http://warewulf-cluster.org/ http://www.scyld.com/ http://oscar.sourceforge.net/ http://rocks.npaci.edu/ http://clic.mandrakesoft.com/index-en.html and others. Most applications will run just fine on 5 or 6 nodes. To start with, i'd get HPL and PMB running to ensure everything is working fine. Then you can look at other applications to see what you might actually be able to benefit from. -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Oct 9 16:44:28 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 9 Oct 2003 16:44:28 -0400 (EDT) Subject: building a RAID system - yup In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Robert G. Brown wrote: > Hell, there are more than that in California alone. So far monopoles Forgot to mention the California "megapoll" which just occurred on Tuesday. Sorry, I could not help myself. Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Oct 9 18:08:51 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 9 Oct 2003 15:08:51 -0700 (PDT) Subject: Raid Deffinitions In-Reply-To: <20031009215222.68022.qmail@web60307.mail.yahoo.com> Message-ID: On Thu, 9 Oct 2003, Andrew Latham wrote: > thanks. > > I know that all the different raid levels are here for a reason and raid5 is > great but what are the benefits of the rest? 0 is fast (interleaved chunks) but provides no redundancy. 1 is a a 1 + 1 mirror... can be faster on reads but is generally slower on writes depending on your controller/implementation... 0 + 1 or 1 + 0 striped mirror or mirrored stripe. less space efficient than raid 5 but faster in general. can survive multiple disk failures so long as both disks containing the same information don't fail at once. > --- Mark Hahn wrote: > > > Discussing a client setup the other day a cohort and I came to a different > > > opinion on what each raid level does. Is there a guide/standard to define > > how > > > it should work. Also do any vendors stray from the beaten path and add > > there > > > own levels? > > > > sure they do. IMO the only important levels are: > > > > raid0 - striping > > raid1 - mirroring > > raid5 - rotating parity-based array > > > > vendors who make a big deal of obvious extensions like raid 10 > > (mirrored stripes or vice versa) are immediately hung up on by me... > > > > > ===== > Andrew Latham > > Penguin loving, moralist agnostic. > > LathamA.com - (lay-th-ham-eh) > lathama at lathama.com - lathama at yahoo.com > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Oct 9 19:19:16 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 09 Oct 2003 23:19:16 GMT Subject: building a RAID system - 8 drives - drive-net In-Reply-To: References: Message-ID: <20031009231916.21008.qmail@houston.wolf.com> Alvin Oga writes: > > hi ya angel > > On Thu, 9 Oct 2003, Angel Rivera wrote: > >> Alvin Oga writes: >> >> > nobody swaps disks around ... unless one is using those 5.25" drive bay >> > thingies in which case ... thats a different ball game >> >> No quite true. We use Rare drives (one box) to move up to a TB of data >> around w/o having to take the time to create tapes and then download them. >> That takes a lot of time, even w/ LTOs. > > yes.. guess it makes sense to move disks around for moving tb of data > like floppy-net or sneaker-net > - done that ( moving disks around ) myself once in a while > for a quickie fix When you have that much data, it is easier and faster to load 8 drives into a box than tons of tapes. take out the old drives and place the new ones in, mount it, export it and voila-it is on-line. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 19:36:29 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 16:36:29 -0700 (PDT) Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: <20031009231916.21008.qmail@houston.wolf.com> Message-ID: hi ya On Thu, 9 Oct 2003, Angel Rivera wrote: .. > > yes.. guess it makes sense to move disks around for moving tb of data > > like floppy-net or sneaker-net > > - done that ( moving disks around ) myself once in a while > > for a quickie fix > > When you have that much data, it is easier and faster to load 8 drives into > a box than tons of tapes. take out the old drives and place the new ones > in, mount it, export it and voila-it is on-line. yes and a "bunch of disks" (raid5) survives the loss of one dropped disk and is relatively secure from prying eyes .... - ceo gets one disk - cfo gets one disk - hr gets one disk - eng gets one disk - sys admin gets one disk ( combine all[-1] disks together to recreate the (raid5) TB data ) - a single (raid5) disk by itself is basically worthless tape backups are insecure ... - lose a tape ( bad tape, lost tape ) and and all its data is lost - anybody can read the entire contents of the full backup ( one could tar up one disk per tape, instead of tar'ing the ( whole raid5 subsystem, to provide the ( same functionality as a raid5 offsite disk backup c ya alvin and hopefully .. the old disks are not MFM drives.. or ata-133 in a new sata system :-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Oct 9 19:51:05 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 09 Oct 2003 23:51:05 GMT Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: References: Message-ID: <20031009235105.23420.qmail@houston.wolf.com> Alvin Oga writes: >> yes and a "bunch of disks" (raid5) survives the loss of one dropped disk > and is relatively secure from prying eyes .... Well, let's see. We can backup the data to tapes or to disks-disks are faster. From the time the data is on the disk, 1/2-1.0 hours to get to us, a few minutes to install them and voila you are on-line. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 21:31:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 21:31:13 -0400 (EDT) Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Alvin Oga wrote: > yes and a "bunch of disks" (raid5) survives the loss of one dropped disk > and is relatively secure from prying eyes .... > - ceo gets one disk > - cfo gets one disk > - hr gets one disk > - eng gets one disk > - sys admin gets one disk > ( combine all[-1] disks together to recreate the (raid5) TB data ) > > - a single (raid5) disk by itself is basically worthless Secure from prying eyes, maybe (as in casually secure). "Secure" as in your secret plans for world domination or the details of your flourishing cocaine business are safe from the feds, not at all, unless the information is encrypted. Each disk has about one fourth of the information. English is about 3:1 compressible (really more; this is using simple symbolic compression). A good cryptanalyst could probably recover "most" of what is on the disks from any one disk, depending on what kind of data is there. Numbers, possibly not, but written communications, quite possibly. Especially if it falls in the hands of somebody who really wants it and has LOTS of good cryptanalysts. > tape backups are insecure ... > - lose a tape ( bad tape, lost tape ) and and all its data is lost > - anybody can read the entire contents of the full backup Unless it is encrypted. Without strong encryption there is no data-level security. With it there is. Maybe. Depending on what is "strong" to you and what is strong to, say, the NSA, whether your systems and network is secure, depending on whether you have dual isolation power inside a faraday cage with dobermans at the door. However, there can be as much or as little physical security for the tape as you care to put there. Tape in a locked safe, tape in an armored car. Disks are far more fragile than tapes -- drop a disk one meter onto the ground and chances are quite good that it is toast and will at best cost hundreds of dollars and a trip to specialized facilities to remount and mostly recover. Drop a tape one meter onto the ground and chance are quite good that it is perfectly fine, and even if it isn't (because e.g. the case cracked) ordinary humans can generally remount the tape in a new case without needing a clean room and special tools. Tapes are cheap -- you can afford to send almost three tapes compared to one disk. I get the feeling that you just don't like tapes, Alvin...;-) rgb > > ( one could tar up one disk per tape, instead of tar'ing the > ( whole raid5 subsystem, to provide the > ( same functionality as a raid5 offsite disk backup > > c ya > alvin > > and hopefully .. the old disks are not MFM drives.. > or ata-133 in a new sata system :-) > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From smuelas at mecanica.upm.es Fri Oct 10 01:20:06 2003 From: smuelas at mecanica.upm.es (smuelas) Date: Fri, 10 Oct 2003 07:20:06 +0200 Subject: Inquiry small system S/W In-Reply-To: References: Message-ID: <20031010072006.54dfd8a4.smuelas@mecanica.upm.es> I have put together a 8 node beowulf cluster to my greatest satisfaction and results. You don't need nothing special; if it is beowulf it must be Linux. If you use, for example, RedHat 9, what I do, you have everything needed in the standard 3 CD's distribution, that you can download at no cost. Apart from that, and in my particular case, I use fortran90 and the compiler from Intel, also free for non-comercial use. Perhaps, the only special hardware to buy is a simple, 8 nodes switch for your ethernet connections. Then, what is really important is to learn to make your software really able to use the cluster. So, some time to study MPI or similar, and work, work, work... :-) Before being an 8 nodes cluster, mine has been 4-nodes, then 6-nodes and at 8 I stopped. But there is no difference in the work to do. Just the possibilities and speed increase. Good luck!! On Thu, 09 Oct 2003 15:43:57 -0400 Gene Sheppard wrote: > We here are Georgia Perimeter College are planning on putting together a 5 > or 6 node Beowulf system. > > My question: > Is there any software for a system like this? > What applications have been tested on a small system? > > If there are none, what is the smallest system out there? > > Thank you for your help. > > GEne > > ============================================== > Gene Sheppard > Georgia Perimeter College > Computer Science > 1000 University Center Lane > Lawrenceville, GA 30043 > 678-407-5243 > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Santiago Muelas E.T.S. Ingenieros de Caminos, (U.P.M) Tf.: (34) 91 336 66 59 e-mail: smuelas at mecanica.upm.es Fax: (34) 91 336 67 61 www: http://w3.mecanica.upm.es/~smuelas _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Fri Oct 10 01:43:57 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Thu, 9 Oct 2003 22:43:57 -0700 Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: References: Message-ID: <20031010054357.GB13480@sphere.math.ucdavis.edu> On the hardware vs software RAID thread. A friend needed a few TB and bought a high end raid card (several $k), multiple channels, enclosure, and some 10's of 73GB drives for somewhere in the $50k-$100k neighborhood. He needed the capacity and a minumum of 50MB/sec sequential write performance (on large sequential writes). He didn't get it. Call #1 to dell resulted in well it's your fault, it's our top of the line, it should be plenty fast, bleah, bleah, bleah. Call #2 lead to an escalation to someone with more of a clue, tune paramater X, tune Y, try a different raid setup, swap out X, etc. After more testing without helping call #3 was escalated again someone fairly clued answered. The conversation went along the lines of what, yeah, it's dead slow. Yeah most people only care about the reliability. Oh performance? We use linux + software raid on all the similar hardware we use internally at Dell. So the expensive controller was returned, and 39160's were used in it's place (dual channel U160) and performance went up by a factor of 4 or so. In my personal benchmarking on a 2 year old machine with 15 drives I managed 200-320 MB/sec sustained (large sequential read or write), depending on filesystem and strip size. I've not witnessed any "scaling problems", I've been quite impressed with linux software raid under all conditions and have had it run significantly faster then several expensive raid cards I've tried over the years. Surviving hotswap, over 500 day uptimes, and substantial performance advantages seem to be common. Anyone have numbers comparing hardware and software raid using bonnie++ for random access or maybe postmark (netapp's diskbenchmark) Failures so far: * 3ware 6800 (awful, evil, slow, unreliable, terrible tech support) * quad channel scsi card from Digital/storage works, rather slow, then started crashing * More recently (last 6 months) the top of the line dell raid card (PERSC?) * A few random others One alternative solution I figured I'd mention is the Apple 2.5 TB array for $10-$11k isnt' a bad solution for a mostly turnkey, hotswap, redundant powersupply setup with a warranty. Dual 2 Gigabit Fiber channels does make it easier to scale to 10's of TB's then some other solutions. I managed 70 MB/sec read/write to a 1/2 Xraid (on a single FC). Of course there are cheaper solutions. Oh, I also wanted to mention one gotcha for the DIY methods. I've had I think 4 machines now with 8-15 disks, and dual 400 watt powersupplies or 3x225 watt (n+1) boot just fine for 6 months, but start complaining at boot due to to high power consumption. This is of course especially bad with EIDEs since they all spin up at boot (SCSI can usually be spun up one at a time). I suspect a slight decrease in lubrication and or degradation in the powersupplies which were possibly running above 100% to be the cause. In any case great thread, I've yet to see a performance or functionality benefit from hardware raid. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Fri Oct 10 03:24:15 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Fri, 10 Oct 2003 09:24:15 +0200 Subject: building a RAID system In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org> References: <1065725416.1136.59.camel@qeldroma.cttc.org> Message-ID: <20031010072415.GI17432@unthought.net> On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote: > Hi again, ... Others have already answered your other questions, I'll try to take one that went unanswered (as far as I can see). ... > > But must be noted that HW RAID offers better response time. In a HW RAID setup you *add* an extra layer: the dedicated CPU on the RAID card. Remember, this CPU also runs software - calling it 'hardware RAID' in itself is misleading, it could just as well be called 'offloaded SW RAID'. The problem with offloading is, that while it made great sense in the days of 1 MHz CPUs, it really doesn't make a noticable difference in the load on your typical N GHz processor. However, you added a layer with your offloaded-RAID. You added one extra CPU in the 'chain of command' - and an inferior CPU at that. That layer means latency even in the most expensive cards you can imagine (and bottleneck in cheap cards). No matter how you look at it, as long as the RAID code in the kernel is fairly simple and efficient (which it was, last I looked), then the extra layers needed to run the PCI commands thru the CPU and then to the actual IDE/SCSI controller *will* incur latency. And unless you pick a good controller, it may even be your bottleneck. Honestly I don't know how much latency is added - it's been years since I toyed with offload-RAID last ;) I don't mean to be handwaving and spreading FUD - I'm just trying to say that the people who advocate SW RAID here are not necessarily smoking crack - there are very good reasons why SW RAID will outperform HW RAID in many scenarios. > > HW raid offers hotswap capability and offload our work instead of > maintaining a SW raid solution ...we'll see ;) That, is probably the best reason I know of for choosing hardware RAID. And depending on who you will have administering your system, it can be a very important difference. There are certainly scenarios where you will be willing to trade a lot of performance for a blinking LED marking the failed disk - I am not kidding. Cheers, -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Fri Oct 10 02:58:37 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Fri, 10 Oct 2003 08:58:37 +0200 Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: References: Message-ID: <20031010065837.GH17432@unthought.net> On Thu, Oct 09, 2003 at 09:31:13PM -0400, Robert G. Brown wrote: ... > Each disk has about one fourth of the information. English is about 3:1 > compressible (really more; this is using simple symbolic compression). > A good cryptanalyst could probably recover "most" of what is on the > disks from any one disk, depending on what kind of data is there. You overlook the fact that data on a RAID-5 is distributed in 'chunks' of sizes around 4k-128k (depending...) So you would get the entire first 'Introduction to evil empire plans', but the entire 'Subverting existing banana government' chapter may be on one of the disks that you are missing. > Numbers, possibly not, but written communications, quite possibly. > Especially if it falls in the hands of somebody who really wants it and > has LOTS of good cryptanalysts. You'd probably need historians and psychologists rather than cryptographers - but of course the point remains the same. Just nit-picking here. > > > tape backups are insecure ... > > - lose a tape ( bad tape, lost tape ) and and all its data is lost > > - anybody can read the entire contents of the full backup > > Unless it is encrypted. Without strong encryption there is no > data-level security. With it there is. Maybe. Depending on what is > "strong" to you and what is strong to, say, the NSA, whether your > systems and network is secure, depending on whether you have dual > isolation power inside a faraday cage with dobermans at the door. I'm just thinking of distributing two tapes for each disk - one with 200G of random numbers, the other with 200G of data XOR'ed with the data from the first tape. Enter the one-time pad - unbreakable encryption (unless you get a hold of both tapes of course). You'd need to make sure you have good random numbers - as an extra measure of safety one should probably wear a tinfoil hat while working with the tapes, just in case... ;) Of course, if any tape is lost, everything is lost. But one bad KB on either tape will only result in one bad KB total. > > However, there can be as much or as little physical security for the > tape as you care to put there. Tape in a locked safe, tape in an > armored car. No no no no no! Think big! Think: cobalt bomb in own backyard - threaten anyone who steals your data, that you'll make the planet inhabitable for a few hundred decades unless they hand back your tapes. ;) (I'm drafting up 'Introduction to evil empire plans' soon by the way ;) ... > I get the feeling that you just don't like tapes, Alvin...;-) Where did you get that idea? ;) Cheers, -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Fri Oct 10 13:34:41 2003 From: pesch at attglobal.net (pesch at attglobal.net) Date: Fri, 10 Oct 2003 10:34:41 -0700 Subject: building a RAID system References: <1065725416.1136.59.camel@qeldroma.cttc.org> <20031010072415.GI17432@unthought.net> Message-ID: <3F86EDB1.6264A405@attglobal.net> You write: "The problem with offloading is, that while it made great sense in the days of 1 MHz CPUs, it really doesn't make a noticable difference in the load on your typical N GHz processor." Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the practical limit of SW RAID? Paul Jakob Oestergaard wrote: > On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote: > > Hi again, > ... > > Others have already answered your other questions, I'll try to take one > that went unanswered (as far as I can see). > > ... > > > > But must be noted that HW RAID offers better response time. > > In a HW RAID setup you *add* an extra layer: the dedicated CPU on the > RAID card. Remember, this CPU also runs software - calling it > 'hardware RAID' in itself is misleading, it could just as well be called > 'offloaded SW RAID'. > > The problem with offloading is, that while it made great sense in the > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > load on your typical N GHz processor. > > However, you added a layer with your offloaded-RAID. You added one extra > CPU in the 'chain of command' - and an inferior CPU at that. That layer > means latency even in the most expensive cards you can imagine (and > bottleneck in cheap cards). No matter how you look at it, as long as > the RAID code in the kernel is fairly simple and efficient (which it > was, last I looked), then the extra layers needed to run the PCI > commands thru the CPU and then to the actual IDE/SCSI controller *will* > incur latency. And unless you pick a good controller, it may even be > your bottleneck. > > Honestly I don't know how much latency is added - it's been years since > I toyed with offload-RAID last ;) > > I don't mean to be handwaving and spreading FUD - I'm just trying to say > that the people who advocate SW RAID here are not necessarily smoking > crack - there are very good reasons why SW RAID will outperform HW RAID > in many scenarios. > > > > > HW raid offers hotswap capability and offload our work instead of > > maintaining a SW raid solution ...we'll see ;) > > That, is probably the best reason I know of for choosing hardware RAID. > And depending on who you will have administering your system, it can be > a very important difference. > > There are certainly scenarios where you will be willing to trade a lot > of performance for a blinking LED marking the failed disk - I am not > kidding. > > Cheers, > > -- > ................................................................ > : jakob at unthought.net : And I see the elder races, : > :.........................: putrid forms of man : > : Jakob ?stergaard : See him rise and claim the earth, : > : OZ9ABN : his downfall is at hand. : > :.........................:............{Konkhra}...............: > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Oct 10 07:12:48 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 10 Oct 2003 04:12:48 -0700 (PDT) Subject: building a RAID system - 8 drives - drive-net - tapes - preferences In-Reply-To: Message-ID: hi ya robert On Thu, 9 Oct 2003, Robert G. Brown wrote: > > tape backups are insecure ... > > - lose a tape ( bad tape, lost tape ) and and all its data is lost > > - anybody can read the entire contents of the full backup > > Unless it is encrypted. Without strong encryption there is no > data-level security. With it there is. Maybe. Depending on what is > "strong" to you and what is strong to, say, the NSA, whether your > systems and network is secure, depending on whether you have dual > isolation power inside a faraday cage with dobermans at the door. just trying to protect the tapes ( backups ) against the casual "oops look what i found" and they go and look at the HR records or the salary records or employee reviews etc..etc.. not trying to protect the tapes against the [cr/h]ackers ( different ball game ) and even not protecting against the spies of nsa/kgb etc either ( whole new ballgame for those types of backup issues ) > However, there can be as much or as little physical security for the > tape as you care to put there. Tape in a locked safe, tape in an > armored car. dont forget to lock the car/safe too :-) and log who goes in and out of the "safe" area :-) > I get the feeling that you just don't like tapes, Alvin...;-) not my first choice for backups .. even offsite backups... but if "management" takes out the $$$ to do tape backups... so it shall be done ... ideally, everything works ... but unfortunately, tapes are highly prone to people's "oops i forgot to change it yesterday" or the weekly catridge have fun alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Fri Oct 10 07:56:39 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Fri, 10 Oct 2003 13:56:39 +0200 Subject: building a RAID system In-Reply-To: <3F86EDB1.6264A405@attglobal.net> References: <1065725416.1136.59.camel@qeldroma.cttc.org> <20031010072415.GI17432@unthought.net> <3F86EDB1.6264A405@attglobal.net> Message-ID: <20031010115639.GN17432@unthought.net> On Fri, Oct 10, 2003 at 10:34:41AM -0700, pesch at attglobal.net wrote: > You write: > > "The problem with offloading is, that while it made great sense in the > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > load on your typical N GHz processor." > > Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the > practical limit of SW RAID? In this forum, I run small storage only. Around 150G for the most busy server that I have. Linux has problems with >2TB devices as far as I know, so that sort of puts an upper limit to whatever you can do with SW/HW RAID there. In between, it's just one order of magnitude :) More seriously - the SW RAID code is extremely simple, and it performs two different tasks: *) Reconstruction - which has time complexity T(n) for n bytes of data *) Read/write - which has time complexity T(1) for n bytes of data In other words - the more data you have, the longer a resync is going to take - HW or SW makes no difference (except for a factor, which tends to be rediculously large on cheap HW RAID cards but acceptable on more expensive ones). Reads and writes are not affected by the amount of data, in the SW RAID layer (and hopefully not in the HW RAID layer either). The scalability limits you will run into are: *) Number of disks you can attach to your box (HW RAID may hide this from you and may thus buy you some scalability there) *) Filesystem limits/performance problems. HW/SW RAID makes no difference *) Device size limits. HW/SW RAID makes no difference *) Reconstruction time after unclean shutdown - SW performs much better than crap/cheap HW solutions, but I don't know about the expensive ones. There are others on this list with much larger servers and less antique hardware - guys, speak up - where does it begin to hurt? :) -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Oct 10 07:59:22 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Fri, 10 Oct 2003 04:59:22 -0700 (PDT) Subject: building a RAID system In-Reply-To: <3F86EDB1.6264A405@attglobal.net> Message-ID: On Fri, 10 Oct 2003 pesch at attglobal.net wrote: > You write: > > "The problem with offloading is, that while it made great sense in the > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > load on your typical N GHz processor." > > Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the > practical limit of SW RAID? size-wise software raid (I'm talking specifically about linux here) scales far better than most hardware raid controllers (san subsystems are another kettle of fish entirely), among other reasons because you can spread the disks out between multiple controllers. > Paul > > Jakob Oestergaard wrote: > > > On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote: > > > Hi again, > > ... > > > > Others have already answered your other questions, I'll try to take one > > that went unanswered (as far as I can see). > > > > ... > > > > > > But must be noted that HW RAID offers better response time. > > > > In a HW RAID setup you *add* an extra layer: the dedicated CPU on the > > RAID card. Remember, this CPU also runs software - calling it > > 'hardware RAID' in itself is misleading, it could just as well be called > > 'offloaded SW RAID'. > > > > The problem with offloading is, that while it made great sense in the > > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > > load on your typical N GHz processor. > > > > However, you added a layer with your offloaded-RAID. You added one extra > > CPU in the 'chain of command' - and an inferior CPU at that. That layer > > means latency even in the most expensive cards you can imagine (and > > bottleneck in cheap cards). No matter how you look at it, as long as > > the RAID code in the kernel is fairly simple and efficient (which it > > was, last I looked), then the extra layers needed to run the PCI > > commands thru the CPU and then to the actual IDE/SCSI controller *will* > > incur latency. And unless you pick a good controller, it may even be > > your bottleneck. > > > > Honestly I don't know how much latency is added - it's been years since > > I toyed with offload-RAID last ;) > > > > I don't mean to be handwaving and spreading FUD - I'm just trying to say > > that the people who advocate SW RAID here are not necessarily smoking > > crack - there are very good reasons why SW RAID will outperform HW RAID > > in many scenarios. > > > > > > > > HW raid offers hotswap capability and offload our work instead of > > > maintaining a SW raid solution ...we'll see ;) > > > > That, is probably the best reason I know of for choosing hardware RAID. > > And depending on who you will have administering your system, it can be > > a very important difference. > > > > There are certainly scenarios where you will be willing to trade a lot > > of performance for a blinking LED marking the failed disk - I am not > > kidding. > > > > Cheers, > > > > -- > > ................................................................ > > : jakob at unthought.net : And I see the elder races, : > > :.........................: putrid forms of man : > > : Jakob ?stergaard : See him rise and claim the earth, : > > : OZ9ABN : his downfall is at hand. : > > :.........................:............{Konkhra}...............: > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 10 09:35:35 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 10 Oct 2003 09:35:35 -0400 (EDT) Subject: building a RAID system - 8 drives - drive-net - tapes - preferences In-Reply-To: Message-ID: On Fri, 10 Oct 2003, Alvin Oga wrote: > > hi ya robert > > On Thu, 9 Oct 2003, Robert G. Brown wrote: > > > > tape backups are insecure ... > > > - lose a tape ( bad tape, lost tape ) and and all its data is lost > > > - anybody can read the entire contents of the full backup > > > > Unless it is encrypted. Without strong encryption there is no > > data-level security. With it there is. Maybe. Depending on what is > > "strong" to you and what is strong to, say, the NSA, whether your > > systems and network is secure, depending on whether you have dual > > isolation power inside a faraday cage with dobermans at the door. > > just trying to protect the tapes ( backups ) against the casual > "oops look what i found" and they go and look at the HR records > or the salary records or employee reviews etc..etc.. > > not trying to protect the tapes against the [cr/h]ackers ( different > ball game ) and even not protecting against the spies of nsa/kgb etc > either ( whole new ballgame for those types of backup issues ) Hmmm, this is morphing offtopic, but data security is a sufficiently universal problem that I'll chance one more round. Pardon me while I light up my crack pipe here...:-7... so I can babble properly. Ah, ya, NOW I'm awake...:-) The point is that you cannot do this with precisely HR records or salary records or employee reviews. If anybody gets hold of the data (casually or not) be it on tape or disk or the network while it is in transit then you are liable to the extent that you failed to take adequate measures to ensure the data's security. By the numbers: 1) Tapes even more than disk are most unlikely to be viewed casually. Disks have street value, tapes (really) don't. There is a high entry investment required before one can even view the contents of an e.g. LTO tape, plus a fair degree of expertise. A disk can be pulled from a box and remounted in any system by a pimple-faced kid with a screwdriver and an attitude. The net can be snooped by anyone, with a surprisingly low entry level of expertise (or rather a high level expertise encapsulated in openly distributed rootkits and exploits so anybody can do it). 2) All three are clearly vulnerable to someone (e.g. a private investigator, an insurance company, a competitor, an identity thief, the government) seeking to snoop and violate the privacy of the individuals who have entrusted their data to you. HR records contain SSNs, bank numbers (to facilitate direct deposit), names addresses, health records, employment records, CVs and/or transcripts, disciplinary records: they are basically everything you never wanted the world to know in one compact and efficient package. Federal and state laws regulate the handling of this data in quite rigorous ways. 3) An IT officer who was responsible for holding sensitive data secure according to law and who failed to employ reasonable measures for maintaining it secure and who subsequently had it stolen (violating his trust) would be publically eviscerated. Career ruined, bankrupted by suits, tormented by guilt, possibly even put in jail, driven to suicide kind of stuff in the worst case. The company that employed that officer would be right behind -- suits, clean sweep firings of the entire management team in the chain of responsibility, plunging stock prices, public recriminations and humiliation. EVEN IF reasonable measures were employed there would likely be trouble and recrimination, but careers might survive, damages would be limited, jail might be avoided, and one wouldn't feel so irresponsibly guilty. 4) Strong encryption of the data to protect it in transit is an obvious, inexpensive, knee-jerk sort of reasonable measure (again, independent of the means of transport presuming only that the data passes out of your fortress keep where you keep the cobalt bomb and dobermans and make all of your staff wear tinfoil caps while looking at the data). It might even be mandated by law for certain forms of data -- the federal government just passed a sweeping right to privacy measure for health data, for example, that may well have highly explicit provisions for data transport and security. 5) Therefore... only someone with a death wish would send sensitive, valuable data for which they are responsible for security, through any transport layer not under their direct control and deemed secure of its own right, between secure sites, without encrypting it first (and otherwise complying with relevant federal and state laws, if any apply to the case at hand). Properly paranoid ITvolken would likely consider ALL transport layers including their own internal LAN not to be secure and would use ssh/ssl/vpn bidirectional encryption of all network traffic period. If it weren't for the fact that there is less motivation to encrypt the data on the physically secured actual server disks (so the only means of access are through dobermans and locked doors or by cracking the servers from outside, in which case you've already lost the game) one would extend the data encryption to the database itself, and I'm sure that there are sites that don't trust even their own staff or the moral character of their dobermans that so do. I don't want to THINK about what one has to endure to obtain access to e.g. NSA or certain military datasites -- probably body cavity searches in and out, shaved heads and paper suits, and metal detectors, that sort of thing...:-) > > However, there can be as much or as little physical security for the > > tape as you care to put there. Tape in a locked safe, tape in an > > armored car. > > dont forget to lock the car/safe too :-) > and log who goes in and out of the "safe" area :-) Ya, precisely. It is only partly a joke, you see. If my Duke HR or my medical records turn up on the street, with somebody purporting to be me cleaning out my bank account and maxing my visa, with my applications for health insurance denied because they've learned about my heavy drinking problem and all the crack that I smoke (I don't know where Jakob got the idea that I don't sit here fuming away all day:-) and the consequent liver failure and bouts of wild-eyed babbling (like this one, strangely enough:-), my plans for a fusion generator that you can build in your garage turning up being patented by Exxon and so forth Duke had DAMN WELL better be able to show my attorney and a court logs of who had access to this data, proofs that it was never left lying around in cars (locked or unlocked), proofs that it was transmitted in encrypted form, etc. Otherwise I'm detonating the cobalt bomb in my backyard and Duke will be a radioactive wasteland for a few kiloyears...(it is only a couple of miles away). This is the kind of thing that gives IT security officers ulcers. Duke's current SO is actually a former engineering school beowulfer (and good friend of mine) whose voice is scattered through the list archives (Chris Cramer). As a former 'wulfer (and EE), he is damn smart and computer-expert (and handsome and witty, just like everybody else on this list:-). However, he sweats bullets because Duke is a huge organization with lots of architectures scattered all over campus -- Windows here (any flavor), Macs there, Suns, Linux boxen, there are likely godforsaken nooks on campus that still have IBM mainframes and VAXes. Sensitive data is routinely served across the campus backbone and beyond (e.g. I can see my advisees' current transcripts where I sit at this very moment). Even with SSL, this data is vulnerable in fragments to any successful exploit on any client that belongs to any partially privileged person and that runs a vulnerable operating system. Hmmm, you say -- wasn't there recently an RPC exploit on a certain very common OS that permitted crackers to put anything they wanted including snoops on all cracked clients (not to mention a steady stream of lesser but equally troublesome invasions of the viral sort)? Didn't this cost institutions all over the world thousands of FTE hours to put right before somebody actually used it to steal access to valuable data? Why yes, I believe that there was! I believe it did! However, as one who got slammed (blush) a year ago on an unfortunately unpatched linux box and who has seen countless exploits succeed against all flavors of networked OS over many years, I avoid feeling too cocky about it. Nevertheless, Chris just keeps suckin' down the prozac and phillips cocktails dealing with crap like this and knowing that it is his butt on line should a malevolent attack succeed in compromising Duke's mountains of sensitive data (gulp) being served by minions whose primary systems expertise was developed back when knowing cobol was a part of the job description (gulp) running on servers with, um "interesting" base architectures (gulp)... > > I get the feeling that you just don't like tapes, Alvin...;-) > > not my first choice for backups .. even offsite backups... > > but if "management" takes out the $$$ to do tape backups... so it shall be > done ... > ideally, everything works ... but unfortunately, tapes > are highly prone to people's "oops i forgot to change it > yesterday" or the weekly catridge They are indeed (as the example I gave of a recent small-scale disaster at Duke clearly shows). A site run by a wise IT human would use a pretty rigorous protocol to regulate the process so that even if you have e.g. student labor doing the tape changes there is strict accountability and people checking the people checking the people who do the job, and so that tapes are randomly pulled every month and checked to be sure that the data is actually getting on the tapes in retrievable form. You can bet that Duke has such a process in place now, if they didn't before, although Universities tend to be a loose amalgamation of quasi-independent fiefdoms that accept control and adopt security measures for the common good and hire competent systems administrators and develop shared protocols for ensuring data integrity about as often and as easily as one would expect. (Sound of Chris in the background crunching another mylantin and washing it down with P&P:-) So in place or not, the risk remains. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 10 09:34:25 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 10 Oct 2003 09:34:25 -0400 (EDT) Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: <20031010065837.GH17432@unthought.net> Message-ID: On Fri, 10 Oct 2003, Jakob Oestergaard wrote: > On Thu, Oct 09, 2003 at 09:31:13PM -0400, Robert G. Brown wrote: > ... > > Each disk has about one fourth of the information. English is about 3:1 > > compressible (really more; this is using simple symbolic compression). > > A good cryptanalyst could probably recover "most" of what is on the > > disks from any one disk, depending on what kind of data is there. > > You overlook the fact that data on a RAID-5 is distributed in 'chunks' > of sizes around 4k-128k (depending...) Overlook, hell. I'm using my usual strategy of feigning knowledge with the complete faith that my true state of ignorance will be publically displayed to the entire internet. This humiliation, in turn, will eventually cause such mental anguish that I'll be able to claim mental disability and retire to tending potted plants on a disability check for the rest of my life... You probably noticed that I used the same strategy quite recently regarding things like factors of N in disk read speed estimates, certain components in disk latency, and oh, too many other things to mention. Pardon me if I babble on a bit this morning, but my lawy... erm, "psychiatrist" insists that I need fairly clear evidence of disability to get away with this. I personally find that smoking crack cocaine induces a pleasant tendency to babble nonsense. And there is no place to babble for the record like the beowulf list archives, I always say...:-) > So you would get the entire first 'Introduction to evil empire plans', > but the entire 'Subverting existing banana government' chapter may be on > one of the disks that you are missing. ... > I'm just thinking of distributing two tapes for each disk - one with > 200G of random numbers, the other with 200G of data XOR'ed with the data > from the first tape. Or just one tape, xor'd with 200G worth of random numbers generated from a cryptographically strong generator via a relatively short key that you can (as you note) send or carry separately and which is smaller, easier to secure, and less susceptible to degradation or loss than a second tape. It's cheaper that way, and even if you use two tapes people are going to try cracking the master tape by trying to guess the key+algorithm you almost certainly used to generate it (see below), so the xor is no stronger than the key+algorithm combination.;-) > Enter the one-time pad - unbreakable encryption (unless you get a hold > of both tapes of course). Or determine the method and key you used for (oxymoronically) generating 200 Gigarands (which is NOT going to be a hardware generator, I don't think, unless you are a very patient person or build/buy a quantum generator or the like -- entropy based things like /dev/random are too slow, and even quantum generators I've looked into are barely fast enough:-). > You'd need to make sure you have good random numbers - as an extra Ah, that's the rub. "Good random numbers" isn't quite an oxymoron. Why, there is even a government standard measure for cryptographic strength in the US (which many/most generators fail, by the way). Entropy based generators tend to be very slow -- order of 10-100 kbps depending on the source of entropy, last I looked. Quantum generators IIRC that rely on e.g. single photon transmission events at half-silvered mirrors have to run at light intensities where single photon events are discernible (rare, that is) and STILL have to wait for an autocorrelation time or ten before opening a window for the next event because even quantum events like this have an associated correlation time due to the existence of extended correlated states in the radiating system. Photon emission from a single atom itself is antibunched, for example, as after an emission the system requires time for the single radiating atom to regain a degree of excitation sufficient to enable re-emission. I believe that they can achieve more like 1 mbps of randomness or at least unpredictability. As you'd need 1.6x10^12 bits to encode your tape, you'd have to wait around 1.6x10^6 seconds to generate the key. That is, hmmm, between two and three week, twenty to thirty weeks with an entropy generator, unless you used a beowulf of entropy generators to shorten the time:-). Not exactly in the category of "generate a one-time pad while I go have a cup of coffee". Using a truly oxymoronic but much faster (and cryptographically strong) random number generator, e.g. the mt19937 from the GSL one can generate a respectable ballpark of 16 MBps (note B, not b) of random bytes and be done in a mere four hours. Alas, mt19937 is seeded from a long int and the seed probably doesn't have enough bits to be secure against a brute force attack, so one would likely have to fall back on one of the actual algorithms that permit the use of long keys (1024 bits or even more). > No no no no no! Think big! > > Think: cobalt bomb in own backyard - threaten anyone who steals your > data, that you'll make the planet inhabitable for a few hundred > decades unless they hand back your tapes. ;) > > (I'm drafting up 'Introduction to evil empire plans' soon by the way ;) Hmm, I'll have to mail you some of my lithium pills, Jakob. Your own prescription obviously ran out...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Fri Oct 10 11:28:03 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Fri, 10 Oct 2003 09:28:03 -0600 Subject: Intel compilers and libraries In-Reply-To: ; from cjtan@optimanumerics.com on Thu, Oct 09, 2003 at 10:04:20AM +0000 References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> Message-ID: <20031010092803.A5136@lnxi.com> On Thu, Oct 09 2003 at 04:04, C J Kenneth Tan -- Heuchera Technologies wrote: > Greg, > > > Is it a 100x100 matrix LU decomposition? Well, no, because Intel's > > MKL and the free ATLAS library run at a respectable % of peak. > > Our benchmarks concentrate on xGEQRF, xGESVD, xGETRF, xGETRS, xGESV, > xPOTRF, xPOSV, xPPTRF, xGEEV, extending to xGETRI, and xTRTRI. > > Have you tried DPPSV or DPOSV on Itanium, for example? I would be > interested in the percentage of peak that you achieve with MKL and > ATLAS, for up to 10000x10000 matrices. > > ATLAS does not have full LAPACK implementation. This gets ATLAS to provide its faster LAPACK routines to a full LAPACK library: http://math-atlas.sourceforge.net/errata.html#completelp Mike -- Mike Snitzer msnitzer at lnxi.com Linux Networx http://www.lnxi.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Patrick.Begou at hmg.inpg.fr Fri Oct 10 13:55:43 2003 From: Patrick.Begou at hmg.inpg.fr (Patrick Begou) Date: Fri, 10 Oct 2003 19:55:43 +0200 Subject: PVM errors at startup Message-ID: <3F86F29F.8A37AC5B@hmg.inpg.fr> Hi I'm new on this list so, just 2 lines about me: A small linux beowulf cluster (10 nodes) for computational fluids dynamics in south-est of France (National Polytechnique Institute from Grenoble) . I've just updated my cluster (from AMD1500+/ Eth100BT to P4 2.8G + Gigabit ethernet) and I've updated my system to Red-Hat 7.3, Kernel 2.4.20-20-7. The current version of pvm is pvm-3.4.4-2 from the RedHat 7.3. The previous system was RH7.1. Since this update I'm unable to start PVM from a node to another (with the add command). The console hang for several tenth of seconds then says OK. The pvmd3 is started on the remote node but the conf command do not show the additionnal node and I get these errors in the /tmp/pvml.xx file: [t80040000] 10/10 15:58:31 craya.hmg.inpg.fr (xxx.xxx.xxx.xxx:32772) LINUX 3.4.4 [t80040000] 10/10 15:58:31 ready Fri Oct 10 15:58:31 2003 [t80040000] 10/10 16:01:46 netoutput() timed out sending to craya02 after 14, 190.000000 [t80040000] 10/10 16:01:46 hd_dump() ref 1 t 0x80000 n "craya02" a "" ar "LINUX" dsig 0x408841 [t80040000] 10/10 16:01:46 lo "" so "" dx "" ep "" bx "" wd "" sp 1000 [t80040000] 10/10 16:01:46 sa 192.168.81.2:32770 mtu 4080 f 0x0 e 0 txq 1 [t80040000] 10/10 16:01:46 tx 2 rx 1 rtt 1.000000 id "(null)" rsh and rexec are working (from master to nodes, from nodes to master and from nodes to nodes). The transfert speed is near 600Mbits/s on the network (binary ftp on /dev/null) variables are set: PVM_ARCH=LINUX PVM_RSH=/usr/bin/rsh PVM_DPATH=/usr/local/pvm3/lib/LINUX/pvmd3 PVM_ROOT=/usr/local/pvm3 I've tried so manythings since thes last 3 days: - trying to compile install pvm3.4.4.tgz from sources file - uninstall iptables, ipchains and iplock. - remove /etc/security (to test this with root authority) - added .rhosts and hosts.equiv file - on the master eth0 is 100Mbits toward internet and eth1 is GB towards the nodes. I've tried the oposite config: eth0 become GB and eth1 100BT. Always the same problem! The cluster is down and I do not know where looking for a solution now.... If some one could help me solving this problem Thanks for your help Patrick -- =============================================================== | Equipe M.O.S.T. | http://most.hmg.inpg.fr | | Patrick BEGOU | ------------ | | LEGI | mailto:Patrick.Begou at hmg.inpg.fr | | BP 53 X | Tel 04 76 82 51 35 | | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | =============================================================== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Oct 10 21:53:34 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 10 Oct 2003 21:53:34 -0400 Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: <20031010054357.GB13480@sphere.math.ucdavis.edu> References: <20031010054357.GB13480@sphere.math.ucdavis.edu> Message-ID: <1065837212.18644.0.camel@QUIGLEY.LINIAC.UPENN.EDU> On Fri, 2003-10-10 at 01:43, Bill Broadley wrote: > On the hardware vs software RAID thread. A friend needed a few TB and > bought a high end raid card (several $k), multiple channels, enclosure, > and some 10's of 73GB drives for somewhere in the $50k-$100k neighborhood. > > He needed the capacity and a minumum of 50MB/sec sequential write > performance (on large sequential writes). He didn't get it. Call #1 to > dell resulted in well it's your fault, it's our top of the line, it should > be plenty fast, bleah, bleah, bleah. Call #2 lead to an escalation to > someone with more of a clue, tune paramater X, tune Y, try a different > raid setup, swap out X, etc. After more testing without helping call #3 > was escalated again someone fairly clued answered. The conversation went > along the lines of what, yeah, it's dead slow. Yeah most people only > care about the reliability. Oh performance? We use linux + software > raid on all the similar hardware we use internally at Dell. > > So the expensive controller was returned, and 39160's were used in it's > place (dual channel U160) and performance went up by a factor of 4 or > so. Can you give more concrete pointers to the hardware that they ended up using ? -- specifically the enclosure. Thanks! Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Fri Oct 10 13:55:14 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Fri, 10 Oct 2003 17:55:14 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: <20031010092803.A5136@lnxi.com> References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <20031010092803.A5136@lnxi.com> Message-ID: Mike, > > Have you tried DPPSV or DPOSV on Itanium, for example? I would be > > interested in the percentage of peak that you achieve with MKL and > > ATLAS, for up to 10000x10000 matrices. > > > > ATLAS does not have full LAPACK implementation. > > This gets ATLAS to provide its faster LAPACK routines to a full LAPACK > library: > http://math-atlas.sourceforge.net/errata.html#completelp Inserting the LU factorization code from ATLAS to publicly available LAPACK will only get you faster LU code in the rest of the publicly available LAPACK library. You will not gain from QR factorization code, Cholesky factorization code, etc.. Ken ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- This e-mail (and any attachments) is confidential and privileged. It is intended only for the addressee(s) stated above. If you are not an addressee, please accept my apologies and please do not use, disseminate, disclose, copy, publish or distribute information in this e-mail nor take any action through knowledge of its contents: to do so is strictly prohibited and may be unlawful. Please inform me that this e-mail has gone astray, and delete this e-mail from your system. Thank you for your co-operation. ----------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Oct 11 13:01:17 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 11 Oct 2003 13:01:17 -0400 (EDT) Subject: Intel compilers and libraries In-Reply-To: Message-ID: > Inserting the LU factorization code from ATLAS to publicly available > LAPACK will only get you faster LU code in the rest of the publicly > available LAPACK library. You will not gain from QR factorization > code, Cholesky factorization code, etc.. oh, sure, but LU is the only important one because of top500 ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Sat Oct 11 16:16:12 2003 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Sat, 11 Oct 2003 15:16:12 -0500 Subject: Help in rsh In-Reply-To: ; from diego_naruto@hotmail.com on Sat, Oct 11, 2003 at 07:14:13PM +0000 References: Message-ID: <20031011151612.A22568@mikee.ath.cx> On Sat, 11 Oct 2003, diego lisboa wrote: > Hi, > I?m having problems with a cluster that i?ve had mount here, it?s a small > cluster with 3 machines, i already have instaled NIS and NFS and it?s > working very well, with red hat 9.0. When i use LAM 6.5.8 rpm it works > beutiful, but i need to install XMPI that needs LAM6.5.8.tar.gz (compiled) > and with trilliun, when i install on master it works, but on slaves i have a > problem with rsh, and hboot doens?t find "squema LAM" or something like > that. Someboy can help me? > Thanks Try something more simple first. What happens when you do $ rsh -l USER HOST uptime does that work? Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diego_naruto at hotmail.com Sat Oct 11 15:14:13 2003 From: diego_naruto at hotmail.com (diego lisboa) Date: Sat, 11 Oct 2003 19:14:13 +0000 Subject: Help in rsh Message-ID: Hi, I?m having problems with a cluster that i?ve had mount here, it?s a small cluster with 3 machines, i already have instaled NIS and NFS and it?s working very well, with red hat 9.0. When i use LAM 6.5.8 rpm it works beutiful, but i need to install XMPI that needs LAM6.5.8.tar.gz (compiled) and with trilliun, when i install on master it works, but on slaves i have a problem with rsh, and hboot doens?t find "squema LAM" or something like that. Someboy can help me? Thanks _________________________________________________________________ MSN Hotmail, o maior webmail do Brasil. http://www.hotmail.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Sat Oct 11 19:10:29 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Sat, 11 Oct 2003 16:10:29 -0700 (PDT) Subject: building a RAID system - 8 drives - drive-net - tapes - preferences In-Reply-To: Message-ID: hi ya robert On Fri, 10 Oct 2003, Robert G. Brown wrote: > > not trying to protect the tapes against the [cr/h]ackers ( different > > ball game ) and even not protecting against the spies of nsa/kgb etc > > either ( whole new ballgame for those types of backup issues ) > > Hmmm, this is morphing offtopic, but data security is a sufficiently > universal problem that I'll chance one more round. Pardon me while I > light up my crack pipe here...:-7... so I can babble properly. Ah, ya, > NOW I'm awake...:-) humm .. gimme some of that :-) > The point is that you cannot do this with precisely HR records or salary > records or employee reviews. If anybody gets hold of the data (casually > or not) be it on tape or disk or the network while it is in transit then > you are liable to the extent that you failed to take adequate measures > to ensure the data's security. By the numbers: security of clusters vs security of normal compute environments and normal users from home and/or w/ laptops requires varying degreee of security policies - from looking at the various incoming sven virus (MS update virus stuff) - about 75% of the incoming junk is coming from (mis-managed) clusters 80% of the security issues will be due to internal folks and not the outsiders.. and i'd hate to be the one responsible for security on an university network where there are tons of bright young and ambitious kids looking for a "trophy" my security rules, assume the hacker is sitting in the firewall .. w/ root passwds .. now protect your data is my model ... - if they have a keyboard sniffer installed .. game over .. ( there'd be no need to guess what the pass phrase was ) c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From victor_ms at brturbo.com Sun Oct 12 11:27:23 2003 From: victor_ms at brturbo.com (Victor Lima) Date: Sun, 12 Oct 2003 12:27:23 -0300 Subject: Benchmarks Message-ID: <3F8972DB.6080802@brturbo.com> Hi All. I'm new on list. Well I have a small linux clusters with 18 P4 2.8 Ghz with FastEthernet 100Mbits I need some benchmarks softwares for Latency, Thoughtput on Ethernet, etc. Ate. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at bellsouth.net Sun Oct 12 19:10:07 2003 From: laytonjb at bellsouth.net (Jeffrey B. Layton) Date: Sun, 12 Oct 2003 19:10:07 -0400 Subject: Benchmarks In-Reply-To: <3F8972DB.6080802@brturbo.com> References: <3F8972DB.6080802@brturbo.com> Message-ID: <3F89DF4F.1070500@bellsouth.net> I'm suprised no one has jumped on this yet. There are several packages for testing basic network performance from one node to another. My personal favorite is netpipe: http://www.scl.ameslab.gov/netpipe/ The other one is netperf: http://www.netperf.org/netperf/NetperfPage.html The web pages are pretty good about explaining things. Good Luck! Jeff > Hi All. > I'm new on list. > Well I have a small linux clusters with 18 P4 2.8 Ghz with > FastEthernet 100Mbits > I need some benchmarks softwares for Latency, Thoughtput on Ethernet, > etc. > Ate. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Oct 13 03:34:33 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 13 Oct 2003 09:34:33 +0200 (CEST) Subject: Benchmarks In-Reply-To: <3F8972DB.6080802@brturbo.com> Message-ID: On Sun, 12 Oct 2003, Victor Lima wrote: > Hi All. > I'm new on list. > Well I have a small linux clusters with 18 P4 2.8 Ghz with FastEthernet > 100Mbits > I need some benchmarks softwares for Latency, Thoughtput on Ethernet, etc. Have a look at Pallas http://www.pallas.com/e/products/pmb/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Mon Oct 13 09:38:36 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Mon, 13 Oct 2003 15:38:36 +0200 Subject: Intel and GNU C++ compilers Message-ID: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Hello: I just wanna thank everybody for the responses to my last question about Intel compiler, I tried both 'gcc' and 'icc', and got the following results for one of our work files containing 10^6 steps of calculation: ************************** *** gcc version 2.95.4 *** ************************** flags bin-size elapsed-time ----- -------- ------------ none 9.5 KB 311 sec "-O3" 8.7 KB 192 sec "-O3 -ffast-math" 8.7 KB 165 sec ******************************************** *********************** *** icc version 7.1 *** *********************** flags bin-size elapsed-time ----- -------- ------------ none 597 KB 100 sec "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0" 563 KB 89 sec **************************************************************** the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions respectively, I guess that using a newer 'gcc', capable of '-march=pentium4' and SSE2 extensions would improve 'gcc' results. I am running on a Dual Xeon 2.4 Ghz machine, with 2Gb of RAM. I use Debian Woody with a 2.4.22 kernel compiled by myself. HyperThreading is disabled at the BIOS level. The test were run on one processor only. Thanks, Jose M. Perez. Madrid, Spain. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Oct 13 12:04:47 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 13 Oct 2003 12:04:47 -0400 (EDT) Subject: Intel and GNU C++ compilers In-Reply-To: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Message-ID: > *** gcc version 2.95.4 *** that's god-aweful ancient. > none 9.5 KB 311 sec > "-O3" 8.7 KB 192 sec > "-O3 -ffast-math" 8.7 KB 165 sec -fomit-frame-pointer usually helps, sometimes noticably, since x86 is so short of registers. -O3 is often not better than -O2 or -Os, mainly because of interactions between unrolling, Intel's microscopic L1's, and the difficulty of scheduling onto a tiny reg set... I'd be surprised if 3.3 or 3.4 (pre-release) didn't perform noticably better. > flags bin-size elapsed-time > ----- -------- ------------ > none 597 KB 100 sec > "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0" 563 KB 89 sec isn't -tpp2 redundant if you have -xW? > the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions > respectively, I guess that using a newer 'gcc', capable of '-march=pentium4' > and SSE2 extensions would improve 'gcc' results. yes. '-march=pentium4 -fpmath=sse' seems to do it. gcc doesn't have an auto-vectorizer yet, unfortunately. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From indigoneptune at yahoo.com Mon Oct 13 13:37:47 2003 From: indigoneptune at yahoo.com (stanley george) Date: Mon, 13 Oct 2003 10:37:47 -0700 (PDT) Subject: benchmarks for performance Message-ID: <20031013173747.37343.qmail@web14912.mail.yahoo.com> Hi, I have a cluster of 8 P-III machines running redhat 8. I am trying to measure combined performance in MFLOPS. I have tried using linpakd and 1000d. It gives me an error with 'Make.inc' file while compiling. How do I get rid of this? Which are the other bechmarking sotwares that I could use? Thank you very much Stanley George __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Mon Oct 13 12:22:17 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Mon, 13 Oct 2003 18:22:17 +0200 Subject: Intel and GNU C++ compilers In-Reply-To: <20031013133836.GA1083@sgirmn.pluri.ucm.es> References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Message-ID: <200310131822.17717.joachim@ccrl-nece.de> Jos? M. P?rez S?nchez: > I just wanna thank everybody for the responses to my last question about > Intel compiler, I tried both 'gcc' and 'icc', and got the following results > for one of our work files containing 10^6 steps of calculation: Jos?, thanks for the information, but you really should (also) use the latest gcc (3.3x) for such a comparision. It will be interesting to see how it performs relative to the latest icc on the one hand, and to the old gcc on the other hand. And some information on the application (or libraries used) would be helpful, too. Like: is it memory-bound or compute-bound, etc.. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Mon Oct 13 15:26:55 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Mon, 13 Oct 2003 12:26:55 -0700 Subject: Intel and GNU C++ compilers In-Reply-To: References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Message-ID: <20031013192655.GC16033@greglaptop.internal.keyresearch.com> On Mon, Oct 13, 2003 at 12:04:47PM -0400, Mark Hahn wrote: > -fomit-frame-pointer usually helps, sometimes noticably, > since x86 is so short of registers. Actually it's a lot more of a tossup than it used to be: having a frame pointer means you have another 256 bytes accessible via a single-byte offset, and the SSE registers help relieve the register pressure problem. On the Opteron, which has more of both general purpose and SSE registers, the frame pointer is often a win. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Mon Oct 13 21:21:34 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Mon, 13 Oct 2003 18:21:34 -0700 (PDT) Subject: The Canadian Internetworked Scientific Supercomputer Message-ID: <20031014012134.21517.qmail@web11403.mail.yahoo.com> Just found an interesting paper written by Paul Lu (the auther of PBSWeb): http://hpcs2003.ccs.usherbrooke.ca/papers/Lu.pdf CISS homepage: http://www.cs.ualberta.ca/~ciss/ Rayson __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Tue Oct 14 10:19:11 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Tue, 14 Oct 2003 16:19:11 +0200 Subject: Intel and GNU C++ compilers In-Reply-To: <200310131822.17717.joachim@ccrl-nece.de> References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> <200310131822.17717.joachim@ccrl-nece.de> Message-ID: <20031014141911.GA995@sgirmn.pluri.ucm.es> On Mon, Oct 13, 2003 at 06:22:17PM +0200, Joachim Worringen wrote: > thanks for the information, but you really should (also) use the latest gcc > (3.3x) for such a comparision. It will be interesting to see how it performs > relative to the latest icc on the one hand, and to the old gcc on the other > hand. > > And some information on the application (or libraries used) would be helpful, > too. Like: is it memory-bound or compute-bound, etc.. > > Joachim I installed gcc-3.3.2 from the debian testing distribution, here it is the full report including gcc-3.3.2: ************************** *** gcc version 2.95.4 *** ************************** flags bin-size elapsed-time ----- -------- ------------ none 9.5 KB 311 sec "-O3" 8.7 KB 192 sec "-O3 -ffast-math" 8.7 KB 165 sec ******************************************** ************************* *** gcc version 3.3.2 *** ************************* flags bin-size elapsed-time ----- -------- ------------ none 9.1 KB 245 sec "-O3" 8.8 KB 161 sec "-O2" 8.7 KB 157 sec "-O2 -ffast-math -fomit-frame-pointer" 8.5 KB 127 sec "-O2 -ffast-math" 8.5 KB 125 sec "-O2 -ffast-math -march=pentium4" 8.5 KB 120 sec "-O2 -ffast-math -march=pentium4 -msse2" 8.5 KB 120 sec "-O3 -ffast-math -march=pentium4 -msse2" 8.5 KB 120 sec ******************************************** *********************** *** icc version 7.1 *** *********************** flags bin-size elapsed-time ----- -------- ------------ none 597 KB 100 sec "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0" 563 KB 89 sec **************************************************************** For this test, we actually wrote a version of the program with many parameters hardcoded, so that we make it as compute bound as posible, we aimed at evaluating how the different compilers took advantage of the Xeon processors. I will repeat the tests with the full version, which includes more memory usage, maybe about 80Mb each process, but it will finally depend on how big we make the files we use to split the calculations. The main calculation is the phase of a particle, we use an implementation of the MersenneTwister algorithm: http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html and have to compute sqrt(-2*log(x)/x) and sin(C*x/y) (x and y are not position, they correspond to other variables in the program), C is a constant hardcoded in the code like sin(9.7438473847*x/y). I measured how much it it took to compute sqrt(-2*log(x)/x), and it was about 412 processor cycles (I used rdtscll() ). I will submit other results as soon as I get them, probably using another computing algorithm which runs quite faster. Regards, Jose M. Perez. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Tue Oct 14 10:32:19 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Tue, 14 Oct 2003 16:32:19 +0200 Subject: Intel and GNU C++ compilers In-Reply-To: References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Message-ID: <20031014143219.GB995@sgirmn.pluri.ucm.es> On Mon, Oct 13, 2003 at 04:40:31PM +0000, C J Kenneth Tan -- Heuchera Technologies wrote: > Jose, > > Can we benchmark our OptimaNumerics Linear Algebra Library with you on > the same machine? > > Thank you very much! > > > Best wishes, > Kenneth Tan > ----------------------------------------------------------------------- > C. J. Kenneth Tan, Ph.D. > Heuchera Technologies Ltd. > E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 > Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 > ----------------------------------------------------------------------- Hi Kenneth: Thank you very much for your message, unfortunately we have a pretty tied schedule here, and lot's of different things to do. Right now I cannot spend time benchmarking your library on my system, and we cannot provide access to anyone from outside. On the other hand I don't know if the calculations I am running at this moment can exploit your libraries. Thanks again and best regards, Jose M. Perez Madrid. Spain. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.fitzmaurice at ngc.com Mon Oct 13 10:52:18 2003 From: michael.fitzmaurice at ngc.com (Fitzmaurice, Michael) Date: Mon, 13 Oct 2003 07:52:18 -0700 Subject: Beowulf Users Group meeting Message-ID: <03E95480F0B2D042A7598115FB3F5D9D49F3E4@XCGVA009> Please join us at the Baltimore-Washington Beowulf Users Group meeting this Tuesday the 14th at 2:45 at the Northrop Grumman building on 7575 Colshire Drive; McLean, VA 22102. For more details please go to Who should attend? Sales, marketing and Business Development people Pre sales engineers High Performance Computer professionals IT generalist Data Center Managers Program and Project Managers Beowulf Clusters installations are one of the fastest growing areas with in the IT market. Beowulf Clusters are replacing old slower SMP systems for half the cost and with twice the performance. Beowulf Clusters will grow even faster with the introduction of easier to use parallel programming tools. Engineered Intelligence is leading the revolution in break through parallel programming tools for the HPC market. So now application on older SMP machines can be easily moved to COTS cost effective Intel or AMD based servers, which have been clustered to improve performance and reduce costs. Come hear the folks from Engineered Intelligence how your projects can use C x C to make your applications ready to use Beowulf Clusters today. This will be one of our best topics regarding the Beowulf Cluster market. There is no cost for the briefing and you do not need to be a BWBUG member. As always there will be great door prizes and free parking. If you can not make it to the meeting pass the word to a colleague or business associate. T. Michael Fitzmaurice, Jr. Coordinator of the BWBUG 8110 Gatehouse Road, Suite 400W Falls Church, VA 22042 703-205-3132 office 240-475-7877 cell _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Tue Oct 14 12:08:50 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Tue, 14 Oct 2003 18:08:50 +0200 Subject: Pentium4 vs Xeon Message-ID: <20031014160850.GA1163@sgirmn.pluri.ucm.es> Hi: We are going to buy a second machine! :-) It will be a diskless dual processor node. We are thinking about buying the same configuration: Xeon 2.4Ghz 533Mhz FSB, but since Xeon and the motherboards supporting them are so expensive, we have been thinking about dual normal Pentium4 instead. We don't have now any P4 comparable processor to run some tests, and after looking at the Intel docs, the only difference we see between Xeon and P4 is Xeon having more cache. Does anyone has any idea about the relative performance of these processors, what about the price/performance ratio? Is it worth paying for more Xeon? The other point I wanna ask about is the "host bus speed" reported by the kernel at boot time, it reports 133Mhz, and our memories are supposed to run at 266Mhz, is it normal, is it just the double rate thing? Thanks in advance, Jose M. Perez. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Patrick.Begou at hmg.inpg.fr Tue Oct 14 12:53:54 2003 From: Patrick.Begou at hmg.inpg.fr (Patrick Begou) Date: Tue, 14 Oct 2003 18:53:54 +0200 Subject: PVM errors at startup References: <3F86F29F.8A37AC5B@hmg.inpg.fr> Message-ID: <3F8C2A22.83F751A3@hmg.inpg.fr> This email just to close the thread with the solution. The problem was not related to any PVM misconfiguration but to the ethernet driver. Looking at the ethernet communications between 2 nodes with tcpdump has shown that pvmd was started using tcp communications BUT that pvmd were trying to talk each other with UDP protocol (it is also detailed in the PVM doc) and this was the problem. The UDP communications was unsuccessfull between the nodes. Details: The nodes are P4 2.8 with Asustek P4P800 motherboard and on board 3C940 (gigabit) controler. I was using the 3c2000 driver (from the cdrom). Kernel is 2.4.20-20.7bigmem from RedHat 7.3. rsh, rexec and rcp are working fine but this driver seems not to work with UDP protocol??? The solution was to download the sk68lin driver (v6.18) and run the shell script to patch the kernel sources for the current kernel. Then correct the module.conf file and set up the gigabit interface. Now PVM is working fine between the two first nodes and the measured throughput is the same as with 3c2000 asustek driver. I should now setup the other nodes! I would like to thanks Pr. Kenneth R. Koehler and Dr James Arthur Kohl for their great help in checking the full PVM configuration and leading me towards a network driver problem. Patrick -- =============================================================== | Equipe M.O.S.T. | http://most.hmg.inpg.fr | | Patrick BEGOU | ------------ | | LEGI | mailto:Patrick.Begou at hmg.inpg.fr | | BP 53 X | Tel 04 76 82 51 35 | | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | =============================================================== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Tue Oct 14 13:38:35 2003 From: josip at lanl.gov (Josip Loncaric) Date: Tue, 14 Oct 2003 11:38:35 -0600 Subject: Pentium4 vs Xeon In-Reply-To: <20031014160850.GA1163@sgirmn.pluri.ucm.es> References: <20031014160850.GA1163@sgirmn.pluri.ucm.es> Message-ID: <3F8C349B.5040302@lanl.gov> Jos? M. P?rez S?nchez wrote: > [...] we have been thinking about dual normal Pentium4 [...] SMP operation and larger caches appear to be threshold features in Xeons. Old Pentium III could be used in duals, but Intel's marketing has changed. Normal Pentium4 is *not* dual processor enabled: http://www.intel.com/products/desktop/processors/pentium4/index.htm?iid=ipp_browse+dsktopprocess_p4p& http://www.intel.com/products/server/processors/server/xeon/index.htm?iid=ipp_browse+srvrprocess_xeon512& If you really want a fast dual CPU machine from Intel, you'll probably have to pay for a Xeon... Sincerely, Josip _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Tue Oct 14 13:40:46 2003 From: djholm at fnal.gov (Don Holmgren) Date: Tue, 14 Oct 2003 12:40:46 -0500 Subject: Pentium4 vs Xeon In-Reply-To: <20031014160850.GA1163@sgirmn.pluri.ucm.es> References: <20031014160850.GA1163@sgirmn.pluri.ucm.es> Message-ID: On Tue, 14 Oct 2003, [iso-8859-1] Jos? M. P?rez S?nchez wrote: > Hi: > > We are going to buy a second machine! :-) It will be a diskless dual > processor node. We are thinking about buying the same configuration: > Xeon 2.4Ghz 533Mhz FSB, but since Xeon and the motherboards supporting > them are so expensive, we have been thinking about dual normal Pentium4 > instead. We don't have now any P4 comparable processor to run some > tests, and after looking at the Intel docs, the only difference we see > between Xeon and P4 is Xeon having more cache. Does anyone has any > idea about the relative performance of these processors, what about the > price/performance ratio? Is it worth paying for more Xeon? > > The other point I wanna ask about is the "host bus speed" reported by > the kernel at boot time, it reports 133Mhz, and our memories are > supposed to run at 266Mhz, is it normal, is it just the double rate > thing? > > Thanks in advance, > > Jose M. Perez. The major difference between P4 and Xeon is that P4's are available with up to 800 MHz FSB, and Xeon's with up to 533 MHz FSB. If your code is sensitive to memory bandwidth, a P4 can be a big win. Otherwise they are essentially equivalent. P4 and standard Xeon both have 512K L2 caches. Xeon's with larger L2 caches are available, but if I'm not mistaken there's a big price difference. Pricewise (YMMV), cheap desktop P4's can be had very roughly for half the price of a comparable dual Xeon. You may very well prefer to admin half the number of boxes and so would prefer the Xeon. If you are using an expensive interconnect, you may also come out ahead with the dual processor boxes, buying only half of the PCI adapters and half the switch ports. Currently P4 motherboards are only available (AFAIK) with 33MHz/32bit PCI. That can be a big bottleneck if your cluster application is sensitive to I/O bandwidth. Early in 2004, if the rumours are true, there will be a P4 chipset supporting 66MHz/64bit PCI-X. And in late 2004, PCI Express should be available on both P4 and Xeon motherboards, providing a big increase in I/O bandwidth if one has a network which can take advantage. Xeon's and P4's do four transfers per clock - so, a 533MHz FSB is really a 133MHz clock doing 4 transfers per cycle. The kernel on my 800 MHz FSB P4 reports a 200 MHz host bus speed. Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Tue Oct 14 17:14:26 2003 From: rodmur at maybe.org (Dale Harris) Date: Tue, 14 Oct 2003 14:14:26 -0700 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: <200310011613.46297.lepalom@upc.es> Message-ID: <20031014211426.GI8116@maybe.org> On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated: > > > > 54.2 You know... one problem I see with this, assuming this information is going to pass across the net (or did I miss something). Is that instead of passing something like four bytes (ie "54.2"), you are going to be passing 56 bytes (just counting the cpu_temp line). So the XML blows up a little bit of data 14 times. I can't see this being particularly efficient way of using a network. Sure, it looks pretty, but seems like a waste of bandwidth. -- Dale Harris rodmur at maybe.org /.-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Oct 14 18:13:53 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 14 Oct 2003 18:13:53 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031014211426.GI8116@maybe.org> Message-ID: > > > > > > 54.2 > > You know... one problem I see with this, assuming this information is > going to pass across the net (or did I miss something). Is that instead > of passing something like four bytes (ie "54.2"), you are going to be > passing 56 bytes (just counting the cpu_temp line). So the XML blows up > a little bit of data 14 times. I can't see this being particularly > efficient way of using a network. Sure, it looks pretty, but seems like > a waste of bandwidth. I'm sure some would claim that 56 bytes is not measurable overhead, especially considering the size of tcp/eth/etc headers. but it's damn ugly, to be sure. this sort of thing has been discussed several times on the linux-kernel list as well - formatting of /proc entries. it's clear that some form of human-readability is a good thing. what's not clear is that it has to be so exceptionally verbose. think of it this way: lmsensors output for a machine is a record whose type will not change (very fast, if you insist!). so why should all the metadata about the record format, units, etc be sent each time? suppose you could fetch the fully verbose record once, and then on subsequent queries, just get '54.2 56.7 40.1 3650 4150 5.0 3.3 12.0 -12.0'. the only think you've lost is same-packet-self-description (and, incidentally, insensitivity to reordering of elements...) there *is* actually a very mind-bending binarification procedure for xml. it seems totally cracked to me, though, since afaikt, it completely tosses the self-description aspect, which is almost the main point of xml... of course, the whole xml thing is a massive fraud, since it does nothing at all towards actual interoperability - there must already be thousands of different xml schemas for "SKU", each better than the last, and therefore mutually incompatible... does ASN.1 improve on this situation at all? regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Oct 14 20:45:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 14 Oct 2003 20:45:12 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031014211426.GI8116@maybe.org> Message-ID: On Tue, 14 Oct 2003, Dale Harris wrote: > On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated: > > > > > > > > 54.2 > > > > You know... one problem I see with this, assuming this information is > going to pass across the net (or did I miss something). Is that instead > of passing something like four bytes (ie "54.2"), you are going to be > passing 56 bytes (just counting the cpu_temp line). So the XML blows up > a little bit of data 14 times. I can't see this being particularly > efficient way of using a network. Sure, it looks pretty, but seems like > a waste of bandwidth. Ah, an open invitation to waste a little more:-) Permit me to rant (the following can be freely skipped by the rant-averse:-). Note that this is not a flame, merely an impassioned assertion of an admittedly personal religious viewpoint. Like similar rants concerning the virtues of C vs C++ vs Fortran vs Java or Python vs Perl, it is intended to amuse or possible educate, but doubtless won't change many human minds. This is an interesting question and one I kicked around a long time when designing xmlsysd. Of course it is also a very longstanding issue -- as old as computers or just about. Binary formats (with need for endian etc translation) are obviously the most efficient but are impossible to read casually and difficult to maintain or modify. Compressed binary (or binary that only uses e.g. one bit where one bit will do) the most impossible and most difficult. Back in the old days, memory and bandwidth on all computers was a precious and rare thing. ALL programs tended to use one bit where one bit was enough. Entire formats with headers and metadata and all were created where every bit was parsimoniously allocated out of a limited pool. Naturally, those allocations proved to be inadequate in the long run so that only a few years ago lilo would complain if the boot partition had more than 1023 divisions because once upon a time somebody decided that 10 bits was all this particular field was ever going to get. In order to parse such a binary stream, it is almost essential to use a single library to both format and write the stream and to read and parse it, and to maintain both ends at the same time. Accessing the data ONLY occurs through the library calls. This is a PITA. Cosmically. Seriously. Yes, there are many computer subsystems that do just this, but they are nightmarish to use even via the library (which from a practical point of view becomes an API, a language definition of its own, with its own objects and tools for creating them and extracting them, and the need to be FULLY DOCUMENTED at each step as one goes along) and require someone with a high level of devotion and skill to keep them roughly bugfree. For example, if you write your code for single CPU systems, it becomes a major problem to add support for duals, and then becomes a major problem again to add support for N-CPU SMPs. Debugging becomes a multistep problem -- is the problem in the unit that assembles and provides the data, the encoding library, the decoding library (both of which are one-offs, written/maintained just for the base application) or is it in the client application seeking access to the data? Fortunately, in the old days, nearly all programming was done by professional programmers working for a wage for giant (or not so giant) companies. Binary interfaces were ideal -- they became Intellectual Property >>because<< they were opaque and required a special library whose source was hidden to access the actual binary, which might be entirely undocumented (except via its API library calls). BECAUSE they were so bloomin' hidden an difficult/expensive to modify, software evolved very, very slowly, breaking like all hell every time e.g. MS Word went from revision 1 to 2 to 3 to... because of broken binary incompatibility. ASCII, OTOH, has the advantage of being (in principle) easy to read. However, it is easy to make it as obscure and difficult to read as binary. Examples abound, but let's pull one from /proc, since the entire /proc interface is designed around the premise that ascii is good relative to binary (although that seems to be the sole thing that the many designers of different subsystems agree on). When parsing the basic status data of an application, one can work through: rgb at lilith|T:105>cat /proc/1214/stat 1214 (pine) S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0 14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168 4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0 (which, as you can see, contains the information on the pine application within which I am currently working on my laptop). What? You find that hard to read? Surely it is obvious that the first field is the PID, the second the application name (inside parens, introducing a second, fairly arbitrary delimiter to parse), the runtime status (which is actually NOT a single character, it can vary) and then... ooo, my. Time to check out man proc, kernel source (/usr/src/linux/fs/proc/array.c) and maybe the procps sources. One does better with: rgb at lilith|T:106>cat /proc/1214/status Name: pine State: S (sleeping) Tgid: 1214 Pid: 1214 PPid: 1205 TracerPid: 0 Uid: 1337 1337 1337 1337 Gid: 1337 1337 1337 1337 FDSize: 32 Groups: 1337 0 VmSize: 11752 kB VmLck: 0 kB VmRSS: 5652 kB VmData: 2496 kB VmStk: 52 kB VmExe: 2804 kB VmLib: 3708 kB SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 8000000008001003 SigCgt: 0000000040016c5c CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 This is an almost human readable view of MUCH of the same data that is in /proc/stat. Of course there is the little ASCII encoded hexadecimal garbage at the bottom that could make strong coders weep (again, without a fairly explicit guide into what every byte or even BIT in this array does, as one sort of expects that there are binary masked values stuck in here). In this case man proc doesn't help -- because this is supposedly "human readable" they don't provide a reference there. Still, some of the stuff that is output by ps aux is clearly there in a fairly easily parseable form. Mind you, there are still mysteries. What are the four UID entries? What is the resolution on the memory, and are kB x1000 or x1024? What about the rest of the data in /proc/stat (as there are a lot more fields there). What about the contents of /proc/PID/statm? (Or heavens preserve us, /proc/PID/maps)? Finally, what about other things in /proc, e.g.: rgb at lilith|T:119>cat /proc/stat cpu 3498 0 2122 239197 cpu0 3498 0 2122 239197 page 128909 55007 swap 1 0 intr 279199 244817 13604 0 3427 6 0 4 4 1 3 2 2 1436 0 15893 0 disk_io: (3,0):(15946,11130,257194,4816,109992) ctxt 335774 btime 1066170139 processes 1261 Again, ASCII yes, but now (count them) there are whitespace, :, (, and ',' separators, and one piece of data (the CPU's index) is a part of a field value (cpu0) so that the entire string "cpu" becomes a sort of separator (but only in one of the lines). An impressive ratio of separators used to field labels. I won't even begin to address the LIVE VILE EVIL of overloading nested data structures nested in sequential, arbitrary separators inside the "values" for a single field, disk_io (or is that disk_io:?) If this isn't enough for you, consider /proc/net/dev, which has two separators (: and ws) but is in COLUMNS, /proc/bus/pci/devices (which I still haven't figured out) and yes, the aforementioned sensors interface in /proc. I offer all of the above as evidence of a fairly evil (did you ever notice how evil, live, vile, veil and elvi are all anagrams of one another he asks in a mindless parenthetical insertion to see if you're still awake:-) middle ground between a true binary interface accessible only through library calls (which can actually be fairly clean, if one creates objects/structs with enough mojo to hold the requisite data types so that one can then create a relatively simple set of methods for accessing them) and xml. XML is the opposite end of the binary spectrum. It asserts as its primary design principle that the objects/structs with the right kind of mojo share certain features -- precisely those that constitute the rigorous design requirements of XML (nesting, attributes, values, etc). There is a fairly obvious mapping between a C struct, a C++ object, and an XMLified table. It also asserts implicitly that whether or not the object tags are chosen to be human readable (nobody insists that the tags encapsulating CPU temperature readings be named -- they could have been just ) there MUST be some sort of dictionary created at the same time as the XML implementation. If (very) human readable tags are chosen they are nearly self-documenting, but whole layers of DTD and CSS and so forth treatment of XML compliant markup are predicated upon a clear definition of the tag rules and hierarchy. Oh, and by its very design XML is highly scalable and extensible. Just as one can easily enough add fields into a struct without breaking code that uses existing fields, one can often add tags into an XML document description without breaking existing tags or tag processing code (compare with adding a field anywhere into /proc/stat -- ooo, disaster). This isn't always the case in either case -- sometimes one converts a field in a struct into a struct in its own right, for example, which can do violence to both the struct and an XML realization of it. Still, often one can and when one can't it is usually because you've had a serious insight into the "right" way to structure your data and before the encoding was just plain wrong in some deep way. This happens, but generally only fairly early in the design and implementation process. Note that XML need not be inefficient in transit. BECAUSE it is so highly structured, it compresses very efficiently. Library calls exist to squeeze out insignificant whitespace, for example (ignored by the parser anyway). I haven't checked recently to see whether compression is making its way into the library, but either way one can certainly compress/decompress and/or encrypt/decrypt the assembled XML messages before/after transmission, if CPU is cheaper to you than network or security is an issue. I think that it then comes down to the following. XML may or may not be perfect, but it does form the basis for a highly consistent representation of data structures that is NOT OPAQUE and is EASILY CREATED AND EASILY PARSED with STANDARD TOOLS AND LIBRARIES. When designing an XMLish "language" for your data, you can make the same kind of choices that you face in any program. Do you document your code or not? Do you use lots of variable names like egrp1 or do you write out something roughly human readable like extra_group_1? Do you write your loops so that they correspond to the actual formulae or basic algorithm (and let the compiler do as well as it can with them) or do you block them out to be cache-friendly, insert inline assembler, and so forth to make them much faster but impossible to read or remember even yourself six months after you write them? Some choices make the code run fast and short but hard to maintain. Other choices make it run slower but be more readable and easier to maintain. In the long run, I think most programmers eventually come to a sort of state of natural economy in most of these decisions; one that expresses their personal style, the requirements of their job, the requirements of the task, and a reflection of their experience(s) coding. It is a cost/benefit problem, after all (as is so much in computing). You have to ask how much it costs you to do something X way instead of Y way, and what the payoff/benefits are, in the long run. For myself only, years of experience have convinced me that as far as things like /proc or task/hardware monitoring are concerned, the bandwidth vs ease of development and maintenance question comes down solidly in favor of ease of development and maintenance. Huge amounts of human time are wasted writing parsers and extracting de facto data dictionaries from raw source (the only place where they apparently reside). Tools that are built to collect data from a more or less arbitrary interface have to be almost completely rewritten when that interface changes signficantly (or break horribly in the meantime). So the cost is this human time (programmers'), more human time (the time and productivity lost by people who lack the many tools a better interface would doubtless spawn), and the human time and productivity lost due to the bugs the more complex and opaque and multilayered interface generates. The benefit is that you save (as you note) anywhere from a factor of 3-4 to 10 or more in the total volume of data delivered by the interface. Data organization and human readability come at a price. But what is the REAL cost of this extra data? Data on computers is typically manipulated in pages of memory, and a page is what, 4096 bytes? Data movement (especially of contiguous data) is also very rapid on modern computers -- you are talking about saving a very tiny fraction of a second indeed when you reduce the message from 54 bytes to 4 bytes. Even on the network, on a 100BT connection one is empirically limited by LATENCY on messages less than about 1000 bytes in length. So if you ask how long it takes to send a 4 byte packet or a 54 byte packet (either one of which is TCP encapsulated inside a header that is longer than the data) the answer is that they take exactly the same amount of time (within a few tens of nanoseconds). If the data in question is truly a data stream -- a more or less continuous flow of data going through a channel that represents a true bottleneck, then one should probably use a true binary representation to send the data (as e.g. PVM or MPI generally do), handling endian translation and data integrity and all that. If the data in question is a relatively short (no matter how it is wrapped and encoded) and intermittant source -- as most things like a sensors interface, the proc interface(s) in general, the configuration file of your choice, and most net/web services are, arguably -- then working hard to compress or minimally encapsulate the data in an opaque form is hard to justify in terms of the time (if any) that it saves, especially on networks, CPUs, memory that are ever FASTER. If it doesn't introduce any human-noticeable delay, and the overall load on the system(s) in question remain unmeasurably low (as was generally the case with e.g. the top command ten Moore's Law years or more ago) then why bother? I think (again noting that this is my own humble opinion:-) that there is no point. /proc should be completely rewritten, probably by being ghosted in e.g. /xmlproc as it is ported a little at a time, to a single, consistent, well documented xmlish format. procps should similarly be rewritten in parallel with this process, as should the other tools that extract data from /proc and process it for human or software consumption. Perhaps experimentation will determine that there are a FEW places in /proc where the extra overhead of parsing xml isn't acceptable for SOME applications -- /proc/pid/stat for example. In those few cases it may be worthwhile to make the ghosting permanent -- to provide an xmlish view AND a binary or minimal ASCII view, as is done now, badly, with /proc/pid/stat and /proc/pid/status. This is especially true, BTW, in open source software, where a major component of the labor that creates and maintains both low level/back end service software and high level/front end client software is unpaid, volunteer, part time, and of a wide range of skill and experience. Here the benefits of having a documented, rigorously organized, straightforwardly parsed API layer between tools are the greatest. Finally, to give the rotting horse one last kick, xmlified documents (deviating slightly from API's per se) are ideal for archival storage purposes. Microsoft is being scrutinized now by many agencies concerned about the risks associated from having 90% of our vital services provided by an operating system that has proven in practice to be appallingly vulnerable. Their problem has barely begun. The REAL expense associated with using Microsoft-based documents is going to prove in the long run to be the expense of de-archiving old proprietary-binary-format documents long after the tools that created them have gone away. This is a problem worthy of a rant all by itself (and I've written one or two in other venues) but it hasn't quite reached maturity as it requires enough years of document accumulation and toplevel drift in the binary "standard" before it jumps out and slaps you in the face with six and seven figure expenses. XMLish documents (especially when accompanied by a suitable DTD and/or data dictionary) simply cannot cost that much to convert because their formats are intrinsically open. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From chrismiles1981 at hotmail.com Tue Oct 14 22:02:28 2003 From: chrismiles1981 at hotmail.com (Chris Miles) Date: Wed, 15 Oct 2003 03:02:28 +0100 Subject: Condor Problem Message-ID: Does anyone have any condor experience? im trying to submit a job which is a Borland C++ console application.. the application writes a final output to the screen... but this is not being saved to the output file i specified in the jobs configuration. When i use a simple batch file and echo some text to the screen and submit that as a job it works fine and the echoed text is in the output file. Is there a problem with condor? or is there a problem with c++ or stdout? any help would be greatly appreciated. Thanks in advance... Chris Miles, NeuralGrid, Paisley University, Scotland _________________________________________________________________ Express yourself with cool emoticons - download MSN Messenger today! http://www.msn.co.uk/messenger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kohlja at ornl.gov Tue Oct 14 15:08:54 2003 From: kohlja at ornl.gov (James Kohl) Date: Tue, 14 Oct 2003 15:08:54 -0400 Subject: PVM errors at startup In-Reply-To: <3F8C2A22.83F751A3@hmg.inpg.fr> References: <3F86F29F.8A37AC5B@hmg.inpg.fr> <3F8C2A22.83F751A3@hmg.inpg.fr> Message-ID: <20031014190854.GA31004@neo.csm.ornl.gov> Hey Patrick, Glad you found the problem. This is usually manifested when the networking config is off slightly, or when internal/external networks are confused, but it sounds like you had a much more interesting problem...! :-) Yes, PVM uses rsh/ssh/TCP to start a remote PVM daemon (pvmd) but then the daemons themselves use UDP to talk and route PVM messages. FYI, any PVM tasks that use the "PvmRouteDirect" will use direct TCP sockets. Again, glad you figured it out! (And you're most welcome! :) All the Best, Jim On Tue, Oct 14, 2003 at 06:53:54PM +0200, Patrick Begou wrote: > This email just to close the thread with the solution. > The problem was not related to any PVM misconfiguration but to the > ethernet driver. Looking at the ethernet communications between 2 nodes > with tcpdump has shown that pvmd was started using tcp communications > BUT that pvmd were trying to talk each other with UDP protocol (it is > also detailed in the PVM doc) and this was the problem. The UDP > communications was unsuccessfull between the nodes. > Details: > The nodes are P4 2.8 with Asustek P4P800 motherboard and on board 3C940 > (gigabit) controler. I was using the 3c2000 driver (from the cdrom). > Kernel is 2.4.20-20.7bigmem from RedHat 7.3. > rsh, rexec and rcp are working fine but this driver seems not to work > with UDP protocol??? > The solution was to download the sk68lin driver (v6.18) and run the > shell script to patch the kernel sources for the current kernel. Then > correct the module.conf file and set up the gigabit interface. Now PVM > is working fine between the two first nodes and the measured throughput > is the same as with 3c2000 asustek driver. I should now setup the other > nodes! > I would like to thanks Pr. Kenneth R. Koehler and Dr James Arthur Kohl > for their great help in checking the full PVM configuration and leading > me towards a network driver problem. > Patrick (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(: James Arthur "Jeeembo" Kohl, Ph.D. "Da Blooos Brathas?! They Oak Ridge National Laboratory still owe you money, Fool!" kohlja at ornl.gov http://www.csm.ornl.gov/~kohl/ Long Live Curtis Blues!!! :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Oct 15 04:49:26 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 15 Oct 2003 10:49:26 +0200 (CEST) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: On Tue, 14 Oct 2003, Robert G. Brown wrote: > On Tue, 14 Oct 2003, Dale Harris wrote: > > On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated: > > > > > > > > > > > > 54.2 > > > > > > > > You know... one problem I see with this, assuming this information is > > going to pass across the net (or did I miss something). Is that instead > > of passing something like four bytes (ie "54.2"), you are going to be > > passing 56 bytes (just counting the cpu_temp line). So the XML blows up > > a little bit of data 14 times. I can't see this being particularly > > efficient way of using a network. Sure, it looks pretty, but seems like > > a waste of bandwidth. > > Ah, an open invitation to waste a little more:-) Isn't it a bit cynical to write a 20 KByte e-mail on the topic of saving 56 Bytes? ;-) SCNR, Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H16 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Wed Oct 15 11:43:09 2003 From: djholm at fnal.gov (Don Holmgren) Date: Wed, 15 Oct 2003 10:43:09 -0500 Subject: Some application performance results on a dual G5 Message-ID: For those who might be interested, I've posted some lattice QCD application performance results on a 2.0 GHz dual G5 PowerMac. See http://lqcd.fnal.gov/benchmarks/G5/ As expected from the specifications, strong memory bandwidth, reasonable scaling, and good floating point performance. Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 15 09:46:45 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 15 Oct 2003 09:46:45 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: On Wed, 15 Oct 2003, Felix Rauch wrote: > > > passing 56 bytes (just counting the cpu_temp line). So the XML blows up > > > a little bit of data 14 times. I can't see this being particularly > > > efficient way of using a network. Sure, it looks pretty, but seems like > > > a waste of bandwidth. > > > > Ah, an open invitation to waste a little more:-) > > Isn't it a bit cynical to write a 20 KByte e-mail on the topic of > saving 56 Bytes? ;-) Cynical? No, not really. Stupid? Probably. If only I could get SOMEBODY to pay me ten measely cents a word for my rants... Alas this is not to be. So the alternative is to see if I can extort ten cents from everybody on the list NOT to write 20K rants like this. Sort of like National Lampoon's famous "Buy this magazine or we'll shoot this dog" issue...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Wed Oct 15 14:16:06 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed, 15 Oct 2003 11:16:06 -0700 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: Message-ID: <20031015181606.GA1574@greglaptop.internal.keyresearch.com> On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote: > So the alternative is to see if I can extort > ten cents from everybody on the list NOT to write 20K rants like this. Do you accept pay-pal? Do you promise to spend all the money buying yourself beer? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From chrismiles1981 at hotmail.com Wed Oct 15 21:22:01 2003 From: chrismiles1981 at hotmail.com (Chris Miles) Date: Thu, 16 Oct 2003 02:22:01 +0100 Subject: Condor Problem Message-ID: Hi, thanks for the reply Using all this instead of condor/globus? The only thing was I need to do this on windows. What i want to do is setup a Grid but also need a cluster to run jobs on Chris >From: Andrew Wang >To: Chris Miles >CC: beowulf at beowulf.org >Subject: Re: Condor Problem >Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST) >MIME-Version: 1.0 >Received: from mc11-f10.hotmail.com ([65.54.167.17]) by >mc11-s20.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Wed, 15 Oct >2003 18:13:50 -0700 >Received: from web16812.mail.tpe.yahoo.com ([202.1.236.152]) by >mc11-f10.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Wed, 15 Oct >2003 18:11:09 -0700 >Received: from [65.49.83.96] by web16812.mail.tpe.yahoo.com via HTTP; Thu, >16 Oct 2003 09:11:03 CST >X-Message-Info: JGTYoYF78jHqyjkG27RbQOhxNCLEO1Jq >Message-ID: <20031016011103.41833.qmail at web16812.mail.tpe.yahoo.com> >Return-Path: andrewxwang at yahoo.com.tw >X-OriginalArrivalTime: 16 Oct 2003 01:11:09.0941 (UTC) >FILETIME=[62388A50:01C39382] > >If all you need is a batch system, I would suggest SGE >and Scalable PBS, which have more users and better >support. > >Both of them are free and opensource, so you can try >both and see which one you like better! > >SGE: http://gridengine.sunsource.net >SPBS: http://www.supercluster.org/projects/pbs/ > >Andrew. > >----------------------------------------------------------------- >?C???? Yahoo!?_?? >?????C???B?????????B?R?A???????A???b?H?????? >http://tw.promo.yahoo.com/mail_premium/stationery.html _________________________________________________________________ Stay in touch with absent friends - get MSN Messenger http://www.msn.co.uk/messenger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Oct 15 21:11:03 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST) Subject: Condor Problem Message-ID: <20031016011103.41833.qmail@web16812.mail.tpe.yahoo.com> If all you need is a batch system, I would suggest SGE and Scalable PBS, which have more users and better support. Both of them are free and opensource, so you can try both and see which one you like better! SGE: http://gridengine.sunsource.net SPBS: http://www.supercluster.org/projects/pbs/ Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Oct 15 21:37:36 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed, 15 Oct 2003 18:37:36 -0700 Subject: Pentium4 vs Xeon In-Reply-To: References: <20031014160850.GA1163@sgirmn.pluri.ucm.es> <20031014160850.GA1163@sgirmn.pluri.ucm.es> Message-ID: <5.2.0.9.2.20031015183031.03c57ce8@216.82.101.6> There are single-Xeon boards using the Serverworks GC series of chipsets with 64-bit PCI, but they're just as expensive as a budget dual Xeon board (Tyan S2723 or Supermicro X5DPA-GG)... In the $280 to $310 per board price range. Seems rather silly, as the "Prestonia" Socket-604 Xeon CPUs are nothing but a P4 repackaged. There's also this board: http://www.tyan.com/products/html/trinitygcsl.html Which uses a single P4 @ 533MHz FSB, with the same Serverworks chipset. Supermicro X5-SS* series (scroll down): http://www.supermicro.com/Product_page/product-mS.htm >Currently P4 motherboards are only available (AFAIK) with 33MHz/32bit >PCI. That can be a big bottleneck if your cluster application is >sensitive to I/O bandwidth. Early in 2004, if the rumours are true, >there will be a P4 chipset supporting 66MHz/64bit PCI-X. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Oct 15 22:15:56 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Thu, 16 Oct 2003 10:15:56 +0800 (CST) Subject: Condor Problem In-Reply-To: Message-ID: <20031016021556.52225.qmail@web16812.mail.tpe.yahoo.com> Unluckily, SGE has very limited Windows support. PBSPro, which supports MS-Windows (the free versions do not), does offer free licenses to .edu sites. BTW, may be there are more people with condor knowledge from the condor mailing list can answer your questions. http://www.cs.wisc.edu/~lists/archive/condor-users/ Andrew. --- Chris Miles ????> Hi, thanks for the reply > > Using all this instead of condor/globus? > > The only thing was I need to do this on windows. > > What i want to do is setup a Grid but also need a > cluster to run > jobs on > > Chris > > >From: Andrew Wang > >To: Chris Miles > >CC: beowulf at beowulf.org > >Subject: Re: Condor Problem > >Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST) > >MIME-Version: 1.0 > >Received: from mc11-f10.hotmail.com > ([65.54.167.17]) by > >mc11-s20.hotmail.com with Microsoft > SMTPSVC(5.0.2195.5600); Wed, 15 Oct > >2003 18:13:50 -0700 > >Received: from web16812.mail.tpe.yahoo.com > ([202.1.236.152]) by > >mc11-f10.hotmail.com with Microsoft > SMTPSVC(5.0.2195.5600); Wed, 15 Oct > >2003 18:11:09 -0700 > >Received: from [65.49.83.96] by > web16812.mail.tpe.yahoo.com via HTTP; Thu, > >16 Oct 2003 09:11:03 CST > >X-Message-Info: JGTYoYF78jHqyjkG27RbQOhxNCLEO1Jq > >Message-ID: > <20031016011103.41833.qmail at web16812.mail.tpe.yahoo.com> > >Return-Path: andrewxwang at yahoo.com.tw > >X-OriginalArrivalTime: 16 Oct 2003 01:11:09.0941 > (UTC) > >FILETIME=[62388A50:01C39382] > > > >If all you need is a batch system, I would suggest > SGE > >and Scalable PBS, which have more users and better > >support. > > > >Both of them are free and opensource, so you can > try > >both and see which one you like better! > > > >SGE: http://gridengine.sunsource.net > >SPBS: http://www.supercluster.org/projects/pbs/ > > > >Andrew. > > > >----------------------------------------------------------------- > >??? Yahoo!?? > >?????????????????????? > >http://tw.promo.yahoo.com/mail_premium/stationery.html > > _________________________________________________________________ > Stay in touch with absent friends - get MSN > Messenger > http://www.msn.co.uk/messenger > ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From graham.mullier at syngenta.com Thu Oct 16 04:47:12 2003 From: graham.mullier at syngenta.com (graham.mullier at syngenta.com) Date: Thu, 16 Oct 2003 09:47:12 +0100 Subject: XML for formatting (Re: Environment monitoring) Message-ID: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> [Hmm, and will the rants be longer or shorter after he's bought the mental lubricant?] I'm in support of the original rant, however, having had to reverse-engineer several data formats in the past. Most recently a set of molecular-orbital output data. Very frustrating trying to count through data fields and convince myself that we have mapped it correctly. Anecdote from a different field (weather models) that's related - for a while, a weather model used calibration data a bit wrong - sea temperature and sea surface wind speed were swapped. All because someone had to look at a data dump and guess which column was which. So, sure, XML is very wordy, but the time saving (when trying to decipher the data) and potential for avoiding big mistakes more than makes up for it (IMO). Graham Graham Mullier Chemoinformatics Team Leader, Chemistry Design Group, Syngenta, Bracknell, RG42 6EY, UK. direct line: +44 (0) 1344 414163 mailto:Graham.Mullier at syngenta.com -----Original Message----- From: Greg Lindahl [mailto:lindahl at keyresearch.com] Sent: 15 October 2003 19:16 Cc: beowulf at beowulf.org Subject: Re: XML for formatting (Re: Environment monitoring) On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote: > So the alternative is to see if I can extort > ten cents from everybody on the list NOT to write 20K rants like this. Do you accept pay-pal? Do you promise to spend all the money buying yourself beer? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Thu Oct 16 08:12:36 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Thu, 16 Oct 2003 14:12:36 +0200 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: <20031014211426.GI8116@maybe.org> Message-ID: <20031016121236.GE8711@unthought.net> On Tue, Oct 14, 2003 at 08:45:12PM -0400, Robert G. Brown wrote: ... > rgb at lilith|T:105>cat /proc/1214/stat > 1214 (pine) S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0 > 14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168 > 4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0 While this has nothing to do with your (fine as always ;) rant, I just need to add a comment (which has everything to do with /proc stupidities): > (which, as you can see, contains the information on the pine application > within which I am currently working on my laptop). > > What? You find that hard to read? Imagine I had a process with the (admittedly unlikely but entirely possible) name 'pine) S 1205 (' Your stat output would read: 1214 (pine) S 1205 () S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0 14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168 4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0 Parsing the ASCII-art in /proc/mdstat is at least as fun ;) -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 16 08:08:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 16 Oct 2003 08:08:12 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031015181606.GA1574@greglaptop.internal.keyresearch.com> Message-ID: On Wed, 15 Oct 2003, Greg Lindahl wrote: > On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote: > > > So the alternative is to see if I can extort > > ten cents from everybody on the list NOT to write 20K rants like this. > > Do you accept pay-pal? Do you promise to spend all the money buying > yourself beer? I do accept pay-pal, by strange chance and will cheerfully delete one word out of a 20Kword base for every dime received (and to make it clear to the list that I've done so, naturally I'll post the diff with the original as well as the modified rant:-). I can't promise to spend ALL of the money buying beer, because my liver is old and has already tolerated much abuse over many years and I want it to last a few more decades, but I'll certainly lift a glass t'alla yer health from time to time...:-) On the other hand, given my experiences with people sending me free money via pay-pal up to this point, it would probably be safe to promise to spend it "all" on beer. Even my aged liver can tolerate beer by the thimbleful...if I didn't end up a de facto teetotaller.;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Oct 16 12:02:18 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 16 Oct 2003 12:02:18 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> Message-ID: On Thu, 16 Oct 2003 graham.mullier at syngenta.com wrote: > [Hmm, and will the rants be longer or shorter after he's bought the mental > lubricant?] Buy the right amount and they will be eloquent enough that you won't mind, or too much and they will be short and slurred ;-o > I'm in support of the original rant, however, having had to reverse-engineer > several data formats in the past. Most recently a set of molecular-orbital > output data. Very frustrating trying to count through data fields and > convince myself that we have mapped it correctly. What you want is not XML, but a data format description language. When I first read about XML, that what I believed it was. I was expecting that file optionally described the data format as a prologue, and then had a sequence of efficently packed data structures. But the XML designers created the evil twin of that idea. The header is a schema of parser rules, and each data element had verbose syntax that conveyes little semantic information. A XML file - is difficult for humans to read, yet is even larger than human-oriented output - requires both syntax and rule checking after human editing, yet is complex for machines to parse. - is intended for large data sets, where the negative impacts are multiplied - encourages "cdata" shortcuts that bypass the few supposed advantages. > Anecdote from a different field (weather models) that's related - for a > while, a weather model used calibration data a bit wrong - sea temperature > and sea surface wind speed were swapped. All because someone had to look at > a data dump and guess which column was which. Versus looking at an XML output and guessing what "load_one" means? I see very little difference: repeating a low-content label once for each data element doesn't convey more information. The only XML adds here is avoiding miscounting fields for undocumented data structures. What we really want in both the weather code case and when reporting cluster statistics is a data format description language. That description includes the format of the packed fields, and should include what the fields mean and their units, which is what we are missing in both cases. With such an approach we can efficiently assemble, transmit and deconstruct packed data while having automatic tools to check its validity. And an general-purpose tools can even combine a descrition and compact data set to product XML. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Thu Oct 16 16:12:13 2003 From: rodmur at maybe.org (Dale Harris) Date: Thu, 16 Oct 2003 13:12:13 -0700 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> Message-ID: <20031016201213.GV8116@maybe.org> On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated: > > What you want is not XML, but a data format description language. > I think the S-expression guys would say that they have one. And it is supermon uses, FWIW. http://sexpr.sourceforge.net/ http://supermon.sourceforge.net/ (supermon pages are currently unavailable.) Dale _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Thu Oct 16 16:52:03 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Thu, 16 Oct 2003 16:52:03 -0400 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031016201213.GV8116@maybe.org> References: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> <20031016201213.GV8116@maybe.org> Message-ID: <1066337523.11093.20.camel@roughneck.liniac.upenn.edu> On Thu, 2003-10-16 at 16:12, Dale Harris wrote: > On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated: > > > > What you want is not XML, but a data format description language. > > > > > I think the S-expression guys would say that they have one. And it is > supermon uses, FWIW. > > > http://sexpr.sourceforge.net/ > > http://supermon.sourceforge.net/ We use supermon as the data gathering mechanism for Clubmask, and I really like it. You can mask to get just certain values, and it is _really_ fast. Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Tue Oct 14 23:31:55 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: 14 Oct 2003 22:31:55 -0500 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: Message-ID: <1066188715.3200.120.camel@terra> On Tue, 2003-10-14 at 19:45, Robert G. Brown wrote: ... > > > ... > > > > rgb As someone who has done programming environment tools most of his reasonably long professional life, I must say you have hit the nail on the head. I have rooted through more than my share of shitty binary formats in my day, and I can honestly say that I go home happier as a result of dealing with an XML trace file in my current project. I was happily working away dealing with only XML, but then it happened. The demons of my past rose their ugly heads when I decided that it would be a good thing to get some ELF information outta some files. Being the industrious guy I am, I went and got ELF docs from Dave Anderson's stash. Did that help? Nope, not really, as it was mangled 64-bit focused ELF. Was it documented? Nope, not really. You could look at the elfdump code to see what that does, so in a backwards way, it was documented. The alternative was to ferret out the format by bugging enough compiler geeks until they gave up the secret handshake. The alternative that I eventually took was to go lay down until the desire to have the ELF information went away. ;-) -- -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Thu Oct 16 17:36:25 2003 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Thu, 16 Oct 2003 16:36:25 -0500 Subject: OT: same commands to multiple servers? Message-ID: <20031016163625.C11181@mikee.ath.cx> I now have control over many AIX servers and I know there are some programs that allow you (once configured) to send the same command to multiple nodes/servers, but do these commands exist within the AIX environment? Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bryce at jfet.net Thu Oct 16 16:15:08 2003 From: bryce at jfet.net (Bryce Bockman) Date: Thu, 16 Oct 2003 16:15:08 -0400 (EDT) Subject: A Petaflop machine in 20 racks? Message-ID: Hi all, Check out this article over at wired: http://www.wired.com/news/technology/0,1282,60791,00.html It makes all sorts of wild claims, but what do you guys think? Obviously, there's memory bandwidth limitations due to PCI. Does anyone know anything else about these guys? Cheers, Bryce _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Oct 16 17:54:31 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 16 Oct 2003 17:54:31 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031016201213.GV8116@maybe.org> Message-ID: On Thu, 16 Oct 2003, Dale Harris wrote: > On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated: > > > > What you want is not XML, but a data format description language. > > I think the S-expression guys would say that they have one. And it is > supermon uses, FWIW. No, S-expressions are an ancient concept, developed back in the early days of computing. They were needed in Lisp to linearize tree structures so that they could be saved to, uhmm, paper tape or clay tablets. Sexprs are oriented toward "structured" data. In this context "structured" means "Lisp-like linked lists" rather than "a series of 'C' structs". More directly related concepts are XDR, part of SunRPC MPI packed data Object brokers all of which are trying to solve similar problem. But, except for a few of the "object broker" systems, they don't have the metadata language to translate between domains. For instance, you can't take MPI packed data and automatically convert it to (useful) XML, pass it to an object broker system, or call a non-MPI remote procedure -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Thu Oct 16 19:21:58 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Thu, 16 Oct 2003 16:21:58 -0700 Subject: OT: same commands to multiple servers? In-Reply-To: <20031016163625.C11181@mikee.ath.cx> References: <20031016163625.C11181@mikee.ath.cx> Message-ID: <3F8F2816.9030606@cert.ucr.edu> Mike Eggleston wrote: >I now have control over many AIX servers and I know there >are some programs that allow you (once configured) to send >the same command to multiple nodes/servers, but do these >commands exist within the AIX environment? > No idea if it would work on AIX, but you could try out pconsole: http://www.heiho.net/pconsole/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Mark at MarkAndrewSmith.co.uk Thu Oct 16 19:36:23 2003 From: Mark at MarkAndrewSmith.co.uk (Mark Andrew Smith) Date: Fri, 17 Oct 2003 00:36:23 +0100 Subject: A Petaflop machine in 20 racks? In-Reply-To: Message-ID: Comment: As each generation of this chip gets more powerful, in an exponential way, then clusters of these chips could be used to break encryption algorithms via brute force approaches. If this became anywhere near an outside chance of a possibility of succeeding, or even threat of, then I would expect Governments to carefully consider export requirements and restrictions, or even in the extreme, classify it as a military armament similar to early RSA 128bit software encryption ciphers. However it could be the dawn of a new architecture for us all..... Kindest regards, Mark Andrew Smith Tel: (01942)722518 Mob: (07866)070122 http://www.MarkAndrewSmith.co.uk/ -----Original Message----- From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf Of Bryce Bockman Sent: 16 October 2003 21:15 To: beowulf at beowulf.org Subject: A Petaflop machine in 20 racks? Hi all, Check out this article over at wired: http://www.wired.com/news/technology/0,1282,60791,00.html It makes all sorts of wild claims, but what do you guys think? Obviously, there's memory bandwidth limitations due to PCI. Does anyone know anything else about these guys? Cheers, Bryce _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf This email has been scanned for viruses by NetBenefit using Sophos anti-virus technology --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.525 / Virus Database: 322 - Release Date: 09/10/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.525 / Virus Database: 322 - Release Date: 09/10/2003 This email has been scanned for viruses by NetBenefit using Sophos anti-virus technology _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Thu Oct 16 19:46:19 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Thu, 16 Oct 2003 16:46:19 -0700 Subject: A Petaflop machine in 20 racks? References: Message-ID: <000f01c3943f$b40ac100$32a8a8c0@laptop152422> Browsing through ClearSpeed's fairly "content thin" website, one turns up the following: http://www.clearspeed.com/downloads/overview_cs301.pdf The CS302 has an array of 64 processors and 256Kbytes of memory in the array + 128 Kbytes SRAM on chip. That's 4 Kbytes/processor (much like a cache).. It doesn't say how many bits wide each processor is, though.. 51.2 Gbyte/sec bandwidth is quoted.. that's 800 Mbyte/sec per processor, which is a reasonable sort of rate. 10 microsecond 1K complex FFTs are reasonably fast, but without knowing how many bits, it's hard to say whether it's outstanding. It also doesn't say whether the architecture is, for instance, SIMD. It could well be a systolic array, which would be very well suited to cranking out FFTs or other similar things, but probably not so hot for general purpose crunching. For all their vaunted patent and IP portfolio, they have only one patent listed in the USPTO database under their own name, and that's some sort of DRAM. ----- Original Message ----- From: "Bryce Bockman" To: Sent: Thursday, October 16, 2003 1:15 PM Subject: A Petaflop machine in 20 racks? > Hi all, > > Check out this article over at wired: > > http://www.wired.com/news/technology/0,1282,60791,00.html > > It makes all sorts of wild claims, but what do you guys think? > Obviously, there's memory bandwidth limitations due to PCI. Does anyone > know anything else about these guys? > > Cheers, > Bryce > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Oct 16 21:23:57 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 16 Oct 2003 21:23:57 -0400 (EDT) Subject: A Petaflop machine in 20 racks? In-Reply-To: Message-ID: Looking at the standard "we have the solution to everyones computing needs press release" a few things are clear: "... multi-threaded array processor ..." which is further verified later in the press release: "... where the CS301 is acting as a co-processor, dynamic libraries offload an application's inner loops to the CS301. Although these inner loops only make up a small portion of the source code, these loops are responsible for the vast majority of the application's running time. By offloading the inner loops, the CS301 can bypass the traditional bottleneck caused by a CPU's limited mathematical capability..." It seems to be a low power array processor which may be of some real value to some people. The real issue is can they keep pace in terms of cost and performance with the commodity CPU market. And what about code portability. Quite a few people have spent quite a lot of time porting and tweaking codes for architectures that seemed to have a rather short lived history. Of course, there is no hardware yet. Doug On Thu, 16 Oct 2003, Bryce Bockman wrote: > Hi all, > > Check out this article over at wired: > > http://www.wired.com/news/technology/0,1282,60791,00.html > > It makes all sorts of wild claims, but what do you guys think? > Obviously, there's memory bandwidth limitations due to PCI. Does anyone > know anything else about these guys? > > Cheers, > Bryce > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Fri Oct 17 03:48:17 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Fri, 17 Oct 2003 09:48:17 +0200 Subject: A Petaflop machine in 20 racks? In-Reply-To: <000f01c3943f$b40ac100$32a8a8c0@laptop152422> References: <000f01c3943f$b40ac100$32a8a8c0@laptop152422> Message-ID: <200310170948.17224.joachim@ccrl-nece.de> Jim Lux: > It also doesn't say whether the architecture is, for instance, SIMD. It > could well be a systolic array, which would be very well suited to cranking > out FFTs or other similar things, but probably not so hot for general > purpose crunching. Exactly. Such coprocessor-boards (typically DSP-based, which also achieve some GFlop/s) already exist for a long time, but obviously are not suited to change "the way we see computing" (place your marketing slogan here). One reason is the lack of portability for code making use of such hardware, but I think if the performance for a wider range of applications would effectively come anywhere close to the peak performance, this problem would be overcome by the premise of getting teraflop-performance for some 10k of $. Thus, the problem probably is that typical applications do not achieve the promised performance. All memory-bound applications will get stuck on the PCI-bus, by both, memory access latency and bandwidth. High sustained performance for real problems can, in the general case, only be achieved in a balanced system. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Fri Oct 17 04:23:46 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Fri, 17 Oct 2003 10:23:46 +0200 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: Message-ID: <200310171023.46865.joachim@ccrl-nece.de> Donald Becker: > More directly related concepts are > XDR, part of SunRPC > MPI packed data Hmm, as you note below, they both do not describe the data they handle, just transform in into a uniform representation. > Object brokers > all of which are trying to solve similar problem. But, except for a few > of the "object broker" systems, they don't have the metadata language to > translate between domains. For instance, you can't take MPI packed data > and > automatically convert it to (useful) XML, > pass it to an object broker system, or > call a non-MPI remote procedure You might want to check HDF5, or for a simpler yet widely used approach, NetCDF. They are self-describing file formats. But as you can send everything via the net the same way you access it in a file, this should be useful. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cap at nsc.liu.se Fri Oct 17 04:40:56 2003 From: cap at nsc.liu.se (Peter Kjellstroem) Date: Fri, 17 Oct 2003 10:40:56 +0200 (CEST) Subject: OT: same commands to multiple servers? In-Reply-To: <20031016163625.C11181@mikee.ath.cx> Message-ID: There is something called dsh (distributed shell) part of some IBM package. The guys at llnl has done further work in this direction with pdsh which I belive runs fine on AIX. pdsh can be found at: http://www.llnl.gov/linux/pdsh/ /Peter On Thu, 16 Oct 2003, Mike Eggleston wrote: > I now have control over many AIX servers and I know there > are some programs that allow you (once configured) to send > the same command to multiple nodes/servers, but do these > commands exist within the AIX environment? > > Mike -- ------------------------------------------------------------ Peter Kjellstroem | National Supercomputer Centre | Sweden | http://www.nsc.liu.se _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scheinin at crs4.it Fri Oct 17 04:35:48 2003 From: scheinin at crs4.it (Alan Scheinine) Date: Fri, 17 Oct 2003 10:35:48 +0200 Subject: A Petaflop machine in 20 racks? Message-ID: <200310170835.h9H8ZmY02530@dali.crs4.it> I have not read carefully descriptions of the Opteron architecture until a few minutes ago. I was not able to find a picture of the layout in silicon at the AMD site, I found a picture at Tom's Hardware. http://www.tomshardware.com/cpu/20030422/opteron-04.html The page before shows that 50 percent of the silicon is cache. Of what is not cache, it seems that the floating point unit occupies about 1/6 or 1/7th of the area, moreover, the authors Frank Voelkel, Thomas Pabst, Bert Toepelt, and Mirko Doelle describe the Opteron as having three floating point units, FADD, FMUL and FMISC. Just counting FADD and FMUL and considering the entire area of the Opteron, using 2 GHz for the frequency, that would be about 12 FP units times 2 GHz, 24 GFLOPS. So it is doable. I do not know the depth of the pipeline, but it is likely it is deep. How do you keep the pipeline full? PCI is around 0.032 Giga floating point words per second? The entire memory subsystem needs to be changed drastically. Moreover, whereas integer units might be used to solve problems that are logically complex, floating point problems are typically ones that use a large amount of data, more than what can fit into cache. But you-all knew that already, Alan Scheinin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 17 09:43:07 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 17 Oct 2003 09:43:07 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: On Thu, 16 Oct 2003, Donald Becker wrote: > translate between domains. For instance, you can't take MPI packed data > and > automatically convert it to (useful) XML, > pass it to an object broker system, or > call a non-MPI remote procedure Yes indeedy. And since XML is at heart linked lists (trees) of structs as well, you still can't get around the difficulty of mapping a previously unseen data file containing XMLish into a set of efficiently accessible structs. Which is doable, but is a royal PITA and requires that you maintain DISTINCT (and probably non-portable) images/descriptions of the data structures and then write all this glue code to import and export. So yeah, I have fantasies of ways of encapsulating C header files and a data dictionary in an XMLified datafile and a toolset that at the very least made it "easy" to relink a piece of C code to read in the datafile and just put the data into the associated structs where I could subsequently use them EFFICIENTLY by local or global name. I haven't managed to make this really portable even in my own code, though -- it isn't an easy problem (so difficult that ad hoc workarounds seem the simpler route to take). This really needs a committee or something and a few zillion NSF dollars to resolve, because it is a fairly serious and widespread problem. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 17 09:29:47 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 17 Oct 2003 09:29:47 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <1066188715.3200.120.camel@terra> Message-ID: On 14 Oct 2003, Dean Johnson wrote: > As someone who has done programming environment tools most of his > reasonably long professional life, I must say you have hit the nail on > the head. I have rooted through more than my share of shitty binary > formats in my day, and I can honestly say that I go home happier as a > result of dealing with an XML trace file in my current project. I was > happily working away dealing with only XML, but then it happened. The > demons of my past rose their ugly heads when I decided that it would be > a good thing to get some ELF information outta some files. Being the > industrious guy I am, I went and got ELF docs from Dave Anderson's > stash. Did that help? Nope, not really, as it was mangled 64-bit focused > ELF. Was it documented? Nope, not really. You could look at the elfdump > code to see what that does, so in a backwards way, it was documented. > The alternative was to ferret out the format by bugging enough compiler > geeks until they gave up the secret handshake. The alternative that I > eventually took was to go lay down until the desire to have the ELF > information went away. ;-) And yet Don's points are also very good ones, although I think that is at least partly a matter of designer style. XML isn't, after all, a markup language -- it is a markup language specification. As an interface designer, you can implement tags that are reasonably human readable and well-separated in function or not. His observation that what one would REALLY like is a self-documenting interface, or an interface with its data dictionary included as a header, is very apropos. I also >>think<< that he is correct (if I understood his final point correctly) in saying that someone could sit down and write an XML-compliant "DML" (data markup language) with straightforward and consistent rules for encapsulating data streams. Since those rules would presumably be laid down in the design phase, and since a wise implementation of them would release a link-level library with prebuilt functions for creating a new data file and its embedded data dictionary, writing data out to the file, opening the file, and reading/parsing data in from the file, it would actually reduce the amount of wheel reinventing (and very tedious coding!) that has to be done now while creating/enforcing a fairly rigorous structural organization on the data itself. One has to be very careful not to assume that XML will necessarily make a data file tremendously longer than it likely is now. For short files nobody (for the most part) cares, where by short I mean short enough that file latencies dominate the read time -- using long very descriptive tags is easy in configuration files. For longer data files (which humans cannot in general "read" anyway unless they have a year or so to spare) there is nothing to prevent XMLish of the following sort of very general structure: This is part of the production data of Joe's Orchards. Eat Fruit from Joe's! apples%-10.6fbushels | oranges%-12.5ecrates | price%-10.2fdollars 13.400000 |77.00000e+2 |450.00 589.200000 |102.00000e+8|6667.00 ... The stuff between the tags could even be binary. Note that the data itself isn't individually wrapped and tagged, so this might be a form of XML heresy, but who cares? For a configuration file or a small/short data file containing numbers that humans might want to browse/read without an intermediary software layer, I would say this is a bad thing, but for a 100 MB data file (a few million lines of data) the overhead introduced by adding the XML encapsulation and dictionary is utterly ignorable and the mindless repetition of tags in the datastream itself pointless. Note well that this encapsulation is STILL nearly perfectly human readable, STILL easily machine parseable, and will still be both in twenty years after Joe's Orchard has been cut down and turned into firewood (or would be, if Joe had bothered to tell us a bit more about the database in question in the description). The data can even be "validated", if the associated library has appropriate functions for doing so (which are more or less the data reading functions anyway, with error management). I should note that the philosophy above might be closer to that of e.g. TeX/LaTeX than XML/SGML/MML (as discussed below). I've already done stuff somewhat LIKE this (without the formal data dictionary, because I haven't taken the time to write a general purpose tool for my own specific applications, which is likely a mistake in the long run but in the long run, after all, I'll be dead:-) in wulfstat. The .wulfhosts xml permits a cluster to be entered "all at once" using a format like: g%02d 1 15 which is used to generate the hostname strings required to open connections to hosts e.g. g01, g02, ... g15. Obviously the same trick could be used to feed scanf, or to feed a regex parser. The biggest problem I have with XML as a data description/configuration file base isn't really details like these, as I think they are all design decisions and can be done poorly or done well. It is that on the parsing end, libxml2 DOES all of the above, more or less. It generates on the fly a linked list that mirrors the XML source, and then provides tools and a consistent framework of rules for walking the list to find your data. How else could it do it? The one parser has to read arbitrary markup, and it cannot know what the markup is until opens the file, and it opens/reads the file in one pass, so all it can do is mosey along and generate recursive structs and link them. However, that is NOT how one wants to access the data in code that wants to be efficient. Walking a llist to find a float data entry that has a tag name that matches "a" and an index attribute that matches "32912" is VERY costly compared to accessing a[32912]. At this point, the only solution I've found is to know what the data encapsulation is (easy, since I created it:-), create my own variables and structs to hold it for actual reference in code, open and read in the xml data, and then walk the list with e.g. xpath and extract the data from the list and repack it into my variables and structs. This latter step really sucks. It is very, very tedious (although perfectly straightforward to write the parsing/repacking code (so much so that the libxml guy "apologizes" for the tedium of the parsing code in the xml.org documentation:-). It is this latter step that could be really streamlined by the use of an xmlified data dictionary or even (in the extreme C case) encapsulating the actual header file with the associated variable struct definitions. It is interesting and amusing to compare two different approaches to the same problem in applications where the issue really is "markup" in a sense. I write lots of things using latex, because with latex one can write equations in a straightforward ascii encoding like $1 = \sin^2(\theta) + \cos^2(\theta)$. This input is taken out of an ascii stream by the tex parser, tokenized and translated into characters, and converted into an actual equation layout according to the prescriptions in a (the latex) style file plus any layered modifications I might impose on top of it. [Purists could argue about whether or not latex is a true markup language -- tex/latex are TYPESETTING languages and not really intended to support other functions (such as translating this equation into an internal algebraic form in a computer algebra program such as macsyma or maple). However, even though it probably isn't, because every ENTITY represented in the equation string isn't individually tagged wrt function, it certainly functions like markup at a high level with entries entered inside functional delimiters and presented in a way/style that is associated with the delimiters "independent" of the delimiters themselves.] If one compares this to the same equation wrapped in MML (math markup language, which I don't know well enough to be able to reproduce here) it would likely occupy twenty or thirty lines of markup and be utterly unreadable by humans. At least "normal" humans. Machines, however, just love it, as one can write a parser that can BOTH display the equation AND can create the internal objects that permit its manipulation algebraically and/or numerically. This would be difficult to do with the latex, because who knows what all these components are? Is \theta a constant, a label, a variable? Are \sin and \cos variables, functions, or is \s the variable and in a string (do I mean s*i*n*(theta) where all the objects are variables)? The equation that is HUMAN readable and TYPESETTABLE without ambiguity with a style file and low level definition that recognizes these elements as non-functional symbols of certain size and shape to be assembled according to the following rules is far from adequately described for doing math with it. For all that, one could easily write an XML compliant LML -- "latex markup language" -- a perfectly straightforward translation of the fundamental latex structures into XML form. Some of these could be utterly simple (aside for dealing with special character issues: {\em emphasized text} -> emphasized text \begin{equation}a = b+c\end{equation} -> a = b+c linuxdoc is very nearly this translation, actually, except that it doesn't know how to handle equation content AFAIK. This sort of encapsulation is highly efficient for document creation/typesetting within a specific domain, but less general purpose. The point is .... [the following text that isn't there was omitted in the fond hope that my paypal account will swell, following which I will make a trip to a purveyor of fine beverages.] rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jac67 at georgetown.edu Fri Oct 17 09:41:19 2003 From: jac67 at georgetown.edu (Jess Cannata) Date: Fri, 17 Oct 2003 09:41:19 -0400 Subject: OT: same commands to multiple servers? In-Reply-To: <20031016163625.C11181@mikee.ath.cx> References: <20031016163625.C11181@mikee.ath.cx> Message-ID: <3F8FF17F.3030008@georgetown.edu> Mike Eggleston wrote: >I now have control over many AIX servers and I know there >are some programs that allow you (once configured) to send >the same command to multiple nodes/servers, but do these >commands exist within the AIX environment? > > I'm not sure it will run on AIX, but we use C3 from Oak Ridge National Laboratory on all of our Linux Beowulf clusters, and I really like it. You might want to take a look at it: http://www.csm.ornl.gov/torc/C3/index.html -- Jess Cannata Advanced Research Computing Georgetown University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From czarek at sun1.chem.univ.gda.pl Thu Oct 16 18:49:27 2003 From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski) Date: Fri, 17 Oct 2003 00:49:27 +0200 (CEST) Subject: Pentium4 vs Xeon In-Reply-To: Message-ID: On Tue, 14 Oct 2003, Don Holmgren wrote: > Pricewise (YMMV), cheap desktop P4's can be had very roughly for half > the price of a comparable dual Xeon. This is true if you look at pricewatch, but the quotes I received shown that good P4's is less than half of the price (in my case around 36%) of a comparable dual Xeon. I am talking about comparison of the price of Asus PC-DL Dual Xeon 2.8 GHz 512K 533 FSB with 3 GB DDR333 and two 36GB SATA 10K RPM hardrives against Asus P4P800-VM P4 2.8 GHz 800 FSB with 1.5 GB DDR 400 and one 36GB SATA 10K RPM hardrive. Xeons machines are not very popular and it is hard to get a good price for them at your local shop (in my case Ithaca US, in Poland difference would be even bigger). I am benchmarking this P4 2.8 GHz against dual Opteron 1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+). If you are interested in some numbers I can send benchmarks of Gaussian 03, Gamess, and our own F77 code. czarek ---------------------------------------------------------------------- Dr. Cezary Czaplewski Department of Chemistry Box 431 Baker Lab of Chemistry University of Gdansk Cornell University Sobieskiego 18, 80-952 Gdansk, Poland Ithaca, NY 14853 phone: +48 58 3450-430 phone: (607) 255-0556 fax: +48 58 341-0357 fax: (607) 255-4137 e-mail: czarek at chem.univ.gda.pl e-mail: cc178 at cornell.edu ---------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bropers at lsu.edu Fri Oct 17 03:20:54 2003 From: bropers at lsu.edu (Brian D. Ropers-Huilman) Date: Fri, 17 Oct 2003 20:20:54 +1300 Subject: OT: same commands to multiple servers? In-Reply-To: <20031016163625.C11181@mikee.ath.cx> References: <20031016163625.C11181@mikee.ath.cx> Message-ID: <3F8F9856.60602@lsu.edu> I have administered over 100 AIX boxes for a living for over 5 years now. The tool of choice for me is dsh, which ships as part of the PSSP LPP, a canned implementation of Kerberos 4. We simply install the ssp.clients fileset on each node and use our control workstation as the Kerberos realm master. We add the external nodes by hand. I know that dsh is open sourced now and available at: http://dsh.sourceforge.net/ There are several other cheap (as in Libris) solutions as well: 1) Use rsh (with TCPwrappers) 2) Use ssh with a password-less key 3) Write your own code around either of the above 4) Implement Kerberos, either as an LPP from IBM, or get the source and compile yourself I think you'll find dsh a good starting point though. Mike Eggleston wrote: > I now have control over many AIX servers and I know there > are some programs that allow you (once configured) to send > the same command to multiple nodes/servers, but do these > commands exist within the AIX environment? > > Mike > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Brian D. Ropers-Huilman (225) 578-0461 (V) Systems Administrator AIX (225) 578-6400 (F) Office of Computing Services GNU Linux brian at ropers-huilman.net High Performance Computing .^. http://www.ropers-huilman.net/ Fred Frey Building, Rm. 201, E-1Q /V\ \o/ Louisiana State University (/ \) -- __o / | Baton Rouge, LA 70803-1900 ( ) --- `\<, / `\\, ^^-^^ O/ O / O/ O _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Daniel.Kidger at quadrics.com Fri Oct 17 10:07:07 2003 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Fri, 17 Oct 2003 15:07:07 +0100 Subject: OT: same commands to multiple servers? Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE1FD@stegosaurus.bristol.quadrics.com> Consider also pdsh: http://www.llnl.gov/linux/pdsh/ It is an open source varient of IBM's dsh builds on Linux (IA32/IA64, etc.), AIX et al. Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- > -----Original Message----- > From: Jess Cannata [mailto:jac67 at georgetown.edu] > Sent: 17 October 2003 14:41 > To: Mike Eggleston > Cc: beowulf at beowulf.org > Subject: Re: OT: same commands to multiple servers? > > > Mike Eggleston wrote: > > >I now have control over many AIX servers and I know there > >are some programs that allow you (once configured) to send > >the same command to multiple nodes/servers, but do these > >commands exist within the AIX environment? > > > > > > I'm not sure it will run on AIX, but we use C3 from Oak Ridge > National > Laboratory on all of our Linux Beowulf clusters, and I really > like it. > You might want to take a look at it: > > http://www.csm.ornl.gov/torc/C3/index.html > > -- > Jess Cannata > Advanced Research Computing > Georgetown University > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eccf at super.unam.mx Fri Oct 17 12:42:15 2003 From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores) Date: Fri, 17 Oct 2003 10:42:15 -0600 (CST) Subject: RLX? In-Reply-To: <200310170846.h9H8kbA29081@NewBlue.scyld.com> Message-ID: Have you ever try or test RLX server for HPC? What is their performance? cafe _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Fri Oct 17 14:05:49 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Fri, 17 Oct 2003 11:05:49 -0700 Subject: POVray, beowulf, etc. Message-ID: <5.2.0.9.2.20031017105940.03129888@mailhost4.jpl.nasa.gov> I'm aware of some MPI-aware POVray stuff, but is there anything out there that can facilitate something where you want to render a sequence of frames (using, e.g., POVray), one frame to a processor, then gather the images back to a head node for display, in quasi-real time. For instance, say you had a image that takes 1 second to render, and you had 30 processors free to do the rendering. Assuming you set everything up ahead of time, it should be possible to set all the processors spinning, and feeding the rendered images back to a central point where they can be displayed as an animation at 30 fps (with a latency of 1 second) Obviously, the other approach is to have each processor render a part of the image, and assemble them all, but it seems that this might actually be slower overall, because you've got the image assembling time added. I'm looking for a way to do some real-time visualization of modeling results as opposed to a batch oriented "render farm", so it's the pipeline to gather the rendered images from the nodes to the display node that I'm interested in. I suppose one could write a little MPI program that gathers the images up as bitmaps and feeds them to a window, but, if someone has already solved this in a reasonably facile and elegant way, why not use it. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From johnb at quadrics.com Fri Oct 17 12:01:12 2003 From: johnb at quadrics.com (John Brookes) Date: Fri, 17 Oct 2003 17:01:12 +0100 Subject: OT: same commands to multiple servers? Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E328@stegosaurus.bristol.quadrics.com> How are the startup times of IBM's dsh these days? I seem to remember that it was somewhat on the slow side on big machines. Many moons have passed since I was last on an AIX machine, though, so I assume the situation's improved drastically. Cheers, John Brookes Quadrics > -----Original Message----- > From: Brian D. Ropers-Huilman [mailto:bropers at lsu.edu] > Sent: 17 October 2003 08:21 > To: Mike Eggleston > Cc: beowulf at beowulf.org > Subject: Re: OT: same commands to multiple servers? > > > I have administered over 100 AIX boxes for a living for over > 5 years now. The > tool of choice for me is dsh, which ships as part of the PSSP > LPP, a canned > implementation of Kerberos 4. We simply install the > ssp.clients fileset on > each node and use our control workstation as the Kerberos > realm master. We add > the external nodes by hand. > > I know that dsh is open sourced now and available at: > > http://dsh.sourceforge.net/ > > There are several other cheap (as in Libris) solutions as well: > > 1) Use rsh (with TCPwrappers) > 2) Use ssh with a password-less key > 3) Write your own code around either of the above > 4) Implement Kerberos, either as an LPP from IBM, or get the > source and > compile yourself > > I think you'll find dsh a good starting point though. > > Mike Eggleston wrote: > > > I now have control over many AIX servers and I know there > > are some programs that allow you (once configured) to send > > the same command to multiple nodes/servers, but do these > > commands exist within the AIX environment? > > > > Mike > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- > Brian D. Ropers-Huilman (225) 578-0461 (V) > Systems Administrator AIX (225) 578-6400 (F) > Office of Computing Services GNU Linux > brian at ropers-huilman.net > High Performance Computing .^. http://www.ropers-huilman.net/ Fred Frey Building, Rm. 201, E-1Q /V\ \o/ Louisiana State University (/ \) -- __o / | Baton Rouge, LA 70803-1900 ( ) --- `\<, / `\\, ^^-^^ O/ O / O/ O _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Oct 17 14:38:41 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 17 Oct 2003 14:38:41 -0400 (EDT) Subject: RLX? In-Reply-To: Message-ID: On Fri, 17 Oct 2003, Eduardo Cesar Cabrera Flores wrote: > Have you ever try or test RLX server for HPC? Yes, we had access to their earliest machines and I was there at the NYC announcement. > What is their performance? It depends on the generation. The first generation was great at what it was designed to do: pump out data, such as static web pages, from memory to two 100Mbps Ethernet ports per blade. It used Transmeta chips, 2.5" laptop drives and fans only on the chassis to fit 24 blades in 3U. The blades didn't do well at computational tasks or disk I/O. A third Ethernet port on each blade was connected to an internal repeater. They could only PXE boot using that port, making a flow-controlled boot server important. The second generation switched to Intel ULV (Ultra Low Voltage) processors in the 1GHz range. This approximately doubled the speed over Transmeta chips, especially with floating point. But ULV CPUs are designed for laptops, and the interconnect was no faster. Thus this still was not a computational cluster box. The current generation blades are much faster, with full speed (and heat) CPUs and chipset, fast interconnect and good I/O potential. But lets look at the big picture for HPC cluster packaging: --> Beowulf clusters have crossed the density threshold <-- This happened about two years ago. At the start of the Beowulf project a legitimate problem with clusters was the low physical density. This didn't matter in some installations, as much larger machines were retired leaving plenty of empty space, but it was a large (pun intended) issue for general use. As we evolved to 1U rack-mount servers, the situation changed. Starting with the API CS-20, Beowulf cluster hardware met and even exceeded the compute/physical density of contemporary air-cooled Crays. Since standard 1U dual processor machines can now exceed the air cooled thermal density supported by an average room, selecting non-standard packaging (blades, back-to-back mounting, or vertical motherboard chassis) must be motivated by some other consideration that justifies the lock-in and higher cost. At least with blade servers there are a few opportunities: Low-latency backplane communication Easier connections to shared storage Hot-swap capability to add nodes or replace failed hardware -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Fri Oct 17 14:37:22 2003 From: angel at wolf.com (Angel Rivera) Date: Fri, 17 Oct 2003 18:37:22 GMT Subject: RLX? In-Reply-To: References: Message-ID: <20031017183722.754.qmail@houston.wolf.com> Eduardo Cesar Cabrera Flores writes: > > Have you ever try or test RLX server for HPC? > What is their performance? > We have not but will be getting a couple of bricks for testing soon. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Fri Oct 17 14:37:22 2003 From: angel at wolf.com (Angel Rivera) Date: Fri, 17 Oct 2003 18:37:22 GMT Subject: RLX? In-Reply-To: References: Message-ID: <20031017183722.754.qmail@houston.wolf.com> Eduardo Cesar Cabrera Flores writes: > > Have you ever try or test RLX server for HPC? > What is their performance? > We have not but will be getting a couple of bricks for testing soon. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Oct 17 15:57:24 2003 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 17 Oct 2003 21:57:24 +0200 (CEST) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: I just saw YAML announced on www.ntk.net http://www.yaml.org YAML (rhymes with camel) is a sraightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimised for serialization , configuration settings, log files, Internet messaging ad filtering. There are YAML writers and parsers fo Perl, Python, Java, Ruby and C. Sounds like it might be good for the purposes we are discussing! BTW, has anyon experimented with Beep for messaging system status, environment variables, logging etc? http://www.beepcore.org _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From srihari at mpi-softtech.com Fri Oct 17 15:34:42 2003 From: srihari at mpi-softtech.com (Srihari Angaluri) Date: Fri, 17 Oct 2003 15:34:42 -0400 Subject: POVray, beowulf, etc. References: <5.2.0.9.2.20031017105940.03129888@mailhost4.jpl.nasa.gov> Message-ID: <3F904452.4090406@mpi-softtech.com> Jim, Not sure if you came across the parallel ray tracer application written using MPI. This does real-time rendering. http://jedi.ks.uiuc.edu/~johns/raytracer/ Jim Lux wrote: > I'm aware of some MPI-aware POVray stuff, but is there anything out > there that can facilitate something where you want to render a sequence > of frames (using, e.g., POVray), one frame to a processor, then gather > the images back to a head node for display, in quasi-real time. > > For instance, say you had a image that takes 1 second to render, and you > had 30 processors free to do the rendering. Assuming you set everything > up ahead of time, it should be possible to set all the processors > spinning, and feeding the rendered images back to a central point where > they can be displayed as an animation at 30 fps (with a latency of 1 > second) > > Obviously, the other approach is to have each processor render a part of > the image, and assemble them all, but it seems that this might actually > be slower overall, because you've got the image assembling time added. > > I'm looking for a way to do some real-time visualization of modeling > results as opposed to a batch oriented "render farm", so it's the > pipeline to gather the rendered images from the nodes to the display > node that I'm interested in. I suppose one could write a little MPI > program that gathers the images up as bitmaps and feeds them to a > window, but, if someone has already solved this in a reasonably facile > and elegant way, why not use it. > > > James Lux, P.E. > Spacecraft Telecommunications Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Oct 17 16:19:15 2003 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 17 Oct 2003 22:19:15 +0200 (CEST) Subject: Also on NTK Message-ID: Sorry if this is off topic too far. Also on NTK, an implementation of zeroconf for Linux, Windows, BSD http://www.swampwolf.com/products/howl/GettingStarted.html Anyone care to speculate on uses for zeroconf in big clusters? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathog at mendel.bio.caltech.edu Fri Oct 17 16:47:08 2003 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Fri, 17 Oct 2003 13:47:08 -0700 Subject: When is cooling air cool enough? Message-ID: Most computer rooms shuttle the air back and forth between the computers and the A/C. I'm wondering if one could not construct a less expensive facility (less power running the A/C which is rarely on, smaller A/C units) if the computer room was a lot more like a wind tunnel: ambient air in (after filtering out any dust or rain), pass it through the computers, and then blow it out the other side of the building. Note the room wouldn't be wide open like a normal computer room. Instead essentially each rack and other largish computer unit would sit in its own separate air flow, so that hot air from one wouldn't heat the next. The question is, how hot can the cooling air be and still keep the computers happy? The answer will determine how big an A/C unit is needed to handle cooling the intake air for those times when it exceeds this upper limit. I'm guessing that so long as a lot of air is moving through the computers most would be ok in a sustained 30C (86F) flow. Remember, this isn't 30C in dead air, it's 30C with high pressure on the intake side of the computer and low pressure on the outlet side, so that the generated heat is rapidly moved out of the computer and away. (But not so much flow as to blow cards out of their sockets!) Somewhere between 30C and 40C one might expect poorly ventilated CPUs and disks to begin to have problems. Above 40C seems a tad too warm. At that temperature it's going to be pretty uncomfortable for the operators too. Anybody have a good estimate for what this upper limit is. For instance, from a computer room with an A/C that failed slowly? There's clearly a lower temperature limit too. However on cold days opening a feedback duct from the outlet back into the intake should do the trick. In really cold climates the intake duct might be closed entirely - when it's 20 below outside. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Fri Oct 17 19:45:24 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Fri, 17 Oct 2003 16:45:24 -0700 Subject: When is cooling air cool enough? In-Reply-To: Message-ID: <5.2.0.9.2.20031017162321.031343c0@mailhost4.jpl.nasa.gov> For component life, colder is better (10 degrees is factor of 2 life/reliability), and the temperature rise inside the box is probably more than you think. You also have some more subtle tradeoffs to address. You don't need as much colder air as warmer air to remove some quantity of heat, and a significant energy cost is pushing the air around (especially since the work involved in running the fan winds up heating the air being moved). This is a fairly standard HVAC design problem. The additional cost to cool the room to, say, 15C instead of 20C is fairly low, if the room is insulated, and there's a lot of recirculation (which is typical for this kind of thing). It's not like you're cooling the room repeatedly after warming up. Once you've reached equilibrium, cooling the mass of equipment down, you're moving the same number of joules of heat either way and the refrigeration COP doesn't change much over that small a temperature range. The heat leakage through the walls is fairly small, compared to the heat dissipated in the equipment. If you were cooling something that doesn't generate heat itself (i.e. a wine cellar or freezer), then the temperature does affect the power consumed. This all said, I worked for a while on a fairly complex electronic system installed at a test facility on a ridge on the island of Kauai, and they had no airconditioning. They had big fans and thermostatically controlled louvers, and could show that statistically, the air temperature never went high enough to cause a problem. I seem to recall something like the calculations showed we'd have to shut down for environmental reasons no more than once every 5 years. Humidity is an issue also, though. At 01:47 PM 10/17/2003 -0700, David Mathog wrote: >Most computer rooms shuttle the air back and forth >between the computers and the A/C. I'm >wondering if one could not construct a less expensive >facility (less power running the A/C which is rarely >on, smaller A/C units) if the computer room was a >lot more like a wind tunnel: ambient air in (after >filtering out any dust or rain), >pass it through the computers, and then blow it out >the other side of the building. Note the room >wouldn't be wide open like a normal computer room. >Instead essentially each rack and other largish >computer unit would sit in its own separate air flow, >so that hot air from one wouldn't heat the next. > >The question is, how hot can the cooling air be and >still keep the computers happy? > >The answer will determine how big an A/C unit is >needed to handle cooling the intake air for those >times when it exceeds this upper limit. > >I'm guessing that so long as a lot of air is moving through >the computers most would be ok in a sustained 30C (86F) flow. >Remember, this isn't 30C in dead air, it's 30C with high >pressure on the intake side of the computer and low >pressure on the outlet side, so that the generated heat >is rapidly moved out of the computer and away. (But not >so much flow as to blow cards out of their sockets!) >Somewhere between 30C and 40C one might expect poorly >ventilated CPUs and disks to begin to have problems. Above >40C seems a tad too warm. At that temperature it's going >to be pretty uncomfortable for the operators too. > >Anybody have a good estimate for what this upper limit is. >For instance, from a computer room with an A/C that failed >slowly? > >There's clearly a lower temperature limit too. However on cold >days opening a feedback duct from the outlet back into the intake >should do the trick. In really cold climates the intake >duct might be closed entirely - when it's 20 below outside. > >Thanks, > >David Mathog >mathog at caltech.edu >Manager, Sequence Analysis Facility, Biology Division, Caltech >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Oct 17 21:41:49 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 17 Oct 2003 18:41:49 -0700 Subject: RLX? In-Reply-To: References: <200310170846.h9H8kbA29081@NewBlue.scyld.com> Message-ID: <20031018014149.GB3774@greglaptop.PEATEC.COM> On Fri, Oct 17, 2003 at 10:42:15AM -0600, Eduardo Cesar Cabrera Flores wrote: > Have you ever try or test RLX server for HPC? > What is their performance? .. what's their price/performance? That decides against them for most of us el-cheapo HPC customers. RLX has some nice features for enterprise computing that may justify a higher cost for enterprises, but... -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 17 21:11:39 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 17 Oct 2003 21:11:39 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: On Fri, 17 Oct 2003, John Hearns wrote: > I just saw YAML announced on www.ntk.net > > http://www.yaml.org yaml.org doesn't resolve for me in nameservice (yet), but whoa, dude, rippin' ntk site. That's one very seriously geeked news site. rgb > YAML (rhymes with camel) is a sraightforward machine parsable > data serialization format designed for human readability and > interaction with scripting languages such as Perl and Python. > YAML is optimised for serialization , configuration settings, > log files, Internet messaging ad filtering. > > There are YAML writers and parsers fo Perl, Python, Java, Ruby and C. > > > Sounds like it might be good for the purposes we are discussing! > > > > BTW, has anyon experimented with Beep for messaging system status, > environment variables, logging etc? > http://www.beepcore.org > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Oct 17 21:39:57 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 17 Oct 2003 18:39:57 -0700 Subject: A Petaflop machine in 20 racks? In-Reply-To: References: Message-ID: <20031018013957.GA3774@greglaptop.PEATEC.COM> On Thu, Oct 16, 2003 at 04:15:08PM -0400, Bryce Bockman wrote: > Hi all, > > Check out this article over at wired: > > http://www.wired.com/news/technology/0,1282,60791,00.html > > It makes all sorts of wild claims, but what do you guys think? I think it's the Return of the Array Processor. There's very little new in computing these days -- and it has the usual flaws of APs: low bandwidth communication to the host. So if you have a problem that actually fits in the limited memory, and doesn't need to communicate with anyone else very often, it may be a win for you. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 17 21:21:42 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 17 Oct 2003 21:21:42 -0400 (EDT) Subject: When is cooling air cool enough? In-Reply-To: Message-ID: On Fri, 17 Oct 2003, David Mathog wrote: > Most computer rooms shuttle the air back and forth > between the computers and the A/C. I'm > wondering if one could not construct a less expensive > facility (less power running the A/C which is rarely > on, smaller A/C units) if the computer room was a > lot more like a wind tunnel: ambient air in (after > filtering out any dust or rain), > pass it through the computers, and then blow it out > the other side of the building. Note the room > wouldn't be wide open like a normal computer room. > Instead essentially each rack and other largish > computer unit would sit in its own separate air flow, > so that hot air from one wouldn't heat the next. > > The question is, how hot can the cooling air be and > still keep the computers happy? I personally have strong feelings about this, although there probably are sites out there with hard data and statistics and engineering recommendations. 70F or cooler would be my recommendation. In fact, cooler would be my recommendation -- 60F would be better still. I think the number is every 10F costs roughly a year of component life in the 60-80F ranges and even brief periods where the temperature at the intake gets significantly above 80F makes it uncomfortably likely that some component is damaged enough to fail within a year. > The answer will determine how big an A/C unit is > needed to handle cooling the intake air for those > times when it exceeds this upper limit. It costs roughly $1/watt/year to feed AND cool a computer, order of $100-150/cpu/year, with about 1/4 of that for cooling per se. The computer itself costs anywhere from $500 lowball to a couple of thousand per CPU (more if you have an expensive network). The HUMAN cost of screwing around with broken hardware can be crushing, and high temperatures are an open invitation for hardware to break a lot more often (and it breaks all too often at LOW temperatures). It just isn't worth it. > > I'm guessing that so long as a lot of air is moving through > the computers most would be ok in a sustained 30C (86F) flow. > Remember, this isn't 30C in dead air, it's 30C with high > pressure on the intake side of the computer and low > pressure on the outlet side, so that the generated heat > is rapidly moved out of the computer and away. (But not > so much flow as to blow cards out of their sockets!) > Somewhere between 30C and 40C one might expect poorly > ventilated CPUs and disks to begin to have problems. Above > 40C seems a tad too warm. At that temperature it's going > to be pretty uncomfortable for the operators too. So an 86F wind keeps YOU cool in the summer time? Only because you're damp on the outside and evaporating sweat cools you. Think 86F humid, and you're only at 98F at core. The CPU is considerably hotter, and is cooled by the temperature DIFFERENCE. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Fri Oct 17 21:41:25 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Fri, 17 Oct 2003 21:41:25 -0400 Subject: A Petaflop machine in 20 racks? In-Reply-To: <200310170846.h9H8kGA29022@NewBlue.scyld.com> References: <200310170846.h9H8kGA29022@NewBlue.scyld.com> Message-ID: <20031018014123.GB4857@piskorski.com> > > http://www.wired.com/news/technology/0,1282,60791,00.html > From: "Jim Lux" > Subject: Re: A Petaflop machine in 20 racks? > Date: Thu, 16 Oct 2003 16:46:19 -0700 > > Browsing through ClearSpeed's fairly "content thin" website, one turns up > the following: > http://www.clearspeed.com/downloads/overview_cs301.pdf > It also doesn't say whether the architecture is, for instance, SIMD. It > could well be a systolic array, which would be very well suited to cranking > out FFTs or other similar things, but probably not so hot for general > purpose crunching. If it is SIMD, this sounds rather reminiscent of the streaming supercomputer designs people hope to build using SIMD commodity GPU (Graphics Processing Unit) chips, and Peter Schroeder's 2002 "Hacking the GPU" class at CalTech. I don't know much of anything about it, but these older links made for some interesting reading: http://www.cs.caltech.edu/courses/cs101.3/ http://www.cs.caltech.edu/cspeople/faculty/schroder_p.html http://merrimac.stanford.edu/whitepaper.pdf http://merrimac.stanford.edu/resources.html http://graphics.stanford.edu/~hanrahan/talks/why/ I am really not clear how any of that relates to vector co-processor add-on cards like the older design mentioned here (I think FPGA based): http://aggregate.org/ECard/ nor to newer MIMD to SIMD compiling technology (and parallel "nanoprocessors"!) like this: http://aggregate.org/KYARCH/ -- Andrew Piskorski http://www.piskorski.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Fri Oct 17 21:41:25 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Fri, 17 Oct 2003 21:41:25 -0400 Subject: A Petaflop machine in 20 racks? In-Reply-To: <200310170846.h9H8kGA29022@NewBlue.scyld.com> References: <200310170846.h9H8kGA29022@NewBlue.scyld.com> Message-ID: <20031018014123.GB4857@piskorski.com> > > http://www.wired.com/news/technology/0,1282,60791,00.html > From: "Jim Lux" > Subject: Re: A Petaflop machine in 20 racks? > Date: Thu, 16 Oct 2003 16:46:19 -0700 > > Browsing through ClearSpeed's fairly "content thin" website, one turns up > the following: > http://www.clearspeed.com/downloads/overview_cs301.pdf > It also doesn't say whether the architecture is, for instance, SIMD. It > could well be a systolic array, which would be very well suited to cranking > out FFTs or other similar things, but probably not so hot for general > purpose crunching. If it is SIMD, this sounds rather reminiscent of the streaming supercomputer designs people hope to build using SIMD commodity GPU (Graphics Processing Unit) chips, and Peter Schroeder's 2002 "Hacking the GPU" class at CalTech. I don't know much of anything about it, but these older links made for some interesting reading: http://www.cs.caltech.edu/courses/cs101.3/ http://www.cs.caltech.edu/cspeople/faculty/schroder_p.html http://merrimac.stanford.edu/whitepaper.pdf http://merrimac.stanford.edu/resources.html http://graphics.stanford.edu/~hanrahan/talks/why/ I am really not clear how any of that relates to vector co-processor add-on cards like the older design mentioned here (I think FPGA based): http://aggregate.org/ECard/ nor to newer MIMD to SIMD compiling technology (and parallel "nanoprocessors"!) like this: http://aggregate.org/KYARCH/ -- Andrew Piskorski http://www.piskorski.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Fri Oct 17 23:15:21 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Fri, 17 Oct 2003 23:15:21 -0400 Subject: When is cooling air cool enough? In-Reply-To: <200310180131.h9I1VYA16665@NewBlue.scyld.com> References: <200310180131.h9I1VYA16665@NewBlue.scyld.com> Message-ID: <20031018031519.GB19525@piskorski.com> > From: "David Mathog" > Date: Fri, 17 Oct 2003 13:47:08 -0700 > if the computer room was a lot more like a wind tunnel: ambient air > in (after filtering out any dust or rain), pass it through the > computers, and then blow it out the other side of the building. > The question is, how hot can the cooling air be and still keep the > computers happy? That sounds like a pretty neat undergraduate heat transfer homework problem. No seriously, since you're at a university, if you want a rough estimate go over to the Chemical Engineering department and borrow their heat transfer textbook, or better, borrow somebody to set up the problem and calculate it for you. That could work, although what assumptions to make might be sticky. It's been too many years since I've forgotten all that, so perhaps fortunately, I don't quite remember where my old undergrad heat transfer book is right now anyway. :) > I'm guessing that so long as a lot of air is moving through > the computers most would be ok in a sustained 30C (86F) flow. But I bet the other respondants were right when they said that's probably too hot... -- Andrew Piskorski http://www.piskorski.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Fri Oct 17 23:15:21 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Fri, 17 Oct 2003 23:15:21 -0400 Subject: When is cooling air cool enough? In-Reply-To: <200310180131.h9I1VYA16665@NewBlue.scyld.com> References: <200310180131.h9I1VYA16665@NewBlue.scyld.com> Message-ID: <20031018031519.GB19525@piskorski.com> > From: "David Mathog" > Date: Fri, 17 Oct 2003 13:47:08 -0700 > if the computer room was a lot more like a wind tunnel: ambient air > in (after filtering out any dust or rain), pass it through the > computers, and then blow it out the other side of the building. > The question is, how hot can the cooling air be and still keep the > computers happy? That sounds like a pretty neat undergraduate heat transfer homework problem. No seriously, since you're at a university, if you want a rough estimate go over to the Chemical Engineering department and borrow their heat transfer textbook, or better, borrow somebody to set up the problem and calculate it for you. That could work, although what assumptions to make might be sticky. It's been too many years since I've forgotten all that, so perhaps fortunately, I don't quite remember where my old undergrad heat transfer book is right now anyway. :) > I'm guessing that so long as a lot of air is moving through > the computers most would be ok in a sustained 30C (86F) flow. But I bet the other respondants were right when they said that's probably too hot... -- Andrew Piskorski http://www.piskorski.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From czarek at sun1.chem.univ.gda.pl Sat Oct 18 00:52:59 2003 From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski) Date: Sat, 18 Oct 2003 06:52:59 +0200 (CEST) Subject: some ab initio benchmarks In-Reply-To: Message-ID: Hi, quite recently I did some benchmarks of P4 2.8 GHz against dual Opteron 1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+) and some older machines. For comparison I am including benchmarks of dual P3 512 1200MHz I got from Wayne Fisher, The University of Texas at Dallas. On Opteron I have also tried PC GAMESS program which I received from Alex Granovsky. 1. Single point HF energy calculation for Ace-Gly-NMe in 6-31G* (155 basis functions) g03: mem=100MW TEST 6-31G* nosym scf=(tight,incore) gamess: MEMORY=20000000 DIRSCF=.TRUE. [sec] itek g03 Itanium2 1400MHz efc 7.1 26.5 prototype g03 p4 512 2800MHz pgi4 41.1 dahlia g03 Opteron 1400MHz pgi4 49.5 m211 g03 k7mp 2133MHz(MP 2600+) pgi4 83.3 Wayne g03 p3 512 1200MHz pgi4 85 m211 gamess k7mp 2133MHz(MP 2600+) ifc7.1 92.5 prototype gamess p4 512 2800MHz ifc7.1 106.5 dahlia PCgamess Opteron 1400MHz 112.9 dahlia gamess Opteron 1400MHz ifc7.1 128.5 itek gamess Itanium2 1400MHz efc 7.1 150.8 2. Single point MP2 energy calculation for Ace-Gly-NMe in 6-31G* (155 basis functions) g03: mem=100mw rwf=a,250MW,b,250MW,c,250MW TEST rmp2/6-31G* nosym scf=(tight,incore) MaxDisk=750MW gamess: MEMORY=50000000 DIRSCF=.TRUE. itek g03 Itanium2 1400MHz efc 7.1 51.7 prototype g03 p4 512 2800MHz pgi4 111.0 dahlia g03 Opteron 1400MHz pgi4 150.7 m211 gamess k7mp 2133MHz(MP 2600+) ifc7.1 154.2 prototype gamess p4 512 2800MHz ifc7.1 157.0 dahlia PCgamess Opteron 1400MHz 163.8 dahlia gamess Opteron 1400MHz ifc7.1 191.0 itek gamess Itanium2 1400MHz efc 7.1 194.8 m211 g03 k7mp 2133MHz(MP 2600+) pgi4 251.6 Wayne g03 p3 512 1200MHz pgi4 303 3. Manfreds Gaussian Benchmark http://www.chemie.uni-dortmund.de/groups/ocb/projekte/mg98b.html 243 basis functions 399 primitive gaussians RHF/3-21G* Freq [sec] itek g03 Itanium2 1400MHz efc 7.1 2843 prototype g03 p4 512 2800MHz pgi 4 8084 dahlia g03 Opteron 1400MHz pgi 4 9332 m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 10289 Wayne g03 p3 512 1200MHz pgi 4 12920 galera g03 p3xenon 700MHz pgi 3 19317 m001 g03 p3 650MHz pgi 4 22824 4. test397.com from gaussian03 882 basis functions, 1440 primitive gaussians rb3lyp/3-21g force test scf=novaracc [sec] itek g03 Itanium2 1400MHz efc 7.1 6733 prototype g03 p4 512 2800MHz pgi 4 12980 dahlia g03 Opteron 1400MHz pgi 4 17879 m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 20521 Wayne g03 p3 512 1200MHz pgi 4 24521 galera g03 p3xenon 700MHz pgi 3 39353 5. Gaussian calculations of NMR chemical shifts for GlyGlyAlaAla 207 basis functions, 339 primitive gaussians %MEM=800MB B3LYP/GEN NMR [sec] itek g03 Itanium2 1400MHz efc 7.1 275 prototype g03 p4 512 2800MHz pgi 4 614 dahlia g03 Opteron 1400MHz pgi 4 849 m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 948 Wayne g03 p3 512 1200MHz pgi 4 1134 some details: g03 is GAUSSIAN 03 rev. B04 with gaussian blas compiled with 32-bit pgi4.0 gamess is VERSION 6 SEP 2001 (R4) compiled with 32-bit ifc 7.1, for P4 I have used additional options -tpp7 -axKW Opteron (dahlia) had 64bit GinGin64 Linux and I had to use static 32-bit binaries. It should have SuSE Linux Enterprise soon and I will repeat tests using PGI 5.0 64-bit compiler when it will be ready. Itanium2 (itek) uses gamess VERSION = 14 JAN 2003 (R3) compiled with 64-bit efc and GAUSSIAN 03 rev. B04 with mkl60 compiled with 64-bit efc 7.1 P3xenon (galera) uses gamess VERSION = 6 SEP 2001 (R4) compiled with ifc 6.0 and GAUSSIAN 03 rev B.01 with gaussian blas compiled with pgi 3.3 czarek ---------------------------------------------------------------------- Dr. Cezary Czaplewski Department of Chemistry Box 431 Baker Lab of Chemistry University of Gdansk Cornell University Sobieskiego 18, 80-952 Gdansk, Poland Ithaca, NY 14853 phone: +48 58 3450-430 phone: (607) 255-0556 fax: +48 58 341-0357 fax: (607) 255-4137 e-mail: czarek at chem.univ.gda.pl e-mail: cc178 at cornell.edu ---------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Oct 19 10:39:36 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sun, 19 Oct 2003 22:39:36 +0800 (CST) Subject: some ab initio benchmarks In-Reply-To: Message-ID: <20031019143936.29602.qmail@web16807.mail.tpe.yahoo.com> I have 2 pts: 1. The compilers used across different platforms were not the same, why not use the Intel compiler for the P4 as well? 2. What is the working set of the benchmark? If the benchmark fit in the 6MB on-chip L3 of the Itanium2, it is very likely to perform very well. Another benchmark that shows the G5 wins the large memory case, loses small/medium cases, while the Itanium2 loses most of its advantages when the working set does not fit the L3: http://www.xlr8yourmac.com/G5/G5_fluid_dynamics_bench/G5_fluid_dynamics_bench.html Andrew. --- Cezary Czaplewski ????> > Hi, > > quite recently I did some benchmarks of P4 2.8 GHz > against dual Opteron > 1400MHz, dual Itanium2 1400MHz and dual k7mp > 2133MHz(MP 2600+) and some > older machines. For comparison I am including > benchmarks of dual P3 512 > 1200MHz I got from Wayne Fisher, The University of > Texas at Dallas. On > Opteron I have also tried PC GAMESS program which I > received from Alex > Granovsky. > > > 1. Single point HF energy calculation for > Ace-Gly-NMe in 6-31G* > (155 basis functions) > > g03: mem=100MW TEST 6-31G* nosym scf=(tight,incore) > gamess: MEMORY=20000000 DIRSCF=.TRUE. [sec] > > itek g03 Itanium2 1400MHz efc 7.1 > 26.5 > prototype g03 p4 512 2800MHz pgi4 > 41.1 > dahlia g03 Opteron 1400MHz pgi4 > 49.5 > m211 g03 k7mp 2133MHz(MP 2600+) pgi4 > 83.3 > Wayne g03 p3 512 1200MHz pgi4 > 85 > m211 gamess k7mp 2133MHz(MP 2600+) ifc7.1 > 92.5 > prototype gamess p4 512 2800MHz ifc7.1 > 106.5 > dahlia PCgamess Opteron 1400MHz > 112.9 > dahlia gamess Opteron 1400MHz ifc7.1 > 128.5 > itek gamess Itanium2 1400MHz efc 7.1 > 150.8 > > 2. Single point MP2 energy calculation for > Ace-Gly-NMe in 6-31G* > (155 basis functions) > > g03: mem=100mw rwf=a,250MW,b,250MW,c,250MW TEST > rmp2/6-31G* nosym > scf=(tight,incore) > MaxDisk=750MW > gamess: MEMORY=50000000 DIRSCF=.TRUE. > > itek g03 Itanium2 1400MHz efc > 7.1 51.7 > prototype g03 p4 512 2800MHz pgi4 > 111.0 > dahlia g03 Opteron 1400MHz pgi4 > 150.7 > m211 gamess k7mp 2133MHz(MP 2600+) > ifc7.1 154.2 > prototype gamess p4 512 2800MHz > ifc7.1 157.0 > dahlia PCgamess Opteron 1400MHz 163.8 > dahlia gamess Opteron 1400MHz > ifc7.1 191.0 > itek gamess Itanium2 1400MHz efc > 7.1 194.8 > m211 g03 k7mp 2133MHz(MP 2600+) pgi4 > 251.6 > Wayne g03 p3 512 1200MHz pgi4 > 303 > > 3. Manfreds Gaussian Benchmark > http://www.chemie.uni-dortmund.de/groups/ocb/projekte/mg98b.html > > 243 basis functions 399 primitive gaussians > RHF/3-21G* Freq > > [sec] > > itek g03 Itanium2 1400MHz efc 7.1 > 2843 > prototype g03 p4 512 2800MHz pgi 4 > 8084 > dahlia g03 Opteron 1400MHz pgi 4 > 9332 > m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 > 10289 > Wayne g03 p3 512 1200MHz pgi 4 > 12920 > galera g03 p3xenon 700MHz pgi 3 > 19317 > m001 g03 p3 650MHz pgi 4 > 22824 > > 4. test397.com from gaussian03 > > 882 basis functions, 1440 primitive gaussians > rb3lyp/3-21g force test scf=novaracc > > [sec] > > itek g03 Itanium2 1400MHz efc 7.1 > 6733 > prototype g03 p4 512 2800MHz pgi 4 > 12980 > dahlia g03 Opteron 1400MHz pgi 4 > 17879 > m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 > 20521 > Wayne g03 p3 512 1200MHz pgi 4 > 24521 > galera g03 p3xenon 700MHz pgi 3 > 39353 > > 5. Gaussian calculations of NMR chemical shifts for > GlyGlyAlaAla > > 207 basis functions, 339 primitive gaussians > %MEM=800MB > B3LYP/GEN NMR > [sec] > > itek g03 Itanium2 1400MHz efc 7.1 > 275 > prototype g03 p4 512 2800MHz pgi 4 > 614 > dahlia g03 Opteron 1400MHz pgi 4 > 849 > m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 > 948 > Wayne g03 p3 512 1200MHz pgi 4 > 1134 > > some details: > > g03 is GAUSSIAN 03 rev. B04 with gaussian blas > compiled with 32-bit pgi4.0 > > gamess is VERSION 6 SEP 2001 (R4) compiled with > 32-bit ifc 7.1, for P4 I > have used additional options -tpp7 -axKW > > Opteron (dahlia) had 64bit GinGin64 Linux and I had > to use static 32-bit > binaries. It should have SuSE Linux Enterprise soon > and I will repeat > tests using PGI 5.0 64-bit compiler when it will be > ready. > > Itanium2 (itek) uses gamess VERSION = 14 JAN 2003 > (R3) compiled with > 64-bit efc and GAUSSIAN 03 rev. B04 with mkl60 > compiled with 64-bit efc > 7.1 > > P3xenon (galera) uses gamess VERSION = 6 SEP 2001 > (R4) compiled with ifc > 6.0 and GAUSSIAN 03 rev B.01 with gaussian blas > compiled with pgi 3.3 > > > czarek > > ---------------------------------------------------------------------- > Dr. Cezary Czaplewski > Department of Chemistry Box 431 > Baker Lab of Chemistry > University of Gdansk Cornell > University > Sobieskiego 18, 80-952 Gdansk, Poland Ithaca, NY > 14853 > phone: +48 58 3450-430 phone: > (607) 255-0556 > fax: +48 58 341-0357 fax: (607) > 255-4137 > e-mail: czarek at chem.univ.gda.pl e-mail: > cc178 at cornell.edu > ---------------------------------------------------------------------- > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Oct 19 11:37:14 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sun, 19 Oct 2003 23:37:14 +0800 (CST) Subject: Long lived OpenPBS bug fixed! Message-ID: <20031019153714.3905.qmail@web16808.mail.tpe.yahoo.com> All versions of OpenPBS have this problem: the scheduler uses blocking sockets to contact the nodes, and if a node is dead, the scheduler hangs for several minutes, and all user commands will hang (no so good!). Scalable PBS finally fixed this problem: "... In local testing, we are able to issue a 'kill -STOP' on one node or even all nodes and the pbs_server daemon continues to be highly responsive to user commands, scheduler queries, and job submissions." http://www.supercluster.org/pipermail/scalablepbsusers/2003-October/000162.html *Also*, don't miss the Supercluster Newsletter, which talked about the next generation Maui scheduler called "Moab": http://www.supercluster.org/pipermail/scalablepbsusers/2003-October/000132.html Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmpc at sanger.ac.uk Sun Oct 19 15:32:42 2003 From: gmpc at sanger.ac.uk (Guy Coates) Date: Sun, 19 Oct 2003 20:32:42 +0100 (BST) Subject: RLX In-Reply-To: <200310181602.h9IG2HA27890@NewBlue.scyld.com> References: <200310181602.h9IG2HA27890@NewBlue.scyld.com> Message-ID: > > Have you ever try or test RLX server for HPC? > > What is their performance? > > .. what's their price/performance? Well, it all depends. The performance of the current generation of blade systems are on a par with 1U systems, and you can now get chassis with myrinet or SAN connectivity if you need it. The part of price/performance that tends to get overlooked is manageability. Do you factor in the time and salaries of you admin staff who have to look after the thing? We run clusters with blade servers from various manufacturers (including RLX) and traditional 1U machines. The management overhead on blade systems is significantly lower than for 1U machines, and streets ahead of "beige boxes on shelves". On blade systems the network and SAN switching infrastructure is nicely integrated with the server chassis, and their management interfaces tied in with OS deployment, remove power management etc. The difference in management overhead gets more pronounced as your cluster size increases. The time it takes to look after a 24 node cluster of 1U boxes isn't going to be that different to the time it takes to look after 24 blades, but running a 1000 blades is much less effort than running a 1000 1U servers. Whether this actually matters or not depends on your circumstances. If you have a limitless supply of PhD student slave labour, (eg Virginia Tech and their G5s), then time and cost of management isn't so much of an issue. If you have to pay money for your sys-admins and want to run big clusters, then blades may end up being cost effective. Cheers, Guy Coates -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Mon Oct 20 04:35:03 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Mon, 20 Oct 2003 01:35:03 -0700 Subject: A Petaflop machine in 20 racks? In-Reply-To: Message-ID: <5.2.0.9.2.20031020013259.03c0a4e0@216.82.101.6> Quoting from the article: An ordinary desktop PC outfitted with six PCI cards, each containing four of the chips, would perform at about 600 gigaflops (or more than half a teraflop). Assuming you were to build cluster systems with six PCI cards each, it would require 4U rack cases... Unless these floating point cards come as low-profile PCI (MD2 form factor)? 20 racks * 42U per rack = 840U / 4 = 210 nodes, not counting switching equipment. Petaflop with 210 compute nodes? At 04:15 PM 10/16/2003 -0400, you wrote: >Hi all, > > Check out this article over at wired: > >http://www.wired.com/news/technology/0,1282,60791,00.html > > It makes all sorts of wild claims, but what do you guys think? >Obviously, there's memory bandwidth limitations due to PCI. Does anyone >know anything else about these guys? > >Cheers, >Bryce > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Mon Oct 20 08:16:19 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Mon, 20 Oct 2003 14:16:19 +0200 Subject: some ab initio benchmarks In-Reply-To: References: Message-ID: <20031020121619.GM8711@unthought.net> On Sat, Oct 18, 2003 at 06:52:59AM +0200, Cezary Czaplewski wrote: > > Hi, > > quite recently I did some benchmarks of P4 2.8 GHz against dual Opteron > 1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+) and some > older machines. For comparison I am including benchmarks of dual P3 512 > 1200MHz I got from Wayne Fisher, The University of Texas at Dallas. On > Opteron I have also tried PC GAMESS program which I received from Alex > Granovsky. Could you please specify which version of which operating system was used for this? If the kernel does not have NUMA scheduling, the Opterons are severely disadvantaged - it would be useful to know. Thank you, -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From richardlbj at yahoo.com Sat Oct 18 23:56:37 2003 From: richardlbj at yahoo.com (Richard Brown) Date: Sat, 18 Oct 2003 20:56:37 -0700 (PDT) Subject: cluseter node freezes while running namd 2.5/2.5b1 Message-ID: <20031019035637.5382.qmail@web41211.mail.yahoo.com> I have been try to figure this out for the past two months with no luck. I have a 8-node PC cluster that consists of 16 athlon mp2200+, msi k7d master-l mb, intel i82557/i82558 10/100 on-board lan, 500mb kingston ddr266 pc2100 unbuffered, 3com superstack III baseline 24 port 10/100 switch. The cluster was built using oscar2.1/redhat7.3 w/ the kernel update 2.4.20-20. namd used includes 2.5b1 and the latest 2.5, both linux binary distributions and source code builds. the simulation tested is apoa1 benchmark example. namd/apoa1 only runs w/o problems on a single cluster node, either with one or two cpus. Every time it runs on two or more nodes, either using one or two cpus from each node, namd/apoa1 stops somewhere in the middle of run. One of the nodes freezes and does not respond to ping, ssh or the directly attached keyboard. Most of the time there were no error messages. A few times I received apic error or sorcket receive failure. I tried plugging a ps/2 mouse into the nodes as some people suggested for a bug of the motherboad but it did not help. I don't know how to proceed from here. Any suggestions would be appreciated. Thanks, Richard __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cb4 at tigertiger.de Sun Oct 19 21:00:53 2003 From: cb4 at tigertiger.de (Christoph Best) Date: Sun, 19 Oct 2003 21:00:53 -0400 Subject: A Petaflop machine in 20 racks? In-Reply-To: <20031018013957.GA3774@greglaptop.PEATEC.COM> References: <20031018013957.GA3774@greglaptop.PEATEC.COM> Message-ID: <16275.13253.833239.996985@random.tigertiger.de> > > http://www.wired.com/news/technology/0,1282,60791,00.html Greg Lindahl writes: > I think it's the Return of the Array Processor. > > There's very little new in computing these days -- and it has the > usual flaws of APs: low bandwidth communication to the host. > > So if you have a problem that actually fits in the limited memory, and > doesn't need to communicate with anyone else very often, it may be a > win for you. They actually say in this document http://www.clearspeed.com/downloads/overview_cs301.pdf that the chip can be used as stand-alone processor and resembles a standard RISC processor. I do not see whether it would be SIMD or MIMD - the block diagram at least does not show a central control unit separate from the PEs. Given the small on-chip memory, they will have to connect external memory. The thing that would worry me is that the external machine balance is 32 Flops/Word (on 32-bit words), so it will only be useful for applications that do a lot of operations inside a few 100Kb of memory. IBM is following a slightly different approach with the QCDOC and BlueGene/L supercomputers which are based on systems-on-a-chip where they put a two PowerPC cores and all support logic on a single chip, wire it up with one or two GB of memory and connect a lot (64K) of these chips together. They expect 5.5 GFlops/s per node peak and to have 360 TFlops operational in 2004/5 (in 64 racks). You would need about 200 racks to get to a PetaFlops machine... http://sc-2002.org/paperpdfs/pap.pap207.pdf http://www.arxiv.org/abs/hep-lat/0306023 [QCDOC is a Columbia University project in collaboration with IBM - IBM is transitioning the technology from high-energy physics to biology which makes a lot of sense... :-)] To put 64 processors on a chip, I am sure ClearSpeed have to sacrifice a lot in memory and functionality/programmability, and who wins in this tradeoff remains to be seen. Depends on the application, too, of course. BTW, who or what is behind ClearSpeed? Their Bristol address is identical to Infineon's Design Centre there, and Hewlett Packard seems to have a lab there, too. If they have that kind of support, I am sure they thought hard before making these design choices, and it may just be tarketed at certain problems (vector/matrix/FFT-like stuff). -Christoph -- Christoph Best cbst at tigertiger.de Bioinformatics group, LMU Muenchen http://tigertiger.de/cb _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mof at labf.org Mon Oct 20 10:13:56 2003 From: mof at labf.org (Mof) Date: Mon, 20 Oct 2003 23:43:56 +0930 Subject: Solaris Fire Engine. Message-ID: <200310202343.56524.mof@labf.org> http://www.theregister.co.uk/content/61/33440.html ... "We worked hard on efficiency, and we now measure, at a given network workload on identical x86 hardware, we use 30 percent less CPU than Linux." Mof. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Oct 20 11:17:24 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 20 Oct 2003 11:17:24 -0400 Subject: cluseter node freezes while running namd 2.5/2.5b1 In-Reply-To: <20031019035637.5382.qmail@web41211.mail.yahoo.com> References: <20031019035637.5382.qmail@web41211.mail.yahoo.com> Message-ID: <3F93FC84.7020808@scalableinformatics.com> Hi Richard: Are your Intel network drivers up to date? Check on the Intel site. If only one node repeatedly freezes (the same node), you might look at taking it out of the cluster, and seeing if that improves the situation. If it does, swap the one you took out, with one that is still in there, and see if the problem returns. This will help you determine if the problem is node based or system based. Joe Richard Brown wrote: >I have been try to figure this out for the past two >months with no luck. > >I have a 8-node PC cluster that consists of 16 athlon >mp2200+, msi k7d master-l mb, intel i82557/i82558 >10/100 on-board lan, 500mb kingston ddr266 pc2100 >unbuffered, 3com superstack III baseline 24 port >10/100 switch. > >The cluster was built using oscar2.1/redhat7.3 w/ the >kernel update 2.4.20-20. namd used includes 2.5b1 and >the latest 2.5, both linux binary distributions and >source code builds. the simulation tested is apoa1 >benchmark example. > >namd/apoa1 only runs w/o problems on a single cluster >node, either with one or two cpus. Every time it runs >on two or more nodes, either using one or two cpus >from each node, namd/apoa1 stops somewhere in the >middle of run. One of the nodes freezes and does not >respond to ping, ssh or the directly attached >keyboard. Most of the time there were no error >messages. A few times I received apic error or sorcket >receive failure. I tried plugging a ps/2 mouse into >the nodes as some people suggested for a bug of the >motherboad but it did not help. > >I don't know how to proceed from here. Any suggestions >would be appreciated. > >Thanks, >Richard > > >__________________________________ >Do you Yahoo!? >The New Yahoo! Shopping - with improved product search >http://shopping.yahoo.com >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netbsd.org Mon Oct 20 11:03:50 2003 From: jschauma at netbsd.org (Jan Schaumann) Date: Mon, 20 Oct 2003 11:03:50 -0400 Subject: New tech-cluster mailing list for NetBSD Message-ID: <20031020150350.GA26140@netmeister.org> Hello, A new tech-cluster at netbsd.org mailing list has been created. As the name suggests, this list is intended for technical discussions on building and using clusters of NetBSD hosts. Initially, this list is expected to be of low volume, but we hope to advocate and advance the use of NetBSD in such environments significantly. Subscription is via majordomo -- please see http://www.NetBSD.org/MailingLists/ for details. -Jan -- http://www.netbsd.org - Multiarchitecture OS, no hype required. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Oct 20 14:03:23 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 20 Oct 2003 14:03:23 -0400 (EDT) Subject: Solaris Fire Engine. In-Reply-To: <200310202343.56524.mof@labf.org> Message-ID: On Mon, 20 Oct 2003, Mof wrote: > http://www.theregister.co.uk/content/61/33440.html > > ... "We worked hard on efficiency, and we now measure, at a given network > workload on identical x86 hardware, we use 30 percent less CPU than Linux." Linux uses much more CPU per packet than it used to. The structural change for IPtable/IPchains capability is very expensive, even when it is not used. And there have been substantial, CPU-costly changes to protect against denial-of-service attacks at many levels. The only protocol stack changes that might benefit cluster use are sendfile/zero-copy, and that doesn't apply to most current hardware or typical cluster message passing. I would be technially easy to revert to the interface of old Linux kernels and see much better than a 30% CPU reduction, but it's very unlikely that would happen politically: Linux development is feature-driven, not performance-driven. And that's easy to understand when your pet feature is at stake, or there is a news story of "Linux Kernel Vulnerable to ". -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kinghorn at pqs-chem.com Mon Oct 20 13:36:28 2003 From: kinghorn at pqs-chem.com (Donald B. Kinghorn) Date: Mon, 20 Oct 2003 12:36:28 -0500 Subject: parllel eigen solvers Message-ID: <200310201236.28901.kinghorn@pqs-chem.com> Does anyone know of any recent progress on parallel eigensolvers suitable for beowulf clusters running over gigabit ethernet? It would be nice to have something that scaled moderately well and at least gave reasonable approximations to some subset of eigenvalues and vectors for large (10,000x10,000) symmetric systems. My interests are primarily for quantum chemistry. It's pretty obvious that you can compute eigenvectors in parallel after you get the eigenvalues but it would be nice to get eigenvalues mostly in parallel requiring maybe just a couple of serial iterates ... Best regards to all -Don _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From czarek at sun1.chem.univ.gda.pl Mon Oct 20 15:08:21 2003 From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski) Date: Mon, 20 Oct 2003 21:08:21 +0200 (CEST) Subject: some ab initio benchmarks In-Reply-To: <20031020121619.GM8711@unthought.net> Message-ID: On Mon, 20 Oct 2003, Jakob Oestergaard wrote: > Could you please specify which version of which operating system was > used for this? Opteron machine (dahlia) was a prototype which dr Paulette Clancy got for evaluation from local computer shop. It had RedHat GinGin 64 operating system preistalled when I did testing. > If the kernel does not have NUMA scheduling, the Opterons are severely > disadvantaged - it would be useful to know. I don't remember which kernel was installed when I did benchmarks, I suppose standard kernel which is coming with GinGin64. Machine should have SuSE installed now so I cannot check it. I will repeat benchmarks with PGI 5 64bit compiler and SuSE when I will have some time. czarek _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Oct 20 17:50:56 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 20 Oct 2003 14:50:56 -0700 Subject: A Petaflop machine in 20 racks? In-Reply-To: <16275.13253.833239.996985@random.tigertiger.de> References: <20031018013957.GA3774@greglaptop.PEATEC.COM> <20031018013957.GA3774@greglaptop.PEATEC.COM> Message-ID: <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov> At 09:00 PM 10/19/2003 -0400, Christoph Best wrote: >BTW, who or what is behind ClearSpeed? Their Bristol address is >identical to Infineon's Design Centre there, and Hewlett Packard seems >to have a lab there, too. If they have that kind of support, I am sure >they thought hard before making these design choices, and it may just >be tarketed at certain problems (vector/matrix/FFT-like stuff). Off their web site...http://www.clearspeed.com/about.php?team The CEO and president are marketing oriented (CEO: "he focused on taking new technologies to market", President: "..successfully grown glabal sales mangement and field application organizations and instrumental in creating key partnership agreements". The CTO (Ray McConnell) does parallel processing with 300K processors, etc. VP Engr (Russell David) designed mixed signal baseband ICs for wireless market. I didn't turn up any papers in the IEEE on-line library, but that's not particularly signficant, in and of itself. McConnell has a paper http://www.hotchips.org/archive/hc11/hc11pres_pdf/hc99.s3.2.McConnell.pdf shows architectures from PixelFusion, Ltd... SIMD core with 32 bit embedded processor running a 256 PE "Fuzion block". Each PE has an 8 bit ALU and 2kByte PE memory... (sound familiar?) From "Hot Chips 99" James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Mon Oct 20 18:46:31 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 21 Oct 2003 00:46:31 +0200 (CEST) Subject: Solaris Fire Engine. In-Reply-To: Message-ID: On Mon, 20 Oct 2003, Donald Becker wrote: > The only protocol stack changes that might benefit cluster use are > sendfile/zero-copy, and that doesn't apply to most current hardware or > typical cluster message passing. Has actually somebody tried to use sendfile in MPICH or LAM-MPI ? I planned to do it, but this is somewhere in the middle of my always growing TODO queue... Recipes for how to use it were posted a few times at least on netdev list, so those interested can find them easily. > I would be technially easy to revert to the interface of old Linux > kernels and see much better than a 30% CPU reduction, but it's very > unlikely that would happen politically: But there are many projects that live outside the official kernel, the Scyld network drivers being one good example. What's wrong with replacing the IP stack with one maintained separately with performance in mind ? I agree though that this would mean somebody to take care of it and make sure that it works with newer kernels... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Oct 20 19:08:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 20 Oct 2003 19:08:12 -0400 (EDT) Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: Message-ID: On Mon, 20 Oct 2003, Trent Piepho wrote: > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > messages I've sent to this list has started bouncing back to me from > dan at systemsfirm.com. I'm getting about ten copies of each one every other > day. Is anyone else having this problem? Oh yes, and it is a SERIOUS problem. I was just mulling on the right procmail recipe to consign this domain to the dark depths of hell, but if it were done at the list level instead it would only be a good thing. My .procmailrc is already getting quite long indeed. BTW, you (and of course the rest of the list) are just the man to ask; what is the status of Opterons and fortran compilers. I myself don't use fortran any more, but a number of folks at Duke do, and they are starting to ask what the choices are for Opterons. A websearch reveals that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an Opteron fortran, but rumor also suggests that a number of these are really "beta" quality with bugs that may or may not prove fatal to any given project. Then there is Gnu. Any comments on any of these from you (or anybody, really)? Is there a functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? Do the compilers permit access to large (> 3GB) memory, do they optimize the use of that memory, do they support the various SSE instructions? I'm indirectly interested in this as it looks like I'm getting Opterons for my next round of cluster purchases personally, although I'll be using C on them (hopefully 64 bit Gnu C). rgb > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Mon Oct 20 18:52:03 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Mon, 20 Oct 2003 18:52:03 -0400 Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: References: Message-ID: <1066690323.7027.17.camel@roughneck.liniac.upenn.edu> On Mon, 2003-10-20 at 18:41, Trent Piepho wrote: > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > messages I've sent to this list has started bouncing back to me from > dan at systemsfirm.com. I'm getting about ten copies of each one every other > day. Is anyone else having this problem? Yes -- quite annoying :/ Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Mon Oct 20 18:41:51 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Mon, 20 Oct 2003 15:41:51 -0700 (PDT) Subject: flood of bounces from postmaster@systemsfirm.net Message-ID: I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every messages I've sent to this list has started bouncing back to me from dan at systemsfirm.com. I'm getting about ten copies of each one every other day. Is anyone else having this problem? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Oct 20 20:08:31 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 20 Oct 2003 20:08:31 -0400 (EDT) Subject: Solaris Fire Engine. In-Reply-To: Message-ID: On Tue, 21 Oct 2003, Bogdan Costescu wrote: > On Mon, 20 Oct 2003, Donald Becker wrote: > > > The only protocol stack changes that might benefit cluster use are > > sendfile/zero-copy, and that doesn't apply to most current hardware or > > typical cluster message passing. > > Has actually somebody tried to use sendfile in MPICH or LAM-MPI ? I > planned to do it, but this is somewhere in the middle of my always growing > TODO queue... Recipes for how to use it were posted a few times at least > on netdev list, so those interested can find them easily. The trick is to memory map a file use that memory region as message buffers send the message buffers using sendfile() My belief is that the page locking involved with sendfile() would be too costly for anything smaller than about 32KB. While I'm certain that there are a few MPI applications that use messages that large, they don't seem to be typical. > But there are many projects that live outside the official kernel, the > Scyld network drivers being one good example. What's wrong with replacing > the IP stack with one maintained separately with performance in mind ? > I agree though that this would mean somebody to take care of it and make > sure that it works with newer kernels... >From my experience trying to keep the network driver interface stable, I very much doubt that it would be possible to separately maintain a network protocol stack. Especially since it would be perceived as competition with the in-kernel version, which brings out the worst behavior... As a specific example, a few years ago we had cluster performance patches for the 2.2 kernel. Even while the 2.3.99 development was going on, the 2.2 kernel changed too quickly to keep those patches up to date and tested. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cb4 at tigertiger.de Mon Oct 20 19:31:33 2003 From: cb4 at tigertiger.de (Christoph Best) Date: Tue, 21 Oct 2003 01:31:33 +0200 Subject: A Petaflop machine in 20 racks? In-Reply-To: <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov> References: <20031018013957.GA3774@greglaptop.PEATEC.COM> <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov> Message-ID: <16276.28757.858683.189030@random.tigertiger.de> Jim Lux writes: > At 09:00 PM 10/19/2003 -0400, Christoph Best wrote: > >BTW, who or what is behind ClearSpeed? Their Bristol address is > >identical to Infineon's Design Centre there, and Hewlett Packard seems > >to have a lab there, too. If they have that kind of support, I am sure > >they thought hard before making these design choices, and it may just > >be tarketed at certain problems (vector/matrix/FFT-like stuff). > > The CTO (Ray McConnell) does parallel processing with 300K processors, etc. > VP Engr (Russell David) designed mixed signal baseband ICs for wireless > market. I didn't turn up any papers in the IEEE on-line library, but > that's not particularly signficant, in and of itself. I actually found some more info about them: Clearspeed used to be Pixelfusion, a spin-off from Inmos, who made the original Transputer. http://www.eetimes.com/sys/news/OEG20010524S0044 Clearspeed tried to design a SIMD processor called Fuzion for graphics applications, then around 2001 turned to the networking sector, and now it seems to high-performance computing. So its a processor in search of an application. http://www.eetimes.com/semi/news/OEG20000208S0039 http://www.eetimes.com/semi/news/OEG19990512S0012 Poor guys went through at least three CEOs during the last four years... -Christoph -- Christoph Best cbst at tigertiger.de Bioinformatics group, LMU Muenchen http://tigertiger.de/cb _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Oct 20 20:33:23 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 20 Oct 2003 17:33:23 -0700 (PDT) Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: Message-ID: hi ya trent just add the ip# of systemsfirm.net to your /etc/mail/access files # a polite msg i added for them/somebody to see .. systemsfirm.net REJECT - geez .. do you need help to fix your PC cd /etc/mail ; make ; restart-sendmail or your exim or ... c ya alvin and about 75% or more of the sven virus is coming from mis-managed/mis-configured clusters http://www.Linux-Sec.net/MSJunk On Mon, 20 Oct 2003, Trent Piepho wrote: > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > messages I've sent to this list has started bouncing back to me from > dan at systemsfirm.com. I'm getting about ten copies of each one every other > day. Is anyone else having this problem? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Mon Oct 20 23:34:44 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Mon, 20 Oct 2003 20:34:44 -0700 (PDT) Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: Message-ID: yes... I've tried contacting the admin contact for that domain and got no response... joelja On Mon, 20 Oct 2003, Trent Piepho wrote: > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > messages I've sent to this list has started bouncing back to me from > dan at systemsfirm.com. I'm getting about ten copies of each one every other > day. Is anyone else having this problem? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Tue Oct 21 08:06:56 2003 From: pesch at attglobal.net (pesch at attglobal.net) Date: Tue, 21 Oct 2003 05:06:56 -0700 Subject: Solaris Fire Engine. References: Message-ID: <3F95215F.9DE43BD9@attglobal.net> In a cluster, would it not make more sense to catch an attack in a firewall rather than at the kernel level? If so, should cluster builders perhaps look for other - more cluster specific - kernels? Should kernel development at some point split in two distinct lines: one for single computer applications and one for clusters? Paul Schenker Donald Becker wrote: > On Mon, 20 Oct 2003, Mof wrote: > > > http://www.theregister.co.uk/content/61/33440.html > > > > ... "We worked hard on efficiency, and we now measure, at a given network > > workload on identical x86 hardware, we use 30 percent less CPU than Linux." > > Linux uses much more CPU per packet than it used to. The structural > change for IPtable/IPchains capability is very expensive, even when it > is not used. And there have been substantial, CPU-costly changes to protect > against denial-of-service attacks at many levels. The only protocol > stack changes that might benefit cluster use are sendfile/zero-copy, and > that doesn't apply to most current hardware or typical cluster message > passing. > > I would be technially easy to revert to the interface of old Linux > kernels and see much better than a 30% CPU reduction, but it's very > unlikely that would happen politically: Linux development is > feature-driven, not performance-driven. And that's easy to understand > when your pet feature is at stake, or there is a news story of "Linux > Kernel Vulnerable to ". > > -- > Donald Becker becker at scyld.com > Scyld Computing Corporation http://www.scyld.com > 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system > Annapolis MD 21403 410-990-9993 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From zby at tsinghua.edu.cn Mon Oct 20 22:34:54 2003 From: zby at tsinghua.edu.cn (Baoyin Zhang) Date: Tue, 21 Oct 2003 10:34:54 +0800 Subject: Jcluster toolkit v 1.0 releases! Message-ID: <266703288.27688@mail.tsinghua.edu.cn> Apologies if you receive multiple copies of this message. Dear all, I am pleased to annouce the Jcluster Toolkit (Ver 1.0) releases, you can freely download it from the website below. http://vip.6to23.com/jcluster/ The toolkit is a high performance Java parallel environment, implemented in pure java. It provides you the popular PVM-like and MPI-like message-passing interface, automatic task load balance across large-scale heterogeneous cluster and high performance, reliable multithreaded communications using UDP protocol. In the version 1.0, Object passing interface is added into PVM-like and MPI-like message passing interface, and provide very convenient deployment -- the classes of user application only need to be deployed at one node in a large-scale cluster. I welcome your comments, suggestions, cooperation, and involvement in improving the toolkit. Best regards Baoyin Zhang _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Oct 21 08:20:10 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 21 Oct 2003 08:20:10 -0400 (EDT) Subject: Solaris Fire Engine. In-Reply-To: <3F95215F.9DE43BD9@attglobal.net> Message-ID: On Tue, 21 Oct 2003 pesch at attglobal.net wrote: > In a cluster, would it not make more sense to catch an attack in a firewall rather than at the kernel level? If > so, should cluster builders perhaps look for other - more cluster specific - kernels? Should kernel development > at some point split in two distinct lines: one for single computer applications and one for clusters? It's the usual problem (and a continuation of my XML rant in a way, as it is at least partly motivated by this). Sure, one can do this. However, it is very, very expensive to do so, a classic case of 90% of the work producing 10% of the benefit, if that. As Don pointed out, even Scyld, with highly talented people who are (in principle:-) even making money doing so found maintaining a separate kernel line crushingly expensive very quickly. Whenever expense is mentioned, especially in engineering, one has to consider benefit, and do a CBA. The CBA is the crux of all optimization theory; find the point of diminishing returns and stay there. I would argue that splitting the kernel is WAY far beyond that point. Folks who agree can skip the editorial below. For that matter, so can folks who disagree...;-) The expense can be expressed/paid one of several ways -- get a distinct kernel optimized and stable, get an entire associated distribution optimized and stable, and then freeze everything except for bugfixes. You then get a local optimum (after a lot of work) that doesn't take a lot of work to maintain, BUT you pay the penalty of drifting apart from the rest of linux and can never resynchronize without redoing all that work (and accepting all that new expense). New, more efficient gcc? Forget it -- the work of testing it with your old kernel costs too much. New device drivers? Hours to days of testing for each one. Eventually a key application or improvement appears in the main kernel line (e.g. 64 bit, Opteron support) that is REALLY different, REALLY worth more to nearly everybody than the benefit they might or might not gain from the custom HPC optimized kernel, and your optimized but stagnant kernel is abandoned. Alternatively, you can effectively twin the entire kernel development cycle, INCLUDING the testing and debugging. Back in my ill-spent youth I spent a considerable amount of time on the linux-smp list (I couldn't take being on the main linux kernel list even then, as its traffic dwarfs both the beowulf list and the linux-smp list combined). I also played a tiny bit with drivers on a couple of occassions. The amount of work, and number of human volunteers, required to drive these processes is astounding, and I would guess that it would have to be done on twinned lists as the kernelvolken would likely not welcome a near doubling of traffic on their lists or doubling of the work burden trying to figure out just who owns a given emergent bug (and inevitably they WOULD have to help figure out who owns emergent bugs, as some of them WOULD belong to them, others to the group supporting the split off sources, if they were to proceed independently but "keep up" with the development kernel so that true divergence did not occur). A better alternative exists (and is even used to some extent). The linux kernel is already highly modular. It is already possible to e.g. bypass the IP stack altogether (as is done by myrinet and other high speed networks) with custom device drivers that work below the IP and TCP layers -- just doing this saves you a lot of the associated latency hit in high speed networks, as TCP/IP is designed for WAN routing and security and tends to be overkill for a secure private LAN IPC channel in a beowulf. This route requires far less maintenance and customization -- specialized drivers for MPI and/or PVM and/or a network socket layer, plus a kernel module or three. Even this is "expensive" and tends to be done only by companies that make hefty marginal profits for their specific devices, but it is FAR cheaper than maintaining a separate kernel altogether. I would also lump into this group applying and testing on an ad hoc basis things like Josip's network optimization patches which make relatively small, relatively specific changes that might technically "break" a kernel for WAN application but can produce measureable benefits for certain classes of communication pattern. This sort of thing is NOT for everybody. It is like a small scale version of the first alternative -- the patches tend to be put together for some particular kernel revision and then frozen (or applied "blindly" to succeeding kernel revisions until they manifestly break). Again this motivates one to freeze kernel and distribution once one gets everything working and live with it until advances elsewhere make it impossible to continue doing so. This is the kind of thing where MAYBE one could get the patches introduced into the mainstream kernel sources in a form that was e.g. sysctl controllable -- "modular", as it were, but inside the non-modular part of the kernel as a "proceed at your own risk" feature. Expense alternatives in hand, one has to measure benefit. We could break up HPC applications very crudely into groups. One group is code that is CPU bound -- where the primary/only bottleneck is the number of double precision floating point (and associated integer) computations that the computer can retire per second. Another might be memory bound -- limited primarily by the speed with which the system can move values into and out of memory doing some simple operations on them in the meantime. Still another might be disk or other non-network I/O bound (people who crunch large data sets to and from large storage devices). Finally yes, one group might be bound by the network and network based IPC's in a parallel division of a program. This latter group is the ONLY group that would really benefit from the kernel split; the rest of the kernel is reasonably well optimized for raw computations, memory access, and even hardware device access (or can be configured and tuned to be without the need of a separate kernel line). I would argue that even the network group splits again, into latency limited and bandwidth limited. Bandwidth limited applications would again see little benefit from a hacked kernel split as TCP can deliver data throughput that is roughly 90% of wire speed (or better) for ethernet, depending on the quality of hardware as much as the kernel. Of course, the degree of the CPU's involvement in sending and receiving these messages could be improved; one would like to be able to use DMA as much as possible to send the messages without blocking the CPU, but this matters only if the CPU can do something useful while awaiting the network IPC transfers; often it cannot. The one remaining group that would significantly benefit is the latency limited group -- true network parallel applications that need to send lots of little messages that cannot be sensibly aggregated in software. The benefit there could be profound, as the TCP stack adds quite a lot of latency (and CPU load) on top of the irreducible hardware latency, IIRC, even on a switched network where the CPU doesn't have to deal with a lot of spurious network traffic. Are there enough members of this group to justify splitting the kernel? I very much doubt it. I don't even think that the existence of this group has motivated the widespread adoption of a non-IP ethernet transport layer -- nearly everybody just lives with the IP stack latency OR... ...uses one of the dedicated HPC networks. This is the real kicker. TCP latency is almost two orders of magnitude greater than either myrinet or dolphin/sci latency (which are both order of microseconds instead of order of hundreds of microseconds). They >>also<< deliver very high bandwidth. Sure, they are expensive, but you know that you are paying for precisely what YOU need for YOUR HPC computations. I don't have to pay for them (even indirectly, by helping out with a whole secondary kernel development track) when MY code is CPU bound; the big DB guys don't have to pay for it when THEIR code depends on how long it takes to read in those ginormous databases of e.g. genetic data; the linear algebra folks who need large, fast memory don't pay for it (unless they try splitting up their linear algebra across the network, of course:-) -- it is paid for only the people who need it, who send lots of little messages or who need its bleeding edge bandwidth or both. One COULD ask, very reasonably, for just about any of the kernel optimizations that can be implemented at the modular level -- that is a matter of writing the module, accepting responsibility for its integration into the kernel and sequential debugging in perpetuity (that is, becoming a slave of the lamp, in perpetuity bound to the kernel lists:-). Alas, TCP/IP is so bound up inside the main part of the kernel that I don't think it can be separated out into modules any more than it already is. ^^^^^ ^^^^^, (closing omitted in the fond hope of remuneration) rgb (C'mon now -- here I am omitting all sorts of words from my rants and my paypal account is still dry as a bone, dry as a desert, bereft of all money, parched as my throat in the noonday sun. Seriously, either I make some money or I'm gonna compose a 50 kiloword opus for my next one...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Tue Oct 21 08:46:57 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 21 Oct 2003 14:46:57 +0200 (CEST) Subject: Solaris Fire Engine. In-Reply-To: Message-ID: On Mon, 20 Oct 2003, Donald Becker wrote: > My belief is that the page locking involved with sendfile() would be too > costly for anything smaller than about 32KB. IIRC, both MPICH and LAM-MPI make the distinction between small and large messages with the default cutoff being 64KB. So large messages could be sent this way... I don't know what you meant with "too costly", but small messsages are not too costly to copy in the stack (normal behaviour) especially with increasing cache sizes of today CPUs, while the large ones (where copying time would be significant) could be sent without the extra copy in the stack. > While I'm certain that there are a few MPI applications that use > messages that large, they don't seem to be typical. ... or might not care that much about the speedup. > >From my experience trying to keep the network driver interface stable, I > very much doubt that it would be possible to separately maintain a > network protocol stack. Well, it was late last night and probably I haven't chosen the most appropriate example... the Scyld network drivers are maintained by one person, while my suggestion was more going toward a community project. > Especially since it would be perceived as competition with the in-kernel > version, which brings out the worst behavior... Yeah, political issues - I think that making the intent clear would solve the problem: there is no competition, it serves a completely different purpose. And given what you wrote in the previous e-mail about "feature-driven", who would use it on normal computers when it misses several "high-profile features" like iptables ? Even more, if it's clear that it should only be used on local fast networks, several aspects of the stack can be optimized without fear of breaking very high latency (satellite) or very low bandwidth (phone modems) connections. But I guess that I should stop dreaming :-) > As a specific example, a few years ago we had cluster performance > patches for the 2.2 kernel. Those maintained by Josip Loncaric ? Again it was a one-man show. I think that this is exactly the problem: there are small projects maintained by one person but which depend on the free time or interest of this person. Given that the clustering had moved from research-only into a lucrative bussiness and that the software (Linux kernel, MPI libraries, etc.) evolved quite a lot and the entry barrier into let's say kernel programing is quite high, it's normal that not many people want to make the step. I already expressed about a year ago my oppinion that such projects can only be carried forward by companies that benefit from them or universities where work from students comes for free. But it seems that there are no companies thinking that they can benefit or universities where students' work is for free... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Tue Oct 21 09:31:37 2003 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Tue, 21 Oct 2003 09:31:37 -0400 (EDT) Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: Message-ID: On Mon, 20 Oct 2003 at 7:08pm, Robert G. Brown wrote > On Mon, 20 Oct 2003, Trent Piepho wrote: > > > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > > messages I've sent to this list has started bouncing back to me from > > dan at systemsfirm.com. I'm getting about ten copies of each one every other > > day. Is anyone else having this problem? > > BTW, you (and of course the rest of the list) are just the man to ask; > what is the status of Opterons and fortran compilers. I myself don't > use fortran any more, but a number of folks at Duke do, and they are > starting to ask what the choices are for Opterons. A websearch reveals > that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an > Opteron fortran, but rumor also suggests that a number of these are > really "beta" quality with bugs that may or may not prove fatal to any > given project. Then there is Gnu. > > Any comments on any of these from you (or anybody, really)? Is there a > functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? > Do the compilers permit access to large (> 3GB) memory, do they optimize > the use of that memory, do they support the various SSE instructions? Well, this is as good a place as many to put up the benchmarks I ran using DYNA (a commercial FEM code from LSTC, first developed at LLNL, and definitely Fortran): http://www.duke.edu/~jlb17/bench-results.pdf According to their docs, the 32bit binary was compiled using ifc6.0. The slowdown in the newer point release is due to them dialing back the optimizations due to compiler bugs. The 64bit Opteron binary was compiled using PGI, but that's all I know about it. To sum it up, I bought some Opterons. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Tue Oct 21 09:41:53 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 21 Oct 2003 15:41:53 +0200 (CEST) Subject: Solaris Fire Engine. In-Reply-To: Message-ID: On Tue, 21 Oct 2003, Bogdan Costescu wrote: > But I guess that I should stop dreaming :-) Well, either I'm not dreaming, or somebody else is dreaming too :-) Below are some fragments of e-mails from David Miller (one of the Linux network maintainers) to netdev today: > People on clusters use their own special clustering hardware and > protocol stacks _ANYWAYS_ because ipv4 is too general to serve their > performance needs. And I think that is a good thing rather than > a bad thing. People should use specialized solutions if that is the > best way to attack their problem. ... > The things cluster people want is totally against what a general > purpose IPV4 implementation should do. Linux needs to provide a > general purpose IPV4 stack that works well for everybody, not just > cluster people. > > I'd rather have millions of servers using my IPV4 stack than a handful > of N-thousand system clusters. > ... > Sure, many people would like to simulate the earth and nuclear weapons > using Linux, but I'm sure as hell not going to put features into the > kernel to help them if such features hurt the majority of Linux users. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Oct 21 11:19:22 2003 From: becker at scyld.com (Donald Becker) Date: Tue, 21 Oct 2003 11:19:22 -0400 (EDT) Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: Message-ID: On Mon, 20 Oct 2003, Robert G. Brown wrote: > On Mon, 20 Oct 2003, Trent Piepho wrote: > > > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > > messages I've sent to this list has started bouncing back to me from > > dan at systemsfirm.com. I'm getting about ten copies of each one every other > > day. Is anyone else having this problem? > > Oh yes, and it is a SERIOUS problem. I was just mulling on the right There are many more problems that list readers do not see. I delete the address from the list only when the problem is persistent. The major problem happens when messages take a few days to bounce, and the bounce does not follow standards. In that case there are dozens of messages in the remote queue, and they all appears to be replies by a valid list subscriber. > BTW, you (and of course the rest of the list) are just the man to ask; > what is the status of Opterons and fortran compilers. I myself don't > use fortran any more, but a number of folks at Duke do, and they are > starting to ask what the choices are for Opterons. A websearch reveals > that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an > Opteron fortran, but rumor also suggests that a number of these are > really "beta" quality with bugs that may or may not prove fatal to any > given project. A surprising amount of 64 bit software (certainly not limited to the Opteron) is still not mature enough for general purpose use. It still requires more development and testing to get to the stability level required for real deployment. And it's not the "64 bit" nature of the software, since we did have reasonable maturity on the Alpha years ago. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Tue Oct 21 11:06:37 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Tue, 21 Oct 2003 09:06:37 -0600 Subject: parllel eigen solvers In-Reply-To: <200310211049.OAA18031@nocserv.free.net> References: <200310201236.28901.kinghorn@pqs-chem.com> <200310211049.OAA18031@nocserv.free.net> Message-ID: <20031021150637.GA8076@plk.af.mil> I should point out that density function theorcan be compute-bound on diagonalization. QUEST, a Sandia Code, easily handles several hundred atoms, but the eigen solve dominates by ~300-400 atoms. Thus, intermediate size diagonalization is of strong interest. Art Edwards On Tue, Oct 21, 2003 at 02:49:07PM +0400, Mikhail Kuzminsky wrote: > According to Donald B. Kinghorn > > > > Does anyone know of any recent progress on parallel eigensolvers suitable for > > beowulf clusters running over gigabit ethernet? > > It would be nice to have something that scaled moderately well and at least > > gave reasonable approximations to some subset of eigenvalues and vectors for > > large (10,000x10,000) symmetric systems. > > My interests are primarily for quantum chemistry. > > > In the case you think about semiempirical fockian diagonalisation, > there is a set of alternative methods for direct construction of density > matrix avoiding preliminary finding of eigenvectors. This methods > are realized, in particular, in Gaussian-03 and MOPAC-2002 methods. > > For non-empirical quantum chemistry diagonalisation usually doesn't limit > common performance. In the case of methods like CI it's necessary to > find only some eigenvectors, and it is better to use special diagonalization > methods. > > There is special parallel solver package, but I don't have exact > reference w/me :-( > > Mikhail Kuzminsky > Zelinsky Inst. of Orgamic Chemistry > Moscow > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eccf at super.unam.mx Tue Oct 21 15:32:05 2003 From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores) Date: Tue, 21 Oct 2003 13:32:05 -0600 (CST) Subject: shift bit & performance? In-Reply-To: <200310211603.h9LG3cA22580@NewBlue.scyld.com> Message-ID: Hi, sometime ago, somebody sent an info about performance working with "<<" & ">>" doing shift bits instead of using "*" or "/" Could anybody help me about it? cafe _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Tue Oct 21 15:21:24 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Tue, 21 Oct 2003 12:21:24 -0700 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: References: Message-ID: <3F958734.2030300@cert.ucr.edu> >>On Mon, 20 Oct 2003, Trent Piepho wrote: >> >> >>BTW, you (and of course the rest of the list) are just the man to ask; >>what is the status of Opterons and fortran compilers. I myself don't >>use fortran any more, but a number of folks at Duke do, and they are >>starting to ask what the choices are for Opterons. A websearch reveals >>that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an >>Opteron fortran, but rumor also suggests that a number of these are >>really "beta" quality with bugs that may or may not prove fatal to any >>given project. Then there is Gnu. >> >>Any comments on any of these from you (or anybody, really)? Is there a >>functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? >>Do the compilers permit access to large (> 3GB) memory, do they optimize >>the use of that memory, do they support the various SSE instructions? >> I can tell you about PGI's compilers. They are kinda beta quality as you say. As of now they only want to install on Suse enterprise edition. Although a little fiddling around with the install scripts and you can get them to install on other distributions. But even though you can get the compilers installed, they only seem to run on the Suse beta for opterons. PGI says this should all change in the near future though. As far as the code that the compilers produce, we haven't had any problems at all as far as I know of. The great thing about PGI compilers though is that you can download them and try them out for free for 15 days or so and see for yourself. As far as the Gnu Fortran compiler goes, it seems to work great on Opterons too. But then as you're probably aware, it's only a Fortran 77 compiler. Cheers, Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Tue Oct 21 15:33:33 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: 21 Oct 2003 14:33:33 -0500 Subject: shift bit & performance? In-Reply-To: References: Message-ID: <1066764813.27603.4.camel@terra> On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote: > Hi, > > sometime ago, somebody sent an info about performance working with "<<" & > ">>" doing shift bits instead of using "*" or "/" > Could anybody help me about it? > There is certainly performance to be had from using a logical shift instead of a multiply or divide, but its of declining value. I am fairly sure that with modern compilers you do a integer divide by a constant power of 2, that it will generate a logical shift. That aint rocket science. -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Tue Oct 21 16:32:07 2003 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Tue, 21 Oct 2003 15:32:07 -0500 Subject: shift bit & performance? In-Reply-To: ; from eccf@super.unam.mx on Tue, Oct 21, 2003 at 01:32:05PM -0600 References: <200310211603.h9LG3cA22580@NewBlue.scyld.com> Message-ID: <20031021153207.N31870@mikee.ath.cx> On Tue, 21 Oct 2003, Eduardo Cesar Cabrera Flores wrote: > > > Hi, > > sometime ago, somebody sent an info about performance working with "<<" & > ">>" doing shift bits instead of using "*" or "/" > Could anybody help me about it? The operations << and >> are closer to assembler operations for integer values than * and /. If using * or / there are many assembler instructions to compute the new values. When using power of 2s for * or / then << and >> are much faster. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bhalevy at panasas.com Tue Oct 21 17:35:06 2003 From: bhalevy at panasas.com (Halevy, Benny) Date: Tue, 21 Oct 2003 17:35:06 -0400 Subject: shift bit & performance? Message-ID: <30489F1321F5C343ACF6872B2CF7942A039DF8BC@PIKES.panasas.com> Could be meaningful on a 32 bit platform doing 64-bit math emulation. Emulating shift is much cheaper than multiply/divide. Benny >-----Original Message----- >From: Dean Johnson [mailto:dtj at uberh4x0r.org] >Sent: Tuesday, October 21, 2003 3:34 PM >To: Eduardo Cesar Cabrera Flores >Cc: beowulf at beowulf.org >Subject: Re: shift bit & performance? > > >On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote: >> Hi, >> >> sometime ago, somebody sent an info about performance >working with "<<" & >> ">>" doing shift bits instead of using "*" or "/" >> Could anybody help me about it? >> > >There is certainly performance to be had from using a logical >shift instead of a >multiply or divide, but its of declining value. I am fairly >sure that with modern >compilers you do a integer divide by a constant power of 2, >that it will generate >a logical shift. That aint rocket science. > > -Dean > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Wed Oct 22 03:32:31 2003 From: pesch at attglobal.net (pesch at attglobal.net) Date: Wed, 22 Oct 2003 00:32:31 -0700 Subject: flood of bounces from postmaster@systemsfirm.net References: Message-ID: <3F96328F.DF327416@attglobal.net> Perhaps it's not related to the topic but any mail I post to this list results automatically in a "incident report" to my mail provider (attglobal.net) which then automatically replies with the mail below. Any inquiry to attglobal.net with the reference number below results always in exactly 0 (zero) replies from attglobal. Paul Schenker "Received: from e4.ny.us.ibm.com ([32.97.182.104]) by prserv.net (in5) with ESMTP id <20031021031824105041p20me>; Tue, 21 Oct 2003 03:18:27 +0000 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e4.ny.us.ibm.com (8.12.10/8.12.2) with ESMTP id h9L3IN0N801416 for ; Mon, 20 Oct 2003 23:18:23 -0400 Received: from BLDVMB.POK.IBM.COM (d01av01.pok.ibm.com [9.56.224.215]) by northrelay01.pok.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id h9L3IMqW036946 for <@vm-av.pok.relay.ibm.com:pesch at attglobal.net>; Mon, 20 Oct 2003 23:18:22 -0400 Message-ID: <200310210318.h9L3IMqW036946 at northrelay01.pok.ibm.com> Received: by BLDVMB.POK.IBM.COM (IBM VM SMTP Level 320) via spool with SMTP id 7133 ; Mon, 20 Oct 2003 21:09:30 MDT Date: Mon, 20 OCT 2003 23:13:12 (-0400 GMT) From: notify at attglobal.net To: CC: Subject: Re: Solaris Fire Engine. (REF:#_CSSEMAIL_0870689) X-Mozilla-Status: 8011 X-Mozilla-Status2: 00000000 X-UIDL: 200310210327271050a5ammfe0013d2 An incident reported by you has been created. Sev: 4 The incident # is listed below. No need to respond to this e-mail. For Account: CSSEMAIL Incident Number: 0870689 Status: INITIAL Last Updated: Mon, 20 OCT 2003 23:13:12 (-0400 GMT) PROBLEM CREATED ************************************************************************* Summary: Re: Solaris Fire Engine. ************************************************************************* If replying via email, do not alter the reference id in the subject line and send only new information, do not send entire note again. Do not send attachments, graphics or images." Donald Becker wrote: > On Mon, 20 Oct 2003, Robert G. Brown wrote: > > > On Mon, 20 Oct 2003, Trent Piepho wrote: > > > > > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > > > messages I've sent to this list has started bouncing back to me from > > > dan at systemsfirm.com. I'm getting about ten copies of each one every other > > > day. Is anyone else having this problem? > > > > Oh yes, and it is a SERIOUS problem. I was just mulling on the right > > There are many more problems that list readers do not see. I delete the > address from the list only when the problem is persistent. > The major problem happens when messages take a few days to bounce, and > the bounce does not follow standards. In that case there are dozens of > messages in the remote queue, and they all appears to be replies by a > valid list subscriber. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From douglas at shore.net Wed Oct 22 01:33:02 2003 From: douglas at shore.net (Douglas O'Flaherty) Date: Wed, 22 Oct 2003 01:33:02 -0400 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: <200310211601.h9LG1QA22261@NewBlue.scyld.com> References: <200310211601.h9LG1QA22261@NewBlue.scyld.com> Message-ID: <3F96168E.7050908@shore.net> Here's the short summary of Opteron compilers. When someone offers an AMD64 compiler, it typically may be used to create 32-bit or 64-bit executables as long as you are specific about which libraries you use. Any IA-32 compiler can create code and run on Opterons. Of course, 32-bit executables don't get the extra memory either, even when running on a 64-bit OS, but sometimes a 32-bit executable might be what you want. With SC2003 coming up, I expect we'll see a flurry of activity relating to compilers and tools. This information will likely be stale soon. Also, most of these have a free trial period, so you can kick the tires. Intel compilers work great in 32-bit and can be run on a 32 or 64-bit OS natively. Performance and compatability is not an issue. For obvious reasons many of the benchmarks have been run using IFC. PGI's first AMD64 production release was around July 5. There is a limitation on objects greater than 2GB in Linux as a result of the GNU assembly linker, but the application can address as much memory as you can give it. Only a small fraction of the world has objects that large. I've only run into it with synthetic benchmarks. The gal coding is done and PGI is working on the next release. As for performance, since this was the first AMD64 fortran compiler to market, it was used in AMD presentations. You can see performance comparisons in Rich Brunner's presentation from ClusterWorld. It's on-line at http://www.amd.com/us-en/assets/content_type/DownloadableAssets/RichBrunnerClusterWorldpresFINAL.pdf (about slide 39 IIRC) There was a minor patch release near the begining of August. I suspect there is always someone finding flaws, but generally it's doing well. NB: Saw Glenn's post re: PGI on SuSE v. RedHat. We've got it running on both. There were definately some fiddley bits to make it happy on RedHat, but I think they are documented on PGI's site. Absoft had a long beta of their AMD64 compiler and went GA in September. I have no personal experience on it, nor do I know of any public benchmarks. NAG worked closely with AMD on the AMD Core Math Libraries. They should know the processor well. No experience with the Gnu Fortran or Lahey. I believe GFC to be AMD64 functional. Lahey would only generate 32-bit code. Your other question was about SSE2. Yes Opteron has complete SSE2 support. I *know* PGI & IFC support it, I expect the others do as well. doug douglas_at_shore.net Disclaimer: Among my several hats I am also in AMD Marketing. This is an unofficial response. No AMD bits were utlized in the creation of this email, etc.. If you want to talk about Opterons 'officially' you need to email me at doug.oflaherty(at)amd.com On Mon, 20 Oct 2003 at 7:08pm, Robert G. Brown wrote >> On Mon, 20 Oct 2003, Trent Piepho wrote: >> > > >>> > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every >>> > messages I've sent to this list has started bouncing back to me from >>> > dan at systemsfirm.com. I'm getting about ten copies of each one every other >>> > day. Is anyone else having this problem? >> >> >> >> BTW, you (and of course the rest of the list) are just the man to ask; >> what is the status of Opterons and fortran compilers. I myself don't >> use fortran any more, but a number of folks at Duke do, and they are >> starting to ask what the choices are for Opterons. A websearch reveals >> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an >> Opteron fortran, but rumor also suggests that a number of these are >> really "beta" quality with bugs that may or may not prove fatal to any >> given project. Then there is Gnu. >> >> Any comments on any of these from you (or anybody, really)? Is there a >> functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? >> Do the compilers permit access to large (> 3GB) memory, do they optimize >> the use of that memory, do they support the various SSE instructions? > > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Wed Oct 22 04:45:08 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Wed, 22 Oct 2003 10:45:08 +0200 Subject: shift bit & performance? In-Reply-To: <1066764813.27603.4.camel@terra> References: <1066764813.27603.4.camel@terra> Message-ID: <20031022084508.GA7048@unthought.net> On Tue, Oct 21, 2003 at 02:33:33PM -0500, Dean Johnson wrote: > On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote: > > Hi, > > > > sometime ago, somebody sent an info about performance working with "<<" & > > ">>" doing shift bits instead of using "*" or "/" > > Could anybody help me about it? > > > > There is certainly performance to be had from using a logical shift instead of a > multiply or divide, but its of declining value. I am fairly sure that with modern > compilers you do a integer divide by a constant power of 2, that it will generate > a logical shift. That aint rocket science. > It used to be true that shifts were 'better' on Intel x86 processors, but it is not that simple anymore. On the P4 for example, a sequence of 'add's is cheaper than a left shift, for three adds or less (because the latency on the shift opcode has increased compared to earlier generations). -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From serguei.patchkovskii at sympatico.ca Wed Oct 22 10:05:38 2003 From: serguei.patchkovskii at sympatico.ca (serguei.patchkovskii at sympatico.ca) Date: Wed, 22 Oct 2003 10:05:38 -0400 Subject: (no subject) Message-ID: <20031022140538.QSP8001.tomts7-srv.bellnexxia.net@[209.226.175.20]> > Any IA-32 compiler can create code and run on Opterons. Of course, > 32-bit executables don't get the extra memory either, even when running > on a 64-bit OS Not true. A 32-bit binary running on x86-64 Linux has access to full 32-bit address space. When I run a very simple 32-bit Fortran program, I see program itself mapped at very low addresses; the shaped libraries get mapped at 1Gbyte mark, while the stack grows down from 4Gbyte mark. On an x86 Linux, the upper 1Gbyte (but this depends on the kernel options) is taken by the kernel address space. What this means in practice, is that on an x86 Linux, I can allocate at most 2.5Gbytes of memory for my data without resorting to ugly tricks; in 32-bit mode of x86-64 Linux, this goes up to about 3.5Gbytes - enough to make a difference in some cases. Serguei _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brian.dobbins at yale.edu Wed Oct 22 09:38:00 2003 From: brian.dobbins at yale.edu (Brian Dobbins) Date: Wed, 22 Oct 2003 09:38:00 -0400 (EDT) Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: <3F96168E.7050908@shore.net> Message-ID: > PGI's first AMD64 production release was around July 5. There is a > limitation on objects greater than 2GB in Linux as a result of the GNU > assembly linker, but the application can address as much memory as you One simple way to get around the 2GB limit (*) is to simply use FORTRAN 90 dynamic allocation calls - we've done this, and have run codes up to (so far) about 7.7GB in size. If you're used to static allocations in F77, it's only about two lines to alter things to use dynamic mem. (*) - I don't think this limitation is in the GNU assembly linker, since g77 has no problems here. I think if you compile to assembly, you'll see that PGI has issues with 32-bit wraparound, whereas g77 does not. Their tech people are aware of this, and it's something I expect will be fixed farily soon. Also, if you do happen to run jobs > 4GB, make sure you update the 'top' version you're using (procps.sourceforge.net). Previous versions had wraparound at the 4GB mark, and it's cool seeing a listing say something to the effect of "7.7G" next to the size. :) Cheers, - Brian _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From shewa at inel.gov Wed Oct 22 12:22:20 2003 From: shewa at inel.gov (Andrew Shewmaker) Date: Wed, 22 Oct 2003 10:22:20 -0600 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: References: Message-ID: <3F96AEBC.2020107@inel.gov> Robert G. Brown wrote: > BTW, you (and of course the rest of the list) are just the man to ask; > what is the status of Opterons and fortran compilers. I myself don't > use fortran any more, but a number of folks at Duke do, and they are > starting to ask what the choices are for Opterons. A websearch reveals > that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an > Opteron fortran, but rumor also suggests that a number of these are > really "beta" quality with bugs that may or may not prove fatal to any > given project. Then there is Gnu. I have used the PGI compiler 5.0-2 on SuSE SLES 8 with Radion Technologies' (www.radiative.com) Attila Fortran 90 code. One of our scientists has run models in which a single Attila process allocates up to about 7GB of RAM. The performance of the Opteron was quite impressive too. I'm still testing the g77 3.3 prerelease that SuSE includes. By default it creates 64 bit binaries. The gfortran (G95) snapshot doesn't work, but I'm planning on building it myself later on and trying to compile the above Attila code with it. Radiative looked at this earlier (months ago) and it wasn't ready at that time. Andrew > > Any comments on any of these from you (or anybody, really)? Is there a > functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? > Do the compilers permit access to large (> 3GB) memory, do they optimize > the use of that memory, do they support the various SSE instructions? > > I'm indirectly interested in this as it looks like I'm getting Opterons > for my next round of cluster purchases personally, although I'll be > using C on them (hopefully 64 bit Gnu C). > > rgb > > >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > -- Andrew Shewmaker, Associate Engineer Phone: 1-208-526-1415 Idaho National Eng. and Environmental Lab. P.0. Box 1625, M.S. 3605 Idaho Falls, Idaho 83415-3605 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Wed Oct 22 13:21:39 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Wed, 22 Oct 2003 11:21:39 -0600 Subject: Cooling Message-ID: <20031022172139.GA12958@plk.af.mil> I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will be on metal racks. Does anyone have a simple way to calculate cooling requirements? We will have fair flexibility with air flow. Art Edwards -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From JAI_RANGI at SDSTATE.EDU Wed Oct 22 15:43:36 2003 From: JAI_RANGI at SDSTATE.EDU (RANGI, JAI) Date: Wed, 22 Oct 2003 14:43:36 -0500 Subject: How to calculate operations on the cluster Message-ID: Hi, Can some tell me how to find out that how many operations can be performed on your cluster. If some say 3 million operation can be performed on this cluster, how to verify that and how to find out the actual performance. -Thanks -Jai Rangi _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Wed Oct 22 16:16:09 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Wed, 22 Oct 2003 13:16:09 -0700 Subject: Cooling References: <20031022172139.GA12958@plk.af.mil> Message-ID: <001d01c398d9$5b09cd00$32a8a8c0@laptop152422> To a first order, figure you've got to reject 150-200W per node.. that's roughly 10kW of heat you need to get rid of. That's 10kJ/second. That will tell you right away how many "tons" of A/C you'll need (1 ton = 12000 BTU/hr or, more usefully, here, 3.517 kW)... Looks like you'll need 3-4 tons (3 tons and 5tons are standard sizes...) Next, figure out how much temperature rise in the air you can tolerate (say, 10 degrees C) Use the specific heat of air to calculate how many kilos (or, more practically, cubic feet) of air you need to move (use 1000 J/kg deg as an approximation... you need to move 1 kg of air every second or about a cubic meter... roughly approximating, a cubic meter is about 35 cubic feet, so you need around 2100 cubic feet per minute) As a practical matter, you'll want a lot more flow (using idealized numbers when it's cheap to put margin in is foolish). Also, a 10 degree rise is pretty substantial... If you kept the room at 15C, the air coming out of the racks would be 25C, and I'll bet the processors would be a good 20C above that. Calculating for a 5 degree rise might be a better plan. Just double the flow. Unless you're investing in specialized ducting that pushes the AC only through the racks and not the room, a lot of the flow will be going around the racks, whether you like it or not. In general, one likes to keep the duct flow speed below 1000 linear feet per minute (for noise reasons!), so your ducting will be around 3-4 square feet. This is not a window airconditioner!... This is the curse of rackmounted equipment in general. Getting the heat out of the room is easy. The tricky part is getting the heat out of the rack. Think about it, you've got to pump all those thousands of CFM *through the rack*, which is aerodynamically not well suited to this, especially in 1U boxes. How much cross sectional area is there in that rack chassis aperture for the air? How fast does that imply that the air is moving? What sort of pressure drop is there going through the rack? Take a look at RGB's Brahma web site. There's some photos there of their chiller unit, so you can get an idea of what's involved. Your HVAC engineer will do a much fancier and useful version of this, allowing for things such as pressure drop, the amount of recirculation, the amount of heat leaking in from other sources (lighting, bodies in the room, etc.), heating from the fans, and so forth; But, at least you've got a ball park figure for what you're going to need. Jim Lux ----- Original Message ----- From: "Arthur H. Edwards" To: Sent: Wednesday, October 22, 2003 10:21 AM Subject: Cooling > I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The > cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will > be on metal racks. Does anyone have a simple way to calculate cooling > requirements? We will have fair flexibility with air flow. > > Art Edwards > > -- > Art Edwards > Senior Research Physicist > Air Force Research Laboratory > Electronics Foundations Branch > KAFB, New Mexico > > (505) 853-6042 (v) > (505) 846-2290 (f) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From toon at moene.indiv.nluug.nl Wed Oct 22 17:43:16 2003 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Wed, 22 Oct 2003 23:43:16 +0200 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) References: <3F96AEBC.2020107@inel.gov> Message-ID: <3F96F9F4.8050505@moene.indiv.nluug.nl> Andrew Shewmaker wrote: > I'm still testing the g77 3.3 prerelease that SuSE includes. By default > it creates 64 bit binaries. Is there any interest in having g77 deal correctly with > 2Gb *direct access* records ? I have a patch in progress (due to http://gcc.gnu.org/PR10885) that I can't test myself ... > The gfortran (G95) snapshot doesn't work, but I'm planning on building > it myself later on and trying to compile the above Attila code with it. > Radiative looked at this earlier (months ago) and it wasn't ready at > that time. Please do not forget to enter bug reports in our Bugzilla database (see http://gcc.gnu.org/bugs.html). Thanks ! -- Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Daniel.Kidger at quadrics.com Wed Oct 22 09:53:51 2003 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed, 22 Oct 2003 14:53:51 +0100 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@syst emsfirm.net) Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE210@stegosaurus.bristol.quadrics.com> >From: Brian Dobbins [mailto:brian.dobbins at yale.edu] (cut) > Also, if you do happen to run jobs > 4GB, make sure you > update the 'top' > version you're using (procps.sourceforge.net). Previous versions had > wraparound at the 4GB mark, and it's cool seeing a listing > say something to the effect of "7.7G" next to the size. :) On the subject of top another caveat is that top is hard-coded at compile time about what it thinks the pagesize is. If you compile kernels with bigger pagesizes (generally a 'good thing' for large memory nodes) then 'top' gets the memory used by your programs wrong by a factor of x2,x4 etc. ! Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leopold.palomo at upc.es Thu Oct 23 05:33:35 2003 From: leopold.palomo at upc.es (Leopold Palomo Avellaneda) Date: Thu, 23 Oct 2003 11:33:35 +0200 Subject: OpenMosix, opinions? Message-ID: <200310231133.35912.leopold.palomo@upc.es> Hi, I'm a newbie in all of this questions of paralelism and clusters. I'm reading all of I can. I have found some point that I need some opinions. Hipotesis, having a typical beowulf, with some nodes, a switch, etc. All of the nodes running GNU/Linux, and the applications that are running are using MPI or PVM. All works, etc .... Imaging that we have an aplication. A pararell aplication that doesn't use a lot I/O operation, but intensive cpu, and some messages. Something like a pure parallel app. We implement it using PVM or MPI ... MPI. And we make a test, and we have some result. Now, we have our beowulf, with a linux kernel with OpenMosix with a patch that can migrate threads (light weith process, Mighsm, http://mcaserta.com/maask/) or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, that com from here: http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/ benchmark.htm. We have our program, and we change it that use threads for the paralel behaviour and not MPI. And we run the same test. So, what will be better? Any one have tested it? Thank's in advance. Best regards, Leo _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 23 07:52:14 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 23 Oct 2003 07:52:14 -0400 (EDT) Subject: Cooling In-Reply-To: <20031022172139.GA12958@plk.af.mil> Message-ID: On Wed, 22 Oct 2003, Arthur H. Edwards wrote: > I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The > cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will > be on metal racks. Does anyone have a simple way to calculate cooling > requirements? We will have fair flexibility with air flow. My kill-a-watt shows 1900+ AMD Athlon duals drawing roughly ~230W/node (or 115 per processor) (under steady, full load). I don't have a single CPU system in this class to test, but because of hardware replication I would guess that one draws MORE than half of this, probably ballpark of 150-160W where YMMV depending on memory and disk and etc configuration. Your clock is also a bit higher than what I measure and there is a clockspeed dependence on the CPU side, so you should likely guesstimate highball, say 175W OR buy a <$50 kill-a-watt (numerous sources online) and measure your prototyping node yourself and get a precise number. Then it is a matter of arithmetic. To be really safe and make the arithmetic easy enough to do on my fingers, I'll assume 200 W/node. Times 48 is 9600 watts. Plus 400 watts for electric lights, a head node with disk, a monitor, a switch (this is likely lowball, but we highballed the nodes). Call it 10 KW in a roughly 1000 cubic foot space. One ton of AC removes approximately 3500 watts continuously. You therefore need at LEAST 3 tons of AC. However, you'd really like to be able to keep the room COLD, not just on a part with its external environment, and so need to be able to remove heat infiltrating through the walls, so providing overcapacity is desireable -- 4-5 tons wouldn't be out of the question. This also gives you at least limited capacity for future growth and upgrade without another remodelling job (maybe you'll replace those singles with duals that draw 250-300W apiece in the same rack density one day). You also have to engineer airflow so that cold air enters on the air intake side of the nodes (the front) and is picked up by a warm air return after being exhausted, heated after cooling the nodes, from their rear. I don't mean that you need air delivery and returns per rack necessarily, but the steady state airflow needs to retard mixing and above all prevent air exhausted by one rack being picked up as intake to the next. There are lots of ways to achieve this. You can set up the racks so that the node fronts face in one aisle and node exhausts face in the rear and arrange for cold air delivery into the lower part of the node front aisle (and warm air return on the ceiling). You can put all the racks in a single row and deliver cold air as low as possible on the front side and remove it on the ceiling of the rear side. If you have a raised floor and four post racks with sidepanels you can deliver it from underneath each rack and remove it from the top. This is all FYI, but it is a good idea to hire an actual architect or engineer with experience in server room design to design your power/cooling system, as there are lots of things (thermal power kill switch, for example) that you might miss but they should not. However, I think that the list wisdom is that you should deal with them armored with a pretty good idea of what they should be doing, as the unfortunate experience of many who have done so is that even the pros make costly mistakes when it comes to server rooms (maybe they just don't do enough of them, or aren't used to working with 1000 cubic foot spaces). If you google over the list archives, there are longranging, extended discussions on server room design that embrace power delivery, cooling, node issues, costs, and more. rgb > > Art Edwards > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From fmahr at gmx.de Thu Oct 23 09:06:09 2003 From: fmahr at gmx.de (Ferdinand Mahr) Date: Thu, 23 Oct 2003 15:06:09 +0200 Subject: OpenMosix, opinions? References: <200310231133.35912.leopold.palomo@upc.es> Message-ID: <3F97D241.2180EEF8@gmx.de> Hi Leo, > Imaging that we have an aplication. A pararell aplication that doesn't use a > lot I/O operation, but intensive cpu, and some messages. Something like a > pure parallel app. We implement it using PVM or MPI ... MPI. And we make a > test, and we have some result. > > Now, we have our beowulf, with a linux kernel with OpenMosix with a patch that > can migrate threads (light weith process, Mighsm, http://mcaserta.com/maask/) > or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, that > com from here: http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/ > benchmark.htm. > > We have our program, and we change it that use threads for the paralel > behaviour and not MPI. And we run the same test. So, what will be better? Any > one have tested it? I haven't tested your special situation, but here are my thoughts about it: - Why changing an application that you already have? It costs you an unnecessary amount of time and money. - Migshm seems to enable OpenMosix to migrate System V shared memory processes, not threads. But, "Threads created using the clone() system call can also be migrated using Migshm", that's what you want, right? I don't know how well that works, but it limits you to clone(), and I don't know if thats sufficient for reasonable thread programming. Still (as you mentioned before), you really can only write code that uses minimum I/O and interprocess/thread communication because of network limitations. - Programs using PThreads don't run in parallel with OpenMosix/Migshm, they can only be migrated in whole. - If your MPI/PVM programs are well designed, they are usually really fast and can scale very well when CPU-bound. - Currently (Open)Mosix is better for load-balancing than HPC, especially in clusters with different hardware configurations. In HPC clusters, you usually have identical compute nodes. Hope that helps, Ferdinand _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From RobertsGP at ncsc.navy.mil Thu Oct 23 15:07:19 2003 From: RobertsGP at ncsc.navy.mil (Roberts Gregory P DLPC) Date: Thu, 23 Oct 2003 14:07:19 -0500 Subject: UnitedLinux? Message-ID: Has anyone used UnitedLinux 1.0? I am using it on a 2 node dual CPU Opteron system. Greg -----Original Message----- From: Bill Broadley [mailto:bill at math.ucdavis.edu] Sent: Thursday, September 25, 2003 7:46 PM To: Brian Dobbins Cc: Bill Broadley; beowulf at beowulf.org Subject: Re: A question of OS!! > Yikes.. what kernels are used on these systems by default, and how large > is the code? I've been running SuSE 8.2 Pro on my nodes, and have gotten Factory default in both cases AFAIK. I don't have access to the SLES system at the moment, but the redhat box is: Linux foo.math.ucdavis.edu 2.4.21-1.1931.2.349.2.2.entsmp #1 SMP Fri Jul 18 00:06:19 EDT 2003 x86_64 x86_64 x86_64 GNU/Linux What relationship that has to the original 2.4.21 I know not. > varying performance due to motherboard, BIOS level and kernel. (SuSE 8.2 > Pro comes a modified 2.4.19, but I've also run 2.6.0-test5) > Also, are the BIOS settings the same? And how are the RAM slots I don't have access to the SLES bios. > populated? That made a difference, too! I'm well aware of the RAM slot issues, and I've experimentally verified that the full bandwidth is available. Basically each cpu will see 2GB/sec or so to main memory, and both see a total of 3GB/sec if both use memory simultaneously. > (Oh, and I imagine they're both writing to a local disk, or minimal > amounts over NFS? That could play a big part, too.. ) Yeah, both local disk, and not much. I didn't notice any difference when I commented out all output. > I should have some numbers at some point for how much things vary, but > at the moment we've been pretty busy on our systems. Any more info on > this would be great, though, since I've been looking at the faster chips, > too! ACK, I never considered that the opterons might be slower in some ways at faster clock speeds. My main suspicious is that MPICH was messaging passing for local nodes in some strange way and triggering some corner case under SLES. I.e. writing an int at a time between CPUs who are fighting over the same page. None of my other MPI benchmarks for latency of bandwidth (at various message sizes) have found any sign of problems. Numerous recompiles of MPICH haven't had any effect either. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lepalom at wol.es Thu Oct 23 10:17:17 2003 From: lepalom at wol.es (Leopold Palomo Avellaneda) Date: Thu, 23 Oct 2003 16:17:17 +0200 Subject: OpenMosix, opinions? In-Reply-To: <3F97D241.2180EEF8@gmx.de> References: <200310231133.35912.leopold.palomo@upc.es> <3F97D241.2180EEF8@gmx.de> Message-ID: <200310231617.17014.lepalom@wol.es> A Dijous 23 Octubre 2003 15:06, Ferdinand Mahr va escriure: > Hi Leo, > > > Imaging that we have an aplication. A pararell aplication that doesn't > > use a > > > lot I/O operation, but intensive cpu, and some messages. Something like a > > pure parallel app. We implement it using PVM or MPI ... MPI. And we make > > a > > > test, and we have some result. > > > > Now, we have our beowulf, with a linux kernel with OpenMosix with a patch > > that > > > can migrate threads (light weith process, Mighsm, > > http://mcaserta.com/maask/) > > > or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, > > that > > > com from here: > > http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/ > > > benchmark.htm. > > > > We have our program, and we change it that use threads for the paralel > > behaviour and not MPI. And we run the same test. So, what will be better? > > Any > > > one have tested it? Hi, > I haven't tested your special situation, but here are my thoughts about > it: > > - Why changing an application that you already have? It costs you an > unnecessary amount of time and money. Ok, I just explaining an example. If I have to begin from 0, which approach will be better? > - Migshm seems to enable OpenMosix to migrate System V shared memory > processes, not threads. But, "Threads created using the clone() system > call can also be migrated using Migshm", that's what you want, right? I > don't know how well that works, but it limits you to clone(), and I > don't know if thats sufficient for reasonable thread programming. Still > (as you mentioned before), you really can only write code that uses > minimum I/O and interprocess/thread communication because of network > limitations. Yes, you are right. However, I hope than soon it will run pure threads. I have heart that 2.6 have a lot of improvements in the thread part, but I'm not sure. > > - Programs using PThreads don't run in parallel with OpenMosix/Migshm, > they can only be migrated in whole. Well, Pthreads can migrate with openMosix (not Linux Threads!), without the patch. I have understood that. > - If your MPI/PVM programs are well designed, they are usually really > fast and can scale very well when CPU-bound. The question that I comment is to make the programation of a parallel program as a threads programation, and the rest is a job of the kernel in a cluster. If this is avalaible, the management of the parallelism will be a job of the SO, in a distributed machine. > - Currently (Open)Mosix is better for load-balancing than HPC, > especially in clusters with different hardware configurations. In HPC > clusters, you usually have identical compute nodes. > > Hope that helps, Yes, of course. Thank's, regards. Leo _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gilberto at ula.ve Thu Oct 23 17:43:14 2003 From: gilberto at ula.ve (Gilberto Diaz) Date: 23 Oct 2003 17:43:14 -0400 Subject: Oscar 2.3 Message-ID: <1066945394.1200.132.camel@odie> Hello everybody I'm trying to install a small cluster using RH8.0 and oscar 2.3. The machines has a sis900 NIC (PXE capable) in the motherboard. When I try to boot the client nodes they not boot because the sis900.o module is not present. Does anybody have any idea how to load the module in the init image in order to boot the nodes without change the kernel using the kernel picker? Thanks in advance Regards Gilberto _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Fri Oct 24 04:48:03 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Fri, 24 Oct 2003 10:48:03 +0200 Subject: CLIC 2, the newest version is out ! Message-ID: <1066985283.32232.57.camel@revolution.mandrakesoft.com> CLIC is a GPL Linux based distribution made for meeting the HPC needs. CLIC 2 now allow people to install a full Linux cluster from scratch in a few hours. This product contains the Linux core system + the clustering autoconfiguration tools + the deployement tools + MPI stacks (mpich, lam/mpi). CLIC 2 is based on the results of MandrakeClustering, and includes several major features: - New backend engine (fully written in perl) - A new configure step during the server's graphical installation - An automated Dual Ethernet configuration (One NIC for computing, One Nic for administrating) - A new kernel (2.4.22) - A new version of urpmi parallel (a parallel rpm installer) - A graphical tool for managing users (add/remove) : userdrake - A new node management |- You just need to power on a fresh node to install and integrate it in your cluster ! |- Fully automated add/remove procedure And of course the lastest version of the clustering software: - Maui 3.2.5-p5 - ScalablePBS 1.0-p4 - Ganglia 2.5.4 - Mpich 1.2.5-2 - LAM/MPI 6.5.9 (will be updated when 7.1 will be available) - PXELinux 2.06 CLIC 2 will no more being compatible with CLIC 1 due to a fully rewritten backend. This will no more happen in the future but it was needed as CLIC 1 was a test release. We hope this product will meet the CLIC community needs. CLIC 2 is now available on your favorite mirrors in the mandrake-iso directory. For example you can found it at Europe: ftp://ftp.lip6.fr:/pub/linux/distributions/mandrake-iso/i586/CLIC-2.0-i586.iso ftp://ftp.mirror.ac.uk:/sites/sunsite.uio.no/pub/unix/Linux/Mandrake/Mandrake-iso/i586/CLIC-2.0-i586.iso ftp://ftp.tu-chemnitz.de:/pub/linux/mandrake-iso/i586/CLIC-2.0-i586.iso USA: ftp://ftp.rpmfind.net:/linux/Mandrake-iso/i586/CLIC-2.0-i586.iso ftp://mirrors.usc.edu:/pub/linux/distributions/mandrake-iso/i586/CLIC-2.0-i586.iso The documentation is included inside the cdrom (/doc/) under pdf and html format. This is the MandrakeClustering documentation based on the same core, everything is the same except the configuration GUI which is only available in MandrakeClustering. All the configuration scripts that DrakCluster (our GUI) uses are beginning whith the "setup_" prefix. So for auto configurating your server, you use the setup_auto_server.pl script. adding new nodes to your cluster, you use setup_auto_add_nodes.pl removing a node, you can use the setup_auto_remove_nodes.pl All this scripts have a really easy to learn syntax :) I hope this release will please every CLIC user, this new generation of CLIC is really easier to use than the previous releases. PS: I've been heard that the 2.4.22 kernel brand may seriously damage LG cdrom drives. So be carefull with CLIC2 if you own LG cdrom drives, remove your cdrom drive before installing it. - CLIC Website: http://clic.mandrakesoft.com/index-en.html -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scheinin at crs4.it Fri Oct 24 11:06:55 2003 From: scheinin at crs4.it (Alan Scheinine) Date: Fri, 24 Oct 2003 17:06:55 +0200 Subject: A Petaflop machine in 20 racks? Message-ID: <200310241506.h9OF6tP02285@dali.crs4.it> I asked ClearSpeed what is the width of the floating point units and today I received a reply. The floating point units in the CS301 are 32 bits wide. A previous email on the subject noted a earlier design Each PE has an 8 bit ALU for the 256 PE "Fuzion block". Evidently, this design is different. My opinion: 32 bits is more than adequate for many signal processing applications, not so long ago 24 bits was considered enough for signal processing. But for simulations of physical events the "eigenvalues" have a range that makes 32 bit floating point too small. regards, Alan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Daniel.Kidger at quadrics.com Fri Oct 24 12:09:31 2003 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Fri, 24 Oct 2003 17:09:31 +0100 Subject: A Petaflop machine in 20 racks? Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE233@stegosaurus.bristol.quadrics.com> > I asked ClearSpeed what is the width of the floating point units > and today I received a reply. > The floating point units in the CS301 are 32 bits wide. Dont forget that www.clearspeed.com used to be www.pixelfusion.com Their target market at the time was massively parallel SIMD PCI based graphics engines. So that is most likely why they use only 32bit floats. Yours, Daniel. (and yes Clearspeed are based in Bristol,UK but are nothing to do with us.) -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Tue Oct 28 11:09:54 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Tue, 28 Oct 2003 11:09:54 -0500 Subject: SFF boxes for a cluster? Message-ID: <3F9E94D2.3020307@lmco.com> Good morning, I've seen a few cluster made from the Small Form Factor (SFF) boxes including "Space Simulator". Has anyone else made a decent size cluster (n > 16) from these boxes? If so, how has the reliability been? Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Peter.Lindgren at experian.com Tue Oct 28 13:19:11 2003 From: Peter.Lindgren at experian.com (Lindgren, Peter) Date: Tue, 28 Oct 2003 10:19:11 -0800 Subject: SFF boxes for a cluster? Message-ID: We have had 48 Dell GX260 SFF boxes in production since March without a single hardware failure. Peter _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Tue Oct 28 15:12:38 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Tue, 28 Oct 2003 12:12:38 -0800 Subject: Beowulf digest, Vol 1 #1515 - 1 msg In-Reply-To: <09AE3D324A22D511A1A50002A5289F2101030E0A@lao-exchpo1-nt.nv .doe.gov> Message-ID: <5.2.0.9.2.20031028120820.04272e60@216.82.101.6> One serious problem with the Shuttle and most competing "small form factor" PCs is the air intake, which is located on the sides. You can't put them flush with each other side-by-side on shelves... Most minitower or midtower ATX cases (and proper 1U or 2U cases) have air intake entirely on the front panel. air intake on the left side: http://www.sfftech.com/showdocs.cfm?aid=447 At 11:45 AM 10/28/2003 -0800, you wrote: >I've got one of those SS51G's at home and I love it. My only complaint is >that it does get a bit warm with a video card, but for a cluster you wont >need one. > >-----Original Message----- >From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com] >Sent: Tuesday, October 28, 2003 10:07 AM >To: beowulf at beowulf.org >Subject: Beowulf digest, Vol 1 #1515 - 1 msg > > >Send Beowulf mailing list submissions to > beowulf at beowulf.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://www.beowulf.org/mailman/listinfo/beowulf >or, via email, send a message with subject or body 'help' to > beowulf-request at beowulf.org > >You can reach the person managing the list at > beowulf-admin at beowulf.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Beowulf digest..." > > >Today's Topics: > > 1. SFF boxes for a cluster? (Jeff Layton) > >--__--__-- > >Message: 1 >Date: Tue, 28 Oct 2003 11:09:54 -0500 >From: Jeff Layton >Subject: SFF boxes for a cluster? >To: beowulf at beowulf.org >Reply-to: jeffrey.b.layton at lmco.com >Organization: Lockheed-Martin Aeronautics Company > >Good morning, > > I've seen a few cluster made from the Small Form Factor >(SFF) boxes including "Space Simulator". Has anyone else >made a decent size cluster (n > 16) from these boxes? If so, >how has the reliability been? > >Thanks! > >Jeff > >-- >Dr. Jeff Layton >Aerodynamics and CFD >Lockheed-Martin Aeronautical Company - Marietta > > > > >--__--__-- > >_______________________________________________ >Beowulf mailing list >Beowulf at beowulf.org >http://www.beowulf.org/mailman/listinfo/beowulf > > >End of Beowulf Digest > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ZukaitAJ at nv.doe.gov Tue Oct 28 14:45:02 2003 From: ZukaitAJ at nv.doe.gov (Zukaitis, Anthony) Date: Tue, 28 Oct 2003 11:45:02 -0800 Subject: Beowulf digest, Vol 1 #1515 - 1 msg Message-ID: <09AE3D324A22D511A1A50002A5289F2101030E0A@lao-exchpo1-nt.nv.doe.gov> I've got one of those SS51G's at home and I love it. My only complaint is that it does get a bit warm with a video card, but for a cluster you wont need one. -----Original Message----- From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com] Sent: Tuesday, October 28, 2003 10:07 AM To: beowulf at beowulf.org Subject: Beowulf digest, Vol 1 #1515 - 1 msg Send Beowulf mailing list submissions to beowulf at beowulf.org To subscribe or unsubscribe via the World Wide Web, visit http://www.beowulf.org/mailman/listinfo/beowulf or, via email, send a message with subject or body 'help' to beowulf-request at beowulf.org You can reach the person managing the list at beowulf-admin at beowulf.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beowulf digest..." Today's Topics: 1. SFF boxes for a cluster? (Jeff Layton) --__--__-- Message: 1 Date: Tue, 28 Oct 2003 11:09:54 -0500 From: Jeff Layton Subject: SFF boxes for a cluster? To: beowulf at beowulf.org Reply-to: jeffrey.b.layton at lmco.com Organization: Lockheed-Martin Aeronautics Company Good morning, I've seen a few cluster made from the Small Form Factor (SFF) boxes including "Space Simulator". Has anyone else made a decent size cluster (n > 16) from these boxes? If so, how has the reliability been? Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta --__--__-- _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf End of Beowulf Digest _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From periea at bellsouth.net Tue Oct 28 16:08:49 2003 From: periea at bellsouth.net (periea at bellsouth.net) Date: Tue, 28 Oct 2003 16:08:49 -0500 Subject: SAS running on compute nodes Message-ID: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> Hello All, Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA... Phil... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rossini at blindglobe.net Tue Oct 28 17:30:35 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Tue, 28 Oct 2003 14:30:35 -0800 Subject: SAS running on compute nodes In-Reply-To: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> (periea@bellsouth.net's message of "Tue, 28 Oct 2003 16:08:49 -0500") References: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> Message-ID: <858yn5m1v8.fsf@blindglobe.net> writes: > Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA... Sure, as a bunch of singleton processes. I don't think you can do much more than that (but would be interested if I'm wrong). best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gabriele.butti at unimib.it Tue Oct 28 04:58:04 2003 From: gabriele.butti at unimib.it (Butti Gabriele - Dottorati di Ricerca) Date: 28 Oct 2003 10:58:04 +0100 Subject: opteron VS Itanium 2 Message-ID: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Dear all, we are planning to build up a new cluster (16 nodes) before this year's end; we are evaluating different proposals from machine sellers, but the main doubt we have at this moment is whether choosing an Itanium 2 architecture or an AMD Opteron one. I know that ther's had already been on this list a debate on such a topic, but maybe some of you has some new experience to tell about. There is a wild bunch of benchmarks on these machines, but we fear that these are somewhat misleading and are not designed to test CPU's for intense scientific computing. The code we want to run on these machines is basically a home-made code, not fully optimized, which allocates around 500 Mb of RAM per node. Communication between nodes is a quite rare event and does not affect much computation time. In the past we had a very nice experience using Alpha CPU's which performed very well. To sum up, the question is: is the Itanium2 worth the price difference or is the Opteron the best choice? Thank you all Gabriele Butti -- \\|// -(o o)- /------------oOOOo--(_)--oOOOo-------------\ | | | Gabriele Butti | | ----------------------- | | Department of Material Science | | University of Milano-Bicocca | | Via Cozzi 53, 20125 Milano, ITALY | | Tel (+39)02 64485214 | | .oooO Oooo. | \--------------( )---( )---------------/ \ ( ) / \_) (_/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jim at ks.uiuc.edu Tue Oct 28 21:23:48 2003 From: jim at ks.uiuc.edu (Jim Phillips) Date: Tue, 28 Oct 2003 20:23:48 -0600 (CST) Subject: opteron VS Itanium 2 In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: Hi, The Athlon design has some Alpha blood in it, and in my experience they both excel on branchy, unoptimized, float-intensive code. The Opteron is similar to the Athlon, but I wouldn't bother with 64-bit unless you're actually going to use more than 2 GB of memory per node. Athlon vs Pentium 4 or Xeon is a closer match, and you really need to run some benchmarks to decide between them. If you have access to an Opteron you should benchmark it as well, since I've heard they fly on some problems. Itanium 2 (Madison) is the current NAMD speed champ (although it's tied with a hyperthreaded P4 running multithreaded code), but it took some serious work to get the inner loops to the point that the Intel compiler could software pipeline them to get decent performance. I've heard that some Fortran codes had an easier time of it. Big branches really hurt. -Jim On 28 Oct 2003, Butti Gabriele - Dottorati di Ricerca wrote: > Dear all, > we are planning to build up a new cluster (16 nodes) before this > year's end; we are evaluating different proposals from machine sellers, > but the main doubt we have at this moment is whether choosing an Itanium > 2 architecture or an AMD Opteron one. > > I know that ther's had already been on this list a debate on such a > topic, but maybe some of you has some new experience to tell about. > > There is a wild bunch of benchmarks on these machines, but we fear that > these are somewhat misleading and are not designed to test CPU's for > intense scientific computing. The code we want to run on these machines > is basically a home-made code, not fully optimized, which allocates > around 500 Mb of RAM per node. Communication between nodes is a quite > rare event and does not affect much computation time. In the past we had > a very nice experience using Alpha CPU's which performed very well. > > To sum up, the question is: is the Itanium2 worth the price difference > or is the Opteron the best choice? > > Thank you all > > Gabriele Butti > -- > \\|// > -(o o)- > /------------oOOOo--(_)--oOOOo-------------\ > | | > | Gabriele Butti | > | ----------------------- | > | Department of Material Science | > | University of Milano-Bicocca | > | Via Cozzi 53, 20125 Milano, ITALY | > | Tel (+39)02 64485214 | > | .oooO Oooo. | > \--------------( )---( )---------------/ > \ ( ) / > \_) (_/ > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From smuelas at mecanica.upm.es Wed Oct 29 04:30:28 2003 From: smuelas at mecanica.upm.es (smuelas) Date: Wed, 29 Oct 2003 10:30:28 +0100 Subject: opteron VS Itanium 2 In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it> References: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: <20031029103028.5b7a89a7.smuelas@mecanica.upm.es> Why don't you try a more humble Athlon, (2800 will be enough and you can use DRAM at 400). You will economize a lot of money and for intensive operation it is very, very quick. I have a small cluster with 8 nodes and Athlon 2400 and the results are astonishing. The important point is the motherboard, and nforce is great. On 28 Oct 2003 10:58:04 +0100 gabriele.butti at unimib.it (Butti Gabriele - Dottorati di Ricerca) wrote: > Dear all, > we are planning to build up a new cluster (16 nodes) before this > year's end; we are evaluating different proposals from machine sellers, > but the main doubt we have at this moment is whether choosing an Itanium > 2 architecture or an AMD Opteron one. > > I know that ther's had already been on this list a debate on such a > topic, but maybe some of you has some new experience to tell about. > > There is a wild bunch of benchmarks on these machines, but we fear that > these are somewhat misleading and are not designed to test CPU's for > intense scientific computing. The code we want to run on these machines > is basically a home-made code, not fully optimized, which allocates > around 500 Mb of RAM per node. Communication between nodes is a quite > rare event and does not affect much computation time. In the past we had > a very nice experience using Alpha CPU's which performed very well. > > To sum up, the question is: is the Itanium2 worth the price difference > or is the Opteron the best choice? > > Thank you all > > Gabriele Butti > -- > \\|// > -(o o)- > /------------oOOOo--(_)--oOOOo-------------\ > | | > | Gabriele Butti | > | ----------------------- | > | Department of Material Science | > | University of Milano-Bicocca | > | Via Cozzi 53, 20125 Milano, ITALY | > | Tel (+39)02 64485214 | > | .oooO Oooo. | > \--------------( )---( )---------------/ > \ ( ) / > \_) (_/ > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Santiago Muelas E.T.S. Ingenieros de Caminos, (U.P.M) Tf.: (34) 91 336 66 59 e-mail: smuelas at mecanica.upm.es Fax: (34) 91 336 67 61 www: http://w3.mecanica.upm.es/~smuelas _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csmith at platform.com Wed Oct 29 10:01:58 2003 From: csmith at platform.com (Chris Smith) Date: Wed, 29 Oct 2003 07:01:58 -0800 Subject: SAS running on compute nodes In-Reply-To: <858yn5m1v8.fsf@blindglobe.net> References: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> <858yn5m1v8.fsf@blindglobe.net> Message-ID: <1067439718.3742.53.camel@plato.dreadnought.org> On Tue, 2003-10-28 at 14:30, A.J. Rossini wrote: > writes: > > > > Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA... > > Sure, as a bunch of singleton processes. I don't think you can do > much more than that (but would be interested if I'm wrong). > Actually ... you can after a fashion. SAS has something called MP CONNECT as part of the SAS/CONNECT product which allows you to call out to other SAS processes to have them run code for you, so you can do parallel SAS programs. http://support.sas.com/rnd/scalability/connect/index.html -- Chris _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Wed Oct 29 10:11:19 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed, 29 Oct 2003 09:11:19 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310291511.h9TFBJx10935@mycroft.ahpcrc.org> On Tue Oct 28 19:26:25 2003, Gabriele Butti wrote: >To sum up, the question is: is the Itanium2 worth the price difference >or is the Opteron the best choice? The SpecFP2000 performance difference between the best I2 and best Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000). The 1.5 GHz I2 with the 6MB cache is very expensive with a recent estimate here for dual processor nodes with the >>smaller<< cache at over $12,000 per node when Myrinet interconnect costs and other incidentals are included. A dual Opteron 246 at 2.0 GHz with the same interconnect and incidentals included was about $4,250. Top of the line Pentium 4 duals again with same interconnect and incidentals about $750 less at $3,500. For bandwidth/memory intensive codes, I think the Opteron is a clear winner in a dual processor configuration because of its dual channel to memory design. Stream triad bandwidth during SMP operation is ~50% more than a one processor test. Both the dual Pentium 4 and Itanium 2 share their memory bus and split (with some loss) the bandwidth in dual mode. In a single processor configuration the conclusion is less clear. Itanium's spec numbers are very impressive, but still not high enough to win on price performance. The new Pentium 4 3.2 GHz Extremem Edition with its 4x200 FSB has very good SpecFP2000 numbers out performing the Opteron by about 100 spec points and may be the best price performance choice in a single processor configuration. But of course the above logic means nothing with a benchmark of >>your<< application and specific vendor quotes in >>your<< hands. rbw #--------------------------------------------------- # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw at networkcs.com, richard.walsh at netaspx.com # rbw at ahpcrc.org #--------------------------------------------------- # Nullum magnum ingenium sine mixtura dementiae fuit. # - Seneca #--------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Wed Oct 29 11:13:27 2003 From: ctierney at hpti.com (Craig Tierney) Date: 29 Oct 2003 09:13:27 -0700 Subject: opteron VS Itanium 2 In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it> References: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: <1067444007.6209.16.camel@hpti10.fsl.noaa.gov> On Tue, 2003-10-28 at 02:58, Butti Gabriele - Dottorati di Ricerca wrote: > Dear all, > we are planning to build up a new cluster (16 nodes) before this > year's end; we are evaluating different proposals from machine sellers, > but the main doubt we have at this moment is whether choosing an Itanium > 2 architecture or an AMD Opteron one. > > I know that ther's had already been on this list a debate on such a > topic, but maybe some of you has some new experience to tell about. > > There is a wild bunch of benchmarks on these machines, but we fear that > these are somewhat misleading and are not designed to test CPU's for > intense scientific computing. The code we want to run on these machines > is basically a home-made code, not fully optimized, which allocates > around 500 Mb of RAM per node. Communication between nodes is a quite > rare event and does not affect much computation time. In the past we had > a very nice experience using Alpha CPU's which performed very well. > > To sum up, the question is: is the Itanium2 worth the price difference > or is the Opteron the best choice? > Why don't you run your codes on the two platforms and figure it out for yourself? Better yet, get the vendors to do it. I have seen cases where Itanium 2 performs much better than Opteron, justifying the price difference. Other codes did not show the same difference, but both were faster than a Xeon. Craig > Thank you all > > Gabriele Butti _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Thomas.Alrutz at dlr.de Wed Oct 29 11:15:48 2003 From: Thomas.Alrutz at dlr.de (Thomas Alrutz) Date: Wed, 29 Oct 2003 17:15:48 +0100 Subject: opteron VS Itanium 2 References: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: <3F9FE7B4.1000607@dlr.de> Hi Gabriele, we have bought a similar Linux Cluster (16 nodes) you are lokking for with the smallest dual Opteron 240 (1.4 GHz) and two Gigabit networks (one for communications (MPI) and one for nfs). > Dear all, > we are planning to build up a new cluster (16 nodes) before this > year's end; we are evaluating different proposals from machine sellers, > but the main doubt we have at this moment is whether choosing an Itanium > 2 architecture or an AMD Opteron one. > > I know that ther's had already been on this list a debate on such a > topic, but maybe some of you has some new experience to tell about. > The nodes have all 2 GB RAM (4*512 MB DDR333 REG), 2 Gigabit NICs (Broadcom onboard) and a Harddisk. The board we had choosen was the Rioworks HDAMA. I know it is not cheap, but it is stable and performances well with the SUSE/United Linux Enterprise Edition. > There is a wild bunch of benchmarks on these machines, but we fear that > these are somewhat misleading and are not designed to test CPU's for > intense scientific computing. The code we want to run on these machines > is basically a home-made code, not fully optimized, which allocates > around 500 Mb of RAM per node. Communication between nodes is a quite > rare event and does not affect much computation time. In the past we had > a very nice experience using Alpha CPU's which performed very well. We have done some benchmarking with our TAU-Code (unstructured finite volume CFD-code, in multigrid), which hangs extremly on the memory bandwith and latency. Therefore we tested 4 different architectures: 1. AMD Athlon MP 1.8 GHz FSB 133 MHZ - with gcc3.2 in 32 Bit 2. Intel Xeon 2.66 GHz FSB 133 MHZ - with icc7 in 32 bit 3. Intel Itanium2 1.0 GHz FSB 100 MHZ - with ecc6 in 64 Bit 4. AMD Opteron 240 1.4 GHz FSB 155 MHZ - with gcc3.2 in 64 Bit For the benchmark we used a "real life" example (aircraft configuration with wing, body and engine - approx. 2 million grid points) which desires 1.3 GB to 1.7 GB for the job (1 process) We have performed 30 iterations (Navier Stokes calculation - Spalart Allmares - central scheme - multigrid cycle) and taken the total (Wallclock) time. > > To sum up, the question is: is the Itanium2 worth the price difference > or is the Opteron the best choice? To answer your question take a look on the following chart : All times in seconds for 1 cpu on the node in use 1. AMD Athlon MP 1.8 GHz - 30 iter. = 3642.4 sec. 2. Intel Xeon 2.66 GHz - 30 iter. = 2151.4 sec. <- fastest 3. Intel Itanium2 1.0 GHz - 30 iter. = 3571.8 sec. 4. AMD Opteron 240 1.4 GHz - 30 iter. = 2256.5 sec. and 2 cpu on the node in use (2 process via MPI) 1. AMD Athlon MP 1.8 GHz - 30 iter. = 2076.1 sec. 2. Intel Xeon 2.66 GHz - 30 iter. = 1447.8 sec 3. Intel Itanium2 1.0 GHz - 30 iter. = 1842.8 sec. 4. AMD Opteron 240 1.4 GHz - 30 iter. = 1159.5 sec. <-- fastest So here you can see why we had to choose an Opteron based node to build up the cluster. The price/performance ratio for the Opteron machine is verry good compared to the itanium2 machines. And the Xeons are not so much cheaper.... Thomas -- __/|__ | Dipl.-Math. Thomas Alrutz /_/_/_/ | DLR Institut fuer Aerodynamik und Stroemungstechnik |/ | Numerische Verfahren DLR | Bunsenstr. 10 | D-37073 Goettingen/Germany _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Wed Oct 29 14:16:43 2003 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Wed, 29 Oct 2003 20:16:43 +0100 Subject: Video-less nodes Message-ID: <1067455003.21980.11.camel@qeldroma.cttc.org> Hi all, I would like to get some opinions about video-less nodes in a cluster, we know that there is no problem about monitoring nodes remotely and reading logs but I suppose that in a kernel panic situation there's some valuable on-screen information... ? any thoughts ? Of course there's the possibility about putting really cheap video cards just that we'll able to see the text screen , nothing more ;) -- Daniel Fernandez Laboratori de Termot?cnia i Energia - CTTC UPC Campus Terrassa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Wed Oct 29 15:45:25 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed, 29 Oct 2003 12:45:25 -0800 (PST) Subject: Video-less nodes In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org> Message-ID: On Wed, 29 Oct 2003, Daniel Fernandez wrote: > Hi all, > > I would like to get some opinions about video-less nodes in a cluster, > we know that there is no problem about monitoring nodes remotely and > reading logs but I suppose that in a kernel panic situation there's some > valuable on-screen information... ? any thoughts ? console on serial... let your terminal server collect oopses... > Of course there's the possibility about putting really cheap video cards > just that we'll able to see the text screen , nothing more ;) > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Wed Oct 29 16:41:21 2003 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed, 29 Oct 2003 16:41:21 -0500 (EST) Subject: Video-less nodes In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org> Message-ID: On Wed, 29 Oct 2003 at 8:16pm, Daniel Fernandez wrote > I would like to get some opinions about video-less nodes in a cluster, > we know that there is no problem about monitoring nodes remotely and > reading logs but I suppose that in a kernel panic situation there's some > valuable on-screen information... ? any thoughts ? > > Of course there's the possibility about putting really cheap video cards > just that we'll able to see the text screen , nothing more ;) As always, the answer is it depends. A serial console should handle all your needs. But sometimes the BIOS sucks or the console doesn't work right or... IMHO, unless it messes other stuff up (e.g. drags your only PCI bus down to 32/33), there's not much reason *not* to stuff cheap video boards into nodes. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 29 17:00:47 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 29 Oct 2003 17:00:47 -0500 (EST) Subject: Video-less nodes In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org> Message-ID: On Wed, 29 Oct 2003, Daniel Fernandez wrote: > Hi all, > > I would like to get some opinions about video-less nodes in a cluster, > we know that there is no problem about monitoring nodes remotely and > reading logs but I suppose that in a kernel panic situation there's some > valuable on-screen information... ? any thoughts ? > > Of course there's the possibility about putting really cheap video cards > just that we'll able to see the text screen , nothing more ;) To my direct experience, the extra time you waste debugging problems on videoless nodes by hauling them out of the rack, sticking video in them, resolving the problem, removing the video, and reinserting the nodes is far more costly than cheap video, or better yet onboard video (many/most good motherboards have onboard video these days) and being able to resolve many of these problems without deracking the nodes. Just my opinion of course. When things go well, of course, it doesn't matter. Just think about the labor involved in a single BIOS reflash, for example. rgb > > -- > Daniel Fernandez > Laboratori de Termot?cnia i Energia - CTTC > UPC Campus Terrassa > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Wed Oct 29 22:00:09 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed, 29 Oct 2003 22:00:09 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: <200310291511.h9TFBJx10935@mycroft.ahpcrc.org> Message-ID: > >To sum up, the question is: is the Itanium2 worth the price difference > >or is the Opteron the best choice? > > The SpecFP2000 performance difference between the best I2 and best > Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000). which to me indicates that the working set of SPEC codes is a good match to the cache of high-end It2's. this says nothing about It2's, but rather points out that SPEC components are nearly obsolete (required to run well in just 64MB core, if I recall correctly!) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jmdavis at mail2.vcu.edu Wed Oct 29 15:12:20 2003 From: jmdavis at mail2.vcu.edu (Mike Davis) Date: Wed, 29 Oct 2003 15:12:20 -0500 Subject: Video-less nodes In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org> References: <1067455003.21980.11.camel@qeldroma.cttc.org> Message-ID: <3FA01F24.2090405@mail2.vcu.edu> The onscreen info should also be logged. And then there's always the crash files. We now have a couple of clusters with videoless nodes (although they are on serial switches, Cyclades). Mike Daniel Fernandez wrote: >Hi all, > >I would like to get some opinions about video-less nodes in a cluster, >we know that there is no problem about monitoring nodes remotely and >reading logs but I suppose that in a kernel panic situation there's some >valuable on-screen information... ? any thoughts ? > >Of course there's the possibility about putting really cheap video cards >just that we'll able to see the text screen , nothing more ;) > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andreas.boklund at htu.se Thu Oct 30 01:57:35 2003 From: andreas.boklund at htu.se (andreas boklund) Date: Thu, 30 Oct 2003 07:57:35 +0100 Subject: opteron VS Itanium 2 Message-ID: Just a note, > For bandwidth/memory intensive codes, I think the Opteron is a clear > winner in a dual processor configuration because of its dual channel > to memory design. Stream triad bandwidth during SMP operation is > ~50% more than a one processor test. Both the dual Pentium 4 and Itanium > 2 share their memory bus and split (with some loss) the bandwidth in > dual mode. This is true as long as you are using an applicaiton where one process has its own memory area. If you would have 2 processes and shared memory the Opt, would behave like a small NUMA machine and a process will get a penalty for accessing another process (processors) memory segment. To quote D. Barron, "If it seems to be to good to be true, it probably is!", i have never yet seen true linear scalability, and with Ahmdahl out there i doubt that i ever will. Best //Andreas _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Oct 30 10:51:41 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 30 Oct 2003 10:51:41 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: Message-ID: > > For bandwidth/memory intensive codes, I think the Opteron is a clear > > winner in a dual processor configuration because of its dual channel > > to memory design. Stream triad bandwidth during SMP operation is > > ~50% more than a one processor test. Both the dual Pentium 4 and Itanium > > 2 share their memory bus and split (with some loss) the bandwidth in > > dual mode. this is particularly bad on "high-end" machines. for instance, several machines have 4 it2's on a single FSB. there's a reason that specfprate scales so much better on 1/2/4-way opterons than on 1/2/4-way it2's. don't even get me started about those old profusion-chipset 8-way PIII machines that Intel pushed for a while... > This is true as long as you are using an applicaiton where one process has its own > memory area. If you would have 2 processes and shared memory the Opt, would > behave like a small NUMA machine and a process will get a penalty for accessing > another process (processors) memory segment. huh? sharing data behaves pretty much the same on opteron systems (broadcast-based coherency) as on shared-FSB (snoopy) systems. it's not at all clear yet whether opterons are higher latency in the case where you have *often*written* shared data. it is perfectly clear that shared/snoopy buses don't scale, and neither does pure broadcast coherency. I figure that both Intel and AMD will be adding some sort of directory support in future machines. if they bother, that is - the market for many-way SMP is definitely not huge, at least not in the mass-market sense. regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Thu Oct 30 10:57:00 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Thu, 30 Oct 2003 09:57:00 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> On Wed Oct 29 21:38:48 2003, Mark Hahn wrote: >> >To sum up, the question is: is the Itanium2 worth the price difference >> >or is the Opteron the best choice? >> >> The SpecFP2000 performance difference between the best I2 and best >> Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000). > >which to me indicates that the working set of SPEC codes is a good >match to the cache of high-end It2's. this says nothing about It2's, >but rather points out that SPEC components are nearly obsolete >(required to run well in just 64MB core, if I recall correctly!) Of course, there is some truth to what you say, but "this says nothing about It2" seems a tad dramatic (but ... definitely in character ... ;-) ). Below is the memory table for most of the benchmarks. A few fit in the 6MB cache (although some surely should, as some codes do or can be made too fit into cache). Many are in the 100 to 200 MB range. The floating point accumen of the I2 chip is hard to question with the capability of performing 4 64-bit flops per clock (that's a 6.0 GFlops peak at 1.5 GHz; 12.0 at 32-bits). Moreover, even an I2 with 1/2 the Opteron's clock and only 50% more cache (L3 vs L2) performs more or less equal to the Opteron 246 on SpecFP2000. And after all a huge cache does raise the average memory bandwidth felt by the average code ... ;-) (even as average codes sizes grow) ... and a large node count divides the total memory required per node. Large clusters should love large caches ... you know the quest for super-linear speed ups. The I2's weakness is in price-performance and in memory bandwidth in SMP configurations in my view. My last line in the prior note was a reminder to the original poster that SpecFP numbers are not a final answer. I repeated the "benchmark you code" mantra ... partly to relieve Bob Brown of his responsibility to do so ;-). Got any snow up in the Great White North yet? Regards, rbw max max num num rsz vsz obs unchanged stable? ----- ----- --- --------- ------- gzip 180.0 199.0 181 68 vpr 50.0 53.6 151 6 gcc 154.0 156.0 134 0 mcf 190.0 190.0 232 230 stable crafty 2.0 2.6 107 106 stable parser 37.0 66.8 263 254 stable eon 0.6 1.5 130 0 perlbmk 146.0 158.0 186 0 gap 192.0 194.0 149 148 stable vortex 72.0 79.4 162 0 bzip2 185.0 199.0 153 6 twolf 3.4 4.0 273 0 wupwise 176.0 177.0 185 181 stable swim 191.0 192.0 322 320 stable mgrid 56.0 56.7 281 279 stable applu 181.0 191.0 371 369 stable mesa 9.4 23.1 132 131 stable galgel 63.0 155.0 287 59 art 3.7 4.3 157 37 equake 49.0 49.4 218 216 stable facerec 16.0 18.5 182 173 stable ammp 26.0 28.4 277 269 stable lucas 142.0 143.0 181 179 stable fma3d 103.0 105.0 268 249 stable sixtrack 26.0 59.8 148 141 stable apsi 191.0 192.0 271 270 stable _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Thu Oct 30 11:07:25 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Thu, 30 Oct 2003 10:07:25 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310301607.h9UG7PX06372@mycroft.ahpcrc.org> On Thu, 30 Oct 2003 07:57, Andreas Boklund wrote: >Just a note, > >> For bandwidth/memory intensive codes, I think the Opteron is a clear >> winner in a dual processor configuration because of its dual channel >> to memory design. Stream triad bandwidth during SMP operation is >> ~50% more than a one processor test. Both the dual Pentium 4 and Itanium >> 2 share their memory bus and split (with some loss) the bandwidth in >> dual mode. > >This is true as long as you are using an applicaiton where one process has its own >memory area. If you would have 2 processes and shared memory the Opt, would >behave like a small NUMA machine and a process will get a penalty for accessing >another process (processors) memory segment. > >To quote D. Barron, "If it seems to be to good to be true, it probably is!", i have never >yet seen true linear scalability, and with Ahmdahl out there i doubt that i ever will. Agreed. Of course, in the case of dual Pentium and Itaniums, even non- overlapping memory locations buy you nothing bandwidth-wise. Small or large scale perfect cross-bars to memory are tough and expensive. The Cray X1, with all its customer design effort and great total bandwidth on the node board, targeted only 1/4 of peak-data-required iin its design and delivers less under the full load of its 16-way SMP vector engines. And it's node board is probably the best bandwidth engine in the world at the moment. Regards, rbw _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Thu Oct 30 12:28:45 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Thu, 30 Oct 2003 11:28:45 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310301728.h9UHSj508273@mycroft.ahpcrc.org> On Thu, 30 Oct 2003 12:00:54, Mark Hahn wrote: >> Of course, there is some truth to what you say, but "this says nothing about >> It2" seems a tad dramatic (but ... definitely in character ... ;-) ). Below is > >all the world's a stage ;) Life without drama is life without the pursuit of happiness ... ;-). >> the memory table for most of the benchmarks. A few fit in the 6MB cache (although >> some surely should, as some codes do or can be made too fit into cache). Many > >seriously, the memory access patterns of very few apps are uniform >across their rss. I probably should have said "working set fits in 6M". Good point, most memory accesses are not globally stride-one. But of course this fact leads us back to the idea that cache >>is<< important for a suite of "representative codes". >and you're right; I just reread the spec blurb, and their aim was 100-200MB. > >> are in the 100 to 200 MB range. The floating point accumen of the I2 chip is hard > >that's max rss; it's certainly an upper bound on working set size, >but definitely not a good estimator. Yes, an upper bound. We would need more data on the Spec codes to know if the working sets are mostly sitting in the I2 cache. There is an inevitable dynamism here with larger caches swallowing up larger and larger chunks of the "average code's" working set and while the average working set grows over time. >in other words, it tells you something about the peak number of pages that >the app ever touches. it doesn't tell you whether 95% of those pages are >never touched again, or whether the app only touches 1 cacheline per page. > >in yet other words, max rss is relevant to swapping, not cache behavior. You might also say it this way ... cache-exceeding, max-RSS magnitude by itself does guarantee the elimination of unwanted cache effects. > >> And after all a huge cache does raise the average memory bandwidth felt by the >> average code ... ;-) (even as average codes sizes grow) ... and a large node count > >even though Spec uses geo-mean, it can strongly be influenced by outliers, >as we've all seen with Sun's dramatic "performance improvements" ;) > >in particular, 179.art is a good example. I actually picked it out by >comparing the specFP barchart for mckinley vs madison - it shows a fairly >dramatic improvement. this *could* be due to compiler improvements, >but given that 179.art has a peak RSS of 3.7MB, I think there's a real >cache effect here. I agree again, but would say that such a suite as SpecFP should include some codes that yield to cache-effects because some real world codes do. Always learn or am reminded of something from your posts Mark ... keep on keeping us honest and true ;-) like a Canadian Mountie. Regards, rbw _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Oct 30 12:45:20 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 30 Oct 2003 12:45:20 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: <200310301728.h9UHSj508273@mycroft.ahpcrc.org> Message-ID: > this fact leads us back to the idea that cache >>is<< important for a suite > of "representative codes". yes, certainly, and TBBIYOC (*). but the traditional perhaps slightly stodgy attitude towards this has been that caches do not help machine balance. that is, it2 has a peak/theoretical 4flops/cycle, but since that would require, worstcase, 3 doubles per flop, the highest-ranked CPU is actually imbalanced by a factor of 22.5! (*) the best benchmark is your own code let's step back a bit. suppose we were designing a new version of SPEC, and wanted to avoid every problem that the current benchmarks have. here are some partially unworkable ideas: keep geometric mean, but also quote a few other metrics that don't hide as much interesting detail. for instance, show the variance of scores. or perhaps show base/peak/trimmed (where the lowest and highest component are simply dropped). cache is a problem unless your code is actually a spec component, or unless all machines have the same basic cache-to-working-set relation for each component. alternative: run each component on a sweep of problem sizes, and derive two scores: in-cache and out-cache. use both scores as part of the overall summary statistic. I'd love to see good data-mining tools for spec results. for instance, I'd like to have an easy way to compare consecutive results for the same machine as the vendor changed the compiler, or as clock increases. there's a characteristic "shape" to spec results - which scores are high and low relative to the other scores for a single machine. not only does this include outliers (drastic cache or compiler effects), but points at strengths/weaknesses of particular architectures. how to do this, perhaps some kind of factor analysis? regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Oct 30 12:00:54 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 30 Oct 2003 12:00:54 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> Message-ID: > Of course, there is some truth to what you say, but "this says nothing about > It2" seems a tad dramatic (but ... definitely in character ... ;-) ). Below is all the world's a stage ;) > the memory table for most of the benchmarks. A few fit in the 6MB cache (although > some surely should, as some codes do or can be made too fit into cache). Many seriously, the memory access patterns of very few apps are uniform across their rss. I probably should have said "working set fits in 6M". and you're right; I just reread the spec blurb, and their aim was 100-200MB. > are in the 100 to 200 MB range. The floating point accumen of the I2 chip is hard that's max rss; it's certainly an upper bound on working set size, but definitely not a good estimator. in other words, it tells you something about the peak number of pages that the app ever touches. it doesn't tell you whether 95% of those pages are never touched again, or whether the app only touches 1 cacheline per page. in yet other words, max rss is relevant to swapping, not cache behavior. > And after all a huge cache does raise the average memory bandwidth felt by the > average code ... ;-) (even as average codes sizes grow) ... and a large node count even though Spec uses geo-mean, it can strongly be influenced by outliers, as we've all seen with Sun's dramatic "performance improvements" ;) in particular, 179.art is a good example. I actually picked it out by comparing the specFP barchart for mckinley vs madison - it shows a fairly dramatic improvement. this *could* be due to compiler improvements, but given that 179.art has a peak RSS of 3.7MB, I think there's a real cache effect here. > Got any snow up in the Great White North yet? no, but I notice that the permanent temporary DX units are not working as hard to keep the machineroom from melting down ;) oh, yeah, and there's something wrong with the color of the leaves. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Thu Oct 30 16:32:38 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Thu, 30 Oct 2003 15:32:38 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310302132.h9ULWcM12979@mycroft.ahpcrc.org> Mark Hahn wrote: >> this fact leads us back to the idea that cache >>is<< important for a suite >> of "representative codes". > >yes, certainly, and TBBIYOC (*). but the traditional perhaps slightly >stodgy attitude towards this has been that caches do not help machine >balance. that is, it2 has a peak/theoretical 4flops/cycle, but since >that would require, worstcase, 3 doubles per flop, the highest-ranked >CPU is actually imbalanced by a factor of 22.5! > >(*) the best benchmark is your own code Agreed, but since the scope of the discussion seemed to be microprocessors which are all relatively bad on balance compared to vector ISA/designs, I did not elaborate on balance. This is design area that favors the Opteron (and Power 4) because the memory controller is on-chip (unlike the Pentium 4 and I2) and as such, its performance improves with clock. I think it is interesting to look at other processor's theoretical balance numbers in relationship to the I2's that you compute (I hope I have them all correct): Pentium 4 EE 3.2 GHz: (3.2 GHz * 2 flops * 24 bytes) / 6.4 bytes/sec = Balance of 24 (max on chip cache 2MB) Itanium 2 1.5 GHz: (1.5 GHz * 4 flops * 24 bytes) / 6.4 bytes/sec = Balance of 22.5 (max on chip cache 6MB) Opteron 246 2.0 GHz: (2.0 GHz * 2 flops * 24 bytes) / 6.4 bytes/sec = Balance of 15 (max on chip cache 1MB) Power 4 1.7 GHz: (1.7 GHz * 4 flops * 24 bytes) / 6.4 bytes/sec = Balance of 25.5* (max on chip cache 1.44MB) Cray X1 .8 GHz: (0.8 GHz * 4 flops * 24 bytes) / 19.2 bytes/sec = Balance of 4 (512 byte off-chip L2) * IBM memory performance is with 1 core disabled and may now be higher than this. When viewed in context, yes, the I2 is poorly balanced, but it is typical of microprocessors, and it is not the worst among them. It also offers the largest compensating cache. Where it loses alot of ground is in the dual processor configuration. Opteron yields a better number, but this is because it can't do as many flops. The Cray X1 is has the most agressive design specs and yields a large enough percentage of peak to beat the fast clocked micros on vector code (leaving the ugly question of price aside). This is in part due to the more balanced design, but also due to its vector ISA which is just better at moving data from memory. >let's step back a bit. suppose we were designing a new version of SPEC, >and wanted to avoid every problem that the current benchmarks have. >here are some partially unworkable ideas: > >keep geometric mean, but also quote a few other metrics that don't >hide as much interesting detail. for instance, show the variance of >scores. or perhaps show base/peak/trimmed (where the lowest and highest >component are simply dropped). Definitely. I am constantly trimming the reported numbers myself and looking at the bar graphs for an eye-ball variance. It takes will power to avoid being seduced by a single summarizing number. The Ultra III's SpecFP number was a good reminder. >cache is a problem unless your code is actually a spec component, >or unless all machines have the same basic cache-to-working-set relation >for each component. alternative: run each component on a sweep of problem >sizes, and derive two scores: in-cache and out-cache. use both scores >as part of the overall summary statistic. Very good as well. This is the "cpu-rate-comes-to-spec" approach that I am sure Bob Brown would endorse. >I'd love to see good data-mining tools for spec results. for instance, >I'd like to have an easy way to compare consecutive results for the same >machine as the vendor changed the compiler, or as clock increases. ... or increased cache size. Another winning suggestion. >there's a characteristic "shape" to spec results - which scores are >high and low relative to the other scores for a single machine. not only >does this include outliers (drastic cache or compiler effects), but >points at strengths/weaknesses of particular architectures. how to do this, >perhaps some kind of factor analysis? This is what I refer to as the Spec finger print or Roshacht(sp?) test. We need a neural net derived analysis and classification here. Another presentation that I like is the "star graph" in which major characteristics (floating point perf., integer perf., cache, memory bandwidth, etc.) are layed out in equal degrees as vectors around a circle. Each processor is measured on each axis to give a star print and the total area is a measure of "total goodness". I hope someone from Spec is reading this ... and they remember who made these suggestions ... ;-). Regards, rbw #--------------------------------------------------- # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw at networkcs.com, richard.walsh at netaspx.com # rbw at ahpcrc.org # #--------------------------------------------------- # Nullum magnum ingenium sine mixtura dementiae fuit. # - Seneca #--------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Oct 30 23:31:01 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 31 Oct 2003 12:31:01 +0800 (CST) Subject: opteron VS Itanium 2 In-Reply-To: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> Message-ID: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> Other problems with the Itanium 2 are power hungry and heat problem. Also, reported on another mailing list: Earth Simulator 35.8 TFlop/s ASCI Q Alpha EV-68 13.8 TFlop/s Apple G5 dual (Big Mac) 9.5 TFlop/s HP RX2600 Itanium 2 8.6 TFlop/s This would place the Big Mac in the 3rd place on the top500 list -- assuming they have reported all submitted results in the report: http://www.netlib.org/benchmark/performance.pdf (p53) Andrew. > The I2's weakness is in price-performance and in > memory bandwidth in SMP configurations > in my view. My last line in the prior note was a > reminder to the original poster > that SpecFP numbers are not a final answer. I > repeated the "benchmark you code" > mantra ... partly to relieve Bob Brown of his > responsibility to do so ;-). > > Got any snow up in the Great White North yet? > > Regards, > > rbw > > > max max num num > rsz vsz obs unchanged stable? > ----- ----- --- --------- ------- > gzip 180.0 199.0 181 68 > vpr 50.0 53.6 151 6 > gcc 154.0 156.0 134 0 > mcf 190.0 190.0 232 230 stable > crafty 2.0 2.6 107 106 stable > parser 37.0 66.8 263 254 stable > eon 0.6 1.5 130 0 > perlbmk 146.0 158.0 186 0 > gap 192.0 194.0 149 148 stable > vortex 72.0 79.4 162 0 > bzip2 185.0 199.0 153 6 > twolf 3.4 4.0 273 0 > > wupwise 176.0 177.0 185 181 stable > swim 191.0 192.0 322 320 stable > mgrid 56.0 56.7 281 279 stable > applu 181.0 191.0 371 369 stable > mesa 9.4 23.1 132 131 stable > galgel 63.0 155.0 287 59 > art 3.7 4.3 157 37 > equake 49.0 49.4 218 216 stable > facerec 16.0 18.5 182 173 stable > ammp 26.0 28.4 277 269 stable > lucas 142.0 143.0 181 179 stable > fma3d 103.0 105.0 268 249 stable > sixtrack 26.0 59.8 148 141 stable > apsi 191.0 192.0 271 270 stable > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 31 11:02:29 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 31 Oct 2003 11:02:29 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: <200310302132.h9ULWcM12979@mycroft.ahpcrc.org> Message-ID: On Thu, 30 Oct 2003, Richard Walsh wrote: > >cache is a problem unless your code is actually a spec component, > >or unless all machines have the same basic cache-to-working-set relation > >for each component. alternative: run each component on a sweep of problem > >sizes, and derive two scores: in-cache and out-cache. use both scores > >as part of the overall summary statistic. > > Very good as well. This is the "cpu-rate-comes-to-spec" approach > that I am sure Bob Brown would endorse. Oh, sure. "I endorse this." ;-) As you guys are working out fine on your own, I like it combined with Mark's suggestion of showing the entire constellation for spec (which of course you CAN access and SHOULD access in any case instead of relying on geometric or any other mean measure of performance:-). I really think that many HPC performance benchmarks primary weakness is that they DON'T sweep problem size and present results as a graph, and that they DON'T present a full suite of different results that measure many identifiably different components of overall performance. From way back with early linpack, this has left many benchmarks susceptible to vendor manipulation -- there are cases on record of vendors (DEC, IIRC, but likely others) actually altering CPU/memory architecture to optimize linpack performance because linpack was what sold their systems. This isn't just my feeling, BTW -- Larry McVoy has similar concerns (more stridently expressed) in his lmbench suite -- he actually had (and likely still has) as a condition of their application to a system that they can NEVER be applied singly with just one (favorable:-) number or numbers quoted in a publication or advertisement --- the results of the complete suite have to be presented all together, with your abysmal failures side by side with your successes. I personally am less religious about NEVER doing anything and dislike semi-closed sources and "rules" even for benchmarks (it makes far more sense to caveat emptor and pretty much ignore vendor-based performance claims in general:-), but do think that you get a hell of a lot more information from a graph of e.g. stream results as a function of vector size than you get from just "running stream". Since running stream as a function of vector size more or less requires using malloc to allocate the memory and hence adds one additional step of indirection to memory address resolution, it also very slightly worsens the results, but very likely in the proper direction -- towards the real world, where people do NOT generally recompile an application in order to change problem size. I also really like Mark's idea of having a benchmark database site where comparative results from a wide range of benchmarks can be easily searched and collated and crossreferenced. Like the spec site, actually. However, that's something that takes a volunteer or organization with spare resources, much energy, and an attitude to make happen, and since one would like to e.g. display spec results on a non-spec site and since spec is (or was, I don't keep up with its "rules") fairly tightly constrained on who can run it and how/where its results can be posted, it might not be possible to create your own spec db, your own lmbench db, your own linpack db, all on a public site. cpu_rate you can do whatever you want with -- it is full GPL code so a vendor could even rewrite it as long as they clearly note that they have done so and post the rewritten sources. Obviously you should either get results from somebody you trust or run it yourself, but that is true for any benchmark, with the latter being vastly preferrable.:-) If I ever have a vague bit of life in me again and can return to cpu_rate, I'm in the middle of yet another full rewrite that should make it much easier to create and encapsulate a new code fragment to benchmark AND should permit running an "antistream" version of all the tests involving long vectors (one where all the memory addresses are accessed in a random/shuffled order, to deliberately defeat the cache). However, I'm stretched pretty thin at the moment -- a talk to give Tuesday on xmlsysd/wulfstat, a CW column due on Wednesday, and I've agreed to write an article on yum due on Sunday of next week I think (and need to finish the yum HOWTO somewhere in there as well). So it won't be anytime soon...:-) > >I'd love to see good data-mining tools for spec results. for instance, > >I'd like to have an easy way to compare consecutive results for the same > >machine as the vendor changed the compiler, or as clock increases. > > ... or increased cache size. Another winning suggestion. > > >there's a characteristic "shape" to spec results - which scores are > >high and low relative to the other scores for a single machine. not only > >does this include outliers (drastic cache or compiler effects), but > >points at strengths/weaknesses of particular architectures. how to do this, > >perhaps some kind of factor analysis? > > This is what I refer to as the Spec finger print or Roshacht(sp?) > test. We need a neural net derived analysis and classification here. . The only one I'd trust is the one already implemented in wetware. After all, classification according to what? > Another presentation that I like is the "star graph" in which major > characteristics (floating point perf., integer perf., cache, memory > bandwidth, etc.) are layed out in equal degrees as vectors around > a circle. Each processor is measured on each axis to give a star > print and the total area is a measure of "total goodness". > > I hope someone from Spec is reading this ... and they remember who > made these suggestions ... ;-). But things are more complicated than this. The real problem with SPEC is that your application may well resemble one of the components of the suite, in which case that component is a decent predictor of performance for your application almost by definition. However, the mean performance on the suite may or may not be well correlated with that component, or your application may not resemble ANY of the components on the suite. Then there are variations with compiler, operating system, memory configuration, scaling (or lack thereof!) with CPU clock. As Mark says, TBBIYOC is the only safe rule if you seek to compare systems on the basis of "benchmarks". I personally tend to view large application benchmarks like linpack and spec with a jaded eye and prefer lmbench and my own microbenchmarks to learn something about the DETAILED performance of my architecture on very specific tasks that might be components of a large application, supplemented with YOC. Or rather MOC. Zen question: Which one reflects the performance of an architecture, a BLAS-based benchmark or an ATLAS-tuned BLAS-based benchmark? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Fri Oct 31 09:11:49 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Fri, 31 Oct 2003 09:11:49 -0500 (EST) Subject: Cluster Poll Results In-Reply-To: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> Message-ID: For those interested, the latest poll at www.cluster-rant.com was on cluster size. We had a record 102 responses! Take a look at http://www.cluster-rant.com/article.pl?sid=03/10/25/1330216 for links to results and to the new poll on interconnects. Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 31 11:55:43 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 31 Oct 2003 11:55:43 -0500 (EST) Subject: Cluster Poll Results In-Reply-To: Message-ID: On Fri, 31 Oct 2003, Douglas Eadline, Cluster World Magazine wrote: > > For those interested, the latest poll at www.cluster-rant.com was on > cluster size. We had a record 102 responses! Take a look at > http://www.cluster-rant.com/article.pl?sid=03/10/25/1330216 > for links to results and to the new poll on interconnects. You need to let people vote more than once in something like this. I have three distinct clusters and there are two more I'd vote for the owners here at Duke. (They pretty much reflect the numbers you're getting, which show well over half the clusters at 32 nodes or less). It is interesting that this indicates that the small cluster is a lot more common than big clusters, although the way numbers work there are a lot more nodes in big clusters than in small clusters. At least in your biased and horribly unscientific (but FUN!) poll:-) So from a human point of view, providing support for small clusters is more important, but from an institutional/hardware point of view, big clusters dominate. It is also very interesting to me that RH (for example) thinks that there is something that they are going to provide that is worth e.g. several hundred thousand dollars in the case of a 1000+ node cluster running their "workstation" product. Fifty dollars certainly. Five hundred dollars maybe. A thousand dollars possibly, but only if they come up with a cluster-specific installation with some actual added value. Sigh. rgb > > Doug > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Fri Oct 31 12:10:31 2003 From: jcownie at etnus.com (James Cownie) Date: Fri, 31 Oct 2003 17:10:31 +0000 Subject: opteron VS Itanium 2 (Benchmark cheating) In-Reply-To: Message from "Robert G. Brown" of "Fri, 31 Oct 2003 11:02:29 EST." Message-ID: <1AFcn9-5Y0-00@etnus.com> > From way back with early linpack, this has left many benchmarks > susceptible to vendor manipulation -- there are cases on record of > vendors (DEC, IIRC, but likely others) actually altering CPU/memory > architecture to optimize linpack performance because linpack was > what sold their systems. This certainly applied to some compilers which "optimized" sdot and ddot by recognizing the source (down to the precise comments) and plugged in a hand coded assembler routine. Changing a comment (for instance mis-spelling Jack's name :-) or replacing a loop variable called "i" with one called "k" could halve the linpack result. When $$$ are involved people are prepared to sail close to the wind... -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Fri Oct 31 14:36:09 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Fri, 31 Oct 2003 11:36:09 -0800 (PST) Subject: opteron VS Itanium 2 (Benchmark cheating) In-Reply-To: <1AFcn9-5Y0-00@etnus.com> Message-ID: On Fri, 31 Oct 2003, James Cownie wrote: > > From way back with early linpack, this has left many benchmarks > > susceptible to vendor manipulation -- there are cases on record of > > vendors (DEC, IIRC, but likely others) actually altering CPU/memory > > architecture to optimize linpack performance because linpack was > > what sold their systems. > > This certainly applied to some compilers which "optimized" sdot and > ddot by recognizing the source (down to the precise comments) and > plugged in a hand coded assembler routine. Nvidia and ATI have recently done similar things, where their drivers would attempt to detect benchmarks being run and then use optimized routines or cheat on following specifications. Renaming quake2.exe to something else would cause a large decrease in framerate for example. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Fri Oct 31 14:45:04 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Fri, 31 Oct 2003 11:45:04 -0800 (PST) Subject: opteron VS Itanium 2 In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com> Message-ID: <20031031194504.30508.qmail@web11404.mail.yahoo.com> But still, at least the results showed that the G5s provided similar performance, and less expensive than IA64... Rayson --- Greg Lindahl wrote: > On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote: > > > This would place the Big Mac in the 3rd place on the > > top500 list > > Except that there are several other new large clusters that will > likely place higher -- LANL announced a 2,048 cpu Opteron cluster a > while back, and LLNL has something new, too, I think. Comparing > yourself to the obsolete list in multiple press releases isn't very > clever. > > -- greg > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Oct 31 12:38:20 2003 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 31 Oct 2003 18:38:20 +0100 (CET) Subject: Cluster Poll Results In-Reply-To: Message-ID: On Fri, 31 Oct 2003, Robert G. Brown wrote: > > It is also very interesting to me that RH (for example) thinks that > there is something that they are going to provide that is worth e.g. > several hundred thousand dollars in the case of a 1000+ node cluster > running their "workstation" product. Fifty dollars certainly. Five > hundred dollars maybe. A thousand dollars possibly, but only if they > come up with a cluster-specific installation with some actual added > value. > I'll second that. There has been a debate running on this topic on the Fedora list over the last few days. Sorry to be so boring, but its something we should debate too. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Oct 31 13:19:12 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 31 Oct 2003 10:19:12 -0800 Subject: opteron VS Itanium 2 In-Reply-To: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> References: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> Message-ID: <20031031181912.GB1289@greglaptop.internal.keyresearch.com> On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote: > This would place the Big Mac in the 3rd place on the > top500 list Except that there are several other new large clusters that will likely place higher -- LANL announced a 2,048 cpu Opteron cluster a while back, and LLNL has something new, too, I think. Comparing yourself to the obsolete list in multiple press releases isn't very clever. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From walkev at presearch.com Fri Oct 31 14:44:59 2003 From: walkev at presearch.com (Vann H. Walke) Date: Fri, 31 Oct 2003 14:44:59 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: References: Message-ID: <1067629499.21719.73.camel@localhost.localdomain> On Fri, 2003-10-31 at 12:38, John Hearns wrote: > On Fri, 31 Oct 2003, Robert G. Brown wrote: > > > > > It is also very interesting to me that RH (for example) thinks that > > there is something that they are going to provide that is worth e.g. > > several hundred thousand dollars in the case of a 1000+ node cluster > > running their "workstation" product. Fifty dollars certainly. Five > > hundred dollars maybe. A thousand dollars possibly, but only if they > > come up with a cluster-specific installation with some actual added > > value. > > > I'll second that. > > There has been a debate running on this topic on the Fedora list > over the last few days. > > Sorry to be so boring, but its something we should debate too. > Hmm... Let's take the case of a 1000 node system. If we assume a $3000/node cost (probably low once rack, UPS, hardware support, and interconnect are added in), we arrive at an approximate hardware cost of $3,000,000. If we were to use the RHEL WS list price of $179/node, we get $179,000 or about 6% of the hardware cost. That is assuming RedHat will not provide any discount on large volume purchases (unlikely). Is 6% unreasonable? What are the alternatives? - Keep using an existing RH distro: Only if you're willing to move into do it yourself mode when RH stop support (December?). I expect very few would be happy with this option. However, if you have a working RH7.3 cluster, it works, and you don't have to worry too much about security, why change? For new clusters though.... - Fedora - Planned releases 2-3 times a year. So, if I build a system on the Fedora release scheduled this Monday, who will be providing security patches for it 2 years from now (after 4-6 new releases have been dropped). My guess is no-one. Again, we're in the do it yourself maintenance or frequent OS upgrade mode. - SUSE - Not sure about this one. Their commercial pricing model is pretty close to RedHat's. Are they going to keep developing consumer releases? What will the support be for those releases? Can we really expect more than we get from a purely community developed system? Perhaps someone with more SUSE knowledge could comment? - Debian - Could be a good option, but to some extent you end up in the same position as Fedora. How often do the releases come out. Who supports the old releases? What hardware / software will work on the platform? - Gentoo - Not reliable, stable enough to meet my needs for clustering - Mandrake - Mandrake has their clustering distribution, which could be a good possibility, but the cost is as high or higher than RedHat. - Scyld - Superior design, supported, but again very high cost and may have to fight some compatibility issues since the it's market share in the Linux world is less than tiny. - OSCAR / Rocks / etc... - generally installed on top of another distribution. We still have to pick a base distribution. My conclusions - If you're in a research facility / university type setting where limited amounts of down time are acceptable, a free or nearly free system is perfect. A new Fedora/Debian/SuSE release comes out, shut the system down over Christmas break and rebuild it. (As long as you're happy spending a fair amount of time doing rebuilds and fixing upgrade problems). If however you really need the thing to work - Corporate research sites, satellite data processing, etc... the cost of the operating system may be minuscule relative to the cost of having the system down. If you _really_ want a particular application to work having it certified and supported on the OS may be important. The project on which I'm working - building sonar training simulators for the US Navy Submarine force requires stable systems which should operate without major maintenance / operational changes for many years. Knowing the RedHat will support the enterprise line for 5 years is a big selling point. The cluster management portion of the software stack would be great to have integrated in to the product, but if third party vendors (Linux Networx, OSCAR, Rocks, etc...) can provide the cluster management portion on top of the distribution, a solution can be found. In some ways this is even better since your cluster management decision is independent of the OS vendor. I basically just want to make the point that the cluster space is filled with people of many different needs. Will everyone want RHEL? My guess is a resounding NO. (In the days of RH7.3 you could almost say Yes.) But, there are situations in which a stable, supported product is needed. This is the market RedHat is trying to target and states so pretty clearly ("Enterprise"). Small users and research systems get somewhat left out in the cold, but we probably shouldn't complain after having a free ride for the last 5+ years. So, is 6% unreasonable? Vann > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eemoore at fyndo.com Fri Oct 31 14:52:55 2003 From: eemoore at fyndo.com (Dr Eric Edward Moore) Date: Fri, 31 Oct 2003 19:52:55 +0000 Subject: opteron VS Itanium 2 In-Reply-To: (Mark Hahn's message of "Thu, 30 Oct 2003 12:45:20 -0500 (EST)") References: Message-ID: <87he1pfalk.fsf@azathoth.fyndo.com> Mark Hahn writes: > there's a characteristic "shape" to spec results - which scores are > high and low relative to the other scores for a single machine. not only > does this include outliers (drastic cache or compiler effects), but > points at strengths/weaknesses of particular architectures. how to do this, > perhaps some kind of factor analysis? Well, being bored, I tried factor analysis on the average results for the submitted specfp benchmarks at http://www.specbench.org/ The 5 factors with the largest eigenvaslues are: Eigenvalue: 0.314116 0.353034 0.799331 1.432038 10.614996 2.22% 2.25% 5.70% 10.22% 75.82% 168.wupwise -0.4134913 0.0241240 -0.1437086 -0.2757206 0.2715672 171.swim 0.0245451 0.0965325 0.3495143 0.1209393 0.2783842 172.mgrid 0.1122617 0.1365769 0.3273285 0.1332301 0.2839204 173.applu 0.0299056 0.0439954 0.4163242 0.1913496 0.2725619 177.mesa 0.4791260 0.4190313 -0.0949648 -0.3785996 0.2448368 178.galgel -0.0489231 -0.5404192 -0.2464610 0.2391370 0.2648068 179.art 0.0646181 0.5095081 -0.4736362 0.6508958 0.1054875 183.equake -0.5560255 0.0841426 0.0214064 0.1615493 0.2794066 187.facerec -0.0402649 0.0446221 -0.2628912 -0.0557252 0.2897607 188.ammp 0.3993861 -0.3404615 -0.1456043 0.0359475 0.2832809 189.lucas -0.2380202 0.0908976 0.0801927 -0.2140971 0.2842518 191.fma3d -0.0326577 0.1661895 -0.1149762 -0.3148501 0.2774768 200.sixtrack 0.1950678 -0.1574121 0.2852895 0.2008475 0.2741305 301.apsi 0.1128198 -0.2379642 -0.3013536 -0.1224494 0.2782804 Pretty much all the specfp tests correlate with each other pretty well, except for 179.art, which correlates... poorly with the others (it's correlation with 177.mesa is just 0.03). So most of the variation in the results is some sort of "raw speed" number, which has near-equal weightings of all the tests besides 179.art. Next most important is whatever makes art so different from all the others (maybe it's a persistent cache-misser, or maybe it's just the easiest for vendors to tweak). Not entirely sure what to make of the others. There does seem to be some commonality between 171.swim 172.mgrid 173.applu and 200.sixtrack in the third biggest factor (plus a lot of whatever art isn't) that could be important. The next two seem to mostly have something to do with whatever makes 177.mesa special. This is presumably all useless, but someone might be entertained :) > regards, mark hahn. -- Eric E. Moore _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Fri Oct 31 16:38:52 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Fri, 31 Oct 2003 18:38:52 -0300 (ART) Subject: sum of matrices Message-ID: <20031031213852.87539.qmail@web12206.mail.yahoo.com> Hi, Last days I write a code(in c) that make the sum of 2 matrices. Let me say a little about how it works. I send 1 row of the 1st matrice and 1 row of 2nd matrice for each process, when a process finish its job, if have more lines i send more to it and it make the sum of these new 2 lines. The problem is, the program works fine with 100x100(or less) matrice, but when I increase this range, something like 10000x10000 i receive the fallowing message: p0_8467: p4_error: Child process exited while making connection to remote process on node2: 0 This is a MPI problem or it`s my code? What can I do to fix this problem. ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - o melhor webmail do Brasil http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Fri Oct 31 15:52:12 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Fri, 31 Oct 2003 12:52:12 -0800 (PST) Subject: opteron VS Itanium 2 In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com> Message-ID: On Fri, 31 Oct 2003, Greg Lindahl wrote: > On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote: > > This would place the Big Mac in the 3rd place on the > > top500 list > > Except that there are several other new large clusters that will > likely place higher -- LANL announced a 2,048 cpu Opteron cluster a > while back, and LLNL has something new, too, I think. Comparing > yourself to the obsolete list in multiple press releases isn't very > clever. I thought that the 3rd place was in the new preliminary top500 list that included all the big machines that will be there when the official list comes out. But there's been so much poor and conflicting information about Big Mac who knows? I'd like to know how much they payed for the infiniband hardware. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Fri Oct 31 16:14:35 2003 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Fri, 31 Oct 2003 15:14:35 -0600 Subject: opteron VS Itanium 2 In-Reply-To: References: Message-ID: On Fri, 31 Oct 2003, Trent Piepho wrote: > I thought that the 3rd place was in the new preliminary top500 list that > included all the big machines that will be there when the official list > comes out. But there's been so much poor and conflicting information > about Big Mac who knows? I'd like to know how much they payed for the > infiniband hardware. Yeah, me too. As someone who just ponied up for a rather large IB installation, I'm not sure that most people realize what a substantial percentage of the cost of the cluster the IB might be. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From weideng at uiuc.edu Fri Oct 31 15:37:45 2003 From: weideng at uiuc.edu (Wei Deng) Date: Fri, 31 Oct 2003 14:37:45 -0600 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <20031031203745.GU1408@aminor.cs.uiuc.edu> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: > - OSCAR / Rocks / etc... - generally installed on top of another > distribution. We still have to pick a base distribution. >From what I heard from Rocks mailing list, they will release 3.1.0 the next Month, which will be based on RHEL 3.0, compiled from source code that is publicly available, and free of charge. Even though Rocks is based on RedHat distribution, it is complete, which means you only need to download Rocks ISOs to accomplish your installation. -- Wei Deng Pablo Research Group Department of Computer Science University of Illinois 217-333-9052 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Fri Oct 31 16:17:35 2003 From: josip at lanl.gov (Josip Loncaric) Date: Fri, 31 Oct 2003 14:17:35 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <3FA2D16F.4030807@lanl.gov> Vann H. Walke wrote: > On Fri, 2003-10-31 at 12:38, John Hearns wrote: >>On Fri, 31 Oct 2003, Robert G. Brown wrote: >> >>>It is also very interesting to me that RH (for example) thinks that >>>there is something that they are going to provide that is worth e.g. >>>several hundred thousand dollars in the case of a 1000+ node cluster >>>running their "workstation" product. Fifty dollars certainly. Five >>>hundred dollars maybe. A thousand dollars possibly, but only if they >>>come up with a cluster-specific installation with some actual added >>>value. >> >>I'll second that. > > Hmm... Let's take the case of a 1000 node system. If we assume a > $3000/node cost (probably low once rack, UPS, hardware support, and > interconnect are added in), we arrive at an approximate hardware cost of > $3,000,000. If we were to use the RHEL WS list price of $179/node, we > get $179,000 or about 6% of the hardware cost. That is assuming RedHat > will not provide any discount on large volume purchases (unlikely). Is > 6% unreasonable? These days, one seldom builds 1000 node systems out of basic x86 boxes. Consider a 1024 node AMD64 system instead: The list price on RHEL WS Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. This is unlikely to create any sales. RH should be paid for the valuable service they provide (patch streams etc.) but this is not worth $811K to builders of large clusters. There are other good alternatives, most of them *MUCH* cheaper. I fully agree with RGB that RH needs to announce a sensible pricing structure for clusters in order to participate in this market. Would a single system image (BProc) cluster constructed by recompiling the kernel w/BProc patches fit RH's legal definition of a single "installed system" and a single "platform"? If so, $792 for a 1024-node cluster would be quite acceptable... Sincerely, Josip _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Fri Oct 31 17:38:50 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 31 Oct 2003 17:38:50 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA2D16F.4030807@lanl.gov> References: <1067629499.21719.73.camel@localhost.localdomain> <3FA2D16F.4030807@lanl.gov> Message-ID: <1067639930.26872.1.camel@squash.scalableinformatics.com> On Fri, 2003-10-31 at 16:17, Josip Loncaric wrote: > These days, one seldom builds 1000 node systems out of basic x86 boxes. > Consider a 1024 node AMD64 system instead: The list price on RHEL WS > Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. > This is unlikely to create any sales. SUSE AMD64 version of 9.0 is something like $120. It was somewhat more stable for my tests than the RH beta (GinGin64). I hope that RH will arrange for similar pricing. -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Oct 31 19:00:30 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Fri, 31 Oct 2003 16:00:30 -0800 (PST) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA2E6FD.6050107@scali.com> Message-ID: On Fri, 31 Oct 2003, Steffen Persvold wrote: > Josip Loncaric wrote: > > > > > > These days, one seldom builds 1000 node systems out of basic x86 boxes. > > Consider a 1024 node AMD64 system instead: The list price on RHEL WS > > Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. > > This is unlikely to create any sales. so download the source build and call you distro something other than redhat enterprise linux... or use debian... or cope. > > RH should be paid for the valuable service they provide (patch streams > > etc.) but this is not worth $811K to builders of large clusters. There > > are other good alternatives, most of them *MUCH* cheaper. I fully agree > > with RGB that RH needs to announce a sensible pricing structure for > > clusters in order to participate in this market. so don't use redhat. > Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't > claim support from RH for more than one of the systems. read the liscsense agreement for you redhat enterprise disks... > Regards, > Steffen > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Oct 31 18:43:42 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 31 Oct 2003 15:43:42 -0800 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <20031031234342.GC3744@greglaptop.internal.keyresearch.com> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: > So, is 6% unreasonable? For just the base OS? Yes. The market-place has spoken very loudly about that, especially people building large machines. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sp at scali.com Fri Oct 31 17:49:33 2003 From: sp at scali.com (Steffen Persvold) Date: Fri, 31 Oct 2003 23:49:33 +0100 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA2D16F.4030807@lanl.gov> References: <1067629499.21719.73.camel@localhost.localdomain> <3FA2D16F.4030807@lanl.gov> Message-ID: <3FA2E6FD.6050107@scali.com> Josip Loncaric wrote: > > > These days, one seldom builds 1000 node systems out of basic x86 boxes. > Consider a 1024 node AMD64 system instead: The list price on RHEL WS > Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. > This is unlikely to create any sales. > > RH should be paid for the valuable service they provide (patch streams > etc.) but this is not worth $811K to builders of large clusters. There > are other good alternatives, most of them *MUCH* cheaper. I fully agree > with RGB that RH needs to announce a sensible pricing structure for > clusters in order to participate in this market. Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't claim support from RH for more than one of the systems. Regards, Steffen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tod at gust.sr.unh.edu Fri Oct 31 18:59:16 2003 From: tod at gust.sr.unh.edu (Tod Hagan) Date: 31 Oct 2003 18:59:16 -0500 Subject: Cluster Poll Results (tangent into OS choices, Fedora and Debian) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <1067644757.5702.219.camel@haze.sr.unh.edu> On Fri, 2003-10-31 at 14:44, Vann H. Walke wrote: > What are the alternatives? > [snip] > - Fedora - Planned releases 2-3 times a year. So, if I build a system > on the Fedora release scheduled this Monday, who will be providing > security patches for it 2 years from now (after 4-6 new releases have > been dropped). My guess is no-one. Again, we're in the do it yourself > maintenance or frequent OS upgrade mode. > [snip] > - Debian - Could be a good option, but to some extent you end up in the > same position as Fedora. How often do the releases come out. Who > supports the old releases? What hardware / software will work on the > platform? If Fedora achieves 2-3 upgrades per year then it will be fairly different from Debian, which seems to be at 2-3 years per upgrade these days, (well almost). After a new release comes out Debian supports the old one for a period of time (12 months?) with security updates before pulling the plug. Debian can be upgraded in place as opposed to requiring a full resinstall; while this is great for desktops and servers, I'm not sure if this is important for a cluster. As a result of the extended release cycle Debian stable tends to lack support for the newest hardware (Opteron 64-bit, for example). This is why Knoppix, which is based on Debian, isn't derived from Debian stable, but rather from packages in the newer releases (testing, unstable and experimental). But the flip side is that the stable release, while dated, tends to work well as it's had a lot of testing. Debian could probably use more recognition as a target platform by commercial software vendors but it incorporates a huge number of packages including many open source applications pertinent to science. Breadth in packaged applications is probably more important for workstations since clusters tend to use small numbers of apps very intensely. As a distribution Debian is more oriented towards servers than the desktop (to the point that frustrated users have spawned the "Debian Desktop" subproject). It seems to me that clusters have more in common with servers than with desktops so that Debian's deliberate release rate is a better match for the cluster environment than distros which release often in order to incorporate the latest GUI improvements. P.S. While looking into the number of packages in Debian vs. Fedora I stumbled across this frightening bit (gotta throw a Halloween reference in somewhere) on the Fedora site: http://fedora.redhat.com/participate/terminology.html > Packages in Fedora Extras should avoid conflicts with other packages > in Fedora Extras to the fullest extent possible. Packages in Fedora > Extras must not conflict with packages in Fedora Core. It seems that Fedora intends to achieve applications breadth through "Fedora Extras" package sets in other repositories, but the prohibition of conflicts between Extras packages isn't as strong as the absolute prohibition of conflicts between Extras and Core packages. Could this result in a new era of DLL hell a few years down the road? Wow, I guess I just slung some FUD at Fedora, but maintaining a 2-3 releases per year rate probably requires a small core, putting the bulk of applications into the Extras category and thus increasing the chance of conflict. (Wasn't that the original recipe for DLL hell?) Debian has avoided this through a much larger core, which of course slows the release cycle. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Fri Oct 31 17:37:36 2003 From: ctierney at hpti.com (Craig Tierney) Date: 31 Oct 2003 15:37:36 -0700 Subject: sum of matrices In-Reply-To: <20031031213852.87539.qmail@web12206.mail.yahoo.com> References: <20031031213852.87539.qmail@web12206.mail.yahoo.com> Message-ID: <1067639856.6209.211.camel@hpti10.fsl.noaa.gov> On Fri, 2003-10-31 at 14:38, Mathias Brito wrote: > Hi, > > Last days I write a code(in c) that make the sum of 2 > matrices. Let me say a little about how it works. I > send 1 row of the 1st matrice and 1 row of 2nd matrice > for each process, when a process finish its job, if > have more lines i send more to it and it make the sum > of these new 2 lines. The problem is, the program > works fine with 100x100(or less) matrice, but when I > increase this range, something like 10000x10000 i > receive the fallowing message: > > p0_8467: p4_error: Child process exited while making > connection to remote process on node2: 0 > > This is a MPI problem or it`s my code? What can I do > to fix this problem. It is probably your code. Are you allocating the matrix statically or dynamically? Try increasing the stack size on your node(s). Craig > > > ===== > Mathias Brito > Universidade Estadual de Santa Cruz - UESC > Departamento de Ci?ncias Exatas e Tecnol?gicas > Estudante do Curso de Ci?ncia da Computa??o > > Yahoo! Mail - o melhor webmail do Brasil > http://mail.yahoo.com.br > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sp at scali.com Fri Oct 31 19:52:23 2003 From: sp at scali.com (Steffen Persvold) Date: Sat, 01 Nov 2003 01:52:23 +0100 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: References: Message-ID: <3FA303C7.8050600@scali.com> Joel Jaeggli wrote: > On Fri, 31 Oct 2003, Steffen Persvold wrote: [] > >>Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't >>claim support from RH for more than one of the systems. > > > read the liscsense agreement for you redhat enterprise disks... > Well the EULA doesn't say anything about having to pay $792 for each node in a cluster (actually it doesn't mention paying license fee's at all). The only relevant stuff I can find is item 2, "Intellectual Property Rights" : "If Customer makes a commercial redistribution of the Software, unless a separate agreement with Red Hat is executed or other permission granted, then Customer must modify the files identified as REDHAT-LOGOS and anaconda-image to remove all images containing the Red Hat trademark or the Shadowman logo. Merely deleting these files may corrupt the Software." And I wouldn't say that installing on your cluster nodes is "making a commercial redistribution" would you ? Or have I missed something fundamental ? Regards, Steffen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Andrew.Cannon at nnc.co.uk Wed Oct 1 04:12:32 2003 From: Andrew.Cannon at nnc.co.uk (Cannon, Andrew) Date: Wed, 1 Oct 2003 09:12:32 +0100 Subject: RH8 vs RH9 Message-ID: Hi All, We have a small test cluster running RH8 which seems to work well. We are going to expand this cluster and I was wondering what, if any, are the advantages of installing the cluster using RH9 instead of RH8? Are there any disadvantages? Thanks Andrew Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, Cheshire, WA16 8QZ. Telephone; +44 (0) 1565 843768 email: mailto:andrew.cannon at nnc.co.uk NNC website: http://www.nnc.co.uk NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856). The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ. This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leopold.palomo at upc.es Wed Oct 1 04:21:59 2003 From: leopold.palomo at upc.es (Leopold Palomo Avellaneda) Date: Wed, 1 Oct 2003 10:21:59 +0200 Subject: Environment monitoring In-Reply-To: References: Message-ID: <200310011001.31106.lepalom@vilma.upc.es> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > Dont overlook lm_sensors+cron > Why? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leopold.palomo at upc.es Wed Oct 1 04:24:21 2003 From: leopold.palomo at upc.es (Leopold Palomo Avellaneda) Date: Wed, 1 Oct 2003 10:24:21 +0200 Subject: Environment monitoring In-Reply-To: References: Message-ID: <200310011001.31106.lepalom@vilma.upc.es> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > Dont overlook lm_sensors+cron > Why? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 1 08:24:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 1 Oct 2003 08:24:13 -0400 (EDT) Subject: RH8 vs RH9 In-Reply-To: Message-ID: On Wed, 1 Oct 2003, Cannon, Andrew wrote: > Hi All, > > We have a small test cluster running RH8 which seems to work well. We are > going to expand this cluster and I was wondering what, if any, are the > advantages of installing the cluster using RH9 instead of RH8? Are there any > disadvantages? Many humans wonder about that, given the very short time that RH8 was around before RH9 came out. The usual rule is that major number upgrades are associated with changes in core libraries that break binary compatibility, so that binaries built for RH 8 are not guaranteed to work for RH 9. I think that the easiest way for you to determine precisely what changed is to look at e.g. ftp://ftp.dulug.duke.edu/pub/redhat/linux/9/en/os/i386/RELEASE-NOTES and see if anything in there is important to your work. Beyond that, there are a few issues to consider: a) 8 will, probably fairly soon, be no longer maintained. 9 will be, at least for a while (possibly for one more year). Of course the maintenance issue right now is very cloudy for RH in general with the Fedora/RHEL situation a work in progress. However, maintenance alone is (in my opinion) a good reason to be using 9 and to move from 8 to 9 to achieve it. Fedora will likely be strongly derived from 9 and the current rawhide in any event. How the "community based" RH release will end up being maintained is the interesting question. One possibility is "as rapidly as RHEL plus a few days", the difference being the time required to download the GPL-required logo-free source rpm(s) after an update and rebuild them and insert them into the community version. Or of course you can spring for a RHEL license (set) for your cluster, which may or may not be reasonable in cost or scale well per node by the time all the University-price dickering is done. b) 9 had some fairly significant library upgrades, service upgrades, and bug fixes. That doesn't mean 8 is "bad" -- it just means that your chances of encountering trouble with 9 are in principle smaller than with 8, and one hopes that the upgrades added a bit to performance as well. c) A lot of the enhancements in 9 were more useful or relevant to userspace and LAN client operation (CUPS or Open Office, for example) than they were to cluster nodes. So in that sense perhaps it doesn't matter as much. We're using 9 on a bunch of hosts and nodes with happiness. We're also using 7.3 (still) on a bunch of hosts and nodes with happiness. We skipped 8 only because they released 9 before we finished creating a stable/tested 8 repository as RH changed their release cycle and dropped the .0, .1 and so forth "correction" releases. I don't know that we'll ever use RHEL with happiness unless RH charges something like $1 per system as their university price (which isn't insane, actually, given that an entire university can install and maintain, as Duke does, off of a single campus-local repository largely run by and debugged by and maintained by campus administrators, so RH's costs don't scale at all strongly with the number of internal campus RH systems). Fedora, quite possibly, but as noted we are fearful, uncertain, and doubtful at the moment, for once because of real issues and not just as a sort of Microsoft joke... rgb > > Thanks > > Andrew > > Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, > Cheshire, WA16 8QZ. > > Telephone; +44 (0) 1565 843768 > email: mailto:andrew.cannon at nnc.co.uk > NNC website: http://www.nnc.co.uk > > > > NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856). The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ. > > This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lepalom at vilma.upc.es Wed Oct 1 04:01:30 2003 From: lepalom at vilma.upc.es (Leopold Palomo Avellaneda) Date: Wed, 1 Oct 2003 10:01:30 +0200 Subject: Environment monitoring In-Reply-To: References: Message-ID: <200310011001.31106.lepalom@vilma.upc.es> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > Dont overlook lm_sensors+cron > Why? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From thornton at yoyoweb.com Wed Oct 1 08:34:40 2003 From: thornton at yoyoweb.com (Thornton Prime) Date: Wed, 01 Oct 2003 05:34:40 -0700 Subject: RH8 vs RH9 In-Reply-To: References: Message-ID: <1065011679.1923.16.camel@localhost.localdomain> > We have a small test cluster running RH8 which seems to work well. We are > going to expand this cluster and I was wondering what, if any, are the > advantages of installing the cluster using RH9 instead of RH8? Are there any > disadvantages? You should check out the release notes. On the whole, I'd say there isn't much advantage unless you can take advantage of NTPL. Most of the other enhancements were primarily for desktop users. The next release should be 2.6-kernel ready, so rather than 9 you may consider experimenting with Severn or Taroon. Taroon has much better support for 64-bit platforms, if you are headed there. thornton _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 1 08:37:44 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 1 Oct 2003 08:37:44 -0400 (EDT) Subject: Environment monitoring In-Reply-To: <200310011001.31106.lepalom@vilma.upc.es> Message-ID: On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote: > A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > > Dont overlook lm_sensors+cron > > > Why? On a system equipped with an internal sensor, lm_sensors can often read e.g. core CPU temperature on the system itself. A polling cron script can then read this and take action, e.g. initiate a shutdown if it exceeds some threshold. There are good and bad things about this. A good thing is it addreses the real problem -- overheating in the system itself -- and not room temperature. CPU's can overheat because of a fan failure when the room remains cold, and a sensors-driven poweroff can then save your hardware on a node by node basis. The bad thing is that it does NOT give you any sort of measure of room temperature per se, although if you have the poweroff script send you mail first, getting deluged with N messages as the entire cluster shuts down would be a good clue that your room cooling failed:-). Also, lm_sensors has the API from hell. In fact, I would hardly call it an API. One has to pretty much craft a polling script on the basis of each supported sensor independently, which requires you to know WAY more than you ever wanted to about the particular sensor your system may or may not have. Alas, if only somebody would give the lm_sensors folks a copy of a good book on XML for christmas, and they decided to take the monumental step of converting /proc/sensors into a single xml-based file with the RELEVANT information presented in toplevel tags like 50.4 and the irrelevant information presented in tags like lm781.22a then we could ALL reap the fruits of their labor without needing a copy of the lm78 version 1.22a API manual and having to write an application that supports each of the sensors THROUGH THEIR INTERFACE one at a time...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Wed Oct 1 09:46:25 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Wed, 1 Oct 2003 08:46:25 -0500 (CDT) Subject: Environment monitoring In-Reply-To: Message-ID: On Wed, 1 Oct 2003, Robert G. Brown wrote: > Alas, if only somebody would give the lm_sensors folks a copy of a good > book on XML for christmas, and they decided to take the monumental step > of converting /proc/sensors into a single xml-based file with the > RELEVANT information presented in toplevel tags like > > 50.4 > > and the irrelevant information presented in tags like > > lm781.22a > > then we could ALL reap the fruits of their labor without needing a copy > of the lm78 version 1.22a API manual and having to write an application > that supports each of the sensors THROUGH THEIR INTERFACE one at a > time...;-) We have that. lm_sensors+cron+gmond. Nice little XML stream on every node with every other nodes temps. One can keep a range of tolerance for cpu0, cpu1, motherboard, and disk temps and shutdown whenever you need to. a netbotz would be cooler though. i'd still use the lm_sensors+cron+gmond and still have the netbotz as a toy..:) -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lepalom at upc.es Wed Oct 1 10:13:46 2003 From: lepalom at upc.es (Leopold Palomo) Date: Wed, 1 Oct 2003 16:13:46 +0200 Subject: Environment monitoring In-Reply-To: References: Message-ID: <200310011613.46297.lepalom@upc.es> A Dimecres 01 Octubre 2003 14:37, Robert G. Brown va escriure: > On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote: > > A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > > > Dont overlook lm_sensors+cron > > > > Why? > > On a system equipped with an internal sensor, lm_sensors can often read > e.g. core CPU temperature on the system itself. A polling cron script > can then read this and take action, e.g. initiate a shutdown if it > exceeds some threshold. > > There are good and bad things about this. A good thing is it addreses > the real problem -- overheating in the system itself -- and not room > temperature. CPU's can overheat because of a fan failure when the room > remains cold, and a sensors-driven poweroff can then save your hardware > on a node by node basis. > > The bad thing is that it does NOT give you any sort of measure of room > temperature per se, although if you have the poweroff script send you > mail first, getting deluged with N messages as the entire cluster shuts > down would be a good clue that your room cooling failed:-). Also, > lm_sensors has the API from hell. In fact, I would hardly call it an > API. One has to pretty much craft a polling script on the basis of each > supported sensor independently, which requires you to know WAY more than > you ever wanted to about the particular sensor your system may or may > not have. > > Alas, if only somebody would give the lm_sensors folks a copy of a good > book on XML for christmas, and they decided to take the monumental step > of converting /proc/sensors into a single xml-based file with the > RELEVANT information presented in toplevel tags like > > 50.4 > > and the irrelevant information presented in tags like > > lm781.22a > > then we could ALL reap the fruits of their labor without needing a copy > of the lm78 version 1.22a API manual and having to write an application > that supports each of the sensors THROUGH THEIR INTERFACE one at a > time...;-) Ok. I was a bit surprise about your sentence. I know that lmsensors is not perfect, but it does their job. Ok, I don't think that use lm_sensors to try to calculate the T of the room is a bit excesive. About the xml,... well, ok, it would be a nice feature, but as plain text, knowing your hardware it's so good, too. Best Regards. Pd How about the pdf, ps, etc? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 1 10:33:29 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 1 Oct 2003 10:33:29 -0400 (EDT) Subject: Environment monitoring In-Reply-To: <200310011613.46297.lepalom@upc.es> Message-ID: On Wed, 1 Oct 2003, Leopold Palomo wrote: > Ok. I was a bit surprise about your sentence. I know that lmsensors is not > perfect, but it does their job. Ok, I don't think that use lm_sensors to try > to calculate the T of the room is a bit excesive. > > About the xml,... well, ok, it would be a nice feature, but as plain text, > knowing your hardware it's so good, too. > Sorry, I tend to get distracted and rant from time to time (even though as Greg noted, sometimes the rants are of lesser quality:-). In this particular case the rant is really directed to all of /proc, but the sensors interface is the worst example of the lot. I'm "entitled" to rant because I've written two tools (procstatd and xmlsysd) that parse all sorts of data, including sensors data in procstatd, out and provide it to clients for monitoring purposes. Even my daemon wasn't the first to do this, but I think it was one of the first two that functioned as a binary without running a shell script or the like on each node. procstatd actually predated ganglia by a fair bit, FWIW. On the basis of this fairly extensive experience I can say that lmsensors output is very poorly organized from the perspective of somebody trying to write a general purpose parser to extract the data it provides. In particular, it uses a directory tree structure where the PARTICULAR sensors interface that you have appears as part of the path, and where what you find underneath that path depends on the particular sensor that you've got as well. Hopefully it is obvious how Evil this makes it from the point of view of somebody trying to write a general purpose tool to parse it. Basically, to write such a tool one has to go through the lmsensors sources and reverse engineer each interface it supports to determine what is produced and where, one at a time. This is more than slightly nuts. What do "most" sensors provide? Fields like cpu temperature (for cpu's 0-N), fan speed (for fans 0-N), core voltage (for lines 0-N). Sure, some provide more, some provide less, but what are we discussing? The monitoring of cpu temperature, under the reasonable assumption that either we have a sensor that provides it or we don't, and that we really don't give a rodent's furry touchis WHICH sensor we have as long as it gives us "CPU Temperature", preferrably for every CPU. So a good API is one that has a single file entitled /proc/sensors, and in that file one finds things like: 54.2 51.7 lm78 ... ... I can write code to parse this in a few minutes of work, literally, and the same code will work for all interfaces that lm_sensors might support, and I don't need to know the interface the system has in it beforehand (although with the knowledge I might add some advanced features if it supports them). Presenting the knowledge is also trivial -- a web interface might be as sparse as a reader/parser and/or a DTD. Compare to parsing something like (IIRC) /proc/sensors/device-with-a-bunch-of-numbers/subunit/field where the path that you find under specific devices-with-numbers depends on the toplevel value on a device by device basis and the contents of field can as well. Yech. And Rocky, hiding the problem with gmond is fine, but then it puts the burden for writing an API for the API on the poor people that have to support the gmond interface. Yes they can (and I could) do this. I personally refuse. They obviously have gritted their teeth and done so. The correct solution is clearly to redo the lm_sensors interface itself so that it is organized as the above indicates. Which criticism, by the way, applies to a LOT of /proc, which currently looks like it was organized by a bunch of wild individualists who have handled every emergent subfield by overloading its data in a single "field" line, usually with documentation only in the form of reading procps or kernel source. Just because this is actually true doesn't excuse it. Parsing the contents of /proc is maddening for just this reason, and the cost is a lot of needless complexity, pointless bugs and upgrade incompatibilities for many people. Putting the data into xml-wrapped form would be a valuable exercise in the discipline of structuring data, for the most part. rgb > Best Regards. > > Pd How about the pdf, ps, etc? I'll try to work on this as soon as I can. My task list for the day looks something like a) debug/fix some dead nodes; b) add a requested feature/view to wulfstat (that has been on hold for a week or more:-(, c) work on a bunch of documents associated with teaching and curriculum at Duke (sigh); d) about eight more tasks, none of which I will likely get to, including work on my research. However, this is about the third or fourth time people have requested a "fix" for the ps/pdf/font issue (with acroread it can even fail altogether to read the document -- presumably some gs/acrobat incompatibility where I use gs-derived tools) so I'll try very hard to craft some sort of fix by the weekend. -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Oct 1 12:36:26 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 1 Oct 2003 12:36:26 -0400 (EDT) Subject: Environment monitoring In-Reply-To: Message-ID: On Wed, 1 Oct 2003, Rocky McGaugh wrote: > On Wed, 1 Oct 2003, Robert G. Brown wrote: > > Alas, if only somebody would give the lm_sensors folks a copy of a good > > book on XML for christmas, and they decided to take the monumental step ... > > then we could ALL reap the fruits of their labor without needing a copy > > of the lm78 version 1.22a API manual and having to write an application > > that supports each of the sensors THROUGH THEIR INTERFACE one at a > > time...;-) > > We have that. lm_sensors+cron+gmond. I think you missed RGB's point. The lm_sensors implementation sucks. Sure, any one specific implementation can be justified. But having each implementation use a different output and calibration shows that this is not an architecture, just a collection of hacks. The usual reply at this point is "just update the user-level script for the new motherboard type". Yup... and you should probably update the constants in your programs' delay loops at the same time. With lm_sensors you can get a one-off hack working, but cannot implement a general case. Compare this to IPMI, which presents the same information. IPMI has a crufty design and ugly implementations, but it is an architected system. With care you can implement and deploy code that works on a broad range of current and future machine. While I'm on the soapbox, gmond deserves its own mini-butane-torch flame. I implemented the translator from Beostat (our status/statistics subsystem) to gmond (per-machine information for Ganglia), so I have a pretty good side-by-side comparison. First, how did they choose what statistics to present? Apparently just because the numbers were there. What is the point of using a XML DTD if it is just used to package undefined data types? A wrapper around a wrapper... Example metric lines: Not only are these metric types not enumerated, they are made more confusing by abbreviations and no definition. To tie both together: What is "proc_total"? Number of processors? Number of processes? Does it count system daemons? It seems to be the useless number "ps x | wc", rather than the number of end user, application processes. Many statistics are only usable when used/presented as a set. Why split the numbers into multiple elements? It just multiplies the size and parsing load. ____ Background: Beostat is our status/statistics interface that we published 3+ years ago. It exports interfaces at multiple levels: network protocol, shared memory table only for very performance sensitive programs, such as schedulers dynamic library the preferred interface for programs command output Thus Beostat is a infrastructure subsystem, rather than a single-purpose stack of programs. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From johnb at quadrics.com Wed Oct 1 09:59:16 2003 From: johnb at quadrics.com (John Brookes) Date: Wed, 1 Oct 2003 14:59:16 +0100 Subject: Upper bound on no. of sockets Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com> I think there is a 1k per-process limit on open sockets. It's tuneable in 2.4 kernels, IIRC, but I don't remember how (off the top of my head). 'ulimit -n' adjusts the max number of fd's, but I'm not sure that'll take it past a/the kernel limit. Maybe recompile kernel? Maybe poke /proc/sys/.../...? Maybe adjust in userland? Maybe use fewer sockets ;-) Does anybody know the score? Cheers, John Brookes Quadrics > -----Original Message----- > From: Balaji Rangasamy [mailto:br66 at HPCL.CSE.MsState.Edu] > Sent: 30 September 2003 05:44 > To: beowulf at beowulf.org > Subject: Upper bound on no. of sockets > > > Hi, > Is there an upper bound on the number of sockets that can be > created by a > process? If there is one, is the limitation enforced by OS? > And what other > factors does it depend on? Can you please be specific on the > numbers for > different OS (RH Linux 7.2) ? > Thank you very much, > Balaji. > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rokrau at yahoo.com Wed Oct 1 13:06:40 2003 From: rokrau at yahoo.com (Roland Krause) Date: Wed, 1 Oct 2003 10:06:40 -0700 (PDT) Subject: RH8 vs RH9 (Robert G. Brown) In-Reply-To: <200310011504.h91F4DY02889@NewBlue.Scyld.com> Message-ID: <20031001170640.91750.qmail@web40002.mail.yahoo.com> --- beowulf-request at scyld.com wrote: > 6. Re:RH8 vs RH9 (Robert G. Brown) > From: "Robert G. Brown" > Many humans wonder about that, given the very short time that RH8 was > around before RH9 came out. The usual rule is that major number > upgrades are associated with changes in core libraries that break > binary > compatibility, so that binaries built for RH 8 are not guaranteed to > work for RH 9. Indeed some of them wont, I have first hand experience that binaries produced with the Intel Fortran compiler on RH-8, even when statically linked, will not run on a RH-9 system. Further, if you need the Intel Fortan compiler, RH-9 is not really an option for you because it is not officially supported and it will not be either. Inofficially I can confirm that it works fine if you are not using the OpenMP capabilities of the compiler. > achieve it. Fedora will likely be strongly derived from 9 and the > current rawhide in any event. How the "community based" RH release > will > end up being maintained is the interesting question. One possibility > is > "as rapidly as RHEL plus a few days", the difference being the time > required to download the GPL-required logo-free source rpm(s) after > an > update and rebuild them and insert them into the community version. Having used fedora in the past on a desktop client I am hopeful that it will be possible to get all necessary packages for a cluster into an 'aptable' repository, be it hosted by fedora or somewhere else (think e.g. sourceforge). If people work together, as they have in the past, I dont see why RH would succeed pushing their rediculous price policies upon cluster users. Roland __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Wed Oct 1 18:10:14 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed, 1 Oct 2003 15:10:14 -0700 Subject: Environment monitoring In-Reply-To: References: Message-ID: <20031001221014.GA28394@sphere.math.ucdavis.edu> I'd recommend: http://www.maxim-ic.com/quick_view2.cfm/qv_pk/2820 For $25.00 you have a trivial to interface to temperature probe that even is smart enough to collect samples even if a machine is down (complete with time stamp). It will even build a histogram of temp samples for you. It's kinda cool that you can leave one in your luggage or send it up in a space probe and then get periodic samples when you arrive at your destination. In anyways people use them for all kinds of things, even in space: http://www.voiceofidaho.org/tvnsp/01atchrn.htm More info: http://www.ibutton.com/ibuttons/thermochron.html They can also be connected via USB, Parallel, and serial. The other cool feature is they are chainable, so we have one behind the machine (i.e. rack temp), one on top of the rack (room temp), and one at the airconditioner output all on one wire. Each button has a guarenteed unique 64 bit ID. Once you get a feel for the dynamics of the system it becomes really easy to spot anomalies. Recommended, the thermo buttons are cheaper, but IMO for most things the thermocron premium is worth it so you can have continuous sampling even if a machine crashes. The logs are very handy for fighting when facilities to combat the well it's not really getting that hot that often kinda thing. Oh, I guess I should mention I have no financial ties to any of the mentioned companies. So no I won't sell you one. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Wed Oct 1 18:36:35 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 01 Oct 2003 15:36:35 -0700 Subject: more on structural models for clusters Message-ID: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> In regards to my recent post looking for cluster implementations for structural dynamic models, I would like to add that I'm interested in "highly distributed" solutions where the computational load for each processor is very, very low, as opposed to fairly conventional (and widely available) schemes for replacing the Cray with a N-node cluster. The number of processors would be comparable to the number of structural nodes (to a first order of magnitude) Imagine you had something like a geodesic dome with a microprocessor at each vertex that wanted to compute the loads for that vertex, communicating only with the adjacent vertices... Trivial, egregiously simplified, and demo cases are just fine, and, in fact, probably preferable.... James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Wed Oct 1 19:19:26 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed, 1 Oct 2003 16:19:26 -0700 Subject: RH8 vs RH9 (Robert G. Brown) In-Reply-To: <20031001170640.91750.qmail@web40002.mail.yahoo.com> References: <200310011504.h91F4DY02889@NewBlue.Scyld.com> <20031001170640.91750.qmail@web40002.mail.yahoo.com> Message-ID: <20031001231926.GA2900@greglaptop.internal.keyresearch.com> On Wed, Oct 01, 2003 at 10:06:40AM -0700, Roland Krause wrote: > Inofficially I can confirm that it works fine if you are not using > the OpenMP capabilities of the compiler. Which is no surprise, as the thread library stuff changed fairly radically in RedHat 9. I have some sympathy for Intel's compiler guys on that issue. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 1 19:01:55 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 2 Oct 2003 09:01:55 +1000 Subject: RH8 vs RH9 In-Reply-To: References: Message-ID: <200310020901.57000.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 1 Oct 2003 10:24 pm, Robert G. Brown wrote: > a) 8 will, probably fairly soon, be no longer maintained. 9 will be, > at least for a while (possibly for one more year). Updates for 7.3 ends on December 31st 2003. Updates for 8.0 ends on December 31st 2003. Updates for 9 ends on April 30th 2004. So going to 9 will only get you an extra 4 months of updates. http://www.redhat.com/apps/support/errata/ - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/e1zjO2KABBYQAh8RArjhAJoDUAq9xSKjz6pJ58nIvSk1GEqG2QCeJ7f3 5XYQ/rJIzUPP744CNvAOLXA= =UNIB -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 1 18:58:21 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 2 Oct 2003 08:58:21 +1000 Subject: Environment monitoring In-Reply-To: <200310011001.31106.lepalom@vilma.upc.es> References: <200310011001.31106.lepalom@vilma.upc.es> Message-ID: <200310020858.30401.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 1 Oct 2003 06:21 pm, Leopold Palomo Avellaneda wrote: > A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure: > > Dont overlook lm_sensors+cron > > Why? Presumably because you can use it to monitor the temp and fan sensors and stuff and raise alarms if they go out of bounds. http://secure.netroedge.com/~lm78/ And from the info page: Project Mission / Background / Ethics: The primary mission for our project is to provide the best and most complete hardware health monitoring drivers for Linux. We strive to produce well organized, efficient, safe, flexible, and tested code free of charge to all Linux users using the Intel x86 hardware platform. The project attempts to support as many related devices as possible (when testing and documentation is available), especially those which are commonly included on mainboards. Our drivers provide the base software layer for utilities to acquire data on the environmental conditions of the hardware. We also provide a sample text-oriented utility to display sensor data. While this simple utility is sufficient for many users, others desire more elaborate user interfaces. We leave the development of these GUI-oriented utilities to others. See our useful addresses page for references. http://secure.netroedge.com/~lm78/info.html NB: I've used these at home from time to time, but we don't use them on our IBM cluster as we can grab the same info out of CSM. - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/e1wQO2KABBYQAh8RApUxAJ0V9QuvuGOLCnS7qXCkWD+9/OrOlgCfezuT QQ5wnTot9uoJCy3tRjuDKAQ= =fDWX -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Wed Oct 1 18:27:58 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 01 Oct 2003 15:27:58 -0700 Subject: cluster computing for mechanical structural FEM models Message-ID: <5.2.0.9.2.20031001152545.03110070@mailhost4.jpl.nasa.gov> I'm looking for references to work on distributed computing for structural models like trusses and spaceframes. They are typically sparse/diagonalish matrices that represent the masses and springs, so distributing the work in a cluster seems a natural fit. Anybody done anything like this (as a demonstration, e.g.) say, using NASTRAN inputs? James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From vanw at tticluster.com Thu Oct 2 08:37:50 2003 From: vanw at tticluster.com (Kevin Van Workum) Date: Thu, 2 Oct 2003 08:37:50 -0400 (EDT) Subject: lm_sensors output Message-ID: The recent discussion on environment sensors motivated me to take the subject more seriously. I therefore installed lm_senors on one of my nodes for testing. I simply used the lm_sensors RPM from RH8.0, ran sensors-detect and did what it told me to do. It apparently worked. The problem is, I don't really know what the output means or what I should be looking for. I guess I'm a novice. Anyways, the output from sensors is shown below. What is VCore and why is mine out of range? What are all the other voltages describing? V5SB is out of range also, is that a bad thing? I have only 1 CPU, so I guess temp2 and fan2 are meaningless, right? $ sensors w83697hf-isa-0290 Adapter: ISA adapter Algorithm: ISA algorithm VCore: +1.50 V (min = +0.00 V, max = +0.00 V) +3.3V: +3.29 V (min = +2.97 V, max = +3.63 V) +5V: +5.02 V (min = +4.50 V, max = +5.48 V) +12V: +12.20 V (min = +10.79 V, max = +13.11 V) -12V: -12.85 V (min = -13.21 V, max = -10.90 V) -5V: -5.42 V (min = -5.51 V, max = -4.51 V) V5SB: +5.51 V (min = +4.50 V, max = +5.48 V) VBat: +3.29 V (min = +2.70 V, max = +3.29 V) fan1: 4687 RPM (min = 187 RPM, div = 32) fan2: 0 RPM (min = 187 RPM, div = 32) temp1: +53?C (limit = +60?C, hysteresis = +127?C) sensor = thermistor temp2: +208.0?C (limit = +60?C, hysteresis = +50?C) sensor = thermistor alarms: beep_enable: Sound alarm disabled Kevin Van Workum, Ph.D. www.tsunamictechnologies.com ONLINE COMPUTER CLUSTERS __/__ __/__ * / / / / / / _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From AlberT at SuperAlberT.it Thu Oct 2 03:35:57 2003 From: AlberT at SuperAlberT.it (AlberT) Date: Thu, 2 Oct 2003 09:35:57 +0200 Subject: Upper bound on no. of sockets In-Reply-To: References: Message-ID: <200310020935.58006.AlberT@SuperAlberT.it> On Tuesday 30 September 2003 06:44, Balaji Rangasamy wrote: > Hi, > Is there an upper bound on the number of sockets that can be created by a > process? If there is one, is the limitation enforced by OS? And what other > factors does it depend on? Can you please be specific on the numbers for > different OS (RH Linux 7.2) ? > Thank you very much, > Balaji. > from man setrlimit: [quote] getrlimit and setrlimit get and set resource limits respectively. Each resource has an associated soft and hard limit, as defined by the rlimit structure (the rlim argument to both getrlimit() and setrlimit()): struct rlimit { rlim_t rlim_cur; /* Soft limit */ rlim_t rlim_max; /* Hard limit (ceiling for rlim_cur) */ }; The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process may make arbitrary changes to either limit value. The value RLIM_INFINITY denotes no limit on a resource (both in the structure returned by getrlimit() and in the structure passed to setrlimit()). [snip] RLIMIT_NOFILE Specifies a value one greater than the maximum file descriptor number that can be opened by this process. Attempts (open(), pipe(), dup(), etc.) to exceed this limit yield the error EMFILE. [/QUOTE] -- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathog at mendel.bio.caltech.edu Thu Oct 2 11:33:21 2003 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Thu, 02 Oct 2003 08:33:21 -0700 Subject: Environment monitoring Message-ID: Robert G. Brown rgb at phy.duke.edu wrote: >The bad thing is that it does NOT give you any sort of measure of room >temperature per se, Well, no, but to be fair that's hardly lm_sensors fault. The problem is that few (any?) motherboards have a sensor positioned away from hot devices on the upstream end of the wind flow. One can sometimes acquire a fair approximation of this info using SMART from a hard drive if the airflow across the drive is good and the drive itself does not run very hot. We have not yet filled the second processor slot on the mobos of our beowulf and that temperature sensor gives a pretty good indication of the air temperature in the case (32C) vs. under a live Athlon MP 2200+ processor (no load, 40.5C). We use lm_sensors with mondo http://mondo-daemon.sourceforge.net/ to watch the systems and shut them down if they overheat. Generally this works well. Mondo can compensate for the shortcomings of the lm_sensors/motherboard combos which sometimes arise. For instance, on our ASUS A7V266 mobos (workstations, not in a beowulf!) some of the sensors tend to go whacky for one or two measurements. Fan speeds go to 0 or temps to 255C. Mondo is set to require an out of range condition for 3 seconds before triggering a shutdown, and so far we have not seen a glitch last that long. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcmoore at atipa.com Thu Oct 2 13:56:04 2003 From: jcmoore at atipa.com (Curt Moore) Date: 02 Oct 2003 12:56:04 -0500 Subject: lm_sensors output In-Reply-To: References: Message-ID: <1065117364.12473.27.camel@picard.lab.atipa.com> This is really the bad thing about lm_sensors which some have touched on previously; too much guesswork. Many times even if the drivers are present and up to date for your specific hardware, the values may be meaningless as different board manufacturers may choose to physically connect the monitoring chip(s) to different onboard devices, such as in the case with fans. You have to have a knowledge of which onboard piece of hardware is connected to which input of the monitoring chip in order to make sense of the sensors output. Don't get me wrong, when lm_sensors works, it works great but sometimes it takes a little work to get to that point. Even if the values are sane for your hardware, you still have to go into the sensors.conf and set max, min, and hysteresis values, if you so choose, in order to have this information make sense for your specific hardware. In recent months, vendors such as Tyan have begun to distribute customized sensors.conf files for their boards which take into account the differences between boards and how sensor chips are connected to the onboard devices for each of their boards. As Don mentioned earlier, IPMI is more generalized and is much easier to ask for "CPU 1 Temperature" and actually get "CPU 1 Temperature" instead of data from some other onboard thermistor. A mistake in this area could end up costing time and money if something overheats and it's not detected because of polling the wrong data. >From my experience, it would be very difficult to come up with a generalized set of sensors values to work across differing motherboard types. A "standard" such as IPMI makes things much easier to accurately collect and act upon as all of the "hard" work has already been done by those implementing IPMI on the hardware. One would hope that these individuals would have the in-depth knowledge of exactly which values to map to which sensor inputs and any computations needed for these values so that clean and accurate values are returned when the hardware is polled. -Curt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Curt Moore Systems Integration Engineer At?pa Technologies jcmoore at atipa.com (O) 785-813-0312 (Fax) 785-841-1809 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brbarret at osl.iu.edu Thu Oct 2 13:05:54 2003 From: brbarret at osl.iu.edu (Brian Barrett) Date: Thu, 2 Oct 2003 10:05:54 -0700 Subject: Upper bound on no. of sockets In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com> References: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com> Message-ID: On Oct 1, 2003, at 6:59 AM, John Brookes wrote: > I think there is a 1k per-process limit on open sockets. It's tuneable > in > 2.4 kernels, IIRC, but I don't remember how (off the top of my head). > 'ulimit -n' adjusts the max number of fd's, but I'm not sure that'll > take it > past a/the kernel limit. Maybe recompile kernel? Maybe poke > /proc/sys/.../...? Maybe adjust in userland? > > Maybe use fewer sockets ;-) > > Does anybody know the score? On linux, there is a default per-process limit of 1024 (hard and soft limits) file descriptors. You can see the per-process limit by running limit (csh/tcsh) or ulimit -n (sh). There is also a limit on the total number of file descriptors that the system can have open, which you can find by looking at /proc/sys/fs/file-max. On my home machine, the max file descriptor count is around 104K (the default), so that probably isn't a worry for you. There is the concept of a soft and hard limit for file descriptors. The soft limit is the "default limit", which is generally set to somewhere above the needs of most applications. The soft limit can be increased by a normal user application up to the hard limit. As I said before, the defaults for the soft and hard limits on modern linux machines are the same, at 1024. You can adjust either limit by adding the appropriate lines in /etc/security/limits.conf (at least, that seems to be the file on both Red Hat and Debian). In theory, you could set the limit up to file-max, but that probably isn't a good idea. You really don't want to run your system out of file descriptors. There is one other concern you might want to think about. If you ever use any of the created file descriptors in a call to select(), you have to ensure all the select()ed file descriptors fit in an FD_SET. On Linux, the size of an FD_SET is hard-coded at 1024 (on most of the BSDs, Solaris, and Mac OS X, it can be altered at application compile time). So you may not want to ever set the soft limit above 1024. Some applications may expect that any file descriptor that was successfully created can be put into an FD_SET. If this isn't the case, well, life could get interesting. Hope this helps, Brian _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csmith at lnxi.com Thu Oct 2 11:45:40 2003 From: csmith at lnxi.com (Curtis Smith) Date: Thu, 2 Oct 2003 09:45:40 -0600 Subject: lm_sensors output References: Message-ID: <006001c388fc$3b67cb60$a423a8c0@blueberry> VCore is the voltage of the CPU #1. You can get the full definition of all values at http://www2.lm-sensors.nu/~lm78/. Curtis Smith Principle Software Engineer Linux Networx Inc. (www.lnxi.com) ----- Original Message ----- From: "Kevin Van Workum" To: Sent: Thursday, October 02, 2003 6:37 AM Subject: lm_sensors output > The recent discussion on environment sensors motivated me to take the > subject more seriously. I therefore installed lm_senors on one of my nodes > for testing. I simply used the lm_sensors RPM from RH8.0, ran > sensors-detect and did what it told me to do. It apparently worked. The > problem is, I don't really know what the output means or what I should be > looking for. I guess I'm a novice. Anyways, the output from sensors is > shown below. > > What is VCore and why is mine out of range? > What are all the other voltages describing? > V5SB is out of range also, is that a bad thing? > I have only 1 CPU, so I guess temp2 and fan2 are meaningless, right? > > $ sensors > w83697hf-isa-0290 > Adapter: ISA adapter > Algorithm: ISA algorithm > VCore: +1.50 V (min = +0.00 V, max = +0.00 V) > +3.3V: +3.29 V (min = +2.97 V, max = +3.63 V) > +5V: +5.02 V (min = +4.50 V, max = +5.48 V) > +12V: +12.20 V (min = +10.79 V, max = +13.11 V) > -12V: -12.85 V (min = -13.21 V, max = -10.90 V) > -5V: -5.42 V (min = -5.51 V, max = -4.51 V) > V5SB: +5.51 V (min = +4.50 V, max = +5.48 V) > VBat: +3.29 V (min = +2.70 V, max = +3.29 V) > fan1: 4687 RPM (min = 187 RPM, div = 32) > fan2: 0 RPM (min = 187 RPM, div = 32) > temp1: +53?C (limit = +60?C, hysteresis = +127?C) sensor = thermistor > temp2: +208.0?C (limit = +60?C, hysteresis = +50?C) sensor = thermistor > alarms: > beep_enable: > Sound alarm disabled > > Kevin Van Workum, Ph.D. > www.tsunamictechnologies.com > ONLINE COMPUTER CLUSTERS > > __/__ __/__ * > / / / > / / / > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Oct 2 17:25:20 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 2 Oct 2003 17:25:20 -0400 (EDT) Subject: Power Supply: Supermicro P4DL6 Board? In-Reply-To: Message-ID: > > disks are much, much cooler than they used to be, probably dropping > > below power consumed by ram on most clusters. > > Note that most performance-oriented RAM types now have metal cases and > heat sinks. They didn't add the metal because it _looks_ cool. I'm not so sure. I looked at the spec for a current samsung pc333 ddr 512Mb chip, and it works out to about 16W per GB. I think most people still have 512MB dimms, and probably pc266 (13.6W/GB). I don't really see why a dimm would have trouble dissipating ~20W, considering its size. I suspect dimm heatsinks are actually a fashion statement inspired by the heat-spreaders found on some rambus rimms (which were *spreaders*, a consequence of how rambus does power management...) personally, I'm waiting till I can invest in peltier-cooled dimms ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Thu Oct 2 19:08:39 2003 From: csamuel at vpac.org (Chris Samuel) Date: Fri, 3 Oct 2003 09:08:39 +1000 Subject: more on structural models for clusters In-Reply-To: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> References: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> Message-ID: <200310030908.41322.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 2 Oct 2003 08:36 am, Jim Lux wrote: > Imagine you had something like a geodesic dome with a microprocessor at > each vertex that wanted to compute the loads for that vertex, communicating > only with the adjacent vertices... The nearest I can remember to something like that (which sounds like an excellent idea) was for a fault tolerant model built around processors connected in a grid where each monitored the neighbours and if one was seen to go bad it could be sent a kill signal and the grid would logically reform without that processor. I think I read it in New Scientist between 1-4 years ago, but this abstract from the IEEE Transactions on Computers sounds similar (you've got to pay for the full article apparently): http://csdl.computer.org/comp/trans/tc/1988/11/t1414abs.htm A Multiple Fault-Tolerant Processor Network Architecture for Pipeline Computing Good luck! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/fK/3O2KABBYQAh8RArPyAKCCoaQXbywrq9h+3geGOVCE97dhgQCeKzV0 B94q2Yd0yPYFwDbcVINl/4w= =rbMB -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Thu Oct 2 20:39:33 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu, 02 Oct 2003 17:39:33 -0700 Subject: more on structural models for clusters In-Reply-To: <20031003002932.GA5984@sphere.math.ucdavis.edu> References: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov> Message-ID: <5.2.0.9.2.20031002173001.0310ce38@mailhost4.jpl.nasa.gov> At 05:29 PM 10/2/2003 -0700, Bill Broadley wrote: >On Wed, Oct 01, 2003 at 03:36:35PM -0700, Jim Lux wrote: > > In regards to my recent post looking for cluster implementations for > > structural dynamic models, I would like to add that I'm interested in > > "highly distributed" solutions where the computational load for each > > processor is very, very low, as opposed to fairly conventional (and widely > > available) schemes for replacing the Cray with a N-node cluster. > > > > The number of processors would be comparable to the number of structural > > nodes (to a first order of magnitude) > >Er, why bother? Is there some reason to distribute those things so >thinly? Your average dell can do 1-4 Billion floating point ops/sec, >why bother with so few per CPU? Am I missing something? Your average Dell isn't suited to inclusion as a MCU core in an ASIC at each node and would cost more than $10/node... I'm looking at Z80/6502/low end DSP kinds of computational capability in a mesh containing, say, 100,000 nodes. Sure, we'd do algorithm development on a bigger machine, but in the end game, you're looking at zillions of fairly stupid nodes. The commodity cluster aspect would only be in the development stages, and because it's much more likely that someone has solved the problem for a Beowulf (which is fairly loosely coupled and coarse grained) than for a big multiprocessor with tight coupling like a Cray. Haven't fully defined the required performance yet, but, as a starting point, I'd need to "solve the system" in something like 100 microseconds. The key is that I need an algorithm for which the workload scales roughly linearly as a function of the number of nodes, because the computational power available also scales as the number of loads. Clearly, I'm not going to do a brute force inversion or LU decomposition of a 100,000x100,000 matrix... However, inverting 100,000 matrices, each, say, 10x10, is reasonable. >Bill Broadley >Mathematics >UC Davis James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From he.94 at osu.edu Thu Oct 2 20:59:55 2003 From: he.94 at osu.edu (Hao He) Date: Thu, 02 Oct 2003 20:59:55 -0400 Subject: NFS Problem Message-ID: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com> Hi, there. I am building a cluster with 16 or 32 nodes, based on Pentium 4, Intel 875P chipsets and Intel CSA Gigabit NIC. The distribution is RedHat 9. I have some experience before but I still got some problem in NFS. Problem 1: When I just use 'rw' and 'intr' as the parameters used in /etc/fstab, I got following problem when startup clients (while the server with NFS daemon is running): Mount: RPC: Remote system error -- no route to host Then I added 'bg' to /etc/fstab, this time the result is better. Several minutes after the client booted up, the remote directory mounted. However, in many cases following meassage was prompted: nfs warning: mount version older than kernel Problem 2: I am mounting two remote directories from the server, however, at some nodes, only one directory even no directory got mounted. If only one directory mounted successfully, it differs from one client to another, and to the same node, it changes from time to time at system booting up, like dicing. This really confused me. Problem 3: Sometimes I got the message at the server node like this: (scsi 0:A:0:0): Locking max tag count at 33. However, seems it does not make trouble to mounted directories. I think it must be related with NFS. I have a further question: Since there may be 16 or 32 or even more clients try to mount the remote directory at the same time, can the NFS server really handle so much requests simultaneously? Is there any effective alternate method to share data, besides NFS? How to solve these problems? Any suggestion? Thank you very much. I will appreciate your response. Best wishes, Hao He _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Fri Oct 3 01:13:37 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Fri, 3 Oct 2003 07:13:37 +0200 Subject: NFS Problem In-Reply-To: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com> References: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com> Message-ID: <20031003051337.GA6263@unthought.net> On Thu, Oct 02, 2003 at 08:59:55PM -0400, Hao He wrote: > Hi, there. > > I am building a cluster with 16 or 32 nodes, based on Pentium 4, Intel 875P > chipsets and Intel CSA Gigabit NIC. > The distribution is RedHat 9. > I have some experience before but I still got some problem in NFS. > > Problem 1: When I just use 'rw' and 'intr' as the parameters used in > /etc/fstab, I got following problem when startup clients (while the server > with NFS daemon is running): > Mount: RPC: Remote system error -- no route to host That's a network problem or a network configuration problem. Usually this would be a name resolution problem. Check that the hostname in your fstab can be resolved in early boot (add it to your hosts file if necessary), or use the IP address of the server instead. But the error message seems to indicate that it's not resolution but routing - very odd... Is the network up? Do you have any special networking setup? Try checking your init-scripts to see that the network is really started before the NFS filesystems are mounted. > Then I added 'bg' to /etc/fstab, this time the result is better. Several > minutes after the client booted up, the remote directory mounted. So you NFS mount depends on something (network related) that isn't up at the time when the system tries to mount your NFS filesystems. Either you have a special (and wrong) setup, or RedHat messed up good :) Check the order in which things are started in your /etc/rc3.d/ directory. Network should go before NFS. > However, in many cases following meassage was prompted: > nfs warning: mount version older than kernel Most likely this is not really a problem - I've had systems with that message work just fine. You could check to see if RedHat has updates to mount. > > Problem 2: I am mounting two remote directories from the server, however, at > some nodes, only one directory even no directory got mounted. > If only one directory mounted successfully, it differs from one client to > another, and to the same node, it changes from time to time at system > booting up, like dicing. > This really confused me. Isn't this problem 1 over again? > > Problem 3: Sometimes I got the message at the server node like this: > (scsi 0:A:0:0): Locking max tag count at 33. That's a SCSI diagnostic. You can ignore it. > However, seems it does not make trouble to mounted directories. > I think it must be related with NFS. It's not related to NFS. > > I have a further question: Since there may be 16 or 32 or even more clients > try to mount the remote directory at the same time, > can the NFS server really handle so much requests simultaneously? Is there > any effective alternate method to share data, besides NFS? That should be no problem at all. NFS should be up to the task with no special tuning at all. Once you have all your nodes mounting NFS properly, you can start looking into tuning for performance - but it really should work 'out of the box' with no special tweaking. > > How to solve these problems? Any suggestion? > Thank you very much. I will appreciate your response. Use the following options to the NFS mounts in your fstab: hard,intr You can add rsize=8192,wsize=8192 for tuning. You should not need 'bg' - although it may be convenient if you need to be able to boot your nodes when the NFS server is down. One thing you should make sure: never use host-names or netgroups in your exports file on the server (!) *Only* use IP addresses or wildcards - *Never* use names. Using names in your 'exports' file on the server can cause *all* kinds of weird sporadic irreproducible problems - it's a long-standing and extremely annoying problem, but fortunately one that has an easy workaround. Check: *) Server: Your exports file (only IP or wildcard exports) *) Clients: Your fstab (use server IP or name in hosts file) *) Clients: Is network started before NFS mount? Please write to the list about your progress :) -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Andrew.Cannon at nnc.co.uk Fri Oct 3 05:30:07 2003 From: Andrew.Cannon at nnc.co.uk (Cannon, Andrew) Date: Fri, 3 Oct 2003 10:30:07 +0100 Subject: Filesystem question (sort of newbie) Message-ID: Hi All, I am going to be setting up a 16 node cluster in the near future. I have only set up a 4 node cluster before and I am a little unsure about how to sort out the disk space. Each computer will be running Red Hat (either 8 or 9 I haven't decided yet, any advice is still appreciated), and I was wondering how to best organise the disks on each node. I am thinking (only started wondering about this today) of installing the cluster software on the master node (pvm, MPI and the actual calculation software, MCNP) and mounting the disk on each of the other nodes, so that all they have on their hard drives is the minimal install of RH. The question I am asking is, will this work and what sort of performance hit will there be? Would I be better installing the software on each computer? TIA (sorry for being so stoopid, I'm still very much a learner at linux and clustering) Andy Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, Cheshire, WA16 8QZ. Telephone; +44 (0) 1565 843768 email: mailto:andrew.cannon at nnc.co.uk NNC website: http://www.nnc.co.uk NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856). The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ. This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Fri Oct 3 05:32:34 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Fri, 03 Oct 2003 05:32:34 -0400 Subject: Filesystem question (sort of newbie) In-Reply-To: References: Message-ID: <3F7D4232.3070900@lmco.com> Andrew, Let me recommend Warewulf (warewulf-cluster.org). It boots the nodes using RH 7.3 (although it should work with 8 or but I haven't tested it), but it boots into a small Ram Disk (about 70 megs depending upon what you need on the nodes). It's very easy to setup, configure and use, plus you don't need to install RH on each node. Warewulf will use a hard disk in the nodes if available for swap and local scratch space. However, it will also work with diskless nodes (although you don't get swap or scratch space). Warewulf will also take /home from the master node and NFS mount it throughout the cluster. So you can install your code on /home for all of the nodes. Good Luck! Jeff > Hi All, > > I am going to be setting up a 16 node cluster in the near future. I have > only set up a 4 node cluster before and I am a little unsure about how to > sort out the disk space. > > Each computer will be running Red Hat (either 8 or 9 I haven't decided > yet, > any advice is still appreciated), and I was wondering how to best > organise > the disks on each node. > > I am thinking (only started wondering about this today) of installing the > cluster software on the master node (pvm, MPI and the actual calculation > software, MCNP) and mounting the disk on each of the other nodes, so that > all they have on their hard drives is the minimal install of RH. The > question I am asking is, will this work and what sort of performance hit > will there be? Would I be better installing the software on each > computer? > > TIA (sorry for being so stoopid, I'm still very much a learner at > linux and > clustering) > > Andy > > Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, > Cheshire, WA16 8QZ. > > Telephone; +44 (0) 1565 843768 > email: mailto:andrew.cannon at nnc.co.uk > NNC website: http://www.nnc.co.uk > > > > NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC > Limited (no. 1120437), National Nuclear Corporation Limited (no. > 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited > (no. 235856). The registered office of each company is at Booths > Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for > Technica-NNC Limited whose registered office is at 6 Union Row, > Aberdeen AB10 1DQ. > > This email and any files transmitted with it have been sent to you by > the relevant UK operating company and are confidential and intended > for the use of the individual or entity to whom they are addressed. > If you have received this e-mail in error please notify the NNC system > manager by e-mail at eadm at nnc.co.uk. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Dr. Jeff Layton Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Fri Oct 3 08:59:52 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Fri, 03 Oct 2003 08:59:52 -0400 Subject: Filesystem question (sort of newbie) In-Reply-To: References: Message-ID: <3F7D72C8.3050409@lmco.com> Mark Hahn wrote: > > 8 or but I haven't tested it), but it boots into a small Ram > > Disk (about 70 megs depending upon what you need on the > > alternately, it's almost trivial to PXE boot nodes, mount a simple > root FS from a server/master, and use the local disk, if any, for > swap and/or tmp. one nice thing about this is that you can do it > with any distribution you like - mine's RH8, for instance. > > personally, I prefer the nfs-root approach, probably because once > you boot, you won't be wasting any ram with boot-only files. > for a cluster of 48 nodes, there seems to be no drawback; > for a much larger cluster, I expect all the boot-time traffic > would be crippling, and you might want to use some kind of > multicast to distribute a ramdisk image just once... > While I don't prefer the nfs-root approach, Warewulf can do that as well (haven't tried it personally). What kind of network do you use for the 48-node cluster? Anybody else use the nfs-root approach? The 70 megs used in the ram disk is pretty well thought out. There are some basic things to boot the node, but it also includes glibc and you can easily add MPICH, LAM, Ganglia, SGE, etc. The developer has thought out these packages very well so that only the pieces of each of these packages that needs to be on the nodes actually gets installed on the nodes. Very well thought out. Oh, one other thing. The image that goes to the nodes via TFTP (over PXE) is compressed so it's about half the size of the final ram disk. This really helps cut down on network traffic (even works over my poor rtl8139 network). One of the things I'd like to experiment with is using squasfs to reduce the size of the ram disk. IMHO, 70 megs is not very big, but reducing it to 30-40 Megs might be worth the effort. > regards, mark hahn. > Thanks! Jeff -- Dr. Jeff Layton Senior Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Fri Oct 3 09:34:30 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Fri, 3 Oct 2003 09:34:30 -0400 (EDT) Subject: Filesystem question (sort of newbie) In-Reply-To: <3F7D4232.3070900@lmco.com> Message-ID: > 8 or but I haven't tested it), but it boots into a small Ram > Disk (about 70 megs depending upon what you need on the alternately, it's almost trivial to PXE boot nodes, mount a simple root FS from a server/master, and use the local disk, if any, for swap and/or tmp. one nice thing about this is that you can do it with any distribution you like - mine's RH8, for instance. personally, I prefer the nfs-root approach, probably because once you boot, you won't be wasting any ram with boot-only files. for a cluster of 48 nodes, there seems to be no drawback; for a much larger cluster, I expect all the boot-time traffic would be crippling, and you might want to use some kind of multicast to distribute a ramdisk image just once... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 3 11:24:48 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 3 Oct 2003 11:24:48 -0400 (EDT) Subject: Filesystem question (sort of newbie) In-Reply-To: Message-ID: On Fri, 3 Oct 2003, Cannon, Andrew wrote: > Each computer will be running Red Hat (either 8 or 9 I haven't decided yet, > any advice is still appreciated), and I was wondering how to best organise > the disks on each node. > > I am thinking (only started wondering about this today) of installing the > cluster software on the master node (pvm, MPI and the actual calculation > software, MCNP) and mounting the disk on each of the other nodes, so that > all they have on their hard drives is the minimal install of RH. The > question I am asking is, will this work and what sort of performance hit > will there be? Would I be better installing the software on each computer? > > TIA (sorry for being so stoopid, I'm still very much a learner at linux and > clustering) If the nodes have lots of memory, most of their access to non-data disk (programs and libraries) will come out of caches after the systems have been up for a while, so they won't take a HUGE performance hit, but things like loading a big program for the first time may take longer. However, if you work to master PXE and kickstart (which go together like ham and eggs) and have adequate disk, in the long run your maintenance will be minimized by putting energy into developing a node kickstart script. Then you just boot the nodes into kickstart over the network, wait a few minutes for the install and boot into production. This will take you some time to learn (there are HOWTO-like resource online, so it isn't a LOT of time) and if you got nodes with NICs that don't support PXE you'll likely want to replace them or add ones that do, but once you invest these capital costs the payback is that your marginal cost for installing additional nodes after the first node you get to install "perfectly" is so close to zero as to make no nevermind. Make a dhcp table entry. Boot node into install. Boot node. Reinstalling is exactly the same process and can be done in minutes if a hard disk crashes. It gets to be so easy that we almost routinely do a reinstall after working on a system for any reason, including ones where it probably isn't necessary. You can reinstall a system from anywhere on the internet (if your hardware is accessible and preconfigured for this to work). Finally, if you include yum on the nodes, you can automagically update the nodes from a master repository image on your server, and mirror your server image from one of the Red hat mirrors, and actually maintain a stream of updates onto the nodes with no further action on your part. At this point, if you aren't doing Scyld or one of the preconfigured cluster packages and want to roll your own cluster out of a base install plus selected RPMs (and why not?) PXE+kickstart/RH+yum forms a pretty solid low-energy paradigm for installation and maintenance once you've learned how to make it work. rgb > > Andy > > Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford, > Cheshire, WA16 8QZ. > > Telephone; +44 (0) 1565 843768 > email: mailto:andrew.cannon at nnc.co.uk > NNC website: http://www.nnc.co.uk > > > > NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856). The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ. > > This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajiang at mail.eecis.udel.edu Sat Oct 4 12:00:51 2003 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Sat, 4 Oct 2003 12:00:51 -0400 (EDT) Subject: Help: About Intel Fortran Compiler: Message-ID: Hi, All: I tried to compile a Fortran 90 MPI program by the Intel Frotran Compiler in the OSCAR cluster. I run the command: " ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 " The system failed to compile it and gave me the following information: " module EHFIELD program FDTD3DPML external function RISEF external function WINDOWFUNCTION external function SIGMA external function GETISTART external function GETIEND external subroutine COM_EYZ external subroutine COM_EYX external subroutine COM_EZX external subroutine COM_EZY external subroutine COM_HYZ external subroutine COM_HYX external subroutine COM_HZX external subroutine COM_HZY 3228 Lines Compiled /tmp/ifcVao851.o(.text+0x5a): In function `main': : undefined reference to `mpi_init_' /tmp/ifcVao851.o(.text+0x6e): In function `main': : undefined reference to `mpi_comm_rank_' /tmp/ifcVao851.o(.text+0x82): In function `main': : undefined reference to `mpi_comm_size_' /tmp/ifcVao851.o(.text+0xab): In function `main': : undefined reference to `mpi_wtime_' /tmp/ifcVao851.o(.text+0x422): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x448): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x47b): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x49e): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x4c1): In function `main': : undefined reference to `mpi_bcast_' /tmp/ifcVao851.o(.text+0x4e7): more undefined references to `mpi_bcast_' follow /tmp/ifcVao851.o(.text+0x24511): In function `com_hzy_': : undefined reference to `mpi_recv_' /tmp/ifcVao851.o(.text+0x24b76): In function `com_hzy_': : undefined reference to `mpi_send_' " At the same time, I tried the same program in the other scyld cluster, using NAG compiler. I use command: " f95 -I/usr/include -lmpi -lm -o p_wg3 p_fdtd3dwg3.f90 " It works fine. So that means my fortran program in fine. Both of the cluster use the MPICH implementation. But because I have to work on that OSCAR cluster with Intel compiler, I wonder 1. why the errors happen? 2. Is the problem of cluster or the Intel compiler? 3. How I can solve it. I know there are a lot of guy with experience and experts of cluster and MPI in this mailing list. I appreciate your suggestion and advice from you. Thanks. Tom _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From br66 at HPCL.CSE.MsState.Edu Sun Oct 5 00:52:45 2003 From: br66 at HPCL.CSE.MsState.Edu (Balaji Rangasamy) Date: Sat, 4 Oct 2003 23:52:45 -0500 (CDT) Subject: Upper bound on no. of sockets In-Reply-To: Message-ID: Thanks a billion for all the responses. Here is another question: Is there a way to send some data to the listener when I do a connect()? I tried using sin_zero field of the sockaddr_in structure, but quite unsuccessfully. The problem is I want to uniquely identify the actively connecting process (IP address and port number information wont suffice). I can send() the identifier value to the listener after the connect(), but I want to cut down the cost of an additional send. Any suggestions are greatly appreciated. Thanks, Balaji. PS: I am not sure if it is appropriate to send this question to this mailing list. My sincere apologies for those who find this question annoyingly incongruous. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Sat Oct 4 15:15:08 2003 From: lathama at yahoo.com (Andrew Latham) Date: Sat, 4 Oct 2003 12:15:08 -0700 (PDT) Subject: Filesystem question (sort of newbie) In-Reply-To: Message-ID: <20031004191508.27391.qmail@web60306.mail.yahoo.com> This is by far my favorite approach however I tend to tweak it with a very large initrd and custom kernel. I am using older hardware with its max ram so I use it as best I can. with no local harddisk I am always looking at the best method of network file access and have gone so far as to try wget with http. --- Mark Hahn wrote: > > 8 or but I haven't tested it), but it boots into a small Ram > > Disk (about 70 megs depending upon what you need on the > > alternately, it's almost trivial to PXE boot nodes, mount a simple > root FS from a server/master, and use the local disk, if any, for > swap and/or tmp. one nice thing about this is that you can do it > with any distribution you like - mine's RH8, for instance. > > personally, I prefer the nfs-root approach, probably because once > you boot, you won't be wasting any ram with boot-only files. > for a cluster of 48 nodes, there seems to be no drawback; > for a much larger cluster, I expect all the boot-time traffic > would be crippling, and you might want to use some kind of > multicast to distribute a ramdisk image just once... > > regards, mark hahn. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== Andrew Latham Penguin loving, moralist agnostic. LathamA.com - (lay-th-ham-eh) lathama at lathama.com - lathama at yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From franz.marini at mi.infn.it Mon Oct 6 04:21:50 2003 From: franz.marini at mi.infn.it (Franz Marini) Date: Mon, 6 Oct 2003 10:21:50 +0200 (CEST) Subject: Help: About Intel Fortran Compiler: In-Reply-To: References: Message-ID: On Sat, 4 Oct 2003, Ao Jiang wrote: > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 Try with : ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 Btw, a cleaner way to compile mpi programs is to use the mpif90 (mpif77 for fortran77) command (which is a wrapper for the real compiler). You should be able to make it use the ifc by setting the MPICH_F90 (MPICH_F77 for fortran77) and MPICH_F90LINKER environment variables to choose which compiler to use, e.g. let's say you want to use the ifc compiler, and you're using bash, you would have to do: export MPICH_F90=ifc export MPICH_F90LINKER=ifc and then, in order to compile your mpi program you should issue the command: mpif90 -o p_wg3 p_fdtd3dwg3_pml.f90 > 2. Is the problem of cluster or the Intel compiler? Neither. Intel works fine with Oscar. Have a good day, F. --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it phone : +390250317221 --------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Mon Oct 6 07:22:35 2003 From: ds10025 at cam.ac.uk (ds10025 at cam.ac.uk) Date: Mon, 06 Oct 2003 12:22:35 +0100 Subject: Copy files between Nodes via NFS In-Reply-To: References: <003f01c37d29$47d7ec10$0e01010a@hpcncd.cpe.ku.ac.th> Message-ID: <5.0.2.1.0.20031006121907.03a25120@hermes.cam.ac.uk> Morning I have basic node PC that NFS mount directories from the master node. When I try to copy files using 'cp' from the node to NFS mounted directory the node PC just hang. Have any comes across this problem? How best move/copy files across nodes? Regards Dan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Mon Oct 6 07:21:02 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Mon, 06 Oct 2003 13:21:02 +0200 Subject: Intel compilers and libraries Message-ID: <20031006112102.GC15837@sgirmn.pluri.ucm.es> Hello: We are thinking about purchasing the Intel C++ compiler for linux, mainly for getting the most of our harware (Xeon 2.4Gz processors), we are also interested in the Intel MKL (Math Kernel Library), I would like to know if the performance gain using Intel compiler+libraries, which exploit SSE2 and make other optimizations for P4/Xeon, are as good as Intel claims, anyone in the list using those products? On the other hand, isn't MKL just as good as any other good math library compiled with Xeon/P4 optimization and extensions (using Intel C++ compiler for example). Another question, the only difference I can see reading Intel docs between P4 and Xeon is more cache on Xeon, and HyperThreading (below P4/Xeon 3Ghz), does it really makes a big difference taking into account the much more expensive Xeons are. Any one having experience with both platforms. Greetings: Jose M. P?rez. Madrid. Spain. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From j.a.white at larc.nasa.gov Mon Oct 6 10:05:07 2003 From: j.a.white at larc.nasa.gov (Jeffery A. White) Date: Mon, 06 Oct 2003 10:05:07 -0400 Subject: undefined references to pthread related calls Message-ID: <3F817693.9040001@larc.nasa.gov> Hi group, I have a user of my software (a f90 based CFD code using mpich) that is haveing trouble installing my code on their system. They are using mpich and the Intel version 7.1 ifc compiler. The problem occurs at the link step. They are getting undefined references to what appear to be system calls to pthread related functions such as pthread_self, pthread_equal, pthread_mutex_lock. Does any one else encountered and know how to fix this problem? Thanks, Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel.kidger at quadrics.com Mon Oct 6 03:23:32 2003 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Mon, 6 Oct 2003 08:23:32 +0100 Subject: Help: About Intel Fortran Compiler calling mpich In-Reply-To: References: Message-ID: <200310060823.32239.daniel.kidger@quadrics.com> Tom, this is the standard old chestnut about Fortran and trailing underscores on function names. if you do say ' nm -a /opt/mpich-1.2.5/lib/libmpi.a |grep -i mpi_comm_rank' I expect you will see 2 trailing underscores. Different Fortran vendors add a different number of underscores - some add 2 by default (eg g77), some one (eg ifc), and some none. Sometimes there is a a compiler option to change this. There are three solutions to this issue: 1/ (Lazy option) recompile mpich several times; once with each Fortran compiler you have. 2/ Compile your application with the option that matches your prebuilt mpich (presumably 2 underscores - but note that ifc doesn't have an option for this) 3/ rebuild mpich with '-fno-second-underscore' (using say g77) . This is the common ground. You can link code to this with all current Fortran compilers. You may also meet the 'mpi_getarg, x_argc' issue - this too is easy to fix. -- Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- On Saturday 04 October 2003 5:00 pm, Ao Jiang wrote: > Hi, All: > I tried to compile a Fortran 90 MPI program by > the Intel Frotran Compiler in the OSCAR cluster. > I run the command: > " > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 > " > The system failed to compile it and gave me the following information: > " > module EHFIELD > program FDTD3DPML > external function RISEF > external function WINDOWFUNCTION > external function SIGMA > external function GETISTART > external function GETIEND > external subroutine COM_EYZ > external subroutine COM_EYX > external subroutine COM_EZX > external subroutine COM_EZY > external subroutine COM_HYZ > external subroutine COM_HYX > external subroutine COM_HZX > external subroutine COM_HZY > > 3228 Lines Compiled > > /tmp/ifcVao851.o(.text+0x5a): In function `main': > : undefined reference to `mpi_init_' > > /tmp/ifcVao851.o(.text+0x6e): In function `main': > : undefined reference to `mpi_comm_rank_' > > /tmp/ifcVao851.o(.text+0x82): In function `main': > : undefined reference to `mpi_comm_size_' > > /tmp/ifcVao851.o(.text+0xab): In function `main': > : undefined reference to `mpi_wtime_' > > /tmp/ifcVao851.o(.text+0x422): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x448): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x47b): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x49e): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x4c1): In function `main': > : undefined reference to `mpi_bcast_' > > /tmp/ifcVao851.o(.text+0x4e7): more undefined references to `mpi_bcast_' > follow > > /tmp/ifcVao851.o(.text+0x24511): In function `com_hzy_': > : undefined reference to `mpi_recv_' > > /tmp/ifcVao851.o(.text+0x24b76): In function `com_hzy_': > : undefined reference to `mpi_send_' > > " > > At the same time, I tried the same program in the other scyld cluster, > using NAG compiler. > > I use command: > " > f95 -I/usr/include -lmpi -lm -o p_wg3 p_fdtd3dwg3.f90 > " > > It works fine. So that means my fortran program in fine. > > Both of the cluster use the MPICH implementation. > > But because I have to work on that OSCAR cluster with Intel compiler, > I wonder > 1. why the errors happen? > 2. Is the problem of cluster or the Intel compiler? > 3. How I can solve it. > > I know there are a lot of guy with experience and experts of cluster and > MPI in this mailing list. I appreciate your suggestion and advice from > you. > > Thanks. > > Tom > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Mon Oct 6 10:54:43 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Mon, 6 Oct 2003 14:54:43 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: <20031006112102.GC15837@sgirmn.pluri.ucm.es> References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> Message-ID: Jose, Pardon me for some advertising here, but our OptimaNumerics Linear Algebra Library can very significantly outperform Intel MKL. Depending on the particular routine and platform, we have seen performance advantage of almost 32x (yes, that's 32 times!) using OptimaNumerics Linear Algebra Library! I can send you one of our white papers which shows performance benchmark details off-line. If anyone else is interested, please do send me an e-mail also. Best wishes, Kenneth Tan ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- This e-mail (and any attachments) is confidential and privileged. It is intended only for the addressee(s) stated above. If you are not an addressee, please accept my apologies and please do not use, disseminate, disclose, copy, publish or distribute information in this e-mail nor take any action through knowledge of its contents: to do so is strictly prohibited and may be unlawful. Please inform me that this e-mail has gone astray, and delete this e-mail from your system. Thank you for your co-operation. ----------------------------------------------------------------------- On Mon, 6 Oct 2003, [iso-8859-1] Jos? M. P?rez S?nchez wrote: > Date: Mon, 06 Oct 2003 13:21:02 +0200 > From: "[iso-8859-1] Jos? M. P?rez S?nchez" > To: beowulf at beowulf.org > Subject: Intel compilers and libraries > > Hello: > > We are thinking about purchasing the Intel C++ compiler for linux, > mainly for getting the most of our harware (Xeon 2.4Gz processors), we > are also interested in the Intel MKL (Math Kernel Library), I would like > to know if the performance gain using Intel compiler+libraries, which exploit > SSE2 and make other optimizations for P4/Xeon, are as good as Intel > claims, anyone in the list using those products? > > On the other hand, isn't MKL just as good as any other good math library compiled > with Xeon/P4 optimization and extensions (using Intel C++ compiler for > example). > > Another question, the only difference I can see reading Intel docs between > P4 and Xeon is more cache on Xeon, and HyperThreading (below P4/Xeon 3Ghz), > does it really makes a big difference taking into account the much more > expensive Xeons are. Any one having experience with both platforms. > > Greetings: > > Jose M. P?rez. > Madrid. Spain. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From clwang at csis.hku.hk Sun Oct 5 21:59:34 2003 From: clwang at csis.hku.hk (Cho Li Wang) Date: Mon, 06 Oct 2003 09:59:34 +0800 Subject: Cluster2003: Call for Participation (Preliminary) Message-ID: <3F80CC86.FAFBFAD2@csis.hku.hk> ---------------------------------------------------------------------- CALL FOR PARTICIPATION 2003 IEEE International Conference on Cluster Computing 1 - 4 December 2003 Sheraton Hong Kong Hotel & Towers, Tsim Sha Tsui, Kowloon, Hong Kong URL: http://www.csis.hku.hk/cluster2003/ Cosponsored by IEEE Computer Society IEEE Computer Society Task Force on Cluster Computing IEEE Hong Kong Section Computer Chapter The University of Hong Kong Industrial Sponsors : Hewlett-Packard, Microsoft, IBM, Extreme Networks, Sun Microsystems, Intel, Dawning, and Dell. ----------------------------------------------------------------------- Dear Friends, You are cordially invited to participate the annual international cluster computing conference to be held on Dec. 1-4, 2003 in Hong Kong, the most dynamic city in the Orient. The Cluster series of conferences is one of the flagship events sponsored by the IEEE Task Force on Cluster Computing (TFCC) since its inception in 1999. The competition among refereed papers was particularly strong this year, with 48 papers being selected as full papers from the 164 papers that were submitted, for a 29% acceptance rate. An additional 19 papers were selected for poster presentation. Besides the technical paper presentation, there will be three keynotes, four tutorials, one panel, a Grid live demo session, and a number of invited talks and exhibits to be arranged during the conference period. A preliminary program schedule is attached below. Please share this Call for Participation information with your colleagues working in the area of cluster computing. For registration, please visit our registration web page at: http://www.csis.hku.hk/cluster2003/registration.htm (The deadline for advance registration is October 22, 2003.) TCPP Awards will be granted to students members, and will partially cover the registration and travel cost to attend the conference. See : http://www.caip.rutgers.edu/~parashar/TCPP/TCPP-Awards.htm We look forward meeting you in Hong Kong! Cho-Li Wang and Daniel Katz Cluster2003, Program Co-chairs ------------------------------------------------------------------ ***************************************** Cluster 2003 Preliminary Program Schedule ***************************************** Monday, December 1 ------------------ 8:00-5:00 - Conference/Tutorial Registration 8:30-12:00: Morning Tutorials Designing Next Generation Clusters with Infiniband: Opportunities and Challenges D. Panda (Ohio State University) Using MPI-2: Advanced Features of the Message Passing Interface W. Gropp, E. Lusk, R. Ross, R. Thakur (Argonne National Lab.) 12:00-1:30 - Lunch 1:30-5:00 : Afternoon Tutorials The Gridbus Toolkit for Grid and Utility Computing R. Buyya (University of Melbourne) Building and Managing Clusters with NPACI Rocks G. Bruno, M. Katz, P. Papadopoulos, F. Sacerdoti, NPACI Rocks group at San Diego Supercomputer Center), L. Liew, N. Ninaba (Singapore Computing Systems) ************************ Tuesday, December 2 ************************ 7:00-5:00 Conference Registration 9:00-9:15 Welcome and Opening Remarks 9:15-10:15 Keynote 1 (TBA) 10:45-12:15 : Session 1A, 1B, 1C Session 1A (Room A) : Scheduling I Dynamic Scheduling of Parallel Real-time Jobs by Modelling Spare Capabilities in Heterogeneous Clusters Ligang He, Stephen A. Jarvis, Graham R. Nudd, Daniel P. Spooner (University of Warwick, UK) Parallel Job Scheduling on Multi-Cluster Computing Systems Jemal Abawajy and S. P. Dandamudi (Carleton University, Canada) Interstitial Computing: Utilizing Spare Cycles on Supercomputers Stephen Kleban and Scott Clearwater (Sandia National Laboratories, USA) Session 1B (Room B) : Applications A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen, Guang R. Gao (University of Delaware, USA) Computing Large-scale Alignments on a Multi-cluster Chunxi Chen and Bertil Schmidt (Nanyang Technological University, Singapore) Auto-CFD: Efficiently Parallelizing CFD Applications on Clusters Li Xiao (Michigan State University, USA), Xiaodong Zhang (College of WIlliam and Mary, NSF, USA), Zhengqian Kuang, Baiming Feng, Jichang Kang (Northwestern Polytechnic University, China) Session 1C (Room C) : Performance Analysis Performance Analysis of a Large-Scale Cosmology Application on Three Cluster Systems Zhiling Lan and Prathibha Deshikachar (Illinois Institute of Technology, USA) A Performance Monitor based on Virtual Global Time for Clusters of PCs Michela Taufer (UC San Diego, USA), Thomas M. Stricker (ETH Zurich, Switzerland) A Distributed Performance Analysis Architecture for Clusters Holger Brunst, Wolfgang E. Nagel (Dresden University of Technology, Germany), Allen D. Malony (University of Oregon, USA) 12:15-2:00 Lunch 2:00-3:30 : Session 2A, 2B, 2C Session 2A (Room A) : Scheduling II Coordinated Co-scheduling in time-sharing Clusters through a Generic Framework Saurabh Agarwal (IBM India Research Labs, India), Gyu Sang Choi, Chita R. Das (Pennsylvania State University, USA), Andy B. Yoo (Lawrence Livermore National Laboratory, USA), Shailabh Nagar (IBM T.J. Watson Research Center, USA) A Robust Scheduling Strategy for Moldable Jobs Sudha Srinivasan, Savitha Krishnamoorthy, P. Sadayappan (Ohio State University, USA) Towards Load Balancing Support for I/O-Intensive Parallel Jobs in a Cluster of Workstations Xiao Qin, Hong Jiang, Yifeng Zhu, David R. Swanson (University of Nebraska-Lincoln, USA) Session 2B (Room B) : Java JavaSplit: A Runtime for Execution of Monolithic Java Programs on Heterogeneous Collections of Commodity Workstations Michael Factor (IBM Research Lab in Haifa, Israel), Assaf Schuster, Konstantin Shagin (Israel Institute of Technology, Israel) Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, Myrinet and SCI Clusters Guillermo L. Taboada, Juan Touri?o, Ramon Doallo (University of A Coruna, Spain) Compiler Optimized Remote Method Invocation Ronald Veldema and Michael Philippsen (University of Erlangen-Nuremberg, Germany) Session 2C (Room C) : Communication I Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters Jarek Nieplocha , V. Tipparaju, M. Krishnan (Pacific Northwest National Laboratory, USA), G. Santhanaraman, D.K. Panda (Ohio State University, USA) Impact of Computational Resource Reservation to the Communication Performance in the Hypercluster Environment Kai Wing Tse and P.K Lun (The Hong Kong Polytechnic University, Hong Kong) Kernel Implementations of Locality-Aware Dispatching Techniques for Web Server Clusters Michele Di Santo, Nadia Ranaldo, Eugenio Zimeo (University of Sannio, Italy) 3:30-4:00 Coffee Break 4:00-4:30 Invited Talk 1 (Room C) : TBA 4:30-5:00 Invited Talk 2 (Room C) : TBA 5:30-7:30 Poster Session (Details Attached at the End) 6:00-7:30 Reception ****************************** Wednesday, December 3 ******************************* 8:30-5:00 Conference Registration 9:00-10:00 Keynote 2 (Room C) TBA 10:00-10:30 Coffee Break 10:30-12:00 Session 3A, 3B, 3C Session 3A (Room A): Middleware OptimalGrid: Middleware for Automatic Deployment of Distributed FEM Problems on an Internet-Based Computing Grid Tobin Lehman and James Kaufman (IBM Almaden Research Center, USA) Adaptive Grid Resource Brokering Abdulla Othman, Peter Dew, Karim Djemame, Iain Gourlay (University of Leeds, UK) HPCM: A Pre-compiler Aided Middleware for the Mobility of Legacy Code Cong Du, Xian-He Sun, Kasidit Chanchio (Illinois Institute of Technology, USA) Session 3B (Room B) : Cluster/Job Management I The Process Management Component of a Scalable Systems Software Environment Ralph Butler (Middle Tennessee State University, USA), Narayan Desai, Andrew Lusk, Ewing Lusk (Argonne National Laboratory,USA) Load Distribution for Heterogeneous and Non-Dedicated Clusters Based on Dynamic Monitoring and Differentiated Services Liria Sato (University of Sao Paulo, Brazil), Hermes Senger(Catholic University of Santos, Brazil) GridRM: An Extensible Resource Monitoring System Mark Baker and Garry Smith (University of Portsmouth, UK) Session 3C (Room C) : I/O I A High Performance Redundancy Scheme for Cluster File Systems Manoj Pillai and Mario Lauria (Ohio State University, USA) VegaFS: A Prototype for File-sharing Crossing Multiple Administrative Domains Wei Li, Jianmin Liang, Zhiwei Xu (Chinese Academy of Sciences, China) Design and Performance of the Dawning Cluster File System Jin Xiong, Sining Wu, Dan Men, Ninghui Sun, Guojie Li (Chinese Academy of Sciences, China) 12:00-1:30 Lunch 1:30-3:00 Session 4A, 4B, Vender Talk 1 Session 4A (Room A) Novel Systems Coordinated Checkpoint versus Message Log for Fault Tolerant MPI Aur?lien Bouteiller, Lemarinier, Krawezik, Cappello (Universit? de Paris Sud, France) A Performance Comparison of Linux and a Lightweight Kernel Ron Brightwell, Rolf Riesen, Keith Underwood (Sandia National Laboratories, USA), Trammell B. Hudson (Operating Systems Research, Inc.), Patrick Bridges, Arthur B. Maccabe (University of New Mexico, USA) Implications of a PIM Architectural Model for MPI Arun Rodrigues, Richard Murphy, Peter Kogge, Jay Brockman (University of Notre Dame, USA), Ron Brightwell, Keith Underwood (Sandia National Laboratories, USA) Session 4B (Room B) Cluster/Job Management II Reusable Mobile Agents for Cluster Computing Ichiro Satoh (National Institute of Informatics, Japan) High Service Reliability For Cluster Server Systems M. Mat Deris, M.Rabiei, A. Noraziah, H.M. Suzuri (University College of Science and Technology, Malaysia) Wide Area Cluster Monitoring with Ganglia Federico D. Sacerdoti, Mason J. Katz (San Diego Supercomputing Center, USA), Matthew L. Massie, David E. Culler (UC Berkeley, USA) Vender Talk 1 (Room C) 3:00-3:30 Coffee Break 3:30-5:00 Panel Discussion 6:30-8:30 Banquet Dinner (Ballroom, Conference Hotel) **************************** Thursday, December 4 **************************** 8:30-5:00 Conference Registration Special Technical Session : Dec. 4 (9am - 4:30pm) Grid Demo - Life Demonstrations of Grid Technologies and Applications Session Chairs: Peter Kacsuk (MTA SZTAKI Research Institute, Hungary), Rajkumar Buyya (University of Melbourne, Australia) 9:00-10:00 Keynote 3 (Room C) 10:00-10:30 Coffee Break 10:30-12:00 Vender Talk 2, 5B, 5C Vender Talk 2 (Room A) Session 5B (Room B) : Novel Software Efficient Parallel Out-of-core Matrix Transposition Sriram Krishnamoorthy, Gerald Baumgartner, Daniel Cociorva, Chi-Chung Lam, P Sadayappan (Ohio State University, USA) A Case Study of Parallel I/O for Biological Sequence Search on Linux Clusters Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson (University of Nebraska-Lincoln, USA) CTFS: A New Light-weight, Cooperative Temporary File System for Cluster-based Web Server Jun Wang (University of Nebraska-Lincoln, USA) Session 5C (Room C) I/O II Efficient Structured Data Access in Parallel File Systems Avery Ching, Alok Choudhary, Wei-keng Liao (Northwestern University, USA), Robert Ross, William Gropp (Argonne National Laboratory, USA) View I/O: Improving the Performance of Non-contiguous I/O Florin Isaila and Walter F. Tichy (University of Karlsruhe, Germany) Supporting Efficient Noncontiguous Access in PVFS over InfiniBand Jiesheng Wu (Ohio State University), Pete Wyckoff (Ohio Supercomputer Center, USA), D.K. Panda (Ohio State University, USA) 12:00-2:00 Lunch 2:00-2:30 Invited Talk 3 (Room C) 2:30-3:00 Invited Talk 4 (Room C) 3:00-3:30 Coffee Break 3:30-5:00 : Session 6A, 6B, 6C Session 6A (Room A) : Scheduling III A General Self-adaptive Task Scheduling System for Non-dedicated Heterogeneous Computing Ming Wu and Xian-He Sun (Illinois Institute of Technology, USA) Adding Memory Resource Consideration into Workload Distribution for Software DSM Systems Yen-Tso Liu, Ce-Kuen Shieh (National Chung Kung University, Taiwan), Tyng-Yeu Liang (National Kaohsiung University of Applied Sciences, Taiwan) An Energy-Based Implicit Co-scheduling Model for Beowulf Cluster Somsak Sriprayoonsakul and Putchong Uthayopas (Kasetsart University, Thailand) Session 6B (Room B) : High Availability Availability Prediction and Modeling of High Availability OSCAR Cluster Lixin Shen, Chokchai Leangsuksun, Tong Liu, Hertong Song (Louisiana Tech University, USA), Stephen L. Scott (Oak Ridge National Laboratory, USA) A System Recovery Benchmark for Clusters Ira Pramanick, James Mauro, Ji Zhu (Sun Microsystems, Inc., USA) Performance Evaluation of Routing Algorithms in RHiNET-2 Cluster Michihiro Koibuchi, Konosuke Watanabe, Kenichi Kono, Akiya Jouraku, Hideharu Amano (Keio University, Japan) Session 6C (Room C) : Communications II Application-Bypass Reduction for Large-Scale Clusters Adam Wagner, Darius Buntias, D.K. Panda (Ohio State University, USA), Ron Brightwell (Sandia National Laboratories, USA) Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost Surendra Byna (Illinois Institute of Technology, USA), William Gropp (Argonne National Laboratory, USA), Xian-He Sun (Illinois Institute of Technology, USA), Rajeev Thakur (Argonne National Laboratory, USA) Shared Memory Mirroring for Reducing Communication Overhead on Commodity Networks Jarek Nieplocha, B. Palmer, E. Apra (Pacific Northwest National Laboratory, USA) ************************************* 5:00 : End of the Conference ************************************* ------------------------------------------------------------------- Poster Session/Short Papers "Plug-and-Play" Cluster Computing using Mac OS X Dean Dauger (Dauger Research, Inc.) and Viktor K. Decyk (UC Los Angeles, USA) Improving Performance of a Dynamic Load Balancing System by Using Number of Effective Tasks Min Choi, Jung-Lok Yu, Seung-Ryoul Maeng (Korea Advanced Institute of Science and Technology, Korea) Dynamic Self-Adaptive Replica Location Method in Data Grids Dongsheng Li, Nong Xiao, Xicheng Lu, Kai Lu, Yijie Wang (National University of Defense Technology, China) Efficient I/O Caching in Data Grid and Cluster Management Song Jiang (College of William and Mary, USA), Xiaodong Zhang (National Science Foundation, USA) Optimized Implementation of Extendible Hashing to Support Large File System Directory Rongfeng Tang, Dan Mend, Sining Wu (Chinese Academy of Sciences, China) Parallel Design Pattern for Computational Biology and Scientific Computing Applications Weiguo Liu and Bertil Schmidt (Nanyang Technological University, Singapore) FJM: A High Performance Java Message Library Tsun-Yu Hsiao, Ming-Chun Cheng, Hsin-Ta Chiao, Shyan-Ming Yuan (National Chiao Tung University, Taiwan) Cluster Architecture with Lightweighted Redundant TCP Stacks Hai Jin and Zhiyuan Shao (Huazhong University of Science and Technology, China) >From Clusters to the Fabric: The Job Management Perspective Thomas R?oblitz, Florian Schintke, Alexander Reinefeld (Zuse Institute Berlin, Germany) Towards an Efficient Cluster-based E-Commerce Server Victoria Ungureanu, Benjamin Melamed, Michael Katehakis (Rutgers University, USA) A Kernel Running in a DSM - Design Aspects of a Distributed Operating System Ralph Goeckelmann, Michael Schoettner, Stefan Frenz, Peter Schulthess (University of Ulm, Germany) Distributed Recursive Sets: Programmability and Effectiveness for Data Intensive Applications Roxana Diaconescu (UC Irvine, USA) Run-Time Prediction of Parallel Applications on Shared Environment Byoung-Dai Lee (University of Minnesota, USA), Jennifer M. Schopf (Argonne National Laboratory, USA) An Instance-Oriented Security Mechanism in Grid-based Mobile Agent System Tianchi Ma and Shanping Li (Zhejiang University, China) A Hierarchical and Distributed Approach for Mapping Large Applications to Heterogeneous Grids using Genetic Algorithms Soumya Sanyal, Amit Jain, Sajal Das (University of Texas at Arlington, USA), Rupak Biswas (NASA Ames Research Center, USA) BCFG: A Configuration Management Tool for Heterogeneous Clusters Narayan Desai, Andrew Lusk, Rick Bradshaw, Remy Evard (Argonne National Laboratory, USA) Communication Middleware Systems for Heterogenous Clusters: A Comparative Study Daniel Balkanski, Mario Trams, Wolfgang Rehm (Technische Universita Chemnitz, Germany) QoS-Aware Adaptive Resource Management in Distributed Multimedia System Using Server Clusters Mohammad Riaz Moghal, Mohammad Saleem Mian (University of Engineering and Technology, Pakistan) On the InfiniBand Subnet Discovery Process Aurelio Berm?dez, Rafael Casado, Francisco J. Quiles (Universidad de Castilla-La Mancha, Spain), Timothy M. Pinkston (University of Southern California, USA), Jos? Duato (Universidad Polit?cnica de Valencia, Spain) -------------------------------------------------------------- Chairs/Committees General Co-Chairs Jack Dongarra (University of Tennessee) Lionel Ni (Hong Kong University of Science and Technology) General Vice Chair Francis C.M. Lau (The University of Hong Kong) Program Co-Chairs Daniel S. Katz (Jet Propulsion Laboratory) Cho-Li Wang (The University of Hong Kong) Program Vice Chairs Bill Gropp (Argonne National Laboratory) -- Middleware Wolfgang Rehm (Technische Universit?t Chemnitz) -- Hardware Zhiwei Xu (Chinese Academy of Sciences, China) -- Applications Tutorials Chair Ira Pramanick (Sun Microsystems) Workshops Chair Jiannong Cao (Hong Kong Polytechnic University) Exhibits/Sponsors Chairs Jim Ang (Sandia National Lab) Nam Ng (The University of Hong Kong) Publications Chair Rajkumar Buyya (The University of Melbourne) Publicity Chair Arthur B. Maccabe (The University of New Mexico) Poster Chair Putchong Uthayopas (Kasetsart University) Finance/Registration Chair Alvin Chan (Hong Kong Polytechnic University) Local Arrangements Chair Anthony T.C. Tam (The University of Hong Kong) Programme Committee David Abramson (Monash U., Australia) Gabrielle Allen (Albert Einstein Institute, Germany) David A. Bader (U. of New Mexico, USA) Mark Baker (U. of Portsmouth, UK) Ron Brightwell (Sandia National Laboratory USA) Rajkumar Buyya (U. of Melbourne, Australia) Giovanni Chiola (Universita' di Genova Genova, Italy) Sang-Hwa Chung (Pusan National U., Korea) Toni Cortes (Universitat Politecnica de Catalunya, Spain) Al Geist (Oak Ridge National Laboratory, USA) Patrick Geoffray (Myricom Inc., USA) Yutaka Ishikawa (U. of Tokyo, Japan) Chung-Ta King (National Tsing Hua U., Taiwan) Tomohiro Kudoh (AIST, Japan) Ewing Lusk (Argonne National Laboratory, USA) Jens Mache (Lewis and Clark College, USA) Phillip Merkey (Michigan Tech U., USA) Matt Mutka (Michigan State U., USA) Charles D. Norton (JPL, California Institute of Technology, USA) D.K. Panda (Ohio State U., USA) Philip Papadopoulos (UC San Diego, USA) Myong-Soon Park (Korea U., Korea) Neil Pundit (Sandia National Laboratory, USA) Thomas Rauber (U. Bayreuth, Germany) Alexander Reinefeld (ZIB, Germany) Rob Ross (Argonne National Laboratory, USA) Gudula Ruenger (Chemnitz U. of Technology, Germany) Jennifer Schopf (Argonne National Laboratory, USA) Peter Sloot (U. of Amsterdam, Netherlands) Thomas Stricker (Institut fur Computersysteme, Switzerland) Ninghui Sun (Chinese Academy of Sciences, China) Xian-He Sun (Illinois Institute of Technology, USA) Rajeev Thakur (Argonne National Laboratory, USA) Putchong Uthayopas (Kasetsart U., Thailand) David Walker (U. of Wales Cardiff, UK) Xiaodong Zhang (NSF, USA) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Florent.Calvayrac at univ-lemans.fr Mon Oct 6 11:54:09 2003 From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac) Date: Mon, 06 Oct 2003 17:54:09 +0200 Subject: undefined references to pthread related calls In-Reply-To: <3F817693.9040001@larc.nasa.gov> References: <3F817693.9040001@larc.nasa.gov> Message-ID: <3F819021.2050605@univ-lemans.fr> Jeffery A. White wrote: > Hi group, > > I have a user of my software (a f90 based CFD code using mpich) that is > haveing trouble installing my code on > their system. They are using mpich and the Intel version 7.1 ifc > compiler. The problem occurs at the link step. > They are getting undefined references to what appear to be system calls > to pthread related functions such as > pthread_self, pthread_equal, pthread_mutex_lock. Does any one else > encountered and know how to fix this problem? > > Thanks, > > Jeff > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > is the compiler installed on a Redhat 8.0 ? Besides, maybe they use OpenMP/HPF directives and options which can mess up things and are usually useless on a cluster with one CPU per node. -- Florent Calvayrac | Tel : 02 43 83 26 26 Laboratoire de Physique de l'Etat Condense | Fax : 02 43 83 35 18 UMR-CNRS 6087 | http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences | 72085 Le Mans Cedex 9 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Oct 6 12:56:37 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 6 Oct 2003 12:56:37 -0400 (EDT) Subject: Help: About Intel Fortran Compiler: In-Reply-To: Message-ID: On Mon, 6 Oct 2003, Franz Marini wrote: > On Sat, 4 Oct 2003, Ao Jiang wrote: > > > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 > > Try with : > > ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 > > Btw, a cleaner way to compile mpi programs is to use the mpif90 > (mpif77 for fortran77) command (which is a wrapper for the real > compiler). Acckkk!! This is one of the horribly broken things about most MPI implementations. It's not reasonable to say "to use this library you must use our magic compile script" A MPI library should be just that -- a library conforming to system standards. You should be able to link it with just "-lmpi". Most of the Fortran underscore issues may be hidden from the user with weak linker aliases. Similarly, it's not reasonable to say "to run this program, you must use our magic script" You should be able to just run the program, by name, in the usual way. Our BeoMPI implementation demonstrated how to do it right many years ago, and we provided the code back to the community. Many people on this list seem to take the attitude "I've already learned the crufty way, therefore the improvements don't matter." One element of a high-quality library is ease of use, and in the long run that matters more than a few percent faster for a specific function call. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Mon Oct 6 13:24:57 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Mon, 06 Oct 2003 10:24:57 -0700 Subject: Intel compilers and libraries In-Reply-To: <20031006112102.GC15837@sgirmn.pluri.ucm.es> References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> Message-ID: <3F81A569.80805@cert.ucr.edu> Jos? M. P?rez S?nchez wrote: >Hello: > >We are thinking about purchasing the Intel C++ compiler for linux, >mainly for getting the most of our harware (Xeon 2.4Gz processors), we >are also interested in the Intel MKL (Math Kernel Library), I would like >to know if the performance gain using Intel compiler+libraries, which exploit >SSE2 and make other optimizations for P4/Xeon, are as good as Intel >claims, anyone in the list using those products? > You realize that there's a free version of the Intel compiler for Linux right? Anyway, our experience with their Fortran compiler has been that it's roughly on par with the Portland Group's compiler. However, if Pentium 4 optimizations are turned on, the code produced by the Intel compiler runs just a little bit faster. Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Wolfgang.Dobler at kis.uni-freiburg.de Mon Oct 6 12:50:06 2003 From: Wolfgang.Dobler at kis.uni-freiburg.de (Wolfgang Dobler) Date: Mon, 6 Oct 2003 18:50:06 +0200 Subject: Beowulf digest, Vol 1 #1482 - 2 msgs In-Reply-To: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com> References: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com> Message-ID: <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de> Hi Ao, > I tried to compile a Fortran 90 MPI program by > the Intel Frotran Compiler in the OSCAR cluster. > I run the command: > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 > > The system failed to compile it and gave me the following information: > 3228 Lines Compiled > /tmp/ifcVao851.o(.text+0x5a): In function `main': > : undefined reference to `mpi_init_' [...] > I wonder > 1. why the errors happen? > 2. Is the problem of cluster or the Intel compiler? Looks like the infamous underscore problem. You have a library (libmpi.so or libmpi.a) that has been built using the GNU F77 compiler without the option `-fno-second-underscore' and accordingly the MPI symbols are called `mpi_init__', not `mpi_init_', etc. But the Intel compiler (and all other non-G77 compilers) expects a symbol with only one underscore appended ( `mpi_init_'), but that one is not in the library. > 3. How I can solve it. The way out is to either rebuild the library, compiling with `g77 -fno-second-underscore' or with the Intel compiler, or (the less elegant choice) to refer to the MPI functions with one underscore in you F90 code: call MPI_INIT_(ierr) There is one related question I want to ask the ld-specialists on the list: On some machines libraries like MPICH contain all symbol names with both underscore conventions, i.e. `mpi_init__', and `mpi_init_' at the same time. Does anybody know whether there are easy ways of building such a library? Is there something like `symbol aliases' and how would one create these when generating the library? W o l f g a n g _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From j.a.white at larc.nasa.gov Mon Oct 6 12:56:56 2003 From: j.a.white at larc.nasa.gov (Jeffery A. White) Date: Mon, 06 Oct 2003 12:56:56 -0400 Subject: undefined references to pthread related calls Message-ID: <3F819ED8.9020002@larc.nasa.gov> Group, Thanks for your responses. Turns out that the problem appears to be an incompatiblilty between ifc 7.1 and the glibc version in the version of RH 8.0 being used. The RH 8.0 being used had some patches that updated glibc. I was able to fix it by removing the -static option when compling with ifc. I have tested this with a patch free version of 8.0 and I don't see the problem wit or without the -static option specified. At runtime my code does not use any calls that seem to access pthread related system routines. I am guessing that by deferring reolution of the link until runtime I have bypassed the problem. Obviously if I did use routines that needed pthread related code I would still have a problem so this isn't a general fix. Jeff _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Oct 6 13:29:09 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 6 Oct 2003 13:29:09 -0400 (EDT) Subject: Filesystem question (sort of newbie) In-Reply-To: Message-ID: On Fri, 3 Oct 2003, Mark Hahn wrote: > > 8 or but I haven't tested it), but it boots into a small Ram > > Disk (about 70 megs depending upon what you need on the For the Scyld Beowulf system we developed a more sophisticated "diskless administrative" approach that has better scaling and more predictable performance. We cache executable objects, libraries and executables, using an method that works unchanged with either Ramdisk (==tmpfs) or local disk cache. Keep in mind that this is just one element of making a cluster system scalable and easy to manage. Using a workstation-oriented distribution as the software base for compute nodes means generating many different kinds of configuration files, and dealing with the scheduling impact of the various daemons. > alternately, it's almost trivial to PXE boot nodes, mount a simple > root FS from a server/master, and use the local disk, if any, for > swap and/or tmp. one nice thing about this is that you can do it > with any distribution you like - mine's RH8, for instance. The obvious problems are configuration, scaling and update consistency issues. > personally, I prefer the nfs-root approach, probably because once > you boot, you won't be wasting any ram with boot-only files. They are trivial to get rid of either by explicitly erasing or switching to a new ramdisk (e.g. our old stage 3) when initialization completes. > for a cluster of 48 nodes, there seems to be no drawback; > for a much larger cluster, I expect all the boot-time traffic > would be crippling, and you might want to use some kind of > multicast to distribute a ramdisk image just once... Multicast bulk data transfer was a good idea back when we had Ethernet repeaters. Today it should only be used for service discovery and low-rate status updates. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Mon Oct 6 14:02:39 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Mon, 06 Oct 2003 14:02:39 -0400 Subject: Help: About Intel Fortran Compiler: In-Reply-To: References: Message-ID: <3F81AE3F.9050202@lmco.com> Donald Becker wrote: > On Mon, 6 Oct 2003, Franz Marini wrote: > > > On Sat, 4 Oct 2003, Ao Jiang wrote: > > > > > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 > p_fdtd3dwg3_pml.f90 > > > > Try with : > > > > ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 > > > > Btw, a cleaner way to compile mpi programs is to use the mpif90 > > (mpif77 for fortran77) command (which is a wrapper for the real > > compiler). > > Acckkk!! > > This is one of the horribly broken things about most MPI > implementations. It's not reasonable to say > "to use this library you must use our magic compile script" > A MPI library should be just that -- a library conforming to system > standards. You should be able to link it with just "-lmpi". > I don't like the mpi compiler helper scripts much either. I just want a simple makefile or a list of the libraries to link in in the correct order. I usually end up reading the helper scripts and pulling out the library order and putting it in my makefiles anyway (no offense to anyone). However, in defense of the different MPI implementations, they have somewhat different philosophies on how to get the best performance and ease of use. Sometimes this involves other libraries. Just telling the user to add '-lmpi' to the end of their link command may not tell them everything (e.g. they may need to add the pthreads library, or libdl or whatever). > One element of a high-quality library is ease of use, and in the long > run that matters more than a few percent faster for a specific function > call. > One piece of data. While we haven't looked at specific MPI calls, we have noticed up to about a 30% difference in wall clock time with our codes between the various MPI implementations using the same system (same nodes, same code, same input, same network, same nodes, etc.). I'm all for that kind of performance boost even if it's a little more cumbersome to compile/link/run (although one's mileage may vary depending upon the code) Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From franz.marini at mi.infn.it Mon Oct 6 14:55:57 2003 From: franz.marini at mi.infn.it (Franz Marini) Date: Mon, 6 Oct 2003 20:55:57 +0200 (CEST) Subject: Help: About Intel Fortran Compiler: In-Reply-To: References: Message-ID: On Mon, 6 Oct 2003, Donald Becker wrote: > Acckkk!! Ok, I shouldn't have said "a cleaner way". I don't usually use mpif77 (or f90) to compile programs requiring mpi libs, in fact. I prefer to explicitly tell the compiler which library I want, and where to find them. Btw, this is much simpler and faster if you have multiple versions/releases of the same library. Anyway, clean or not, elegant or not, mpif77 should (and I say should) work. Btw, I still can't understand why the hell each fortran compiler uses a different way to treat underscores. This, and another thousands of reasons make me hate fortran. (erm, please, this is a *personal* pov, let's not start another flame/discussion on the fortran vs issue ;)). Have a nice day, F. --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it phone : +39 02 50317221 --------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Mon Oct 6 17:18:07 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Mon, 6 Oct 2003 14:18:07 -0700 Subject: Intel compilers and libraries In-Reply-To: References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> Message-ID: <20031006211806.GB2091@greglaptop.internal.keyresearch.com> On Mon, Oct 06, 2003 at 02:54:43PM +0000, C J Kenneth Tan -- Heuchera Technologies wrote: > Pardon me for some advertising here, but our OptimaNumerics Linear > Algebra Library can very significantly outperform Intel MKL. Kenneth, Welcome to the beowulf mailing list. Here are some helpful suggestions: 1) Don't top post. Answer postings like I do here, by quoting the relevant part of the posting you're replying to. 2) Don't include an 8-line confidentiality notice in a posting to a public, archived mailing list, distributed all over the world. 3) Marketing slogans and paragraphs with several !s don't work so well here. More sophisticated customers aren't drawn by a claim of a 32x performance advantage without knowing what is being measured. Is it a 100x100 matrix LU decomposition? Well, no, because Intel's MKL and the free ATLAS library run at a respectable % of peak. Is it on a 1000 point FFT? Well, no, because the free FFTW library runs at a respectable % of peak on that. 4) Put your performance whitepapers on your website, or it looks fishy. I looked and didn't see a single performance claim there. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Mon Oct 6 18:41:34 2003 From: ds10025 at cam.ac.uk (D. Scott) Date: 06 Oct 2003 23:41:34 +0100 Subject: Root-nfs error 13 while mounting Message-ID: Evening I'm getting error 13 when my diskless client try to mount file system. Hoe best to resolved this error 13? Dan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Oct 6 23:55:13 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 6 Oct 2003 23:55:13 -0400 (EDT) Subject: Root-nfs error 13 while mounting In-Reply-To: Message-ID: > I'm getting error 13 when my diskless client try to mount file system. Hoe > best to resolved this error 13? it's best resolved by translating it to text: EACCESS or "permission denied". I'm guessing you should look at the logs on your fileserver, since it seems to be rejecting your clients. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Tue Oct 7 05:17:36 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Tue, 7 Oct 2003 11:17:36 +0200 Subject: weak symbols [Re: Beowulf digest, Vol 1 #1482 - 2 msgs] In-Reply-To: <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de> References: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com> <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de> Message-ID: <200310071117.36507.joachim@ccrl-nece.de> Wolfgang Dobler: > On some machines libraries like MPICH contain all symbol names with both > underscore conventions, i.e. `mpi_init__', and `mpi_init_' at the same > time. Does anybody know whether there are easy ways of building such a > library? Is there something like `symbol aliases' and how would one create > these when generating the library? Yes, most linkers support "weak symbols" in one way or another (there is no common way, usually a pragma or "function attributes" (gcc) are used) which supply all required API symbols for the one real implemented function. Just take a look at a source file like mpich/src/pt2pt/send.c to see how this can be done (some preprocessing "magic"). It can also be done w/o weak symbols at the cost of a slightly bigger library. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Tue Oct 7 07:07:19 2003 From: jcownie at etnus.com (James Cownie) Date: Tue, 07 Oct 2003 12:07:19 +0100 Subject: more on structural models for clusters Message-ID: <1A6pgV-16F-00@etnus.com> Jim, > Your average Dell isn't suited to inclusion as a MCU core in an ASIC > at each node and would cost more than $10/node... I'm looking at > Z80/6502/low end DSP kinds of computational capability in a mesh > containing, say, 100,000 nodes. Have you seen this gizmo ? (It's just so cute I had to pass it on :-) http://www.lantronix.com/products/eds/xport/ It's a 48MHz x86 with 256KB of SRAM and 512KB of flash, a 10/100Mb ethernet interface an RS232 and three bits of digital I/O and it all fits _inside_ an RJ45 socket. It comes loaded up with a web server and so on. It's on sale here in the UK for GBP 39 + VAT one off, so should come down somewhere near the price you mention above for your 100,000 off in the US. (It might also be useful to the folks who want to build their own environmental monitoring. Couple one of these up to the serial interconnect on a temperature monitoring button and you'd immediately be able to access it from the net). -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Tue Oct 7 08:18:16 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Tue, 7 Oct 2003 09:18:16 -0300 (ART) Subject: Tools for debuging Message-ID: <20031007121816.67790.qmail@web12208.mail.yahoo.com> I'm having problems with a prograa, and i really need a tool for debug it. There's specific debugers for mpi programas, if have more than one, what is the best choice? Thanks ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - o melhor webmail do Brasil http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From nican at nsc.liu.se Tue Oct 7 07:50:26 2003 From: nican at nsc.liu.se (Niclas Andersson) Date: Tue, 7 Oct 2003 13:50:26 +0200 (CEST) Subject: CALL FOR PARTICIPATION: Workshop on Linux Clusters for Super Computing Message-ID: CALL FOR PARTICIPATION ================================================================ 4th Annual Workshop on Linux Clusters For Super Computing (LCSC) Clusters for High Performance Computing and Grid Solutions 22-24 October, 2003 Hosted by National Supercomputer Centre (NSC) Link?ping University, SWEDEN ================================================================ The programme is in its final state. The workshop is brimful of knowledgeable speakers giving exciting talks about Linux clusters, grids and distributed applications requiring vast computational resources. Just a few samples: - Keynote: Andrew Grimshaw, University of Virginia and CTO of Avaki Inc. - Comparisons of Linux clusters with the Red Storm MPP William J. Camp, Project Leader of Red Storm, Sandia National Laboratories - The EGEE project: building a grid infrastructure for Europe Bob Jones, EGEE Technical Director, CERN - Linux on modern NUMA architectures Jes Sorensen, Wild Open Source Inc. - The AMANDA Neutrino Telescope Stephan Hundertmark, Stockholm University and many more. In addition to invited speakers there will be vendor presentations, exhibitions and tutorials. Last date for registration: October 10. For more information and registration: http://www.nsc.liu.se/lcsc _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From keith.murphy at attglobal.net Tue Oct 7 11:28:50 2003 From: keith.murphy at attglobal.net (Keith Murphy) Date: Tue, 7 Oct 2003 08:28:50 -0700 Subject: Tools for debuging References: <20031007121816.67790.qmail@web12208.mail.yahoo.com> Message-ID: <025701c38ce7$b5b64060$02fea8c0@oemcomputer> Check out Etnus's Totalview parallel debugger www.etnus.com Keith Murphy Dolphin Interconnect C: 818-292-5100 T: 818-597-2114 F: 818-597-2119 www.dolphinics.com ----- Original Message ----- From: "Mathias Brito" To: Sent: Tuesday, October 07, 2003 5:18 AM Subject: Tools for debuging > I'm having problems with a prograa, and i really need > a tool for debug it. There's specific debugers for mpi > programas, if have more than one, what is the best > choice? > > Thanks > > > ===== > Mathias Brito > Universidade Estadual de Santa Cruz - UESC > Departamento de Ci?ncias Exatas e Tecnol?gicas > Estudante do Curso de Ci?ncia da Computa??o > > Yahoo! Mail - o melhor webmail do Brasil > http://mail.yahoo.com.br > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rokrau at yahoo.com Tue Oct 7 13:30:19 2003 From: rokrau at yahoo.com (Roland Krause) Date: Tue, 7 Oct 2003 10:30:19 -0700 (PDT) Subject: undefined references to pthread related calls In-Reply-To: <200310071605.h97G5FV13188@NewBlue.Scyld.com.scyld.com> Message-ID: <20031007173019.8519.qmail@web40010.mail.yahoo.com> --- beowulf-request at scyld.com wrote: > 2. undefined references to pthread related calls (Jeffery A. > White) FYI, Intel has released a version of their compiler that fixes the link problem for applications that use OpenMP. Intel Fortran now supports glibc-2.3.2 which is used in RH-9 and Suse-8.2. The old compatibility hacks have become obsolete at least. I hear Intel-8 is in beta, anyone have experience with it? Roland > Subject: undefined references to pthread related calls > > Group, > > Thanks for your responses. Turns out that the problem appears to > be > an incompatiblilty between ifc 7.1 and the glibc version > in the version of RH 8.0 being used. The RH 8.0 being used had some > patches that updated glibc. I was able to fix it by removing > the -static option when compling with ifc. I have tested this with a > patch free version of 8.0 and I don't see the problem wit or without > the -static option specified. At runtime my code does not use any > calls > that seem to access pthread related system routines. I am > guessing that by deferring reolution of the link until runtime I > have > bypassed the problem. Obviously if I did use routines that > needed pthread related code I would still have a problem so this > isn't a > general fix. > > Jeff > > __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Oct 7 15:28:42 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 07 Oct 2003 15:28:42 -0400 Subject: updated cluster finishing script system Message-ID: <1065554922.32374.47.camel@protein.scalableinformatics.com> Folks: I updated my cluster finishing script package. This package allows you to perform post-installation configuration changes (e.g. finishing) for an RPM based cluster which maintains image state on local disks. It used to be specialized to the ROCKS distribution, but it has evolved significantly and should work with generalized RPM based distributions. Major changes: 1) No RPMs are distributed (this is a good thing, read on) 2) a build script generates customized RPMs for you after asking you 4 questions. (please, no jokes about unladen swallows, neither European nor African...) These RPMs allow you to customize the finishing server and the finishing script client as you require for your task. This includes choosing the server's IP address (used to be hard-coded to 10.1.1.1), the server's export directory (used to be hard-coded to /opt/finishing), the cluster's network (used to be hard-coded to 10.0.0.0), and the cluster's netmask (used to be hard-coded to 255.0.0.0). 3) Documentation (see below) Have a look at http://scalableinformatics.com/finishing/ for more details, including new/better instructions. It is licensed under the GPL for end users. Contact us offline if you want to talk about redistribution licenses. Joe -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Oct 7 19:50:38 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 8 Oct 2003 09:50:38 +1000 Subject: updated cluster finishing script system In-Reply-To: <1065554922.32374.47.camel@protein.scalableinformatics.com> References: <1065554922.32374.47.camel@protein.scalableinformatics.com> Message-ID: <200310080950.41343.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 8 Oct 2003 05:28 am, Joseph Landman wrote: > It is licensed under the GPL for end users. Contact us offline if you want > to talk about redistribution licenses. Err, if it's licensed under the GPL then the "end users" who receive it under that license can redistribute it themselves under the GPL. Part 6 of the GPL v2 says: [quote] 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. [...] [/quote] Of course as the copyright holder you could also do dual licensing, so I guess this is what you mean - correct ? But whichever it is, once you have released something under the GPL you cannot prevent others from redistributing it under the GPL themselves. cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/g1FOO2KABBYQAh8RAu1oAJ0fLlcljVYwXj7xgnkjGFyNaoWOFwCfWM/r IC1/xPLO2ePGM2zlJF2ZHK8= =HOnr -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajiang at mail.eecis.udel.edu Tue Oct 7 21:58:21 2003 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Tue, 7 Oct 2003 21:58:21 -0400 (EDT) Subject: Still about the MPICH and Intel Fortran Compiler: In-Reply-To: Message-ID: Hi, First, I want to thank all of you for the answers and suggestions for my question last time. ( Last time, I tried: " ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 " The system failed to compile it and gave me the following information: " module EHFIELD program FDTD3DPML external function RISEF 3228 Lines Compiled /tmp/ifcVao851.o(.text+0x5a): In function `main': : undefined reference to `mpi_init_' . . . ) Most of friends suggest me to use '-lmpi', instead of '-Lmpi', I tried it, the system gave me the following error: " ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 module EHFIELD program FDTD3DPML external function RISEF external subroutine COM_HZY 3228 Lines Compiled ld: cannot find -lmpi " either does '-lmpif', although there exist 'mpif.h' and 'mpi.h' in the directory '/opt/mpich-1.2.5/include'. I also tried the command: " /opt/mpich-1.2.5/bin/mpif90 -w -o p_wg3 p_fdtd3dwg3_pml.f90 " The system gave the error: " 3228 Lines Compiled /opt/intel/compiler70/ia32/lib/libIEPCF90.a(f90fioerr.o)(.text+0x4d3): In function `f_f77ioerr': : undefined reference to `__ctype_b' " In fact, I don't know what this error means. Of course, I don't know how to slove it either. Tom _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Tue Oct 7 22:23:04 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 8 Oct 2003 12:23:04 +1000 Subject: updated cluster finishing script system In-Reply-To: <1065579065.32368.134.camel@protein.scalableinformatics.com> References: <1065554922.32374.47.camel@protein.scalableinformatics.com> <200310080950.41343.csamuel@vpac.org> <1065579065.32368.134.camel@protein.scalableinformatics.com> Message-ID: <200310081223.05966.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 8 Oct 2003 12:11 pm, Joseph Landman wrote: > Thanks for catching the wording error. No worries, I wasn't intending to be pedantic, just curious. :-) - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/g3UIO2KABBYQAh8RAgSkAJ48X7RY3ABNnYa2DlQ0z0vHfinaxACfdsMk hIZqsuVLevZqp2OBtfAafEs= =2vpF -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Tue Oct 7 22:11:05 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Tue, 07 Oct 2003 22:11:05 -0400 Subject: updated cluster finishing script system In-Reply-To: <200310080950.41343.csamuel@vpac.org> References: <1065554922.32374.47.camel@protein.scalableinformatics.com> <200310080950.41343.csamuel@vpac.org> Message-ID: <1065579065.32368.134.camel@protein.scalableinformatics.com> On Tue, 2003-10-07 at 19:50, Chris Samuel wrote: > On Wed, 8 Oct 2003 05:28 am, Joseph Landman wrote: > > > It is licensed under the GPL for end users. Contact us offline if you want > > to talk about redistribution licenses. > > Err, if it's licensed under the GPL then the "end users" who receive it under > that license can redistribute it themselves under the GPL. Part 6 of the GPL > v2 says: ... > Of course as the copyright holder you could also do dual licensing, so I guess > this is what you mean - correct ? Commercial redistribution ala the MySQL form of license. You are correct, it was a mis-wording on my part. Basically if someone decides to turn this into a commercial product (ok, stop laughing...), or wants support, or a warranty, then they need to speak with us first. As the package is mostly source code, make files and scripts, it seems odd to consider distributing it any other way. More to the point, there are some things that should be free (Libre and beer, though some keep asking me where the free beer is). Stuff like this should be free (as in Libre). RGB and I had a conversation about this I think... . I leave it to others to supply the beer. > But whichever it is, once you have released something under the GPL you cannot > prevent others from redistributing it under the GPL themselves. ... which I don't want to hinder (redistribution under GPL), rather I want to encourage ... Thanks for catching the wording error. -- Joseph Landman _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Wed Oct 8 13:16:03 2003 From: ctierney at hpti.com (Craig Tierney) Date: 08 Oct 2003 11:16:03 -0600 Subject: Still about the MPICH and Intel Fortran Compiler: In-Reply-To: References: Message-ID: <1065633362.22256.8.camel@woody> On Tue, 2003-10-07 at 19:58, Ao Jiang wrote: > Hi, > First, I want to thank all of you for the answers and suggestions > for my question last time. > ( > Last time, I tried: > " > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90 > " > The system failed to compile it and gave me the following information: > " > module EHFIELD > program FDTD3DPML > external function RISEF > > 3228 Lines Compiled > /tmp/ifcVao851.o(.text+0x5a): In function `main': > : undefined reference to `mpi_init_' The option -L specifies the path for libraries. The option -l specifies the library to link. Your command should be: ifc -I/opt/mpich-1.2.5/include -L/opt/mpich-1.2.5/lib -lmpi -w -lm -o p_wg3 p_fdtd3dwg3_pml.f90 Craig > . > . > . > ) > > Most of friends suggest me to use '-lmpi', instead of '-Lmpi', I tried > it, the system gave me the following error: > " > ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90 > > module EHFIELD > program FDTD3DPML > external function RISEF > external subroutine COM_HZY > > 3228 Lines Compiled > ld: cannot find -lmpi > " > either does '-lmpif', although there exist 'mpif.h' and 'mpi.h' in the > directory '/opt/mpich-1.2.5/include'. > > I also tried the command: > " > /opt/mpich-1.2.5/bin/mpif90 -w -o p_wg3 p_fdtd3dwg3_pml.f90 > " > > The system gave the error: > " > 3228 Lines Compiled > /opt/intel/compiler70/ia32/lib/libIEPCF90.a(f90fioerr.o)(.text+0x4d3): In > function `f_f77ioerr': > : undefined reference to `__ctype_b' > " > > In fact, I don't know what this error means. Of course, I don't know > how to slove it either. > > Tom > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Wed Oct 8 14:02:17 2003 From: ds10025 at cam.ac.uk (D. Scott) Date: 08 Oct 2003 19:02:17 +0100 Subject: Root-nfs error 13 while mounting Message-ID: Evening Have resolved the problem. It was due to setting in dhcpd.conf it require option root-path pointing to root path of the node. I get another error. When diskless node boot up it can not find init file. Also, what is min files is transfer to /tftfpboot/node/? Dan On Oct 7 2003, Mark Hahn wrote: > > I'm getting error 13 when my diskless client try to mount file system. > > Hoe best to resolved this error 13? > > it's best resolved by translating it to text: EACCESS or "permission > denied". I'm guessing you should look at the logs on your fileserver, > since it seems to be rejecting your clients. > > > _______________________________________________ Beowulf mailing list, > Beowulf at beowulf.org To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Wed Oct 8 16:50:34 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed, 8 Oct 2003 13:50:34 -0700 (PDT) Subject: building a RAID system In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org> Message-ID: the 3ware 6000 7000 and 7006 cards are all gone from the marketplace the cards you want to look at are the 3ware 7506 (parallel ata) or the 3ware 8506 (serial ata). the 2400 was never seriously in the running for us because it only supports 4 drives. joelja On Wed, 8 Oct 2003, Daniel Fernandez wrote: > Hi, > > I would like to know some advice about what kind of technology apply > into a RAID file server ( through NFS ) . We started choosing hardware > RAID to reduce cpu usage. > > We have two options , SCSI RAID and ATA RAID. The first would give the > best results but on the other hand becomes really expensive so we have > in mind two ATA RAID controllers: > > Adaptec 2400A > 3Ware 6000/7000 series controllers > > Any one of these has its strong and weak points, after seeing various > benchmarks/comparisons/reviews these are the only candidates that > deserve our attention. > > The server has a dozen of client workstations connected through a > switched 100Mbit LAN , all of these equipped with it's own OS and > harddisk, all home directories will be stored under the main server, > main workload (compilation and edition) would be done on the local > machines tough, server only takes care of file sharing. > > Also parallel MPI executions will be done between the clients. > > Considering that not all the workstantions would be working full time > and with cost in mind ? it's worth an ATA RAID solution ? > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Wed Oct 8 17:46:31 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 8 Oct 2003 14:46:31 -0700 (PDT) Subject: building a RAID system In-Reply-To: Message-ID: hi ya daniel > On Wed, 8 Oct 2003, Daniel Fernandez wrote: > > > Hi, > > > > I would like to know some advice about what kind of technology apply > > into a RAID file server ( through NFS ) . We started choosing hardware > > RAID to reduce cpu usage. > > > > We have two options , SCSI RAID and ATA RAID. The first would give the > > best results but on the other hand becomes really expensive so we have > > in mind two ATA RAID controllers: > > > > Adaptec 2400A > > 3Ware 6000/7000 series controllers > > > > Any one of these has its strong and weak points, after seeing various > > benchmarks/comparisons/reviews these are the only candidates that > > deserve our attention. good points about ata raid - large disks storage ( 300GB drives at $300 each +/- ) - get those drives w/ 8MB buffer disk cache - cheap ... can do with software raid or $40 ata-133 ide controller - $300 more for making ata drives appear like scsi drives with 3ware raid controllers - slower rpm disks ... usually it tops out at 7200rpm - it supposedly can sustain 133MB/sec transfers - if you use software raid, you can monitor the raid status - if you use hardware raid, you are limited to the tools the hw vendor gives you tomonitor the raid status of pending failures or dead drives good points about scsi .. - some say scsi disks are faster ... - super expensive .. $200 for 36 GB .. at 15000rpm - it supposedly can sustain 320MB/sec transfers if the disks does transfer at its full speed ... 320MB/sec or 133MB/sec does the rest of the system get to keep up with processing the data spewing off and onto the disks independent of which raid system is built, you wil need 2 or 3 more backup systems to backup your Terabyte sized raid systems more raid fun http://www.1U-Raid5.net c ya alvin > > The server has a dozen of client workstations connected through a > > switched 100Mbit LAN , all of these equipped with it's own OS and > > harddisk, all home directories will be stored under the main server, > > main workload (compilation and edition) would be done on the local > > machines tough, server only takes care of file sharing. > > > > Also parallel MPI executions will be done between the clients. > > > > Considering that not all the workstantions would be working full time > > and with cost in mind ? it's worth an ATA RAID solution ? good p > > > > > > -- > -------------------------------------------------------------------------- > Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu > GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Wed Oct 8 15:46:59 2003 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Wed, 08 Oct 2003 21:46:59 +0200 Subject: building a RAID system Message-ID: <1065642419.9483.55.camel@qeldroma.cttc.org> Hi, I would like to know some advice about what kind of technology apply into a RAID file server ( through NFS ) . We started choosing hardware RAID to reduce cpu usage. We have two options , SCSI RAID and ATA RAID. The first would give the best results but on the other hand becomes really expensive so we have in mind two ATA RAID controllers: Adaptec 2400A 3Ware 6000/7000 series controllers Any one of these has its strong and weak points, after seeing various benchmarks/comparisons/reviews these are the only candidates that deserve our attention. The server has a dozen of client workstations connected through a switched 100Mbit LAN , all of these equipped with it's own OS and harddisk, all home directories will be stored under the main server, main workload (compilation and edition) would be done on the local machines tough, server only takes care of file sharing. Also parallel MPI executions will be done between the clients. Considering that not all the workstantions would be working full time and with cost in mind ? it's worth an ATA RAID solution ? -- Daniel Fernandez Laboratori de Termot?cnia i Energia - CTTC UPC Campus Terrassa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Wed Oct 8 06:33:13 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 08 Oct 2003 06:33:13 -0400 Subject: Why NFS hang when copying files of 6MB? In-Reply-To: References: Message-ID: <1065609193.28674.32.camel@squash.scalableinformatics.com> On Wed, 2003-10-08 at 18:17, D. Scott wrote: > Hi > > On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes? Lots of possibilities, though I am not sure you have supplied enough information to hazard a guess (unless someone ran into this before and already knows the answer). An operation on an NFS mounted file system can hang when: 1) the nfs server becomes unresponsive (crash, overload, file system full, ...) 2) the client becomes unresponsive ... 3) the network becomes unresponsive ... ... If you could indicate more details, it is likely someone might be able to tell you where to look next. > > > Dan -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ds10025 at cam.ac.uk Wed Oct 8 18:17:02 2003 From: ds10025 at cam.ac.uk (D. Scott) Date: 08 Oct 2003 23:17:02 +0100 Subject: Why NFS hang when copying files of 6MB? Message-ID: Hi On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes? Dan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Wed Oct 8 19:39:41 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed, 8 Oct 2003 19:39:41 -0400 (EDT) Subject: building a RAID system In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org> Message-ID: > I would like to know some advice about what kind of technology apply > into a RAID file server ( through NFS ) . We started choosing hardware > RAID to reduce cpu usage. that's unfortunate, since the main way HW raid saves CPU usage is by running slower ;) seriously, CPU usage is NOT a problem with any normal HW raid, simply because a modern CPU and memory system is *so* much better suited to performing raid5 opterations than the piddly little controller in a HW raid card. the master/fileserver for my cluster is fairly mundane (dual-xeon, i7500, dual PC1600), and it can *easily* saturate its gigabit connection. after all, ram runs at around 2 GB/s sustained, and the CPU can checksum at 3 GB/s! concern for PCI congestion is a much more serious issue. finally, why do you care at all? are you fileserving through a fast (>300 MB/s) network like quadrics/myrinet/IB? most people limp along at a measly gigabit, which even a two-ide-disk raid0 can saturate... > The server has a dozen of client workstations connected through a > switched 100Mbit LAN , all of these equipped with it's own OS and jeez, since your limited to 10 MB/s, you could do raid5 on a 486 and still saturate the net. seriously, CPU consumption is NOT an issue at 10 MB/s. > machines tough, server only takes care of file sharing. so excess cycles on the fileserver will be wasted unless used. > Considering that not all the workstantions would be working full time > and with cost in mind ? it's worth an ATA RAID solution ? you should buy a single promise sata150 tx4 and four big sata disks (7200 RPM 3-year models, please). regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Wed Oct 8 19:28:37 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 8 Oct 2003 16:28:37 -0700 (PDT) Subject: Why NFS hang when copying files of 6MB? In-Reply-To: <1065609193.28674.32.camel@squash.scalableinformatics.com> Message-ID: On Wed, 8 Oct 2003, Joe Landman wrote: > On Wed, 2003-10-08 at 18:17, D. Scott wrote: > > Hi > > > > On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes? > > Lots of possibilities, though I am not sure you have supplied enough > information to hazard a guess (unless someone ran into this before and > already knows the answer). > > An operation on an NFS mounted file system can hang when: > > 1) the nfs server becomes unresponsive (crash, overload, file system > full, ...) not enough memory, too much swap spce > 2) the client becomes unresponsive ... > 3) the network becomes unresponsive ... > ... - bad hub, bad switch, bad cables - bad nic cards, bad motherboard, - bad kernel, bad drivers - bad dhcp config, waiting for machines that went offline c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 8 19:41:12 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 9 Oct 2003 09:41:12 +1000 Subject: Why NFS hang when copying files of 6MB? In-Reply-To: References: Message-ID: <200310090941.13302.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 9 Oct 2003 08:17 am, D. Scott wrote: > On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes? You need to give a lot more detail on that, try having a quick read of: http://www.catb.org/~esr/faqs/smart-questions.html#beprecise Basically there are all sorts of possible problems from kernel bugs, node hardware problems through to various network problems... Useful information would be things like: /etc/fstab from the nodes output of the mount command the output of strace when you try and do the 'cp': strace -o cp.log -e trace=file cp /path/to/file /path/to/destination good luck! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/hKCYO2KABBYQAh8RAqltAJ4/R91yD0KKVA6wB3+UDZxZcAOsFwCbBZn1 DeaCjkFO8bwGLhhSkxB20yE= =d7Gz -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Wed Oct 8 22:27:35 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 8 Oct 2003 19:27:35 -0700 (PDT) Subject: CAD In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver> Message-ID: hi ya On Thu, 9 Oct 2003, Manoj Gupta wrote: > Hello, > > One of my clients has asked me to provide a solution for his AutoCAD > work. > The minimum file size on which he works is nearly of 400 MB and it takes > 15-20 minutes to load on his single system. tell them to break the drawing up into itty-bitty pieces and work on a real autocad drawing .. :-) - separate the item into separate pieces so it can be bent, welded, drilled, etc or get a 3Ghz cpu and load up 4GB or 8GB of memory and nope ... beowulf or any other cluster will not help autocad c ya alvin - part time autocad me ..but i cant draw a line .. :-) - easier to contract out the 1u chassis design "drawings" :-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 8 19:47:32 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 9 Oct 2003 09:47:32 +1000 Subject: PocketPC Cluster Message-ID: <200310090947.33601.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Not strictly a Beowulf as there's no Linux, but interesting nonetheless. :-) IrDA for the networking, 11 compute + 1 management, slower than "a mainstream Pentium II-class desktop PC" (they don't specify what spec). http://www.spbsoftwarehouse.com/dev/articles/pocketcluster/index.html Twelve Pocket PC devices have been joined in a cluster to perform distributed calculations - the devices share the load of a complex calculation. The concept was to compare the performance of several Pocket PC devices linked into a cluster with the performance of a typical Pentium II-class desktop computer. [...] - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/hKIUO2KABBYQAh8RAvJvAJoDNqZ/2m8cIqo02Hbbwzpm2DWeMQCeOltt 3LuUp1Kkoc4jnmwVNgoDoFI= =+abL -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mg_india at sancharnet.in Wed Oct 8 20:03:57 2003 From: mg_india at sancharnet.in (Manoj Gupta) Date: Thu, 09 Oct 2003 05:33:57 +0530 Subject: CAD Message-ID: <000001c38df8$e1a6d9c0$bbd2003d@myserver> Hello, One of my clients has asked me to provide a solution for his AutoCAD work. The minimum file size on which he works is nearly of 400 MB and it takes 15-20 minutes to load on his single system. Can Beowulf be used to solve this problem and minimize the time required so as to improve productivity? Sawan Gupta || mg_india at sancharnet.in || _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Wed Oct 8 20:23:28 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed, 8 Oct 2003 20:23:28 -0400 (EDT) Subject: building a RAID system In-Reply-To: Message-ID: > - get those drives w/ 8MB buffer disk cache what reason do you have to regard 8M as other than a useless marketing feature? I mean, the kenel has a cache that's 100x bigger, and a lot faster. > - slower rpm disks ... usually it tops out at 7200rpm unless your workload is dominated by tiny, random seeks, the RPM of the disk isn't going to be noticable. > - it supposedly can sustain 133MB/sec transfers it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE disks in raid0. interestingly, the chipset controller is normally not competing for the same bandwidth as the PCI, so even with entry-level hardware, it's not hard to break 133. > - if you use software raid, you can monitor the raid status this is the main and VERY GOOD reason to use sw raid. > - some say scsi disks are faster ... usually lower-latency, often not higher bandwidth. interestingly, ide disks usually fall off to about half peak bandwidth on inner tracks. scsi disks fall off too, but usually less so - they don't push capacity quite as hard. > - it supposedly can sustain 320MB/sec transfers that's silly, of course. outer tracks of current disks run at between 50 and 100 MB/s, so that's the max sustained. you can even argue that's not really 'sustained', since you'll eventually get to slower inner tracks. > independent of which raid system is built, you wil need 2 or 3 > more backup systems to backup your Terabyte sized raid systems backup is hard. you can get 160 or 200G tapes, but they're almost as expensive as IDE disks, not to mention the little matter of a tape drive that costs as much as a server. raid5 makes backup less about robustness than about archiving or rogue-rm-protection. I think the next step is primarily a software one - some means of managing storage, versioning, archiving, etc... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Wed Oct 8 21:04:03 2003 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed, 8 Oct 2003 21:04:03 -0400 Subject: CAD In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver> References: <000001c38df8$e1a6d9c0$bbd2003d@myserver> Message-ID: <20031008210403.F28876@www2> AutoCAD versions since R13 only run on Windows, and AFAIK no version of AutoCAD has ever been shipped for Linux. Beowulf is a Linux- (or, taken more liberally than most people intend, Unix-) specific thing. Thus, unless I misunderstand, no. --Bob Drzyzgula On Thu, Oct 09, 2003 at 05:33:57AM +0530, Manoj Gupta wrote: > > Hello, > > One of my clients has asked me to provide a solution for his AutoCAD > work. > The minimum file size on which he works is nearly of 400 MB and it takes > 15-20 minutes to load on his single system. > > Can Beowulf be used to solve this problem and minimize the time required > so as to improve productivity? > > > Sawan Gupta || mg_india at sancharnet.in || > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 8 21:45:08 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 8 Oct 2003 21:45:08 -0400 (EDT) Subject: building a RAID systemo In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Mark Hahn wrote: > you should buy a single promise sata150 tx4 and four big sata disks > (7200 RPM 3-year models, please). I totally agree with everything Mark said and second this. Although 3-year ata (lower) or scsi (higher) disks would be just fine too, depending on how much you care to spend and how much it costs you if things go down. e.g. md raid under linux works marvelously well, and one can even create a kickstart file so that it makes your raid for you on a fully automated install, which is very cool. It is also dirt cheap. My home (switched 100 Mbps, 8-9 hosts/nodes depending on what is on) has a 150 GB RAID-5 server (3x80 GB 3-year ATA 7200 RPM disks) on a 2.2 GHz Celeron server with an extra ATA controller so there is only one disk per channel. It cost about $800 total to build inside a full tower case with extra fans including one with leds in front so that it glows blue. You couldn't get the CASE of a HW raid for that price, I don't think (although I admit that it won't do hot swap and dual power supplies). The total RAID/NFS load since 9/19 is: root 11 0.0 0.0 0 0 ? SW Sep19 0:00 [mdrecoveryd] root 21 0.0 0.0 0 0 ? SW Sep19 0:00 [raid1d] root 22 0.0 0.0 0 0 ? SW Sep19 0:02 [raid5d] root 23 0.0 0.0 0 0 ? SW Sep19 5:03 [raid5d] ... root 4928 0.0 0.0 0 0 ? SW Sep19 2:58 [nfsd] root 4929 0.0 0.0 0 0 ? SW Sep19 2:57 [nfsd] root 4930 0.0 0.0 0 0 ? SW Sep19 3:00 [nfsd] root 4931 0.0 0.0 0 0 ? SW Sep19 2:43 [nfsd] root 4932 0.0 0.0 0 0 ? SW Sep19 3:00 [nfsd] root 4933 0.0 0.0 0 0 ? SW Sep19 2:43 [nfsd] root 4934 0.0 0.0 0 0 ? SW Sep19 2:56 [nfsd] root 4935 0.0 0.0 0 0 ? SW Sep19 2:58 [nfsd] (or less than 30 minutes of total CPU). At 1440 min/day, for 18 days (conservatively) that is about 0.1% load, on average. This is a home network load, sure (which includes gaming and a fair bit of data access, but no, we're not talking GB per day moving over the lines). In a more data-intensive environment this would increase, but there is a lot of head room. The point is that a 2.2 GHz system has a LOT of horsepower. We used to run entire departments of twenty or thirty workstations using $10-20,000 Sun servers at maybe 5 MEGAHertz on 10 Mbps thinwire networks with fair to middling satisfaction. My $800 home server has several thousand times the raw speed, about a thousand times the memory, a thousand times the disk, AND it is RAID 5 disk at that. The network has only increased in speed by a factor of maybe 10-20 (allowing for switched vs hub). Mucho headroom indeed. BTW, our current department primary server is a 1 GHz PIII, although we're adding a second CPU shortly as load dictates. And if you are planning your server to handle something other than a small cluster or LAN where downtime isn't too "expensive" you may want to look at higher quality (rackmount) servers and disk arrays in enclosures that permit e.g. hot swap and that have redundant power. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Oct 8 23:12:41 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 9 Oct 2003 13:12:41 +1000 Subject: building a RAID system In-Reply-To: References: Message-ID: <200310091312.42544.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 9 Oct 2003 10:23 am, Mark Hahn wrote: > raid5 makes backup > less about robustness than about archiving or rogue-rm-protection. > I think the next step is primarily a software one - > some means of managing storage, versioning, archiving, etc... For those who haven't seen it, this is a very interesting way of doing snapshot style backups: http://www.mikerubel.org/computers/rsync_snapshots/ - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/hNIpO2KABBYQAh8RAvXaAJ0ecv77jUJe3DWpsinqBFgs4W4JlQCfRz/z HfXF/JkFSszlvX10/JXjisM= =7lAy -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Oct 8 22:58:17 2003 From: becker at scyld.com (Donald Becker) Date: Wed, 8 Oct 2003 22:58:17 -0400 (EDT) Subject: building a RAID system In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Mark Hahn wrote: > > - get those drives w/ 8MB buffer disk cache > > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. The larger cache does provide some benefit. Disks now read and cache up to a whole track/cylinder at once, starting from when the head settles from a seek up to when the desired sector is read. You can't do that type of caching in the kernel. As disks become more dense, more memory is needed to save a cylinder's worth of data, so we should expect the cache size to increase. But you point is likely "disk cache is mostly legacy superstition". MS-Windows 98 and earlier had such horrible caching behavior that a few MB of on-disk cache could triple the performance. This was also why MS-Windows would run much faster under Linux+VMWare than on the raw hardware. > > - it supposedly can sustain 133MB/sec transfers Normal disks top out at 70MB/sec read, 50MB/sec write on the outer tracks. These numbers drop significantly on the inner tracks. You might get 10MB/sec better with 10K or 15K RPM SCSI drives, but it's certainly not linear with the speed. BTW, 2.5" laptop drives are _far_ worse. Typical for a modern fast drive is 20MB/sec read and 10MB/sec write. Older drivers were worse. > > - some say scsi disks are faster ... > > usually lower-latency, often not higher bandwidth. interestingly, > ide disks usually fall off to about half peak bandwidth on inner > tracks. scsi disks fall off too, but usually less so - they > don't push capacity quite as hard. Look at the shape of the transfer performance curve -- the shape is sometimes the same as the similar IDE drive, but sometimes has a much different curve. Wider tracks mean faster seek settling but lower density. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Wed Oct 8 22:33:49 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed, 8 Oct 2003 19:33:49 -0700 (PDT) Subject: building a RAID system - yup In-Reply-To: Message-ID: hi ya mark On Wed, 8 Oct 2003, Mark Hahn wrote: > > - get those drives w/ 8MB buffer disk cache > > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. for those squeezing the last 1MB/sec transfer out of their disks ... 8MB did seem to make a difference ( streaming video apps - encoding/decoding/xmit ) > > - slower rpm disks ... usually it tops out at 7200rpm > > unless your workload is dominated by tiny, random seeks, > the RPM of the disk isn't going to be noticable. usually a side affect of partitioning too > > - it supposedly can sustain 133MB/sec transfers > > it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE > disks in raid0. interestingly, the chipset controller is normally > not competing for the same bandwidth as the PCI, so even with > entry-level hardware, it's not hard to break 133. super easy to overflow the disks and pci .. depending on apps > > - if you use software raid, you can monitor the raid status > > this is the main and VERY GOOD reason to use sw raid. yup > > - some say scsi disks are faster ... > > usually lower-latency, often not higher bandwidth. interestingly, > ide disks usually fall off to about half peak bandwidth on inner > tracks. scsi disks fall off too, but usually less so - they > don't push capacity quite as hard. scsi capacity doesnt seem to be an issue for them ... they're falling behind by several generations ( scsi disks used to be the highest capacity drives .. not any more ) > > - it supposedly can sustain 320MB/sec transfers > > that's silly, of course. outer tracks of current disks run at > between 50 and 100 MB/s, so that's the max sustained. you can even > argue that's not really 'sustained', since you'll eventually get > to slower inner tracks. yup ... those are just marketing numbers... all averages ... and bigg differences between inner tracks and outer tracks > > independent of which raid system is built, you wil need 2 or 3 > > more backup systems to backup your Terabyte sized raid systems > > backup is hard. you can get 160 or 200G tapes, but they're almost to me ... backup of terabyte sized systems is trivial ... - just give me lots of software raid subsystems ( 2 backups for each "main" system ) - lot cheaper than tape drives and 1000x faster than tapes for live backups - will never touch a tape backup again ... too sloow and too unreliable no matter how clean the tape heads are ( too slow being the key problem for restoring ) c ya alvin > as expensive as IDE disks, not to mention the little matter of a > tape drive that costs as much as a server. raid5 makes backup > less about robustness than about archiving or rogue-rm-protection. > I think the next step is primarily a software one - > some means of managing storage, versioning, archiving, etc... > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 8 22:31:50 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 8 Oct 2003 22:31:50 -0400 (EDT) Subject: CAD In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver> Message-ID: On Thu, 9 Oct 2003, Manoj Gupta wrote: > Hello, > > One of my clients has asked me to provide a solution for his AutoCAD > work. > The minimum file size on which he works is nearly of 400 MB and it takes > 15-20 minutes to load on his single system. Load from what into what? It is hard for me to see how a 400 MB file could take this long to load into memory over any modern channel, as this is less than 0.5 MB/sec. This is roughly the bandwidth one achieves throwing floppies across a room one at a time by hand. That is, I can't imagine how this is bandwidth limited, unless the client has primitive hardware. From a local disk (even a bad one) this should take ballpark of a few seconds to load into memory. From NFS order of a minute or three (in most configurations, less on faster networks). If the load is so slow because the program is crunching the file as it loads it (reading a bit, thinking a bit, reading a bit more) then nothing can speed this up unless AutoCAD has a parallel version of their program. > Can Beowulf be used to solve this problem and minimize the time required > so as to improve productivity? I don't know for sure (although somebody else on the list might). I doubt it, though, unless autocad has a parallel version that can use a linux cluster to speed things up. However, your first step in answering it for yourself is going to be doing measurements to determine what the bottleneck is. If it is I/O then invest in better I/O (perhaps a better network). So measure e.g. the network load if it is getting the file from a network file server. If the problem is that the file is coming from a winXX server with too little memory on an antique CPU and with creaky old disks on a 10 Mbps hub, well, FIRST replace the winxx with linux, the old server with a new server, the old disks with new disks, the 10 BT with 1000 BT. At that point you won't have a bandwidth problem, as the server should be able to deliver files at some tens of MB/sec pretty easily. If the problem persists, try to figure out what autocad is doing when it loads. rgb > > > Sawan Gupta || mg_india at sancharnet.in || > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Thu Oct 9 02:00:33 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Wed, 8 Oct 2003 23:00:33 -0700 (PDT) Subject: building a RAID system In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Mark Hahn wrote: > > - get those drives w/ 8MB buffer disk cache > > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. I found a comparison of 8MB vs 2MB drives in a raid, though it's windows based and not that great: http://www.madshrimps.be/?action=getarticle&number=13&artpage=289&articID=69 Seems like the 8MB didn't really make much of a difference. > > independent of which raid system is built, you wil need 2 or 3 > > more backup systems to backup your Terabyte sized raid systems > > backup is hard. you can get 160 or 200G tapes, but they're almost > as expensive as IDE disks, not to mention the little matter of a 100GB LTO tapes can be had for $36, that's less than half the price of the cheapest 200 GB drives. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From maurice at harddata.com Thu Oct 9 00:58:27 2003 From: maurice at harddata.com (Maurice Hilarius) Date: Wed, 08 Oct 2003 22:58:27 -0600 Subject: building a RAID system In-Reply-To: <200310090112.h991CPb24907@NewBlue.scyld.com> Message-ID: <5.1.1.6.2.20031008225509.04259800@mail.harddata.com> Where you said: >I would like to know some advice about what kind of technology apply >into a RAID file server ( through NFS ) . We started choosing hardware >RAID to reduce cpu usage. > >We have two options , SCSI RAID and ATA RAID. The first would give the >best results but on the other hand becomes really expensive so we have >in mind two ATA RAID controllers: > > Adaptec 2400A > 3Ware 6000/7000 series controllers I would suggest using the 3Ware (current models are 7506 ( parallel ATA) and 8506 ( Serial ATA)). Use mdamd to create software RAID devices. It will yield better performance, and is much more flexible. If you are building a large array, use multiple controllers to increase throughput. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue mailto:maurice at harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 03:52:39 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 00:52:39 -0700 (PDT) Subject: building a RAID system In-Reply-To: Message-ID: hi ya On Wed, 8 Oct 2003, Trent Piepho wrote: > On Wed, 8 Oct 2003, Mark Hahn wrote: > > > - get those drives w/ 8MB buffer disk cache > > > > what reason do you have to regard 8M as other than a useless > > marketing feature? I mean, the kenel has a cache that's 100x > > bigger, and a lot faster. > > I found a comparison of 8MB vs 2MB drives in a raid, though it's windows > based and not that great: > http://www.madshrimps.be/?action=getarticle&number=13&artpage=289&articID=69 i dont have much data between 2MB and 8MB ... just various people's feedback ... - releasable data i do have is at http://www.Linux-1U.net/Disks/Tests/ - testing for 2MB and 8MB should be done on the same system of the same sized disks and exact same partition, distro, patchlevel and "test programs to amplify the differences" - lots of disk writes and reads ... that overflow the memory so that disk access is forced ... > Seems like the 8MB didn't really make much of a difference. > > > > independent of which raid system is built, you wil need 2 or 3 > > > more backup systems to backup your Terabyte sized raid systems -- emphasizing .. "Terabyte" sized disk subsystems > > backup is hard. you can get 160 or 200G tapes, but they're almost > > as expensive as IDE disks, not to mention the little matter of a > > 100GB LTO tapes can be had for $36, that's less than half the price of the > cheapest 200 GB drives. we/i like to build systems that backup 1TB or 2TB per 1U server ... - tapes doesn't come close ... different ballpark - a rack of 1U servers is a minimum of 40TB - 80TB of data .. - and than to turn around and simulate a disk crash and restore from backups from bare metal or how fast to get a replacement system back online ( hot swap - live backups) - i think those 200GB tape drives is something to also add into the costs of backup media .. as are restore from tape considerations before deciding on tape vs disk backup media ( all depends on the purpose of the server and data ) - last i played with tape drives was those $3K - $4K exabyte tape drives ... nice and fast (writing) .. but very slow for restore and unreliable ... and time consuming and NOT automated - people costs the mosts for doing proper backups ... ( someone has to write the backup methodology ro swap the tapes etc ) fries ( a local pc store here ) had 160GB disks 8MB buffers for $80 after rebates ... otherwise general rule is $1 per GB of raw disk storage per disk fun stuff .. have fun alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Thu Oct 9 05:25:18 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Thu, 9 Oct 2003 02:25:18 -0700 (PDT) Subject: building a RAID system In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Alvin Oga wrote: > > > backup is hard. you can get 160 or 200G tapes, but they're almost > > > as expensive as IDE disks, not to mention the little matter of a > > > > 100GB LTO tapes can be had for $36, that's less than half the price of the > > cheapest 200 GB drives. > > we/i like to build systems that backup 1TB or 2TB per 1U server ... > - tapes doesn't come close ... different ballpark How do you stick 2TB in a 1 U server? I've seen 1U cases with four IDE bays, and the largest IDE drive I've seen is 250 GB. I've got two 4U rackmount systems sitting side by side on the same shelf. One is a ADIC Scalar 24, which holds 24 100 GB LTO tapes. The other is a 16 drive server with 200GB SATA drives and two 8 port 3ware cards. The tape library has 2.4 TB and the IDE server is 3.2 TB. To be fair, the IDE server is brand new, while the ADIC is around a year old. If the tape library were bought today, it would have a LTO-2 drive with double the capacity and could store 4.8 TB. So tapes seem to come pretty close to me. It also quite a bit more practical changes tapes with the library than to be swapping hard drives around. The libraries built in barcode reader keeps track of them all for me. I can type a command and have it spit out all the tapes that a certain set of backups are on. They fit nicely in a box in their plastic cases and if I drop one it will be ok. I can stick them on a shelf for five years and still expect to read them. And the tapes don't take up any rackspace or power or need any cooling. I've never had a tape go bad on me either, even though I've been though a lot more of them than IDE drives. Of course the tape library was expensive. A new LTO-2 model can be had for around $11,600 on pricewatch. The 16 bay IDE case, CPUs/MB/memory and 3ware controllers were much less. But the cost of the media is a lot less for tapes than for SATA hard drives. Especially if you get models with 3 year warranties. Once you buy enough drives/tapes you'll break even on a $/GB comparison. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Thu Oct 9 06:04:20 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Thu, 9 Oct 2003 10:04:20 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: <20031006211806.GB2091@greglaptop.internal.keyresearch.com> References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> Message-ID: Greg, > Is it a 100x100 matrix LU decomposition? Well, no, because Intel's > MKL and the free ATLAS library run at a respectable % of peak. Our benchmarks concentrate on xGEQRF, xGESVD, xGETRF, xGETRS, xGESV, xPOTRF, xPOSV, xPPTRF, xGEEV, extending to xGETRI, and xTRTRI. Have you tried DPPSV or DPOSV on Itanium, for example? I would be interested in the percentage of peak that you achieve with MKL and ATLAS, for up to 10000x10000 matrices. ATLAS does not have full LAPACK implementation. > 4) Put your performance whitepapers on your website, or it looks > fishy. Our white papers are not on the Web they contain performance data, and particularly, performance data comparing against our competitors. It may expose us to libel legal issues. Putting legitimacy of any legal issues aside, it is not good for any business to be engulf in legal squabbles. We are in the process of clearing this with our legal department at the moment. As I have noted in my previous e-mail, anyone who wants to get a hold of the white papers are welcome to please send me an e-mail. > I looked and didn't see a single performance claim there. There is one on the front page! Ken ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 06:13:21 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 03:13:21 -0700 (PDT) Subject: building a RAID system - 8 drives In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Trent Piepho wrote: > On Thu, 9 Oct 2003, Alvin Oga wrote: > > > > backup is hard. you can get 160 or 200G tapes, but they're almost > > > > as expensive as IDE disks, not to mention the little matter of a > > > > > > 100GB LTO tapes can be had for $36, that's less than half the price of the > > > cheapest 200 GB drives. > > > > we/i like to build systems that backup 1TB or 2TB per 1U server ... > > - tapes doesn't come close ... different ballpark > > How do you stick 2TB in a 1 U server? I've seen 1U cases with four IDE bays, > and the largest IDE drive I've seen is 250 GB. 8 drives ... 250GB or 300GB each .. > I've got two 4U rackmount systems sitting side by side on the same shelf. One > is a ADIC Scalar 24, which holds 24 100 GB LTO tapes. The other is a 16 drive > server with 200GB SATA drives and two 8 port 3ware cards. The tape library > has 2.4 TB and the IDE server is 3.2 TB. To be fair, the IDE server is brand > new, while the ADIC is around a year old. If the tape library were bought > today, it would have a LTO-2 drive with double the capacity and could store > 4.8 TB. So tapes seem to come pretty close to me. It also quite a bit more > practical changes tapes with the library than to be swapping hard drives nobody swaps disks around ... unless one is using those 5.25" drive bay thingies in which case ... thats a different ball game i/we claim that if the drives fail, something is wrong ... its not necessary for the disks to be removable > around. The libraries built in barcode reader keeps track of them all for me. > I can type a command and have it spit out all the tapes that a certain set of > backups are on. They fit nicely in a box in their plastic cases and if I drop > one it will be ok. I can stick them on a shelf for five years and still i prefer hands off backups and restore .... esp if the machine is not within your hands reach ... > expect to read them. And the tapes don't take up any rackspace or power or > need any cooling. I've never had a tape go bad on me either, even though I've > been though a lot more of them than IDE drives. > > Of course the tape library was expensive. A new LTO-2 model can be had for > around $11,600 on pricewatch. The 16 bay IDE case, CPUs/MB/memory and 3ware for $11.6K ... i can build two 2TB servers or more ... 8 * $400 --> $3200 in drives ... for 2.4TB each ... + $700 for misc cpu/mem/1u case and it'd be 2 live backups of the primary 2TB system or about 2-3 months of weekly full backups depending ondata > controllers were much less. But the cost of the media is a lot less for tapes > than for SATA hard drives. Especially if you get models with 3 year > warranties. Once you buy enough drives/tapes you'll break even on a $/GB > comparison. i dont want to be baby sitting tapes ... on a daily basis and cleaning its heads or assume that someone else did c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From seth at hogg.org Thu Oct 9 06:38:54 2003 From: seth at hogg.org (Simon Hogg) Date: Thu, 09 Oct 2003 11:38:54 +0100 Subject: Intel compilers and libraries In-Reply-To: References: <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> Message-ID: <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net> At 10:04 09/10/03 +0000, C J Kenneth Tan -- Heuchera Technologies wrote: >Our white papers are not on the Web they contain performance data, and >particularly, performance data comparing against our competitors. It >may expose us to libel legal issues. Putting legitimacy of any legal >issues aside, it is not good for any business to be engulf in legal >squabbles. We are in the process of clearing this with our legal >department at the moment. > >As I have noted in my previous e-mail, anyone who wants to get a hold >of the white papers are welcome to please send me an e-mail. I would just like to comment that if you are releasing the white papers by email, what difference is that to putting it on the web? They are both still publishing. Although IANAL, I would doubt that these figures expose you legally, as long as they are correct and truthful in the figures you claim (and probabily the methodology would be pretty handy, too). Simon Hogg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Thu Oct 9 06:31:00 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Thu, 09 Oct 2003 06:31:00 -0400 Subject: building a RAID system - 8 drives In-Reply-To: References: Message-ID: <3F8538E4.9020400@lmco.com> > > How do you stick 2TB in a 1 U server? I've seen 1U cases with four > IDE bays, > > and the largest IDE drive I've seen is 250 GB. > > 8 drives ... 250GB or 300GB each .. > Cool. Do you have pictures? How do you get the other 4 drives out? I assume they're not accessible from the front so do you have to pull the unit out, pop the cover and replace the drive? > > I've got two 4U rackmount systems sitting side by side on the same > shelf. One > > is a ADIC Scalar 24, which holds 24 100 GB LTO tapes. The other is > a 16 drive > > server with 200GB SATA drives and two 8 port 3ware cards. The tape > library > > has 2.4 TB and the IDE server is 3.2 TB. To be fair, the IDE server > is brand > > new, while the ADIC is around a year old. If the tape library were > bought > > today, it would have a LTO-2 drive with double the capacity and > could store > > 4.8 TB. So tapes seem to come pretty close to me. It also quite a > bit more > > practical changes tapes with the library than to be swapping hard > drives > > nobody swaps disks around ... unless one is using those 5.25" drive bay > thingies in which case ... thats a different ball game > > i/we claim that if the drives fail, something is wrong ... its not > necessary for the disks to be removable > Are you saying that it's not necessary to have hot-swappable drives? (I'm just trying to undertand your point). Does everyone remember this: http://www.tomshardware.com/storage/20030425/index.html My only problem with this approach is off-site storage of backups. Do you pull a huge number of drives and move them off-site? (I still love the idea of using inexpensive drives for backup instead of tape though). Jeff -- Dr. Jeff Layton Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Thu Oct 9 07:07:26 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Thu, 9 Oct 2003 11:07:26 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net> References: <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net> Message-ID: Simon, > I would just like to comment that if you are releasing the white papers by > email, what difference is that to putting it on the web? They are both > still publishing. I am not a lawyer, so I cannot comment on the legal aspects of things. What if an e-mail and its attachments have a confidentiality clause attached? Ken ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 07:26:56 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 07:26:56 -0400 (EDT) Subject: building a RAID system - yup In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Alvin Oga wrote: > > > - it supposedly can sustain 320MB/sec transfers > > > > that's silly, of course. outer tracks of current disks run at > > between 50 and 100 MB/s, so that's the max sustained. you can even > > argue that's not really 'sustained', since you'll eventually get > > to slower inner tracks. > > yup ... those are just marketing numbers... all averages ... It probably refers to burst delivery out of its 8 MB cache. The actual sustained bps speed is a pure matter of N*2*\pi*\R*f/S, where N = number of heads, 2piRf is the linear/tangential speed of the platter at R the read radius, and S is the linear length per bit. This is an upper bound. Similarly average latency (seek time) is something like 1/2f, the time the platter requires to move half a rotation. > and bigg differences between inner tracks and outer tracks Well, proportional to R, at any rate. Given the physical geometry of the platters (which I get to look at when I rip open old drives to salvage their magnets) about a factor of two. > > > independent of which raid system is built, you wil need 2 or 3 > > > more backup systems to backup your Terabyte sized raid systems > > > > backup is hard. you can get 160 or 200G tapes, but they're almost > > to me ... backup of terabyte sized systems is trivial ... > - just give me lots of software raid subsystems > ( 2 backups for each "main" system ) > > - lot cheaper than tape drives and 1000x faster than tapes > for live backups > > - will never touch a tape backup again ... too sloow > and too unreliable no matter how clean the tape heads are > ( too slow being the key problem for restoring ) C'mon, Alvin. Sometimes this is a workable solution, sometimes it just plain is not. What about archival storage? What about offsite storage? What about just plain moving certain data around (where networks of ANY sort might be held to be untrustworthy). What about due diligence if you were are corporate IT exec held responsible for protecting client data against loss where the data was worth real money (as in millions to billions) compared to the cost of archival media and mechanism? "never touch a tape backup again" is romantic and passionate, but not necessarily sane or good advice for the vast range of humans out there. To backup a terabyte scale system, one needs a good automated tape changer and a pile of tapes. These days, this will (as Mark noted) cost more than your original RAID, in all probability, although this depends on how gold-plated your RAID is and whether or not you install two of them and use one to backup the other. I certainly don't have a tape changer in my house as it would cost more than my server by a factor of two or three to set up. I backup key data by spreading it around on some of the massive amounts of leftover disk that accumulates in any LAN of systems in days where the smallest drives one can purchase are 40-60 GB but install images take at most a generous allotment of 5 GB including swap. In the physics department, though, we are in the midst of a perpetual backup crisis, because it IS so much more expensive than storage and our budget is limited. Our primary department servers are all RAID and total (IIRC) over a TB and growing. We do actually back up to disk several times a day so that most file restores for dropped files take at most a few seconds to retrieve (well, more honestly a few minutes of FTE labor between finding the file and putting it back in a user's home directory). However, we ALSO very definitely make tape backups using a couple of changers, keep offsite copies and long term archives, and give users tapes of special areas or data on request. The tape system is expensive, but a tiny fraction of the cost of the loss of data due to (say) a server room fire, or a monopole storm, or a lightning strike on the primary room feed that fries all the servers to toast. I should also point out that since we've been using the RAIDs we have experienced multidisk failures that required restoring from backup on more than one occasion. The book value probability for even one occasion is ludicrously low, but the book value assumes event independence and lies. Disks are often bought in batches, and batches of disk often fail (if they fail at all) en masse. Failures are often due to e.g. overheating or electrical problems, and these are often common to either all the disks in an enclosure or all the enclosures in a server room. I don't think a sysadmin is ever properly paranoid about data loss until they screw up and drop somebody's data for which they were responsible because of inadequate backups. Our campus OIT just dropped a big chunk of courseware developed for active courses this fall because they changed the storage system for the courseware without verifying their backup, experienced a crash during the copy over, and discovered that the backup was corrupt. That's real money, people's effort, down the drain. Pants AND suspenders. Superglue around the waistband, actually. Who wants to be caught with their pants down in this way? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 09:16:43 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 09:16:43 -0400 (EDT) Subject: Intel compilers and libraries In-Reply-To: Message-ID: On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote: > > 4) Put your performance whitepapers on your website, or it looks > > fishy. > > Our white papers are not on the Web they contain performance data, and > particularly, performance data comparing against our competitors. It > may expose us to libel legal issues. Putting legitimacy of any legal Expose you to libel suits? Say what? Only if you lie about your competitor's numbers (or "cook" them so that they aren't an accurate reflection of their capabilities, as is often done in the industry) does it expose you to libel charges or more likely to the ridicule of the potential consumers (who tend to be quite knowledgeable, like Greg). One essential element to win those crafty consumers over is to compare apples to apples, not apples to apples that have been picked green, bruised, left on the ground for a while in the company of some handy worms, and then picked up so you can say "look how big and shiny and red and worm-free our apple is and how green and tiny and worm-ridden our competitor's apple is". A wise consumer is going to eschew BOTH of your "display apples" (as your competitor will often have an equally shiny and red apple to parade about and curiously bruised and sour apples from YOUR orchard) and instead insist on wandering into the various orchards to pick REAL apples from your trees for their OWN comparison. What exactly prevents you from putting your own raw numbers up, without any listing of your competitor's numbers? You can claim anything you like for your own product and it isn't libel. False advertising, possibly, but not libel. Or put the numbers up with your competitor's numbers up "anonymized" as A, B, C. And nobody will sue you for beating ATLAS/GCC/GSL numbers -- ATLAS etc are open source tools and nobody "owns" them to sue you or cares in the slightest if you beat them. The most that might happen is that if you manipulate(d) ATLAS numbers so they aren't what real humans get on real systems, people might laugh at you or more likely just ignore you thereafter. What makes you any LESS liable to libel if you distribute the white papers to (potential) customers individually? Libel is against the law no matter how, and to who, you distribute libelous material; it is against the law even if shrouded in NDA. It is against the law if you whisper it in your somebody's ears -- it is just harder to prove. Benchmark comparisons, by the way, are such a common marketing tool (and so easily bent to your own needs) that I honestly think that there is a tacit agreement among vendors not to challenge competitors' claims in court unless they are openly egregious, only to put up their own competing claims. After all, no sane company WOULD actually lie, right -- they would have a testbed system on which they could run the comparisons listed right there in court and everybody knows it. Whether the parameters, the compiler, the system architecture, the tests run etc. were carefully selected so your product wins is moot -- if it ain't a lie it ain't libel, and it is caveat emptor for the rest (and the rest is near universal practice -- show your best side, compare to their worst). > issues aside, it is not good for any business to be engulf in legal > squabbles. We are in the process of clearing this with our legal > department at the moment. > > As I have noted in my previous e-mail, anyone who wants to get a hold > of the white papers are welcome to please send me an e-mail. As if your distributing them on a person by person basis is somehow less libelous? Or so that you can ask me to sign an NDA so that your competitors never learn that you are libelling them? I rather think that an NDA that was written to protect illegal activity be it libel or drug dealing or IP theft would not stand up in court. Finally, product comparisons via publically available benchmarks of products that are openly for sale don't sound like trade secrets to me as I could easily duplicate the results at home (or not) and freely publish them. Your company's apparent desire to conceal this comes across remarkably poorly to the consumer. It has the feel of "Hey, buddy, wanna buy a watch? Come right down this alley so I can show you my watches where none of the bulls can see" compared to an open storefront with your watches on display to anyone, consumer or competitor. This is simply my own viewpoint, of course. I've simply never heard of a company shrinking away from making the statement "we are better than our competitors and here's why" as early and often as they possibly could. AMD routinely claims to be faster than Intel and vice versa, each has numbers that "prove" it -- for certain tests that just happen to be the tests that they tout in their claims, which they can easily back up. For all the rest of us humans, our mileage may vary and we know it, and so we mistrust BOTH claims and test the performance of our OWN programs on both platforms to see who wins. I'm certain that the same will prove true for your own product. I don't care about your benchmarks except as a hook to "interest" me. Perhaps they will convince me to get you to loan me access to your libraries etc to link them into my own code to see if MY code speeds up relative to the way I have it linked now, or relative to linking with a variety of libraries and compilers. Then I can do a real price/performance comparison and decide if I'm better off buying your product (and buying fewer nodes) or using an open source solution that is free (and buying more nodes). Which depends on the scaling properties of MY application, costs, and so forth, and cannot be predicted on the basis of ANY paper benchmark. Finally, don't assume that this audience is naive about benchmarking or algorithms, or at all gullible about performance numbers and vendor claims. A lot of people on the list (such as Greg) almost certainly have far more experience with benchmarks than your development staff; some are likely involved in WRITING benchmarks. If you want to be taken seriously, put up a full suite of benchmarks, by all means, and also carefully indicate how those benchmarks were run as people will be interested in duplicating them and irritated if they are unable to. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jsims at csiopen.com Thu Oct 9 09:02:11 2003 From: jsims at csiopen.com (Joey Sims) Date: Thu, 9 Oct 2003 09:02:11 -0400 Subject: building a RAID system - 8 drives Message-ID: <812B16724C38EE45A802B03DD01FD5472A3BF4@exchange.concen.com> 300GB Maxtor ATA133 5400RPM drives are the largest currently available. 250GB is the largest SATA currently. You can achieve 2TB in a 1U by using a drive sled that will hold two drives. The drives are mounted opposing each other and share a backplane. This is a proprietary solution. Or, if you have a chassis with 4 external trays and a few internal 3.5" bays it could be done. I personally don't believe cramming this many drives in a 1U is a good idea. Increased heat due to lack of airflow would have to decrease the lifespan of the drives. ---------------------------------------------------- |Joey P. Sims 800.995.4274 x 242 |Sales Manager 770.442.5896 - Fax |HPC/Storage Division jsims at csiopen.com |Concentric Systems,Inc. www.csilabs.net ---------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Thu Oct 9 07:02:57 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Thu, 09 Oct 2003 07:02:57 -0400 Subject: building a RAID system - 8 drives In-Reply-To: References: Message-ID: <3F854061.3040208@lmco.com> Alvin Oga wrote: > > On Thu, 9 Oct 2003, Jeff Layton wrote: > > > > > How do you stick 2TB in a 1 U server? I've seen 1U cases with four > > > IDE bays, > > > > and the largest IDE drive I've seen is 250 GB. > > > > > > 8 drives ... 250GB or 300GB each .. > > > > > > > Cool. Do you have pictures? How do you get the other 4 drives > > out? I assume they're not accessible from the front so do you > > have to pull the unit out, pop the cover and replace the drive? > > yup.. pull the cover off and pop out the drive the hard way vs > "hot swap ide tray" > > autocad generated *.jpg file > http://linux-1u.net/Dwg/jpg.sm/c2500.jpg > > ( newer version has the mb and ps swapped for better cpu cooling) > http://linux-1u.net/Dwg/jpg.sm/c2610.jpg ( also holds 8 drives ) > > > > i/we claim that if the drives fail, something is wrong ... its not > > > necessary for the disks to be removable > > > > > > > Are you saying that it's not necessary to have hot-swappable > > drives? (I'm just trying to undertand your point). > > if the drive is dying .... > - find out which brand/model# it is and avoid it > - find out if others are having similar problems > - put a 40x40x20mm fan on the (7200rpm) disks and see if it helps > > i'm not convinced that hotswap ide works w/o special ide controllers > - pull the ide disk out while its powered up > - pull the ide disk out while you're writing a 2GB file to it > > - or insert the disk while the rest of the systme is up and > running > > if you have to power down to take the ide disk out, you might as > well do a clean shutdown and replace the disk the hard way with > a screw driver instead of nice ($50 expensive) drive bay handle > $ 50 can be an extra 80GB of disk space when a good sale > is occuring at the local fries stores > We've got several NAS boxes with hot-swappable IDE drives and without it we'd be toast. Granted the controller is specialized, coming from one vendor, but it allows us to have a fail-over drive with auto-rebuild in the background. Then we just pull the bad drive, put in a new one, and designate it as the new hot spare. Works great! It's saved our bacon a few times. I've wanted to test hot-swap with 3ware controllers, but have never done it. Has anyone tested the hotswap capability of the 3ware controllers/cases? Another comment. If you have to pull the node to replace the drive, then you have to bring down the filesystem which might not be the best thing to do. Hot-swapping allows the filesystem to keep functioning, albeit at a lower performance level. > > Does everyone remember this: > > > > http://www.tomshardware.com/storage/20030425/index.html > > > > My only problem with this approach is off-site storage of > > backups. Do you pull a huge number of drives and move them > > off-site? (I still love the idea of using inexpensive drives for > > backup instead of tape though). > > i suppose you can do "incremental" backups across the wire ... > and "inode" based backups too ... > > - it'd be crazy to xfer the entire 1MB file if > only 1 line changed in it > We can't do backups across the wire to an offsite storage facility. So we have to do backups, pull the tapes, and store them off-site. I'm just not sure how this would work with disks instead of tapes. Oh, you can full and incremental backups to disk - most backup software doesn't care what the media is anyway - but I'm just not sure if you pull a set of disks and store them. How does off-site backup recovery work? Do you pop them in, mount them as read-only, and copy them to a live filesystem? However, despite all of these questions, at some point soon, disk will be the only way to get backups of LARGE filesystems in a reasonable amount of time. Jeff -- Dr. Jeff Layton Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 07:36:40 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 04:36:40 -0700 (PDT) Subject: building a RAID system - 8 drives In-Reply-To: <3F8538E4.9020400@lmco.com> Message-ID: On Thu, 9 Oct 2003, Jeff Layton wrote: > > > How do you stick 2TB in a 1 U server? I've seen 1U cases with four > > IDE bays, > > > and the largest IDE drive I've seen is 250 GB. > > > > 8 drives ... 250GB or 300GB each .. > > > > Cool. Do you have pictures? How do you get the other 4 drives > out? I assume they're not accessible from the front so do you > have to pull the unit out, pop the cover and replace the drive? yup.. pull the cover off and pop out the drive the hard way vs "hot swap ide tray" autocad generated *.jpg file http://linux-1u.net/Dwg/jpg.sm/c2500.jpg ( newer version has the mb and ps swapped for better cpu cooling) http://linux-1u.net/Dwg/jpg.sm/c2610.jpg ( also holds 8 drives ) > > i/we claim that if the drives fail, something is wrong ... its not > > necessary for the disks to be removable > > > > Are you saying that it's not necessary to have hot-swappable > drives? (I'm just trying to undertand your point). if the drive is dying .... - find out which brand/model# it is and avoid it - find out if others are having similar problems - put a 40x40x20mm fan on the (7200rpm) disks and see if it helps i'm not convinced that hotswap ide works w/o special ide controllers - pull the ide disk out while its powered up - pull the ide disk out while you're writing a 2GB file to it - or insert the disk while the rest of the systme is up and running if you have to power down to take the ide disk out, you might as well do a clean shutdown and replace the disk the hard way with a screw driver instead of nice ($50 expensive) drive bay handle $ 50 can be an extra 80GB of disk space when a good sale is occuring at the local fries stores > Does everyone remember this: > > http://www.tomshardware.com/storage/20030425/index.html > > My only problem with this approach is off-site storage of > backups. Do you pull a huge number of drives and move them > off-site? (I still love the idea of using inexpensive drives for > backup instead of tape though). i suppose you can do "incremental" backups across the wire ... and "inode" based backups too ... - it'd be crazy to xfer the entire 1MB file if only 1 line changed in it c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Thu Oct 9 08:24:20 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu, 9 Oct 2003 14:24:20 +0200 (CEST) Subject: building a RAID system In-Reply-To: Message-ID: On Wed, 8 Oct 2003, Mark Hahn wrote: > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. Yes, but the kernel might be dumb at times, like when splitting large requests into small pieces to be fed to the block subsystem just to be reassembled again before being sent to the disk :-) Another issue is how this memory is used by the drive firmware. I've seen tests that show some Fujitsu SCSI disks (MAN or MAP series, IIRC) perform much better than competitors in multi-user situations (lots of different files accessed by different users, supposedly scattered on the disk) while the competitors were better at streaming media (one big file used by a single user, supposedly contiguously placed on disk). > unless your workload is dominated by tiny, random seeks, Or your file-system becomes full and thus fragmented. Been there, done that! I've had a big storage device changed from ext3 to XFS because ext3 at about 50% fragmentation was horribly slow; XFS allows live (without unmounting or mounting "ro") defragmentation. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 09:42:46 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 09:42:46 -0400 (EDT) Subject: building a RAID system - yup - superglue In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Alvin Oga wrote: > > Pants AND suspenders. Superglue around the waistband, actually. Who > > wants to be caught with their pants down in this way? > > always got bit by tapes... somebody didnt change the tape on the 13th > a couple months ago ... and critical data is now found to be missing > - people do forget to change tapes ... or clean heads... > ( thats the part i dont like about tapes .. and is the most > ( common failure mode for tapes ... easily/trivially avoided by > ( disks-to-disk backups > > - people get sick .. people go on vacations .. people forget > > > - no (similar) problems since doing disk-to-disk backups > - and i usually have 3-6 months of full backups floating around > in compressed form All agreed. And tapes aren't that permanent a medium either -- they deteriorate on a timescale of years to decades, with data bleeding through the film, dropped bits due to cosmic ray strikes, depolymerization of the underlying tape itself. Even before the tape itself is unreadable, you are absolutely certain to be unable to find a working drive to read it with. I have a small pile of obsolete tapes in my office -- tapes made with drives that no longer "exist", and that is after dumping the most egregiously useless of them. Still, I'd argue that the best system for many environments is to use all three: RAID, real backup to (separate) disk, possibly a RAID as well, and tape for offsite and archival purposes. The first two layers protect you against the TIME required to handle users accidentally deleting files (the most common reason to access a backup) as retrieval is usually nearly instantaneous and not at all labor intensive. It also protects you agains the most common single-server failures that get past the protection of RAID itself (multidisk failures, blown controllers). The tape (with periodic offsite storage) protects you against server room fire, brownouts or spikes that cause immediate data corruption or disk loss on both original and backup servers, and tapes can be saved for years -- far longer than one typically can go back on a disk backup mechanism. Users not infrequently want to get at a file version they had LAST YEAR, especially if they don't use CVS. Finally, some research groups generate data that exceeds even TB-scale disk resources -- they constantly move data in and out of their space in GB-sized chunks. They often like to create their own tape library as a virtual extension of the active space. Tapes aren't only about backup. So you engineer according to what you can afford and what you need, making the usual compromises brought about by finite resources. BTW, one point that hasn't been made in the soft vs hard RAID argument is that with hard RAID you are subject to (proprietary) HARDWARE obsolescence, which typically is more difficult to control than software. You build a RAID, populate it, use it. After a few years, the RAID controller itself dies (but the disks are still good). Can you get another? One that can actually retrieve the data on your disks? There are no guarantees. Maybe the company that made your controller is still in business (or rather, still in the RAID business). Maybe they either still carry old models, or can do depot repair, or maybe new models can still handle the raid encoding they implemented with the old model. Maybe you can AFFORD a new model, or maybe it has all sorts of new features and costs 3x as much as the first one did (which may not have been cheap). Maybe it takes you weeks to find a replacement and restore access to your data. Soft RAID can have problems of its own (if the software for example evolves to where it is no longer backwards compatible) but it is a whole lot easier to cope with these problems and they are strictly under your control. You are very unlikely to have any "event" like the death of the RAID server that prevents you from retrieving what is on the disks (at a cost likely to be quite controllable and in a timely way) as long as the disks themselves are not corrupted. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.worsham at mci.com Thu Oct 9 09:07:25 2003 From: michael.worsham at mci.com (Michael.Worsham) Date: Thu, 09 Oct 2003 09:07:25 -0400 Subject: CAD Message-ID: <000201c38e66$49b2aa40$94022aa6@Wcomnet.com> My wife works for a construction/architecture firm and handles AutoCad files like this all the time (some even larger at times, depending on the client). One thing we looked at first was what platform they were running the AutoCad on. Windows XP or 95/98 can't really handle Autocad as it is a highly intensive CPU application. We had a similar 'old' layout where CAD machines were more based as a word processing workstation than as a CAD station. Given the amount of work this firm produced in a single day, we went for a Dual Xeon P4 setup w/ 4 GB ram and 36 GB SCSI hard drives loaded with Windows 2000 Pro Workstation. When deciding the P4 hardware platform, look for boards that have PCI-X slots... esp for giganet NIC cards and if needed, Hardware RAID SCSI adapters. Refrain from using ATA, esp since CAD likes to really utilize the hard drives and ATA would most likely wear out faster. (Though some might look that using Xeon is overkill, lets just say there are many times it has come in handy when the customer shows up on-site unexpectedly and wants to see a progress report or has changes to be added. Pulling up the program and the data file in a couple of seconds rather than several minutes makes a beliver out of you in an instant.) If the file is being downloaded from a file server, using standard 10/100 via a cheap hub isn't going to cut it. Best to utilize something of a 10/100/1000 switch (ie. copper giganet) and 10/100/1000 NICs in each of the machines. Make sure the card is set for FULL-DUPLEX to fully utilize the bandwidth needed esp for downloading large files from the file server. Based on the file server specs, its is similar to that of the workstations however it is running Windows 2000 Advanced Server w/ Veritas Backup... can't be too careful for DR measures, esp with CAD files of this caliber. -- M Michael Worsham MCI/Intermedia Communications System Administrator & Applications Engineer Phone: 813-829-6845 Vnet: 838-6845 E-mail: michael.worsham at mci.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 07:47:55 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 04:47:55 -0700 (PDT) Subject: building a RAID system - yup - superglue In-Reply-To: Message-ID: hi ya robert On Thu, 9 Oct 2003, Robert G. Brown wrote: > On Wed, 8 Oct 2003, Alvin Oga wrote: > > > - will never touch a tape backup again ... too sloow > > and too unreliable no matter how clean the tape heads are > > ( too slow being the key problem for restoring ) > > C'mon, Alvin. Sometimes this is a workable solution, sometimes it just > plain is not. What about archival storage? What about offsite storage? > What about just plain moving certain data around (where networks of ANY > sort might be held to be untrustworthy). What about due diligence if > you were are corporate IT exec held responsible for protecting client > data against loss where the data was worth real money (as in millions to > billions) compared to the cost of archival media and mechanism? "never > touch a tape backup again" is romantic and passionate, but not > necessarily sane or good advice for the vast range of humans out there. yup .. maybe an oversimplied statement ... tapes are my (distant) 2nd choice for backups of xx-Terabyte sized servers.. disk-to-disk being my first choice ( preferrably to 2 other similar sized machines ) ( it's obviously not across a network :-) i randomly restore from backups and do a diff w/ the current servers before it dies .. > Pants AND suspenders. Superglue around the waistband, actually. Who > wants to be caught with their pants down in this way? always got bit by tapes... somebody didnt change the tape on the 13th a couple months ago ... and critical data is now found to be missing - people do forget to change tapes ... or clean heads... ( thats the part i dont like about tapes .. and is the most ( common failure mode for tapes ... easily/trivially avoided by ( disks-to-disk backups - people get sick .. people go on vacations .. people forget - no (similar) problems since doing disk-to-disk backups - and i usually have 3-6 months of full backups floating around in compressed form c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Thu Oct 9 09:26:45 2003 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Thu, 9 Oct 2003 09:26:45 -0400 (EDT) Subject: building a RAID system - 8 drives In-Reply-To: <3F854061.3040208@lmco.com> Message-ID: On Thu, 9 Oct 2003 at 7:02am, Jeff Layton wrote > spare. Works great! It's saved our bacon a few times. I've > wanted to test hot-swap with 3ware controllers, but have > never done it. Has anyone tested the hotswap capability of > the 3ware controllers/cases? Yes, and it works just as advertised. To add my $.05 to the discussion, I'm a pretty big fan of the 3wares -- I currently have 5TB of formatted space (with about 2TB of data) on them. I have two servers with 2 cards and 16 drives in them, and one with 1 card and 8 drives. On the two board servers, I run the 3wares in hardware RAID mode (R5 with a hot spare), and then do a software stripe across the two hardware arrays. With the boards on separate PCI busses, this lets the stripe go faster than the 266MB/s that the boards are limited to (these are 7500 boards, which are 64/33). 3ware's 3DM also lets you monitor the status of your arrays (it's almost too verbose, actually), and do all sorts of online maintenance. Not having used mdadm much, I can't really compare the functionality of the two. A couple of nice features of 3DM is that it lets you schedule array verification and background disk scanning, which can find problems before they affect the array. I'm not sure what cases or backplane these systems use (I bought 'em from Silicon Mechanics, who I highly recommend), but the hot swap has always just worked. If anyone's interested, I have benchmarks (bonnie++ and tiobench) of one of the 2 board systems using pure software RAID as well as the setup above. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 10:09:56 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 10:09:56 -0400 (EDT) Subject: building a RAID system - 8 drives In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Alvin Oga wrote: > > My only problem with this approach is off-site storage of > > backups. Do you pull a huge number of drives and move them > > off-site? (I still love the idea of using inexpensive drives for > > backup instead of tape though). > > i suppose you can do "incremental" backups across the wire ... > and "inode" based backups too ... > > - it'd be crazy to xfer the entire 1MB file if > only 1 line changed in it http://rdiff-backup.stanford.edu/ The name says it all. I believe it is built on top of rsync -- at any rate it is distributed in an rpm named librsync. Awesome tool -- creates a mirror, then saves incremental compressed diffs. It is the way we can restore so quickly and yet maintain a decent archival/historical backup where a user CAN request file X from last friday (or even the version between the hours of midnight and noon on last friday). Efficient enough to run several times a day on the most active part of your space and not eat a hell of a lot of either disk or network BW. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Thu Oct 9 09:48:10 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Thu, 09 Oct 2003 09:48:10 -0400 Subject: building a RAID system - yup In-Reply-To: References: Message-ID: <1065707290.4708.28.camel@protein.scalableinformatics.com> On Thu, 2003-10-09 at 07:26, Robert G. Brown wrote: > users tapes of special areas or data on request. The tape system is > expensive, but a tiny fraction of the cost of the loss of data due to > (say) a server room fire, or a monopole storm, or a lightning strike on > the primary room feed that fries all the servers to toast. Monopole storm... (smile) I seem to remember (old bad and likely wrong memory) that Max Dresden had predicted one monopole per universe as a consequence of the standard model. Not my area of (former) expertise, so reality may vary from my memory ... [...] > I don't think a sysadmin is ever properly paranoid about data loss until > they screw up and drop somebody's data for which they were responsible > because of inadequate backups. Our campus OIT just dropped a big chunk I always ask my customers a simple question: What is the cost to you to recreate all the data you lost when your disk/tape dies? That is I tend to recommend multiple redundant systems for backup. I also like to point out that you can build a single point of failure into any system, and the cost of recovering from that failure needs to be considered when designing systems to back up the possibly failing systems. If you backup all your systems over the network, and your network dies, are you in a bad way when you need to restore? What about, if you back up everything to a single tape drive, and the drive dies (and you need your backup). Single points of failure are critical to identify. They are also critical to estimate impact from. Most folks have a backup solution of some sort. Some of them are even reasonable, though few of them are about to withstand a single failure in a critical component. My old research group has a tape changer robot and drive from a well known manufacturer. Said well known manufacturer recently told them that since the unit was EOLed about 2 years ago, there would be no more fixes available for it. They (the research group) told me that they were having trouble with it... One tape drive, one point of failure. Tape drive company is happy because you now have to drop a chunk of change on their new units, or scour eBay for old ones. -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Oct 9 08:30:45 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 09 Oct 2003 12:30:45 GMT Subject: building a RAID system - 8 drives In-Reply-To: References: Message-ID: <20031009123045.7582.qmail@houston.wolf.com> Alvin Oga writes: > > On Thu, 9 Oct 2003, Trent Piepho wrote: > >> On Thu, 9 Oct 2003, Alvin Oga wrote: > nobody swaps disks around ... unless one is using those 5.25" drive bay > thingies in which case ... thats a different ball game No quite true. We use Rare drives (one box) to move up to a TB of data around w/o having to take the time to create tapes and then download them. That takes a lot of time, even w/ LTOs. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rtomek at cis.com.pl Thu Oct 9 10:22:53 2003 From: rtomek at cis.com.pl (Tomasz Rola) Date: Thu, 9 Oct 2003 16:22:53 +0200 (CEST) Subject: PocketPC Cluster In-Reply-To: <200310090947.33601.csamuel@vpac.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 9 Oct 2003, Chris Samuel wrote: > Not strictly a Beowulf as there's no Linux, but interesting nonetheless. :-) > > IrDA for the networking, 11 compute + 1 management, slower than "a mainstream > Pentium II-class desktop PC" (they don't specify what spec). > > http://www.spbsoftwarehouse.com/dev/articles/pocketcluster/index.html Yes, it's nice of course. One can also build such cluster with Linux-based devices: http://www.handhelds.org/ I myself would like to see if the performance changes after switching to Linux. One thing that should be considered is cooling. On my iPAQ, when cpu load gets too high for too long, the joy button warms itself. This means, cpu is even more heated. The other issue is power consumption. If I understand what SBP did, they run the cluster on electricity from the wall, not from the battery. My own observetion suggests, that running high load on battery consumes about 2-3 times more power than things like reading html files. - From the performance side, I wonder how this compares to the following page: http://www.applieddata.net/design_Benchmark.asp which suggests StrongARM SA 1100 @200 is 3x faster than Pentium @166? I was interested myself, so I ran the quick test on my own iPAQ 3630 (SA 1110 @206) and on AMD-k6-2 @475. On iPAQ: - -bash-2.05b# `which time` -p python /tmp/erasieve.py --limit 1000 --quiet real 0.94 user 0.91 sys 0.04 On K6: => (1020 29): /usr/bin/time -p erasieve.py --limit 1000 --quiet real 0.51 user 0.49 sys 0.02 So, how can 12 PocketPCs be slower than 1 p2 (with no clock given at all, but if I remember they were about 500MHz at best)? If I haven't misunderstood something, they probably didn't tuned their experiment too well. BTW, most PDA cpus lack fpu. So, while such claster may be nice to ad-hoc password breaking, with nanoscale simulation it will be rather the opposite, I think. bye T. - -- ** A C programmer asked whether computer had Buddha's nature. ** ** As the answer, master did "rm -rif" on the programmer's home ** ** directory. And then the C programmer became enlightened... ** ** ** ** Tomasz Rola mailto:tomasz_rola at bigfoot.com ** -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 5.0i for non-commercial use Charset: noconv iQA/AwUBP4VvRBETUsyL9vbiEQJfvwCeLU3/270BajC74e+r2HEKs27QoXgAn0fP C8FHl6mDchvmMBr04oWioqg0 =wFOr -----END PGP SIGNATURE----- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 10:32:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 10:32:12 -0400 (EDT) Subject: Intel compilers and libraries In-Reply-To: Message-ID: On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote: > Robert, > > You covered some of the issues that we are addressing with our lawyers > right now. It's a process which, as knowledgeable as you are, I am > sure you can understand we have to go through. The comparison, sure, go through the process. Putting your own numbers up, no, I cannot see why you need lawyers to tell you you can do this. How can somebody sue you for putting up the results of your own good-faith tests of your own product? There wouldn't be a manufacturer in existence not bogged down in court if you could (successfully) sue Tide for claiming that it gets clothes cleaner and removes stains when the first time you wash a shirt with it the shirt remains dirty and stains don't come out, for example. Why, I myself would quit work and live on the proceeds of my many suits, if every product out there had to strictly live up to its claims. The most recourse the consumer has is to not buy Tide (or whatever other detergent offendeth thee, nothing against Tide but there are plenty of stains NO detergent removes except maybe xylene or fuming nitric acid based ones:-). Or, if they are really irritated -- it is a GRASS stain and the Tide ad on TV last night shows Tide succeeding against GRASS stains in particular -- they can take the box back to the store and likely get their money back. But sue Tide? Only in Ralph Nader's dreams... Caveat emptor is more than a latin phrase, it is a principle of law. You have to look at the horse's teeth yourself, or don't blame the vendor for claiming that the old nag they sold you was really a young and vibrant horse. To them perhaps it was -- it is a question of just what an old nag is (opinion) vs the age of the horse as indicated by its teeth (fact). Only if the claims are egregious (this here snake oil will cause hair to grow on your head, cure erectile dysfunction, and make you smell nice all for the reasonable price of a dollar a bottle) is there any likelihood of grievance that might be addressed. Surely your claims aren't egregious. Your product doesn't slice, dice, and even eat your meatloaf for you...does it?;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Thu Oct 9 11:48:02 2003 From: landman at scalableinformatics.com (Joseph Landman) Date: Thu, 09 Oct 2003 11:48:02 -0400 Subject: [Fwd: [Bioclusters] 2004 Bioclusters Workshop 1st Announcement -- March 2004, Boston MA USA] Message-ID: <1065714482.4713.73.camel@protein.scalableinformatics.com> -----Forwarded Message----- > ======================================================================= > MEETING ANNOUNCEMENT / CALL FOR PRESENTERS > ======================================================================= > BIOCLUSTERS 2004 Workshop > March 30, 2004 > Hynes Convention Center, Boston MA USA > ======================================================================= > > * Speakers Wanted - Please Distribute Where Appropriate * > > Organized by several members of the bioclusters at bioinformatics.org > mailing list, the Bioclusters 2004 Workshop is a networking and > educational forum for people involved in all aspects of cluster and > grid computing within the life sciences. > > The motivation for organizers of this event was the cancellation of the > O'Reilly Bioinformatics Technology Conference series and the general > lack of forums for researchers and professionals involved with the > applied use of high performance IT and distributed computing techniques > in the life sciences. > > The primary focus of the workshop will be technical presentations from > experienced IT professionals and scientific researchers discussing real > world systems, solutions, use-cases and best practices. > > This event is being held onsite at the Hynes Convention Center on the > first day of the larger 2004 Bio-IT World Conference+Expo. BioIT-World > Magazine is generously providing space and logistical support for the > meeting and workshop attendees will have access to the expo floor and > keynote addresses. Registration & fees will be finalized in short > order. > > Presentations will be broken down among a few general content areas: > > 1. Researcher, Application & End user Issues > 2. Builder, Scaling & Integration Issues > 3. Future Directions > > The organizing committee is actively soliciting presentation proposals > from members of the life science and technical computing communities. > Interested parties should contact the committee at bioclusters04 at open- > bio.org. > > > Bioclusters 2004 Workshop Committee Members > > J.W Bizzaro ? Bioinformatics Organization Inc. > James Cuff - MIT/Harvard Broad Institute > Chris Dwan - The University of Minnesota > Chris Dagdigian ? Open Bioinformatics Foundation & BioTeam Inc. > Joe Landman ? Scalable Informatics LLC > > The committee can be reached at: bioclusters04 at open-bio.org > > > About the Bioclusters Mailing List Community > > The bioclusters at bioinformatics.org mailing list is a 600+ member forum > for users, builders and programmers of distributed systems used in life > science research and bioinformatics. For more information about the > list including the public archives and subscription information please > visit http://bioinformatics.org/mailman/listinfo/bioclusters > -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Thu Oct 9 10:35:16 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Thu, 09 Oct 2003 10:35:16 -0400 Subject: building a RAID system - yup - superglue In-Reply-To: References: Message-ID: <3F857224.9040801@lmco.com> Robert G. Brown wrote: > On Thu, 9 Oct 2003, Alvin Oga wrote: > > > > Pants AND suspenders. Superglue around the waistband, actually. Who > > > wants to be caught with their pants down in this way? > > > > always got bit by tapes... somebody didnt change the tape on the 13th > > a couple months ago ... and critical data is now found to be missing > > - people do forget to change tapes ... or clean heads... > > ( thats the part i dont like about tapes .. and is the most > > ( common failure mode for tapes ... easily/trivially avoided by > > ( disks-to-disk backups > > > > - people get sick .. people go on vacations .. people forget > > > > > > - no (similar) problems since doing disk-to-disk backups > > - and i usually have 3-6 months of full backups floating around > > in compressed form > > All agreed. And tapes aren't that permanent a medium either -- they > deteriorate on a timescale of years to decades, with data bleeding > through the film, dropped bits due to cosmic ray strikes, > depolymerization of the underlying tape itself. Even before the tape > itself is unreadable, you are absolutely certain to be unable to find a > working drive to read it with. I have a small pile of obsolete tapes in > my office -- tapes made with drives that no longer "exist", and that is > after dumping the most egregiously useless of them. > > Still, I'd argue that the best system for many environments is to use > all three: RAID, real backup to (separate) disk, possibly a RAID as > well, and tape for offsite and archival purposes. > I can say with some authority that this is what we at Lockheed Aeronautics do. And rather than extend this email by quoting Bob below, we also have an HSM system that we use for data we may need in the next couple of years. Jeff -- Dr. Jeff Layton Chart Monkey - Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Thu Oct 9 07:59:54 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Thu, 9 Oct 2003 07:59:54 -0400 (EDT) Subject: building a RAID system In-Reply-To: Message-ID: I would also echo most of Mark's points aside from the 8 MB cache issue. I have seen some noticeable speed improvements using 2 MB vs 8 MB drives. I would also offer one other point. No matter whether you use SCSI or IDE drives, be absolutely certain that you keep the drives cool. The "internal" 3.5 bays in most cases are normally useless because they place several drives in almost direct contact. The drive(s) sandwiched in the middle have only their edges exposed to air and have to dissipate the bulk of their heat through the neighboring drives. I like mount the drives in 5.25 bays. This at least provides an air gap for some cooling. For large raid servers, I like to use the cheap fan coolers. They can be had for $5 - $8 each and include 2 or 3 small fans that fill in the 5.25 opening and the 5.25-to-3.5 mounting brackets. Of course, that makes for a lot of fan noise. We typically build 2 identical raid servers connected by a dedicated gigabit link to do nightly backups, both to protect from raid failure and user error. I would like to ask if anyone has investigated Benjamin LaHaise netmd application yet? http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf I think there was some discussion of it a few months ago, but I haven't seen anything lately. Thanks, Mike Prinkey Aeolus Research, Inc. On Wed, 8 Oct 2003, Mark Hahn wrote: > > - get those drives w/ 8MB buffer disk cache > > what reason do you have to regard 8M as other than a useless > marketing feature? I mean, the kenel has a cache that's 100x > bigger, and a lot faster. > > > - slower rpm disks ... usually it tops out at 7200rpm > > unless your workload is dominated by tiny, random seeks, > the RPM of the disk isn't going to be noticable. > > > - it supposedly can sustain 133MB/sec transfers > > it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE > disks in raid0. interestingly, the chipset controller is normally > not competing for the same bandwidth as the PCI, so even with > entry-level hardware, it's not hard to break 133. > > > - if you use software raid, you can monitor the raid status > > this is the main and VERY GOOD reason to use sw raid. > > > - some say scsi disks are faster ... > > usually lower-latency, often not higher bandwidth. interestingly, > ide disks usually fall off to about half peak bandwidth on inner > tracks. scsi disks fall off too, but usually less so - they > don't push capacity quite as hard. > > > - it supposedly can sustain 320MB/sec transfers > > that's silly, of course. outer tracks of current disks run at > between 50 and 100 MB/s, so that's the max sustained. you can even > argue that's not really 'sustained', since you'll eventually get > to slower inner tracks. > > > independent of which raid system is built, you wil need 2 or 3 > > more backup systems to backup your Terabyte sized raid systems > > backup is hard. you can get 160 or 200G tapes, but they're almost > as expensive as IDE disks, not to mention the little matter of a > tape drive that costs as much as a server. raid5 makes backup > less about robustness than about archiving or rogue-rm-protection. > I think the next step is primarily a software one - > some means of managing storage, versioning, archiving, etc... > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Thu Oct 9 09:34:56 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Thu, 9 Oct 2003 13:34:56 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: References: Message-ID: Robert, You covered some of the issues that we are addressing with our lawyers right now. It's a process which, as knowledgeable as you are, I am sure you can understand we have to go through. Ken ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- On Thu, 9 Oct 2003, Robert G. Brown wrote: > Date: Thu, 9 Oct 2003 09:16:43 -0400 (EDT) > From: Robert G. Brown > To: C J Kenneth Tan -- Heuchera Technologies > Cc: Greg Lindahl , beowulf at beowulf.org > Subject: Re: Intel compilers and libraries > > On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote: > > > > 4) Put your performance whitepapers on your website, or it looks > > > fishy. > > > > Our white papers are not on the Web they contain performance data, and > > particularly, performance data comparing against our competitors. It > > may expose us to libel legal issues. Putting legitimacy of any legal > > Expose you to libel suits? Say what? > > Only if you lie about your competitor's numbers (or "cook" them so that > they aren't an accurate reflection of their capabilities, as is often > done in the industry) does it expose you to libel charges or more likely > to the ridicule of the potential consumers (who tend to be quite > knowledgeable, like Greg). > > One essential element to win those crafty consumers over is to compare > apples to apples, not apples to apples that have been picked green, > bruised, left on the ground for a while in the company of some handy > worms, and then picked up so you can say "look how big and shiny and red > and worm-free our apple is and how green and tiny and worm-ridden our > competitor's apple is". A wise consumer is going to eschew BOTH of your > "display apples" (as your competitor will often have an equally shiny > and red apple to parade about and curiously bruised and sour apples from > YOUR orchard) and instead insist on wandering into the various orchards > to pick REAL apples from your trees for their OWN comparison. > > What exactly prevents you from putting your own raw numbers up, without > any listing of your competitor's numbers? You can claim anything you > like for your own product and it isn't libel. False advertising, > possibly, but not libel. Or put the numbers up with your competitor's > numbers up "anonymized" as A, B, C. And nobody will sue you for beating > ATLAS/GCC/GSL numbers -- ATLAS etc are open source tools and nobody > "owns" them to sue you or cares in the slightest if you beat them. The > most that might happen is that if you manipulate(d) ATLAS numbers so > they aren't what real humans get on real systems, people might laugh at > you or more likely just ignore you thereafter. > > What makes you any LESS liable to libel if you distribute the white > papers to (potential) customers individually? Libel is against the law > no matter how, and to who, you distribute libelous material; it is > against the law even if shrouded in NDA. It is against the law if you > whisper it in your somebody's ears -- it is just harder to prove. > Benchmark comparisons, by the way, are such a common marketing tool (and > so easily bent to your own needs) that I honestly think that there is a > tacit agreement among vendors not to challenge competitors' claims in > court unless they are openly egregious, only to put up their own > competing claims. After all, no sane company WOULD actually lie, right > -- they would have a testbed system on which they could run the > comparisons listed right there in court and everybody knows it. Whether > the parameters, the compiler, the system architecture, the tests run > etc. were carefully selected so your product wins is moot -- if it ain't > a lie it ain't libel, and it is caveat emptor for the rest (and the rest > is near universal practice -- show your best side, compare to their > worst). > > > issues aside, it is not good for any business to be engulf in legal > > squabbles. We are in the process of clearing this with our legal > > department at the moment. > > > > As I have noted in my previous e-mail, anyone who wants to get a hold > > of the white papers are welcome to please send me an e-mail. > > As if your distributing them on a person by person basis is somehow less > libelous? Or so that you can ask me to sign an NDA so that your > competitors never learn that you are libelling them? I rather think > that an NDA that was written to protect illegal activity be it libel or > drug dealing or IP theft would not stand up in court. Finally, product > comparisons via publically available benchmarks of products that are > openly for sale don't sound like trade secrets to me as I could easily > duplicate the results at home (or not) and freely publish them. > > Your company's apparent desire to conceal this comes across remarkably > poorly to the consumer. It has the feel of "Hey, buddy, wanna buy a > watch? Come right down this alley so I can show you my watches where > none of the bulls can see" compared to an open storefront with your > watches on display to anyone, consumer or competitor. This is simply my > own viewpoint, of course. I've simply never heard of a company > shrinking away from making the statement "we are better than our > competitors and here's why" as early and often as they possibly could. > AMD routinely claims to be faster than Intel and vice versa, each has > numbers that "prove" it -- for certain tests that just happen to be the > tests that they tout in their claims, which they can easily back up. > For all the rest of us humans, our mileage may vary and we know it, and > so we mistrust BOTH claims and test the performance of our OWN programs > on both platforms to see who wins. > > I'm certain that the same will prove true for your own product. I don't > care about your benchmarks except as a hook to "interest" me. Perhaps > they will convince me to get you to loan me access to your libraries etc > to link them into my own code to see if MY code speeds up relative to > the way I have it linked now, or relative to linking with a variety of > libraries and compilers. Then I can do a real price/performance > comparison and decide if I'm better off buying your product (and buying > fewer nodes) or using an open source solution that is free (and buying > more nodes). Which depends on the scaling properties of MY application, > costs, and so forth, and cannot be predicted on the basis of ANY paper > benchmark. > > Finally, don't assume that this audience is naive about benchmarking or > algorithms, or at all gullible about performance numbers and vendor > claims. A lot of people on the list (such as Greg) almost certainly > have far more experience with benchmarks than your development staff; > some are likely involved in WRITING benchmarks. If you want to be taken > seriously, put up a full suite of benchmarks, by all means, and also > carefully indicate how those benchmarks were run as people will be > interested in duplicating them and irritated if they are unable to. > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Thu Oct 9 10:57:21 2003 From: gerry.creager at tamu.edu (Gerry Creager N5JXS) Date: Thu, 09 Oct 2003 09:57:21 -0500 Subject: building a RAID system In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org> References: <1065642419.9483.55.camel@qeldroma.cttc.org> Message-ID: <3F857751.4090009@tamu.edu> I've recently built a 2TB (well, a little less really) ATA RAID using a pair of HighPoint 374 controlers and 10 250-GB Maxtor 8 MB cache drives (plus a 60 GB drive for the system). It's running as 2 1TB arrays, because of disparate applications, right now. Initially, the drivers for RH9 were not available so we started with RH7.3 and all the updates; they're there now and and allow cross-card arrays. Down the pike we might re-install and span the controllers. I've also recently done a 2-drive striped array supporting a meteorology data application with a lot of data acquisition and database work. It's mounted to a number of other systems via NFS. Uses a Promise Technologies TX2000 and a pair of 80 GB Maxtors. Both RAID systems have worked very well. I suspect the next one I build will incorporate Serial ATA instead of parallel. I doubt I'll build another SCSI RAID for my applications. Gerry Creager Texas Mesonet Texas A&M University Daniel Fernandez wrote: > Hi, > > I would like to know some advice about what kind of technology apply > into a RAID file server ( through NFS ) . We started choosing hardware > RAID to reduce cpu usage. > > We have two options , SCSI RAID and ATA RAID. The first would give the > best results but on the other hand becomes really expensive so we have > in mind two ATA RAID controllers: > > Adaptec 2400A > 3Ware 6000/7000 series controllers > > Any one of these has its strong and weak points, after seeing various > benchmarks/comparisons/reviews these are the only candidates that > deserve our attention. > > The server has a dozen of client workstations connected through a > switched 100Mbit LAN , all of these equipped with it's own OS and > harddisk, all home directories will be stored under the main server, > main workload (compilation and edition) would be done on the local > machines tough, server only takes care of file sharing. > > Also parallel MPI executions will be done between the clients. > > Considering that not all the workstantions would be working full time > and with cost in mind ? it's worth an ATA RAID solution ? > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 10:39:48 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 10:39:48 -0400 (EDT) Subject: building a RAID system - yup In-Reply-To: <1065707290.4708.28.camel@protein.scalableinformatics.com> Message-ID: On Thu, 9 Oct 2003, Joseph Landman wrote: > On Thu, 2003-10-09 at 07:26, Robert G. Brown wrote: > > > users tapes of special areas or data on request. The tape system is > > expensive, but a tiny fraction of the cost of the loss of data due to > > (say) a server room fire, or a monopole storm, or a lightning strike on > > the primary room feed that fries all the servers to toast. > > Monopole storm... (smile) I seem to remember (old bad and likely wrong > memory) that Max Dresden had predicted one monopole per universe as a > consequence of the standard model. Not my area of (former) expertise, > so reality may vary from my memory ... Hell, there are more than that in California alone. So far monopoles have been discovered there at least twice; once on superconducting niobium balls in a Milliken experiement (but they went away when the balls were washed and never returned, go figure) and once in a superconduction flux trap although the events MIGHT have been caused by somebody flicking a light switch down the hall...:-) Seriously, this is theory vs experiment, and as a theorist I firmly defer to experiment. Until we find an (isolated) monopole, they are just a very attractive, compelling even, extension of Maxwell's equations and related field theories that (as a "defect") help us understand why certain quanties are quantized, or add a certain symmetry to the theory that is otherwise broken. However, it does amuse me to think of hard disks as being "experiments" like the flux loop experiment to measure the existence of monopoles. It would be interesting to determine a "signature" of disk penetration by a cosmic ray monopole and scan a small mountain of crashed disks for the signature, if such a signature is in any way unique. Such a mountain represents a lot more event phase space than a single loop or set of loops in a California laboratory. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at yahoo.com Thu Oct 9 12:08:20 2003 From: lathama at yahoo.com (Andrew Latham) Date: Thu, 9 Oct 2003 09:08:20 -0700 (PDT) Subject: Raid Deffinitions Message-ID: <20031009160820.2217.qmail@web60304.mail.yahoo.com> Discussing a client setup the other day a cohort and I came to a different opinion on what each raid level does. Is there a guide/standard to define how it should work. Also do any vendors stray from the beaten path and add there own levels? ===== Andrew Latham Penguin loving, moralist agnostic. LathamA.com - (lay-th-ham-eh) lathama at lathama.com - lathama at yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Thu Oct 9 14:24:22 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Thu, 9 Oct 2003 11:24:22 -0700 Subject: Intel compilers and libraries In-Reply-To: References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> Message-ID: <20031009182422.GB1865@greglaptop.internal.keyresearch.com> On Thu, Oct 09, 2003 at 10:04:20AM +0000, C J Kenneth Tan -- Heuchera Technologies wrote: > Our white papers are not on the Web they contain performance data, and > particularly, performance data comparing against our competitors. It > may expose us to libel legal issues. Welcome to the Internet. In the US, that's not an issue, so we're used to being able to get our performance data without having to ask a human. BTW, in the US, your lawyers would recommend that your "Up to 32X faster" claim would need a "results not typical" disclaimer. > > I looked and didn't see a single performance claim there. > > There is one on the front page! Sorry, I should have said "didn't see a single credible performance claim there". Bogus-looking claims do not help you sell to the HPC market, either in the US or Europe. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dag at sonsorol.org Thu Oct 9 11:42:53 2003 From: dag at sonsorol.org (chris dagdigian) Date: Thu, 09 Oct 2003 11:42:53 -0400 Subject: 2004 Bioclusters Workshop 1st Announcement -- March 2004, Boston MA USA Message-ID: <3F8581FD.3080404@sonsorol.org> ======================================================================= MEETING ANNOUNCEMENT / CALL FOR PRESENTERS ======================================================================= BIOCLUSTERS 2004 Workshop March 30, 2004 Hynes Convention Center, Boston MA USA ======================================================================= * Speakers Wanted - Please Distribute Where Appropriate * Organized by several members of the bioclusters at bioinformatics.org mailing list, the Bioclusters 2004 Workshop is a networking and educational forum for people involved in all aspects of cluster and grid computing within the life sciences. The motivation for organizers of this event was the cancellation of the O'Reilly Bioinformatics Technology Conference series and the general lack of forums for researchers and professionals involved with the applied use of high performance IT and distributed computing techniques in the life sciences. The primary focus of the workshop will be technical presentations from experienced IT professionals and scientific researchers discussing real world systems, solutions, use-cases and best practices. This event is being held onsite at the Hynes Convention Center on the first day of the larger 2004 Bio-IT World Conference+Expo. BioIT-World Magazine is generously providing space and logistical support for the meeting and workshop attendees will have access to the expo floor and keynote addresses. Registration & fees will be finalized in short order. Presentations will be broken down among a few general content areas: 1. Researcher, Application & End user Issues 2. Builder, Scaling & Integration Issues 3. Future Directions The organizing committee is actively soliciting presentation proposals from members of the life science and technical computing communities. Interested parties should contact the committee at bioclusters04 at open- bio.org. Bioclusters 2004 Workshop Committee Members J.W Bizzaro ? Bioinformatics Organization Inc. James Cuff - MIT/Harvard Broad Institute Chris Dwan - The University of Minnesota Chris Dagdigian ? Open Bioinformatics Foundation & BioTeam Inc. Joe Landman ? Scalable Informatics LLC The committee can be reached at: bioclusters04 at open-bio.org About the Bioclusters Mailing List Community The bioclusters at bioinformatics.org mailing list is a 600+ member forum for users, builders and programmers of distributed systems used in life science research and bioinformatics. For more information about the list including the public archives and subscription information please visit http://bioinformatics.org/mailman/listinfo/bioclusters _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Thu Oct 9 14:40:02 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Thu, 9 Oct 2003 11:40:02 -0700 (PDT) Subject: building a RAID system - yup In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Robert G. Brown wrote: > It probably refers to burst delivery out of its 8 MB cache. The actual > sustained bps speed is a pure matter of N*2*\pi*\R*f/S, where N = number A hard drive only reads from one head at a time. It's not possible to align every head with each other to such a degree that every track in a cylinder is readable at once. If you look at a given drive family of drives, each different sized drive is the same basic hardware with more discs/heads. For instance Seagate's Cheetah 15K.3 family (http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah15k.3.pdf) has the exact same internal transfer rate (609-891 megabits/sec) for the 18 GB model with 2 heads, the 36GB with 4 heads, and the 73GB with 8. > read radius, and S is the linear length per bit. This is an upper > bound. Similarly average latency (seek time) is something like 1/2f, > the time the platter requires to move half a rotation. The average latency is indeed 1/2 the rotational period. For a 7200 RPM drive it is 4.16 ms, for a 15k RPM drive it's 2 ms. Seek time is something completely different. It's how long it takes the head to move from one track to another. It does not included the latency. You might see track-to-track, full stroke, and average seek times in a datasheet. > I should also point out that since we've been using the RAIDs we have > experienced multidisk failures that required restoring from backup on > more than one occasion. The book value probability for even one I've had one multidisk failure in a RAID5 system. It was after moving into a new building, one array had three out of six disks fail to spin up. Of course I had anticipated this, and made a backup, to tape, just before the move. None of the tapes were damaged in transit. I've had several single drive failures. I've never seen anyone with significant number of drive-years of experience say they've never seen a drive fail. And no manufacture has a failure rate anywhere near 0%. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Thu Oct 9 13:47:43 2003 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Thu, 9 Oct 2003 13:47:43 -0400 (EDT) Subject: building a RAID system - 8 drives In-Reply-To: <20031009123045.7582.qmail@houston.wolf.com> Message-ID: On Thu, 9 Oct 2003, Angel Rivera wrote: > Alvin Oga writes: > > > > > On Thu, 9 Oct 2003, Trent Piepho wrote: > > > >> On Thu, 9 Oct 2003, Alvin Oga wrote: > > > nobody swaps disks around ... unless one is using those 5.25" drive bay > > thingies in which case ... thats a different ball game > > No quite true. We use Rare drives (one box) to move up to a TB of data > around w/o having to take the time to create tapes and then download them. > That takes a lot of time, even w/ LTOs. Jim Grey just recommends moving the whole computer: http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=43 JG It's a very convenient way of distributing data. DP Are you sending them a whole PC? JG Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, and seven 300-GB disks--all for about $3,000. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From canon at nersc.gov Thu Oct 9 14:30:46 2003 From: canon at nersc.gov (canon at nersc.gov) Date: Thu, 09 Oct 2003 11:30:46 -0700 Subject: building a RAID system In-Reply-To: Message from Daniel Fernandez of "Wed, 08 Oct 2003 21:46:59 +0200." <1065642419.9483.55.camel@qeldroma.cttc.org> Message-ID: <200310091830.h99IUkNr014912@pookie.nersc.gov> Daniel, We have around 50 3ware boxes with a total formated space of around 50 TB. We run all of these in HW raid mode. I would avoid using software raid if you plan to have more than a dozen or so clients. Our experience is that while software raid works great, it scales poorly. This was very noticeable when the server processors were PIII class. It may be less of an issue with newer processors, but I would still recommend HW raid if the card supports it. Also, we like the 3ware cards because they have been supported by linux for ages now. Some of the other cards have been a little dicey. With our newest systems we've seen aggregate performance for a single server of around 70 MB/s and they appear to scale quite well (handle over 50 clients). This last batch of systems have 12 250 GB drives, a 12 port 3ware card, dual Xeon, on-board gigE and cost less than $7k. Also, the 3ware systems hot swap very well. We make use of it all the time. --Shane ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 PSDF Project Lead fax: 510-486-7520 National Energy Research Scientific Computing Center 1 Cyclotron Road Mailstop 943-256 Berkeley, CA 94720 canon at nersc.gov ------------------------------------------------------------------------ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 17:20:25 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 14:20:25 -0700 (PDT) Subject: Raid Deffinitions In-Reply-To: <20031009160820.2217.qmail@web60304.mail.yahoo.com> Message-ID: On Thu, 9 Oct 2003, Andrew Latham wrote: > Discussing a client setup the other day a cohort and I came to a different > opinion on what each raid level does. Is there a guide/standard to define how > it should work. Also do any vendors stray from the beaten path and add there > own levels? http://www.1U-Raid5.net/Differences - definitions, and pretty pictures too c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gsheppar at gpc.edu Thu Oct 9 15:43:57 2003 From: gsheppar at gpc.edu (Gene Sheppard) Date: Thu, 09 Oct 2003 15:43:57 -0400 Subject: Inquiry small system S/W In-Reply-To: Message-ID: We here are Georgia Perimeter College are planning on putting together a 5 or 6 node Beowulf system. My question: Is there any software for a system like this? What applications have been tested on a small system? If there are none, what is the smallest system out there? Thank you for your help. GEne ============================================== Gene Sheppard Georgia Perimeter College Computer Science 1000 University Center Lane Lawrenceville, GA 30043 678-407-5243 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Thu Oct 9 17:04:30 2003 From: rodmur at maybe.org (Dale Harris) Date: Thu, 9 Oct 2003 14:04:30 -0700 Subject: building a RAID system - 8 drives In-Reply-To: References: <20031009123045.7582.qmail@houston.wolf.com> Message-ID: <20031009210430.GD11051@maybe.org> On Thu, Oct 09, 2003 at 01:47:43PM -0400, Michael T. Prinkey elucidated: > > > > No quite true. We use Rare drives (one box) to move up to a TB of data > > around w/o having to take the time to create tapes and then download them. > > That takes a lot of time, even w/ LTOs. > > Jim Grey just recommends moving the whole computer: > > http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=43 > > > JG It's a very convenient way of distributing data. > > DP Are you sending them a whole PC? > > JG Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, > and seven 300-GB disks--all for about $3,000. > Kind of reminds me of a favorite fortune cookie quotes: "Never underestimate the bandwidth of a station wagon full of tapes hurling down the highway" -- Andrew S. Tannenbaum Dale _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Thu Oct 9 14:50:17 2003 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Thu, 09 Oct 2003 20:50:17 +0200 Subject: building a RAID system In-Reply-To: References: Message-ID: <1065725416.1136.59.camel@qeldroma.cttc.org> Hi again, Thanks for the advice, also it has started an interesting thread. On Thu, 2003-10-09 at 01:39, Mark Hahn wrote: > > I would like to know some advice about what kind of technology apply > > into a RAID file server ( through NFS ) . We started choosing hardware > > RAID to reduce cpu usage. > > that's unfortunate, since the main way HW raid saves CPU usage is > by running slower ;) > I cannot get the point here, the dedicated processor should take all transfer commands and offload the CPU why it would run slower ? In some tests a raid system for a single workstation ( no networking ) it's a bit useless (slower) unless you want to transfer really big files. In a networked environment there could be a massive number of I/O commands so should be critical. > seriously, CPU usage is NOT a problem with any normal HW raid, > simply because a modern CPU and memory system is *so* much better > suited to performing raid5 opterations than the piddly little > controller in a HW raid card. the master/fileserver for my > cluster is fairly mundane (dual-xeon, i7500, dual PC1600), and > it can *easily* saturate its gigabit connection. after all, ram > runs at around 2 GB/s sustained, and the CPU can checksum at 3 GB/s! > Agreed, our server would not be doing anything more than managing NFS so, there is power to spare, where talking about an Athlon XP2600+ processor. But, a really good Parallel ATA 100/133 controller is needed, and 4 channels at least... 4 HDs in 2 master/slave channels reduces drastically performance ? any controller recommended ? But must be noted that HW RAID offers better response time. HW raid offers hotswap capability and offload our work instead of maintaining a SW raid solution ...we'll see ;) > concern for PCI congestion is a much more serious issue. > We're limited at 32 bit PCI, we cannot get around this unless spend on a highly priced PCI 64 mainboard. > finally, why do you care at all? are you fileserving through > a fast (>300 MB/s) network like quadrics/myrinet/IB? most people > limp along at a measly gigabit, which even a two-ide-disk raid0 > can saturate... > > > The server has a dozen of client workstations connected through a > > switched 100Mbit LAN , all of these equipped with it's own OS and > > jeez, since your limited to 10 MB/s, you could do raid5 on a 486 > and still saturate the net. seriously, CPU consumption is NOT an issue > at 10 MB/s. There would not be noticeable difference between SW/HW mode here. The clients would be doing write bursts of 2-5Mb per second so there must not be any problem. > > machines tough, server only takes care of file sharing. > > so excess cycles on the fileserver will be wasted unless used. > > > Considering that not all the workstantions would be working full time > > and with cost in mind ? it's worth an ATA RAID solution ? > > you should buy a single promise sata150 tx4 and four big sata disks > (7200 RPM 3-year models, please). > > regards, mark hahn. > In fact we have two choices: - Use an spare existing ( relatively obsolete ) computer and couple it with a HW RAID card. - Spend on a fast CPU computer and a good but cheap Parallel ATA controller. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 17:17:34 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 17:17:34 -0400 (EDT) Subject: building a RAID system In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org> Message-ID: On Thu, 9 Oct 2003, Daniel Fernandez wrote: > > that's unfortunate, since the main way HW raid saves CPU usage is > > by running slower ;) > > > I cannot get the point here, the dedicated processor should take all > transfer commands and offload the CPU why it would run slower ? In some > tests a raid system for a single workstation ( no networking ) it's a > bit useless (slower) unless you want to transfer really big files. In a > networked environment there could be a massive number of I/O commands so > should be critical. Key word: "should" Benchmark results: "often does not" Your best bet is to try both and run your own benchmarks and do your own cost/benefit analysis. When you say things like "better response time" one is fairly naturally driven to ask "does the difference matter", for example. Given that we run over 100 workstations from a SW RAID with nearly instantaneous (entirely satisfactory) performance, you'd have to really be hammering it to perceive a difference. > In fact we have two choices: > > - Use an spare existing ( relatively obsolete ) computer and couple it > with a HW RAID card. > > - Spend on a fast CPU computer and a good but cheap Parallel ATA > controller. Or a cheap computer + PATA or SATA controller. Even a cheap computer has 2+ GHz CPUs and hundreds of MB of RAM these days. Spend more on what you put the disks in, power, cooling. If it is an old/obsolete computer, will it have enough power, enough cooling? Regardless, the disk cost itself will dominate your costs. rgb > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 17:06:44 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 14:06:44 -0700 (PDT) Subject: building a RAID system - 8 drives - drive-net In-Reply-To: <20031009123045.7582.qmail@houston.wolf.com> Message-ID: hi ya angel On Thu, 9 Oct 2003, Angel Rivera wrote: > Alvin Oga writes: > > > nobody swaps disks around ... unless one is using those 5.25" drive bay > > thingies in which case ... thats a different ball game > > No quite true. We use Rare drives (one box) to move up to a TB of data > around w/o having to take the time to create tapes and then download them. > That takes a lot of time, even w/ LTOs. yes.. guess it makes sense to move disks around for moving tb of data like floppy-net or sneaker-net - done that ( moving disks around ) myself once in a while for a quickie fix c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Thu Oct 9 16:35:04 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Thu, 9 Oct 2003 13:35:04 -0700 (PDT) Subject: building a RAID system In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org> Message-ID: On Thu, 9 Oct 2003, Daniel Fernandez wrote: > On Thu, 2003-10-09 at 01:39, Mark Hahn wrote: > > > I would like to know some advice about what kind of technology apply > > > into a RAID file server ( through NFS ) . We started choosing hardware > > > RAID to reduce cpu usage. > > > > that's unfortunate, since the main way HW raid saves CPU usage is > > by running slower ;) > > > I cannot get the point here, the dedicated processor should take all > transfer commands and offload the CPU why it would run slower ? In some Easy, said dedicated processor and memory is quite a bit slower than the main CPU and memory. If you look at thoughput in MB/sec, the latest linux software RAID is usually much faster than hardware raid implimentations. Usually CPU usage is (stupidly) reported as just a % used during a benchmark. If you transfer fewer megabytes in second, obviously the number of CPU cycles used in that second go down as well. If CPU usage is correctly reported in units of % per MB/sec, then you get a real measure of hardware efficiency. > needed, and 4 channels at least... 4 HDs in 2 master/slave channels > reduces drastically performance > ? any controller recommended ? It seems that most good 4-12 channel (NOT drive, channel!) IDE cards ARE hardware raid controllers. Lots of people use the 3ware RAID cards in JBOD mode with software raid, because their isn't a cheaper non-hardware raid card comparable to something like the 3ware 7508-8 or 7508-12. I know about cheaper 2 and 4 channel non-raid cards, but they're 32/33 PCI and not comparable to the 3ware. > > concern for PCI congestion is a much more serious issue. > > > We're limited at 32 bit PCI, we cannot get around this unless spend on a > highly priced PCI 64 mainboard. AMD 760MPX and Intel E7501 motherboards have high speed 64/66 PCI and PCI-X for the E7501. They're not that expensive really. An additional $100-$200 at most over a single PCI 32/33 motherboard. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rokrau at yahoo.com Thu Oct 9 18:02:34 2003 From: rokrau at yahoo.com (Roland Krause) Date: Thu, 9 Oct 2003 15:02:34 -0700 (PDT) Subject: Experience with Omni anyone? Message-ID: <20031009220234.64852.qmail@web40010.mail.yahoo.com> Folks, I came across the Omni OpenMP compiler lately and I was wondering whether anyone here has used it and what the experience was. I.o.w., is it "industrial strength"? I know of and use Portland and Intel compilers but I am also curious. Roland __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lathama at lathama.com Thu Oct 9 17:52:22 2003 From: lathama at lathama.com (Andrew Latham) Date: Thu, 9 Oct 2003 14:52:22 -0700 (PDT) Subject: Raid Deffinitions In-Reply-To: Message-ID: <20031009215222.68022.qmail@web60307.mail.yahoo.com> thanks. I know that all the different raid levels are here for a reason and raid5 is great but what are the benefits of the rest? --- Mark Hahn wrote: > > Discussing a client setup the other day a cohort and I came to a different > > opinion on what each raid level does. Is there a guide/standard to define > how > > it should work. Also do any vendors stray from the beaten path and add > there > > own levels? > > sure they do. IMO the only important levels are: > > raid0 - striping > raid1 - mirroring > raid5 - rotating parity-based array > > vendors who make a big deal of obvious extensions like raid 10 > (mirrored stripes or vice versa) are immediately hung up on by me... > ===== Andrew Latham Penguin loving, moralist agnostic. LathamA.com - (lay-th-ham-eh) lathama at lathama.com - lathama at yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rocky at atipa.com Thu Oct 9 16:24:36 2003 From: rocky at atipa.com (Rocky McGaugh) Date: Thu, 9 Oct 2003 15:24:36 -0500 (CDT) Subject: Inquiry small system S/W In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Gene Sheppard wrote: > We here are Georgia Perimeter College are planning on putting together a 5 > or 6 node Beowulf system. > > My question: > Is there any software for a system like this? > What applications have been tested on a small system? > > If there are none, what is the smallest system out there? > > Thank you for your help. > > GEne System or application software? For system software, any of the beowulf kits will work. http://warewulf-cluster.org/ http://www.scyld.com/ http://oscar.sourceforge.net/ http://rocks.npaci.edu/ http://clic.mandrakesoft.com/index-en.html and others. Most applications will run just fine on 5 or 6 nodes. To start with, i'd get HPL and PMB running to ensure everything is working fine. Then you can look at other applications to see what you might actually be able to benefit from. -- Rocky McGaugh Atipa Technologies rocky at atipatechnologies.com rmcgaugh at atipa.com 1-785-841-9513 x3110 http://67.8450073/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Oct 9 16:44:28 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 9 Oct 2003 16:44:28 -0400 (EDT) Subject: building a RAID system - yup In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Robert G. Brown wrote: > Hell, there are more than that in California alone. So far monopoles Forgot to mention the California "megapoll" which just occurred on Tuesday. Sorry, I could not help myself. Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Thu Oct 9 18:08:51 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Thu, 9 Oct 2003 15:08:51 -0700 (PDT) Subject: Raid Deffinitions In-Reply-To: <20031009215222.68022.qmail@web60307.mail.yahoo.com> Message-ID: On Thu, 9 Oct 2003, Andrew Latham wrote: > thanks. > > I know that all the different raid levels are here for a reason and raid5 is > great but what are the benefits of the rest? 0 is fast (interleaved chunks) but provides no redundancy. 1 is a a 1 + 1 mirror... can be faster on reads but is generally slower on writes depending on your controller/implementation... 0 + 1 or 1 + 0 striped mirror or mirrored stripe. less space efficient than raid 5 but faster in general. can survive multiple disk failures so long as both disks containing the same information don't fail at once. > --- Mark Hahn wrote: > > > Discussing a client setup the other day a cohort and I came to a different > > > opinion on what each raid level does. Is there a guide/standard to define > > how > > > it should work. Also do any vendors stray from the beaten path and add > > there > > > own levels? > > > > sure they do. IMO the only important levels are: > > > > raid0 - striping > > raid1 - mirroring > > raid5 - rotating parity-based array > > > > vendors who make a big deal of obvious extensions like raid 10 > > (mirrored stripes or vice versa) are immediately hung up on by me... > > > > > ===== > Andrew Latham > > Penguin loving, moralist agnostic. > > LathamA.com - (lay-th-ham-eh) > lathama at lathama.com - lathama at yahoo.com > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Oct 9 19:19:16 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 09 Oct 2003 23:19:16 GMT Subject: building a RAID system - 8 drives - drive-net In-Reply-To: References: Message-ID: <20031009231916.21008.qmail@houston.wolf.com> Alvin Oga writes: > > hi ya angel > > On Thu, 9 Oct 2003, Angel Rivera wrote: > >> Alvin Oga writes: >> >> > nobody swaps disks around ... unless one is using those 5.25" drive bay >> > thingies in which case ... thats a different ball game >> >> No quite true. We use Rare drives (one box) to move up to a TB of data >> around w/o having to take the time to create tapes and then download them. >> That takes a lot of time, even w/ LTOs. > > yes.. guess it makes sense to move disks around for moving tb of data > like floppy-net or sneaker-net > - done that ( moving disks around ) myself once in a while > for a quickie fix When you have that much data, it is easier and faster to load 8 drives into a box than tons of tapes. take out the old drives and place the new ones in, mount it, export it and voila-it is on-line. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Thu Oct 9 19:36:29 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Thu, 9 Oct 2003 16:36:29 -0700 (PDT) Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: <20031009231916.21008.qmail@houston.wolf.com> Message-ID: hi ya On Thu, 9 Oct 2003, Angel Rivera wrote: .. > > yes.. guess it makes sense to move disks around for moving tb of data > > like floppy-net or sneaker-net > > - done that ( moving disks around ) myself once in a while > > for a quickie fix > > When you have that much data, it is easier and faster to load 8 drives into > a box than tons of tapes. take out the old drives and place the new ones > in, mount it, export it and voila-it is on-line. yes and a "bunch of disks" (raid5) survives the loss of one dropped disk and is relatively secure from prying eyes .... - ceo gets one disk - cfo gets one disk - hr gets one disk - eng gets one disk - sys admin gets one disk ( combine all[-1] disks together to recreate the (raid5) TB data ) - a single (raid5) disk by itself is basically worthless tape backups are insecure ... - lose a tape ( bad tape, lost tape ) and and all its data is lost - anybody can read the entire contents of the full backup ( one could tar up one disk per tape, instead of tar'ing the ( whole raid5 subsystem, to provide the ( same functionality as a raid5 offsite disk backup c ya alvin and hopefully .. the old disks are not MFM drives.. or ata-133 in a new sata system :-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Thu Oct 9 19:51:05 2003 From: angel at wolf.com (Angel Rivera) Date: Thu, 09 Oct 2003 23:51:05 GMT Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: References: Message-ID: <20031009235105.23420.qmail@houston.wolf.com> Alvin Oga writes: >> yes and a "bunch of disks" (raid5) survives the loss of one dropped disk > and is relatively secure from prying eyes .... Well, let's see. We can backup the data to tapes or to disks-disks are faster. From the time the data is on the disk, 1/2-1.0 hours to get to us, a few minutes to install them and voila you are on-line. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 9 21:31:13 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 9 Oct 2003 21:31:13 -0400 (EDT) Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: Message-ID: On Thu, 9 Oct 2003, Alvin Oga wrote: > yes and a "bunch of disks" (raid5) survives the loss of one dropped disk > and is relatively secure from prying eyes .... > - ceo gets one disk > - cfo gets one disk > - hr gets one disk > - eng gets one disk > - sys admin gets one disk > ( combine all[-1] disks together to recreate the (raid5) TB data ) > > - a single (raid5) disk by itself is basically worthless Secure from prying eyes, maybe (as in casually secure). "Secure" as in your secret plans for world domination or the details of your flourishing cocaine business are safe from the feds, not at all, unless the information is encrypted. Each disk has about one fourth of the information. English is about 3:1 compressible (really more; this is using simple symbolic compression). A good cryptanalyst could probably recover "most" of what is on the disks from any one disk, depending on what kind of data is there. Numbers, possibly not, but written communications, quite possibly. Especially if it falls in the hands of somebody who really wants it and has LOTS of good cryptanalysts. > tape backups are insecure ... > - lose a tape ( bad tape, lost tape ) and and all its data is lost > - anybody can read the entire contents of the full backup Unless it is encrypted. Without strong encryption there is no data-level security. With it there is. Maybe. Depending on what is "strong" to you and what is strong to, say, the NSA, whether your systems and network is secure, depending on whether you have dual isolation power inside a faraday cage with dobermans at the door. However, there can be as much or as little physical security for the tape as you care to put there. Tape in a locked safe, tape in an armored car. Disks are far more fragile than tapes -- drop a disk one meter onto the ground and chances are quite good that it is toast and will at best cost hundreds of dollars and a trip to specialized facilities to remount and mostly recover. Drop a tape one meter onto the ground and chance are quite good that it is perfectly fine, and even if it isn't (because e.g. the case cracked) ordinary humans can generally remount the tape in a new case without needing a clean room and special tools. Tapes are cheap -- you can afford to send almost three tapes compared to one disk. I get the feeling that you just don't like tapes, Alvin...;-) rgb > > ( one could tar up one disk per tape, instead of tar'ing the > ( whole raid5 subsystem, to provide the > ( same functionality as a raid5 offsite disk backup > > c ya > alvin > > and hopefully .. the old disks are not MFM drives.. > or ata-133 in a new sata system :-) > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From smuelas at mecanica.upm.es Fri Oct 10 01:20:06 2003 From: smuelas at mecanica.upm.es (smuelas) Date: Fri, 10 Oct 2003 07:20:06 +0200 Subject: Inquiry small system S/W In-Reply-To: References: Message-ID: <20031010072006.54dfd8a4.smuelas@mecanica.upm.es> I have put together a 8 node beowulf cluster to my greatest satisfaction and results. You don't need nothing special; if it is beowulf it must be Linux. If you use, for example, RedHat 9, what I do, you have everything needed in the standard 3 CD's distribution, that you can download at no cost. Apart from that, and in my particular case, I use fortran90 and the compiler from Intel, also free for non-comercial use. Perhaps, the only special hardware to buy is a simple, 8 nodes switch for your ethernet connections. Then, what is really important is to learn to make your software really able to use the cluster. So, some time to study MPI or similar, and work, work, work... :-) Before being an 8 nodes cluster, mine has been 4-nodes, then 6-nodes and at 8 I stopped. But there is no difference in the work to do. Just the possibilities and speed increase. Good luck!! On Thu, 09 Oct 2003 15:43:57 -0400 Gene Sheppard wrote: > We here are Georgia Perimeter College are planning on putting together a 5 > or 6 node Beowulf system. > > My question: > Is there any software for a system like this? > What applications have been tested on a small system? > > If there are none, what is the smallest system out there? > > Thank you for your help. > > GEne > > ============================================== > Gene Sheppard > Georgia Perimeter College > Computer Science > 1000 University Center Lane > Lawrenceville, GA 30043 > 678-407-5243 > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Santiago Muelas E.T.S. Ingenieros de Caminos, (U.P.M) Tf.: (34) 91 336 66 59 e-mail: smuelas at mecanica.upm.es Fax: (34) 91 336 67 61 www: http://w3.mecanica.upm.es/~smuelas _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at math.ucdavis.edu Fri Oct 10 01:43:57 2003 From: bill at math.ucdavis.edu (Bill Broadley) Date: Thu, 9 Oct 2003 22:43:57 -0700 Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: References: Message-ID: <20031010054357.GB13480@sphere.math.ucdavis.edu> On the hardware vs software RAID thread. A friend needed a few TB and bought a high end raid card (several $k), multiple channels, enclosure, and some 10's of 73GB drives for somewhere in the $50k-$100k neighborhood. He needed the capacity and a minumum of 50MB/sec sequential write performance (on large sequential writes). He didn't get it. Call #1 to dell resulted in well it's your fault, it's our top of the line, it should be plenty fast, bleah, bleah, bleah. Call #2 lead to an escalation to someone with more of a clue, tune paramater X, tune Y, try a different raid setup, swap out X, etc. After more testing without helping call #3 was escalated again someone fairly clued answered. The conversation went along the lines of what, yeah, it's dead slow. Yeah most people only care about the reliability. Oh performance? We use linux + software raid on all the similar hardware we use internally at Dell. So the expensive controller was returned, and 39160's were used in it's place (dual channel U160) and performance went up by a factor of 4 or so. In my personal benchmarking on a 2 year old machine with 15 drives I managed 200-320 MB/sec sustained (large sequential read or write), depending on filesystem and strip size. I've not witnessed any "scaling problems", I've been quite impressed with linux software raid under all conditions and have had it run significantly faster then several expensive raid cards I've tried over the years. Surviving hotswap, over 500 day uptimes, and substantial performance advantages seem to be common. Anyone have numbers comparing hardware and software raid using bonnie++ for random access or maybe postmark (netapp's diskbenchmark) Failures so far: * 3ware 6800 (awful, evil, slow, unreliable, terrible tech support) * quad channel scsi card from Digital/storage works, rather slow, then started crashing * More recently (last 6 months) the top of the line dell raid card (PERSC?) * A few random others One alternative solution I figured I'd mention is the Apple 2.5 TB array for $10-$11k isnt' a bad solution for a mostly turnkey, hotswap, redundant powersupply setup with a warranty. Dual 2 Gigabit Fiber channels does make it easier to scale to 10's of TB's then some other solutions. I managed 70 MB/sec read/write to a 1/2 Xraid (on a single FC). Of course there are cheaper solutions. Oh, I also wanted to mention one gotcha for the DIY methods. I've had I think 4 machines now with 8-15 disks, and dual 400 watt powersupplies or 3x225 watt (n+1) boot just fine for 6 months, but start complaining at boot due to to high power consumption. This is of course especially bad with EIDEs since they all spin up at boot (SCSI can usually be spun up one at a time). I suspect a slight decrease in lubrication and or degradation in the powersupplies which were possibly running above 100% to be the cause. In any case great thread, I've yet to see a performance or functionality benefit from hardware raid. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Fri Oct 10 03:24:15 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Fri, 10 Oct 2003 09:24:15 +0200 Subject: building a RAID system In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org> References: <1065725416.1136.59.camel@qeldroma.cttc.org> Message-ID: <20031010072415.GI17432@unthought.net> On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote: > Hi again, ... Others have already answered your other questions, I'll try to take one that went unanswered (as far as I can see). ... > > But must be noted that HW RAID offers better response time. In a HW RAID setup you *add* an extra layer: the dedicated CPU on the RAID card. Remember, this CPU also runs software - calling it 'hardware RAID' in itself is misleading, it could just as well be called 'offloaded SW RAID'. The problem with offloading is, that while it made great sense in the days of 1 MHz CPUs, it really doesn't make a noticable difference in the load on your typical N GHz processor. However, you added a layer with your offloaded-RAID. You added one extra CPU in the 'chain of command' - and an inferior CPU at that. That layer means latency even in the most expensive cards you can imagine (and bottleneck in cheap cards). No matter how you look at it, as long as the RAID code in the kernel is fairly simple and efficient (which it was, last I looked), then the extra layers needed to run the PCI commands thru the CPU and then to the actual IDE/SCSI controller *will* incur latency. And unless you pick a good controller, it may even be your bottleneck. Honestly I don't know how much latency is added - it's been years since I toyed with offload-RAID last ;) I don't mean to be handwaving and spreading FUD - I'm just trying to say that the people who advocate SW RAID here are not necessarily smoking crack - there are very good reasons why SW RAID will outperform HW RAID in many scenarios. > > HW raid offers hotswap capability and offload our work instead of > maintaining a SW raid solution ...we'll see ;) That, is probably the best reason I know of for choosing hardware RAID. And depending on who you will have administering your system, it can be a very important difference. There are certainly scenarios where you will be willing to trade a lot of performance for a blinking LED marking the failed disk - I am not kidding. Cheers, -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Fri Oct 10 02:58:37 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Fri, 10 Oct 2003 08:58:37 +0200 Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: References: Message-ID: <20031010065837.GH17432@unthought.net> On Thu, Oct 09, 2003 at 09:31:13PM -0400, Robert G. Brown wrote: ... > Each disk has about one fourth of the information. English is about 3:1 > compressible (really more; this is using simple symbolic compression). > A good cryptanalyst could probably recover "most" of what is on the > disks from any one disk, depending on what kind of data is there. You overlook the fact that data on a RAID-5 is distributed in 'chunks' of sizes around 4k-128k (depending...) So you would get the entire first 'Introduction to evil empire plans', but the entire 'Subverting existing banana government' chapter may be on one of the disks that you are missing. > Numbers, possibly not, but written communications, quite possibly. > Especially if it falls in the hands of somebody who really wants it and > has LOTS of good cryptanalysts. You'd probably need historians and psychologists rather than cryptographers - but of course the point remains the same. Just nit-picking here. > > > tape backups are insecure ... > > - lose a tape ( bad tape, lost tape ) and and all its data is lost > > - anybody can read the entire contents of the full backup > > Unless it is encrypted. Without strong encryption there is no > data-level security. With it there is. Maybe. Depending on what is > "strong" to you and what is strong to, say, the NSA, whether your > systems and network is secure, depending on whether you have dual > isolation power inside a faraday cage with dobermans at the door. I'm just thinking of distributing two tapes for each disk - one with 200G of random numbers, the other with 200G of data XOR'ed with the data from the first tape. Enter the one-time pad - unbreakable encryption (unless you get a hold of both tapes of course). You'd need to make sure you have good random numbers - as an extra measure of safety one should probably wear a tinfoil hat while working with the tapes, just in case... ;) Of course, if any tape is lost, everything is lost. But one bad KB on either tape will only result in one bad KB total. > > However, there can be as much or as little physical security for the > tape as you care to put there. Tape in a locked safe, tape in an > armored car. No no no no no! Think big! Think: cobalt bomb in own backyard - threaten anyone who steals your data, that you'll make the planet inhabitable for a few hundred decades unless they hand back your tapes. ;) (I'm drafting up 'Introduction to evil empire plans' soon by the way ;) ... > I get the feeling that you just don't like tapes, Alvin...;-) Where did you get that idea? ;) Cheers, -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Fri Oct 10 13:34:41 2003 From: pesch at attglobal.net (pesch at attglobal.net) Date: Fri, 10 Oct 2003 10:34:41 -0700 Subject: building a RAID system References: <1065725416.1136.59.camel@qeldroma.cttc.org> <20031010072415.GI17432@unthought.net> Message-ID: <3F86EDB1.6264A405@attglobal.net> You write: "The problem with offloading is, that while it made great sense in the days of 1 MHz CPUs, it really doesn't make a noticable difference in the load on your typical N GHz processor." Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the practical limit of SW RAID? Paul Jakob Oestergaard wrote: > On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote: > > Hi again, > ... > > Others have already answered your other questions, I'll try to take one > that went unanswered (as far as I can see). > > ... > > > > But must be noted that HW RAID offers better response time. > > In a HW RAID setup you *add* an extra layer: the dedicated CPU on the > RAID card. Remember, this CPU also runs software - calling it > 'hardware RAID' in itself is misleading, it could just as well be called > 'offloaded SW RAID'. > > The problem with offloading is, that while it made great sense in the > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > load on your typical N GHz processor. > > However, you added a layer with your offloaded-RAID. You added one extra > CPU in the 'chain of command' - and an inferior CPU at that. That layer > means latency even in the most expensive cards you can imagine (and > bottleneck in cheap cards). No matter how you look at it, as long as > the RAID code in the kernel is fairly simple and efficient (which it > was, last I looked), then the extra layers needed to run the PCI > commands thru the CPU and then to the actual IDE/SCSI controller *will* > incur latency. And unless you pick a good controller, it may even be > your bottleneck. > > Honestly I don't know how much latency is added - it's been years since > I toyed with offload-RAID last ;) > > I don't mean to be handwaving and spreading FUD - I'm just trying to say > that the people who advocate SW RAID here are not necessarily smoking > crack - there are very good reasons why SW RAID will outperform HW RAID > in many scenarios. > > > > > HW raid offers hotswap capability and offload our work instead of > > maintaining a SW raid solution ...we'll see ;) > > That, is probably the best reason I know of for choosing hardware RAID. > And depending on who you will have administering your system, it can be > a very important difference. > > There are certainly scenarios where you will be willing to trade a lot > of performance for a blinking LED marking the failed disk - I am not > kidding. > > Cheers, > > -- > ................................................................ > : jakob at unthought.net : And I see the elder races, : > :.........................: putrid forms of man : > : Jakob ?stergaard : See him rise and claim the earth, : > : OZ9ABN : his downfall is at hand. : > :.........................:............{Konkhra}...............: > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Fri Oct 10 07:12:48 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Fri, 10 Oct 2003 04:12:48 -0700 (PDT) Subject: building a RAID system - 8 drives - drive-net - tapes - preferences In-Reply-To: Message-ID: hi ya robert On Thu, 9 Oct 2003, Robert G. Brown wrote: > > tape backups are insecure ... > > - lose a tape ( bad tape, lost tape ) and and all its data is lost > > - anybody can read the entire contents of the full backup > > Unless it is encrypted. Without strong encryption there is no > data-level security. With it there is. Maybe. Depending on what is > "strong" to you and what is strong to, say, the NSA, whether your > systems and network is secure, depending on whether you have dual > isolation power inside a faraday cage with dobermans at the door. just trying to protect the tapes ( backups ) against the casual "oops look what i found" and they go and look at the HR records or the salary records or employee reviews etc..etc.. not trying to protect the tapes against the [cr/h]ackers ( different ball game ) and even not protecting against the spies of nsa/kgb etc either ( whole new ballgame for those types of backup issues ) > However, there can be as much or as little physical security for the > tape as you care to put there. Tape in a locked safe, tape in an > armored car. dont forget to lock the car/safe too :-) and log who goes in and out of the "safe" area :-) > I get the feeling that you just don't like tapes, Alvin...;-) not my first choice for backups .. even offsite backups... but if "management" takes out the $$$ to do tape backups... so it shall be done ... ideally, everything works ... but unfortunately, tapes are highly prone to people's "oops i forgot to change it yesterday" or the weekly catridge have fun alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Fri Oct 10 07:56:39 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Fri, 10 Oct 2003 13:56:39 +0200 Subject: building a RAID system In-Reply-To: <3F86EDB1.6264A405@attglobal.net> References: <1065725416.1136.59.camel@qeldroma.cttc.org> <20031010072415.GI17432@unthought.net> <3F86EDB1.6264A405@attglobal.net> Message-ID: <20031010115639.GN17432@unthought.net> On Fri, Oct 10, 2003 at 10:34:41AM -0700, pesch at attglobal.net wrote: > You write: > > "The problem with offloading is, that while it made great sense in the > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > load on your typical N GHz processor." > > Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the > practical limit of SW RAID? In this forum, I run small storage only. Around 150G for the most busy server that I have. Linux has problems with >2TB devices as far as I know, so that sort of puts an upper limit to whatever you can do with SW/HW RAID there. In between, it's just one order of magnitude :) More seriously - the SW RAID code is extremely simple, and it performs two different tasks: *) Reconstruction - which has time complexity T(n) for n bytes of data *) Read/write - which has time complexity T(1) for n bytes of data In other words - the more data you have, the longer a resync is going to take - HW or SW makes no difference (except for a factor, which tends to be rediculously large on cheap HW RAID cards but acceptable on more expensive ones). Reads and writes are not affected by the amount of data, in the SW RAID layer (and hopefully not in the HW RAID layer either). The scalability limits you will run into are: *) Number of disks you can attach to your box (HW RAID may hide this from you and may thus buy you some scalability there) *) Filesystem limits/performance problems. HW/SW RAID makes no difference *) Device size limits. HW/SW RAID makes no difference *) Reconstruction time after unclean shutdown - SW performs much better than crap/cheap HW solutions, but I don't know about the expensive ones. There are others on this list with much larger servers and less antique hardware - guys, speak up - where does it begin to hurt? :) -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Oct 10 07:59:22 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Fri, 10 Oct 2003 04:59:22 -0700 (PDT) Subject: building a RAID system In-Reply-To: <3F86EDB1.6264A405@attglobal.net> Message-ID: On Fri, 10 Oct 2003 pesch at attglobal.net wrote: > You write: > > "The problem with offloading is, that while it made great sense in the > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > load on your typical N GHz processor." > > Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the > practical limit of SW RAID? size-wise software raid (I'm talking specifically about linux here) scales far better than most hardware raid controllers (san subsystems are another kettle of fish entirely), among other reasons because you can spread the disks out between multiple controllers. > Paul > > Jakob Oestergaard wrote: > > > On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote: > > > Hi again, > > ... > > > > Others have already answered your other questions, I'll try to take one > > that went unanswered (as far as I can see). > > > > ... > > > > > > But must be noted that HW RAID offers better response time. > > > > In a HW RAID setup you *add* an extra layer: the dedicated CPU on the > > RAID card. Remember, this CPU also runs software - calling it > > 'hardware RAID' in itself is misleading, it could just as well be called > > 'offloaded SW RAID'. > > > > The problem with offloading is, that while it made great sense in the > > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > > load on your typical N GHz processor. > > > > However, you added a layer with your offloaded-RAID. You added one extra > > CPU in the 'chain of command' - and an inferior CPU at that. That layer > > means latency even in the most expensive cards you can imagine (and > > bottleneck in cheap cards). No matter how you look at it, as long as > > the RAID code in the kernel is fairly simple and efficient (which it > > was, last I looked), then the extra layers needed to run the PCI > > commands thru the CPU and then to the actual IDE/SCSI controller *will* > > incur latency. And unless you pick a good controller, it may even be > > your bottleneck. > > > > Honestly I don't know how much latency is added - it's been years since > > I toyed with offload-RAID last ;) > > > > I don't mean to be handwaving and spreading FUD - I'm just trying to say > > that the people who advocate SW RAID here are not necessarily smoking > > crack - there are very good reasons why SW RAID will outperform HW RAID > > in many scenarios. > > > > > > > > HW raid offers hotswap capability and offload our work instead of > > > maintaining a SW raid solution ...we'll see ;) > > > > That, is probably the best reason I know of for choosing hardware RAID. > > And depending on who you will have administering your system, it can be > > a very important difference. > > > > There are certainly scenarios where you will be willing to trade a lot > > of performance for a blinking LED marking the failed disk - I am not > > kidding. > > > > Cheers, > > > > -- > > ................................................................ > > : jakob at unthought.net : And I see the elder races, : > > :.........................: putrid forms of man : > > : Jakob ?stergaard : See him rise and claim the earth, : > > : OZ9ABN : his downfall is at hand. : > > :.........................:............{Konkhra}...............: > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 10 09:35:35 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 10 Oct 2003 09:35:35 -0400 (EDT) Subject: building a RAID system - 8 drives - drive-net - tapes - preferences In-Reply-To: Message-ID: On Fri, 10 Oct 2003, Alvin Oga wrote: > > hi ya robert > > On Thu, 9 Oct 2003, Robert G. Brown wrote: > > > > tape backups are insecure ... > > > - lose a tape ( bad tape, lost tape ) and and all its data is lost > > > - anybody can read the entire contents of the full backup > > > > Unless it is encrypted. Without strong encryption there is no > > data-level security. With it there is. Maybe. Depending on what is > > "strong" to you and what is strong to, say, the NSA, whether your > > systems and network is secure, depending on whether you have dual > > isolation power inside a faraday cage with dobermans at the door. > > just trying to protect the tapes ( backups ) against the casual > "oops look what i found" and they go and look at the HR records > or the salary records or employee reviews etc..etc.. > > not trying to protect the tapes against the [cr/h]ackers ( different > ball game ) and even not protecting against the spies of nsa/kgb etc > either ( whole new ballgame for those types of backup issues ) Hmmm, this is morphing offtopic, but data security is a sufficiently universal problem that I'll chance one more round. Pardon me while I light up my crack pipe here...:-7... so I can babble properly. Ah, ya, NOW I'm awake...:-) The point is that you cannot do this with precisely HR records or salary records or employee reviews. If anybody gets hold of the data (casually or not) be it on tape or disk or the network while it is in transit then you are liable to the extent that you failed to take adequate measures to ensure the data's security. By the numbers: 1) Tapes even more than disk are most unlikely to be viewed casually. Disks have street value, tapes (really) don't. There is a high entry investment required before one can even view the contents of an e.g. LTO tape, plus a fair degree of expertise. A disk can be pulled from a box and remounted in any system by a pimple-faced kid with a screwdriver and an attitude. The net can be snooped by anyone, with a surprisingly low entry level of expertise (or rather a high level expertise encapsulated in openly distributed rootkits and exploits so anybody can do it). 2) All three are clearly vulnerable to someone (e.g. a private investigator, an insurance company, a competitor, an identity thief, the government) seeking to snoop and violate the privacy of the individuals who have entrusted their data to you. HR records contain SSNs, bank numbers (to facilitate direct deposit), names addresses, health records, employment records, CVs and/or transcripts, disciplinary records: they are basically everything you never wanted the world to know in one compact and efficient package. Federal and state laws regulate the handling of this data in quite rigorous ways. 3) An IT officer who was responsible for holding sensitive data secure according to law and who failed to employ reasonable measures for maintaining it secure and who subsequently had it stolen (violating his trust) would be publically eviscerated. Career ruined, bankrupted by suits, tormented by guilt, possibly even put in jail, driven to suicide kind of stuff in the worst case. The company that employed that officer would be right behind -- suits, clean sweep firings of the entire management team in the chain of responsibility, plunging stock prices, public recriminations and humiliation. EVEN IF reasonable measures were employed there would likely be trouble and recrimination, but careers might survive, damages would be limited, jail might be avoided, and one wouldn't feel so irresponsibly guilty. 4) Strong encryption of the data to protect it in transit is an obvious, inexpensive, knee-jerk sort of reasonable measure (again, independent of the means of transport presuming only that the data passes out of your fortress keep where you keep the cobalt bomb and dobermans and make all of your staff wear tinfoil caps while looking at the data). It might even be mandated by law for certain forms of data -- the federal government just passed a sweeping right to privacy measure for health data, for example, that may well have highly explicit provisions for data transport and security. 5) Therefore... only someone with a death wish would send sensitive, valuable data for which they are responsible for security, through any transport layer not under their direct control and deemed secure of its own right, between secure sites, without encrypting it first (and otherwise complying with relevant federal and state laws, if any apply to the case at hand). Properly paranoid ITvolken would likely consider ALL transport layers including their own internal LAN not to be secure and would use ssh/ssl/vpn bidirectional encryption of all network traffic period. If it weren't for the fact that there is less motivation to encrypt the data on the physically secured actual server disks (so the only means of access are through dobermans and locked doors or by cracking the servers from outside, in which case you've already lost the game) one would extend the data encryption to the database itself, and I'm sure that there are sites that don't trust even their own staff or the moral character of their dobermans that so do. I don't want to THINK about what one has to endure to obtain access to e.g. NSA or certain military datasites -- probably body cavity searches in and out, shaved heads and paper suits, and metal detectors, that sort of thing...:-) > > However, there can be as much or as little physical security for the > > tape as you care to put there. Tape in a locked safe, tape in an > > armored car. > > dont forget to lock the car/safe too :-) > and log who goes in and out of the "safe" area :-) Ya, precisely. It is only partly a joke, you see. If my Duke HR or my medical records turn up on the street, with somebody purporting to be me cleaning out my bank account and maxing my visa, with my applications for health insurance denied because they've learned about my heavy drinking problem and all the crack that I smoke (I don't know where Jakob got the idea that I don't sit here fuming away all day:-) and the consequent liver failure and bouts of wild-eyed babbling (like this one, strangely enough:-), my plans for a fusion generator that you can build in your garage turning up being patented by Exxon and so forth Duke had DAMN WELL better be able to show my attorney and a court logs of who had access to this data, proofs that it was never left lying around in cars (locked or unlocked), proofs that it was transmitted in encrypted form, etc. Otherwise I'm detonating the cobalt bomb in my backyard and Duke will be a radioactive wasteland for a few kiloyears...(it is only a couple of miles away). This is the kind of thing that gives IT security officers ulcers. Duke's current SO is actually a former engineering school beowulfer (and good friend of mine) whose voice is scattered through the list archives (Chris Cramer). As a former 'wulfer (and EE), he is damn smart and computer-expert (and handsome and witty, just like everybody else on this list:-). However, he sweats bullets because Duke is a huge organization with lots of architectures scattered all over campus -- Windows here (any flavor), Macs there, Suns, Linux boxen, there are likely godforsaken nooks on campus that still have IBM mainframes and VAXes. Sensitive data is routinely served across the campus backbone and beyond (e.g. I can see my advisees' current transcripts where I sit at this very moment). Even with SSL, this data is vulnerable in fragments to any successful exploit on any client that belongs to any partially privileged person and that runs a vulnerable operating system. Hmmm, you say -- wasn't there recently an RPC exploit on a certain very common OS that permitted crackers to put anything they wanted including snoops on all cracked clients (not to mention a steady stream of lesser but equally troublesome invasions of the viral sort)? Didn't this cost institutions all over the world thousands of FTE hours to put right before somebody actually used it to steal access to valuable data? Why yes, I believe that there was! I believe it did! However, as one who got slammed (blush) a year ago on an unfortunately unpatched linux box and who has seen countless exploits succeed against all flavors of networked OS over many years, I avoid feeling too cocky about it. Nevertheless, Chris just keeps suckin' down the prozac and phillips cocktails dealing with crap like this and knowing that it is his butt on line should a malevolent attack succeed in compromising Duke's mountains of sensitive data (gulp) being served by minions whose primary systems expertise was developed back when knowing cobol was a part of the job description (gulp) running on servers with, um "interesting" base architectures (gulp)... > > I get the feeling that you just don't like tapes, Alvin...;-) > > not my first choice for backups .. even offsite backups... > > but if "management" takes out the $$$ to do tape backups... so it shall be > done ... > ideally, everything works ... but unfortunately, tapes > are highly prone to people's "oops i forgot to change it > yesterday" or the weekly catridge They are indeed (as the example I gave of a recent small-scale disaster at Duke clearly shows). A site run by a wise IT human would use a pretty rigorous protocol to regulate the process so that even if you have e.g. student labor doing the tape changes there is strict accountability and people checking the people checking the people who do the job, and so that tapes are randomly pulled every month and checked to be sure that the data is actually getting on the tapes in retrievable form. You can bet that Duke has such a process in place now, if they didn't before, although Universities tend to be a loose amalgamation of quasi-independent fiefdoms that accept control and adopt security measures for the common good and hire competent systems administrators and develop shared protocols for ensuring data integrity about as often and as easily as one would expect. (Sound of Chris in the background crunching another mylantin and washing it down with P&P:-) So in place or not, the risk remains. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 10 09:34:25 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 10 Oct 2003 09:34:25 -0400 (EDT) Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: <20031010065837.GH17432@unthought.net> Message-ID: On Fri, 10 Oct 2003, Jakob Oestergaard wrote: > On Thu, Oct 09, 2003 at 09:31:13PM -0400, Robert G. Brown wrote: > ... > > Each disk has about one fourth of the information. English is about 3:1 > > compressible (really more; this is using simple symbolic compression). > > A good cryptanalyst could probably recover "most" of what is on the > > disks from any one disk, depending on what kind of data is there. > > You overlook the fact that data on a RAID-5 is distributed in 'chunks' > of sizes around 4k-128k (depending...) Overlook, hell. I'm using my usual strategy of feigning knowledge with the complete faith that my true state of ignorance will be publically displayed to the entire internet. This humiliation, in turn, will eventually cause such mental anguish that I'll be able to claim mental disability and retire to tending potted plants on a disability check for the rest of my life... You probably noticed that I used the same strategy quite recently regarding things like factors of N in disk read speed estimates, certain components in disk latency, and oh, too many other things to mention. Pardon me if I babble on a bit this morning, but my lawy... erm, "psychiatrist" insists that I need fairly clear evidence of disability to get away with this. I personally find that smoking crack cocaine induces a pleasant tendency to babble nonsense. And there is no place to babble for the record like the beowulf list archives, I always say...:-) > So you would get the entire first 'Introduction to evil empire plans', > but the entire 'Subverting existing banana government' chapter may be on > one of the disks that you are missing. ... > I'm just thinking of distributing two tapes for each disk - one with > 200G of random numbers, the other with 200G of data XOR'ed with the data > from the first tape. Or just one tape, xor'd with 200G worth of random numbers generated from a cryptographically strong generator via a relatively short key that you can (as you note) send or carry separately and which is smaller, easier to secure, and less susceptible to degradation or loss than a second tape. It's cheaper that way, and even if you use two tapes people are going to try cracking the master tape by trying to guess the key+algorithm you almost certainly used to generate it (see below), so the xor is no stronger than the key+algorithm combination.;-) > Enter the one-time pad - unbreakable encryption (unless you get a hold > of both tapes of course). Or determine the method and key you used for (oxymoronically) generating 200 Gigarands (which is NOT going to be a hardware generator, I don't think, unless you are a very patient person or build/buy a quantum generator or the like -- entropy based things like /dev/random are too slow, and even quantum generators I've looked into are barely fast enough:-). > You'd need to make sure you have good random numbers - as an extra Ah, that's the rub. "Good random numbers" isn't quite an oxymoron. Why, there is even a government standard measure for cryptographic strength in the US (which many/most generators fail, by the way). Entropy based generators tend to be very slow -- order of 10-100 kbps depending on the source of entropy, last I looked. Quantum generators IIRC that rely on e.g. single photon transmission events at half-silvered mirrors have to run at light intensities where single photon events are discernible (rare, that is) and STILL have to wait for an autocorrelation time or ten before opening a window for the next event because even quantum events like this have an associated correlation time due to the existence of extended correlated states in the radiating system. Photon emission from a single atom itself is antibunched, for example, as after an emission the system requires time for the single radiating atom to regain a degree of excitation sufficient to enable re-emission. I believe that they can achieve more like 1 mbps of randomness or at least unpredictability. As you'd need 1.6x10^12 bits to encode your tape, you'd have to wait around 1.6x10^6 seconds to generate the key. That is, hmmm, between two and three week, twenty to thirty weeks with an entropy generator, unless you used a beowulf of entropy generators to shorten the time:-). Not exactly in the category of "generate a one-time pad while I go have a cup of coffee". Using a truly oxymoronic but much faster (and cryptographically strong) random number generator, e.g. the mt19937 from the GSL one can generate a respectable ballpark of 16 MBps (note B, not b) of random bytes and be done in a mere four hours. Alas, mt19937 is seeded from a long int and the seed probably doesn't have enough bits to be secure against a brute force attack, so one would likely have to fall back on one of the actual algorithms that permit the use of long keys (1024 bits or even more). > No no no no no! Think big! > > Think: cobalt bomb in own backyard - threaten anyone who steals your > data, that you'll make the planet inhabitable for a few hundred > decades unless they hand back your tapes. ;) > > (I'm drafting up 'Introduction to evil empire plans' soon by the way ;) Hmm, I'll have to mail you some of my lithium pills, Jakob. Your own prescription obviously ran out...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From msnitzer at lnxi.com Fri Oct 10 11:28:03 2003 From: msnitzer at lnxi.com (Mike Snitzer) Date: Fri, 10 Oct 2003 09:28:03 -0600 Subject: Intel compilers and libraries In-Reply-To: ; from cjtan@optimanumerics.com on Thu, Oct 09, 2003 at 10:04:20AM +0000 References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> Message-ID: <20031010092803.A5136@lnxi.com> On Thu, Oct 09 2003 at 04:04, C J Kenneth Tan -- Heuchera Technologies wrote: > Greg, > > > Is it a 100x100 matrix LU decomposition? Well, no, because Intel's > > MKL and the free ATLAS library run at a respectable % of peak. > > Our benchmarks concentrate on xGEQRF, xGESVD, xGETRF, xGETRS, xGESV, > xPOTRF, xPOSV, xPPTRF, xGEEV, extending to xGETRI, and xTRTRI. > > Have you tried DPPSV or DPOSV on Itanium, for example? I would be > interested in the percentage of peak that you achieve with MKL and > ATLAS, for up to 10000x10000 matrices. > > ATLAS does not have full LAPACK implementation. This gets ATLAS to provide its faster LAPACK routines to a full LAPACK library: http://math-atlas.sourceforge.net/errata.html#completelp Mike -- Mike Snitzer msnitzer at lnxi.com Linux Networx http://www.lnxi.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Patrick.Begou at hmg.inpg.fr Fri Oct 10 13:55:43 2003 From: Patrick.Begou at hmg.inpg.fr (Patrick Begou) Date: Fri, 10 Oct 2003 19:55:43 +0200 Subject: PVM errors at startup Message-ID: <3F86F29F.8A37AC5B@hmg.inpg.fr> Hi I'm new on this list so, just 2 lines about me: A small linux beowulf cluster (10 nodes) for computational fluids dynamics in south-est of France (National Polytechnique Institute from Grenoble) . I've just updated my cluster (from AMD1500+/ Eth100BT to P4 2.8G + Gigabit ethernet) and I've updated my system to Red-Hat 7.3, Kernel 2.4.20-20-7. The current version of pvm is pvm-3.4.4-2 from the RedHat 7.3. The previous system was RH7.1. Since this update I'm unable to start PVM from a node to another (with the add command). The console hang for several tenth of seconds then says OK. The pvmd3 is started on the remote node but the conf command do not show the additionnal node and I get these errors in the /tmp/pvml.xx file: [t80040000] 10/10 15:58:31 craya.hmg.inpg.fr (xxx.xxx.xxx.xxx:32772) LINUX 3.4.4 [t80040000] 10/10 15:58:31 ready Fri Oct 10 15:58:31 2003 [t80040000] 10/10 16:01:46 netoutput() timed out sending to craya02 after 14, 190.000000 [t80040000] 10/10 16:01:46 hd_dump() ref 1 t 0x80000 n "craya02" a "" ar "LINUX" dsig 0x408841 [t80040000] 10/10 16:01:46 lo "" so "" dx "" ep "" bx "" wd "" sp 1000 [t80040000] 10/10 16:01:46 sa 192.168.81.2:32770 mtu 4080 f 0x0 e 0 txq 1 [t80040000] 10/10 16:01:46 tx 2 rx 1 rtt 1.000000 id "(null)" rsh and rexec are working (from master to nodes, from nodes to master and from nodes to nodes). The transfert speed is near 600Mbits/s on the network (binary ftp on /dev/null) variables are set: PVM_ARCH=LINUX PVM_RSH=/usr/bin/rsh PVM_DPATH=/usr/local/pvm3/lib/LINUX/pvmd3 PVM_ROOT=/usr/local/pvm3 I've tried so manythings since thes last 3 days: - trying to compile install pvm3.4.4.tgz from sources file - uninstall iptables, ipchains and iplock. - remove /etc/security (to test this with root authority) - added .rhosts and hosts.equiv file - on the master eth0 is 100Mbits toward internet and eth1 is GB towards the nodes. I've tried the oposite config: eth0 become GB and eth1 100BT. Always the same problem! The cluster is down and I do not know where looking for a solution now.... If some one could help me solving this problem Thanks for your help Patrick -- =============================================================== | Equipe M.O.S.T. | http://most.hmg.inpg.fr | | Patrick BEGOU | ------------ | | LEGI | mailto:Patrick.Begou at hmg.inpg.fr | | BP 53 X | Tel 04 76 82 51 35 | | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | =============================================================== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Fri Oct 10 21:53:34 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: 10 Oct 2003 21:53:34 -0400 Subject: building a RAID system - 8 drives - drive-net - tapes In-Reply-To: <20031010054357.GB13480@sphere.math.ucdavis.edu> References: <20031010054357.GB13480@sphere.math.ucdavis.edu> Message-ID: <1065837212.18644.0.camel@QUIGLEY.LINIAC.UPENN.EDU> On Fri, 2003-10-10 at 01:43, Bill Broadley wrote: > On the hardware vs software RAID thread. A friend needed a few TB and > bought a high end raid card (several $k), multiple channels, enclosure, > and some 10's of 73GB drives for somewhere in the $50k-$100k neighborhood. > > He needed the capacity and a minumum of 50MB/sec sequential write > performance (on large sequential writes). He didn't get it. Call #1 to > dell resulted in well it's your fault, it's our top of the line, it should > be plenty fast, bleah, bleah, bleah. Call #2 lead to an escalation to > someone with more of a clue, tune paramater X, tune Y, try a different > raid setup, swap out X, etc. After more testing without helping call #3 > was escalated again someone fairly clued answered. The conversation went > along the lines of what, yeah, it's dead slow. Yeah most people only > care about the reliability. Oh performance? We use linux + software > raid on all the similar hardware we use internally at Dell. > > So the expensive controller was returned, and 39160's were used in it's > place (dual channel U160) and performance went up by a factor of 4 or > so. Can you give more concrete pointers to the hardware that they ended up using ? -- specifically the enclosure. Thanks! Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cjtan at optimanumerics.com Fri Oct 10 13:55:14 2003 From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies) Date: Fri, 10 Oct 2003 17:55:14 +0000 (UTC) Subject: Intel compilers and libraries In-Reply-To: <20031010092803.A5136@lnxi.com> References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <20031010092803.A5136@lnxi.com> Message-ID: Mike, > > Have you tried DPPSV or DPOSV on Itanium, for example? I would be > > interested in the percentage of peak that you achieve with MKL and > > ATLAS, for up to 10000x10000 matrices. > > > > ATLAS does not have full LAPACK implementation. > > This gets ATLAS to provide its faster LAPACK routines to a full LAPACK > library: > http://math-atlas.sourceforge.net/errata.html#completelp Inserting the LU factorization code from ATLAS to publicly available LAPACK will only get you faster LU code in the rest of the publicly available LAPACK library. You will not gain from QR factorization code, Cholesky factorization code, etc.. Ken ----------------------------------------------------------------------- C. J. Kenneth Tan, Ph.D. Heuchera Technologies Ltd. E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 ----------------------------------------------------------------------- This e-mail (and any attachments) is confidential and privileged. It is intended only for the addressee(s) stated above. If you are not an addressee, please accept my apologies and please do not use, disseminate, disclose, copy, publish or distribute information in this e-mail nor take any action through knowledge of its contents: to do so is strictly prohibited and may be unlawful. Please inform me that this e-mail has gone astray, and delete this e-mail from your system. Thank you for your co-operation. ----------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Sat Oct 11 13:01:17 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 11 Oct 2003 13:01:17 -0400 (EDT) Subject: Intel compilers and libraries In-Reply-To: Message-ID: > Inserting the LU factorization code from ATLAS to publicly available > LAPACK will only get you faster LU code in the rest of the publicly > available LAPACK library. You will not gain from QR factorization > code, Cholesky factorization code, etc.. oh, sure, but LU is the only important one because of top500 ;) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Sat Oct 11 16:16:12 2003 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Sat, 11 Oct 2003 15:16:12 -0500 Subject: Help in rsh In-Reply-To: ; from diego_naruto@hotmail.com on Sat, Oct 11, 2003 at 07:14:13PM +0000 References: Message-ID: <20031011151612.A22568@mikee.ath.cx> On Sat, 11 Oct 2003, diego lisboa wrote: > Hi, > I?m having problems with a cluster that i?ve had mount here, it?s a small > cluster with 3 machines, i already have instaled NIS and NFS and it?s > working very well, with red hat 9.0. When i use LAM 6.5.8 rpm it works > beutiful, but i need to install XMPI that needs LAM6.5.8.tar.gz (compiled) > and with trilliun, when i install on master it works, but on slaves i have a > problem with rsh, and hboot doens?t find "squema LAM" or something like > that. Someboy can help me? > Thanks Try something more simple first. What happens when you do $ rsh -l USER HOST uptime does that work? Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From diego_naruto at hotmail.com Sat Oct 11 15:14:13 2003 From: diego_naruto at hotmail.com (diego lisboa) Date: Sat, 11 Oct 2003 19:14:13 +0000 Subject: Help in rsh Message-ID: Hi, I?m having problems with a cluster that i?ve had mount here, it?s a small cluster with 3 machines, i already have instaled NIS and NFS and it?s working very well, with red hat 9.0. When i use LAM 6.5.8 rpm it works beutiful, but i need to install XMPI that needs LAM6.5.8.tar.gz (compiled) and with trilliun, when i install on master it works, but on slaves i have a problem with rsh, and hboot doens?t find "squema LAM" or something like that. Someboy can help me? Thanks _________________________________________________________________ MSN Hotmail, o maior webmail do Brasil. http://www.hotmail.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Sat Oct 11 19:10:29 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Sat, 11 Oct 2003 16:10:29 -0700 (PDT) Subject: building a RAID system - 8 drives - drive-net - tapes - preferences In-Reply-To: Message-ID: hi ya robert On Fri, 10 Oct 2003, Robert G. Brown wrote: > > not trying to protect the tapes against the [cr/h]ackers ( different > > ball game ) and even not protecting against the spies of nsa/kgb etc > > either ( whole new ballgame for those types of backup issues ) > > Hmmm, this is morphing offtopic, but data security is a sufficiently > universal problem that I'll chance one more round. Pardon me while I > light up my crack pipe here...:-7... so I can babble properly. Ah, ya, > NOW I'm awake...:-) humm .. gimme some of that :-) > The point is that you cannot do this with precisely HR records or salary > records or employee reviews. If anybody gets hold of the data (casually > or not) be it on tape or disk or the network while it is in transit then > you are liable to the extent that you failed to take adequate measures > to ensure the data's security. By the numbers: security of clusters vs security of normal compute environments and normal users from home and/or w/ laptops requires varying degreee of security policies - from looking at the various incoming sven virus (MS update virus stuff) - about 75% of the incoming junk is coming from (mis-managed) clusters 80% of the security issues will be due to internal folks and not the outsiders.. and i'd hate to be the one responsible for security on an university network where there are tons of bright young and ambitious kids looking for a "trophy" my security rules, assume the hacker is sitting in the firewall .. w/ root passwds .. now protect your data is my model ... - if they have a keyboard sniffer installed .. game over .. ( there'd be no need to guess what the pass phrase was ) c ya alvin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From victor_ms at brturbo.com Sun Oct 12 11:27:23 2003 From: victor_ms at brturbo.com (Victor Lima) Date: Sun, 12 Oct 2003 12:27:23 -0300 Subject: Benchmarks Message-ID: <3F8972DB.6080802@brturbo.com> Hi All. I'm new on list. Well I have a small linux clusters with 18 P4 2.8 Ghz with FastEthernet 100Mbits I need some benchmarks softwares for Latency, Thoughtput on Ethernet, etc. Ate. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From laytonjb at bellsouth.net Sun Oct 12 19:10:07 2003 From: laytonjb at bellsouth.net (Jeffrey B. Layton) Date: Sun, 12 Oct 2003 19:10:07 -0400 Subject: Benchmarks In-Reply-To: <3F8972DB.6080802@brturbo.com> References: <3F8972DB.6080802@brturbo.com> Message-ID: <3F89DF4F.1070500@bellsouth.net> I'm suprised no one has jumped on this yet. There are several packages for testing basic network performance from one node to another. My personal favorite is netpipe: http://www.scl.ameslab.gov/netpipe/ The other one is netperf: http://www.netperf.org/netperf/NetperfPage.html The web pages are pretty good about explaining things. Good Luck! Jeff > Hi All. > I'm new on list. > Well I have a small linux clusters with 18 P4 2.8 Ghz with > FastEthernet 100Mbits > I need some benchmarks softwares for Latency, Thoughtput on Ethernet, > etc. > Ate. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Mon Oct 13 03:34:33 2003 From: john.hearns at clustervision.com (John Hearns) Date: Mon, 13 Oct 2003 09:34:33 +0200 (CEST) Subject: Benchmarks In-Reply-To: <3F8972DB.6080802@brturbo.com> Message-ID: On Sun, 12 Oct 2003, Victor Lima wrote: > Hi All. > I'm new on list. > Well I have a small linux clusters with 18 P4 2.8 Ghz with FastEthernet > 100Mbits > I need some benchmarks softwares for Latency, Thoughtput on Ethernet, etc. Have a look at Pallas http://www.pallas.com/e/products/pmb/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Mon Oct 13 09:38:36 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Mon, 13 Oct 2003 15:38:36 +0200 Subject: Intel and GNU C++ compilers Message-ID: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Hello: I just wanna thank everybody for the responses to my last question about Intel compiler, I tried both 'gcc' and 'icc', and got the following results for one of our work files containing 10^6 steps of calculation: ************************** *** gcc version 2.95.4 *** ************************** flags bin-size elapsed-time ----- -------- ------------ none 9.5 KB 311 sec "-O3" 8.7 KB 192 sec "-O3 -ffast-math" 8.7 KB 165 sec ******************************************** *********************** *** icc version 7.1 *** *********************** flags bin-size elapsed-time ----- -------- ------------ none 597 KB 100 sec "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0" 563 KB 89 sec **************************************************************** the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions respectively, I guess that using a newer 'gcc', capable of '-march=pentium4' and SSE2 extensions would improve 'gcc' results. I am running on a Dual Xeon 2.4 Ghz machine, with 2Gb of RAM. I use Debian Woody with a 2.4.22 kernel compiled by myself. HyperThreading is disabled at the BIOS level. The test were run on one processor only. Thanks, Jose M. Perez. Madrid, Spain. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Mon Oct 13 12:04:47 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Mon, 13 Oct 2003 12:04:47 -0400 (EDT) Subject: Intel and GNU C++ compilers In-Reply-To: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Message-ID: > *** gcc version 2.95.4 *** that's god-aweful ancient. > none 9.5 KB 311 sec > "-O3" 8.7 KB 192 sec > "-O3 -ffast-math" 8.7 KB 165 sec -fomit-frame-pointer usually helps, sometimes noticably, since x86 is so short of registers. -O3 is often not better than -O2 or -Os, mainly because of interactions between unrolling, Intel's microscopic L1's, and the difficulty of scheduling onto a tiny reg set... I'd be surprised if 3.3 or 3.4 (pre-release) didn't perform noticably better. > flags bin-size elapsed-time > ----- -------- ------------ > none 597 KB 100 sec > "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0" 563 KB 89 sec isn't -tpp2 redundant if you have -xW? > the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions > respectively, I guess that using a newer 'gcc', capable of '-march=pentium4' > and SSE2 extensions would improve 'gcc' results. yes. '-march=pentium4 -fpmath=sse' seems to do it. gcc doesn't have an auto-vectorizer yet, unfortunately. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From indigoneptune at yahoo.com Mon Oct 13 13:37:47 2003 From: indigoneptune at yahoo.com (stanley george) Date: Mon, 13 Oct 2003 10:37:47 -0700 (PDT) Subject: benchmarks for performance Message-ID: <20031013173747.37343.qmail@web14912.mail.yahoo.com> Hi, I have a cluster of 8 P-III machines running redhat 8. I am trying to measure combined performance in MFLOPS. I have tried using linpakd and 1000d. It gives me an error with 'Make.inc' file while compiling. How do I get rid of this? Which are the other bechmarking sotwares that I could use? Thank you very much Stanley George __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Mon Oct 13 12:22:17 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Mon, 13 Oct 2003 18:22:17 +0200 Subject: Intel and GNU C++ compilers In-Reply-To: <20031013133836.GA1083@sgirmn.pluri.ucm.es> References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Message-ID: <200310131822.17717.joachim@ccrl-nece.de> Jos? M. P?rez S?nchez: > I just wanna thank everybody for the responses to my last question about > Intel compiler, I tried both 'gcc' and 'icc', and got the following results > for one of our work files containing 10^6 steps of calculation: Jos?, thanks for the information, but you really should (also) use the latest gcc (3.3x) for such a comparision. It will be interesting to see how it performs relative to the latest icc on the one hand, and to the old gcc on the other hand. And some information on the application (or libraries used) would be helpful, too. Like: is it memory-bound or compute-bound, etc.. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Mon Oct 13 15:26:55 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Mon, 13 Oct 2003 12:26:55 -0700 Subject: Intel and GNU C++ compilers In-Reply-To: References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Message-ID: <20031013192655.GC16033@greglaptop.internal.keyresearch.com> On Mon, Oct 13, 2003 at 12:04:47PM -0400, Mark Hahn wrote: > -fomit-frame-pointer usually helps, sometimes noticably, > since x86 is so short of registers. Actually it's a lot more of a tossup than it used to be: having a frame pointer means you have another 256 bytes accessible via a single-byte offset, and the SSE registers help relieve the register pressure problem. On the Opteron, which has more of both general purpose and SSE registers, the frame pointer is often a win. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Mon Oct 13 21:21:34 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Mon, 13 Oct 2003 18:21:34 -0700 (PDT) Subject: The Canadian Internetworked Scientific Supercomputer Message-ID: <20031014012134.21517.qmail@web11403.mail.yahoo.com> Just found an interesting paper written by Paul Lu (the auther of PBSWeb): http://hpcs2003.ccs.usherbrooke.ca/papers/Lu.pdf CISS homepage: http://www.cs.ualberta.ca/~ciss/ Rayson __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Tue Oct 14 10:19:11 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Tue, 14 Oct 2003 16:19:11 +0200 Subject: Intel and GNU C++ compilers In-Reply-To: <200310131822.17717.joachim@ccrl-nece.de> References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> <200310131822.17717.joachim@ccrl-nece.de> Message-ID: <20031014141911.GA995@sgirmn.pluri.ucm.es> On Mon, Oct 13, 2003 at 06:22:17PM +0200, Joachim Worringen wrote: > thanks for the information, but you really should (also) use the latest gcc > (3.3x) for such a comparision. It will be interesting to see how it performs > relative to the latest icc on the one hand, and to the old gcc on the other > hand. > > And some information on the application (or libraries used) would be helpful, > too. Like: is it memory-bound or compute-bound, etc.. > > Joachim I installed gcc-3.3.2 from the debian testing distribution, here it is the full report including gcc-3.3.2: ************************** *** gcc version 2.95.4 *** ************************** flags bin-size elapsed-time ----- -------- ------------ none 9.5 KB 311 sec "-O3" 8.7 KB 192 sec "-O3 -ffast-math" 8.7 KB 165 sec ******************************************** ************************* *** gcc version 3.3.2 *** ************************* flags bin-size elapsed-time ----- -------- ------------ none 9.1 KB 245 sec "-O3" 8.8 KB 161 sec "-O2" 8.7 KB 157 sec "-O2 -ffast-math -fomit-frame-pointer" 8.5 KB 127 sec "-O2 -ffast-math" 8.5 KB 125 sec "-O2 -ffast-math -march=pentium4" 8.5 KB 120 sec "-O2 -ffast-math -march=pentium4 -msse2" 8.5 KB 120 sec "-O3 -ffast-math -march=pentium4 -msse2" 8.5 KB 120 sec ******************************************** *********************** *** icc version 7.1 *** *********************** flags bin-size elapsed-time ----- -------- ------------ none 597 KB 100 sec "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0" 563 KB 89 sec **************************************************************** For this test, we actually wrote a version of the program with many parameters hardcoded, so that we make it as compute bound as posible, we aimed at evaluating how the different compilers took advantage of the Xeon processors. I will repeat the tests with the full version, which includes more memory usage, maybe about 80Mb each process, but it will finally depend on how big we make the files we use to split the calculations. The main calculation is the phase of a particle, we use an implementation of the MersenneTwister algorithm: http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html and have to compute sqrt(-2*log(x)/x) and sin(C*x/y) (x and y are not position, they correspond to other variables in the program), C is a constant hardcoded in the code like sin(9.7438473847*x/y). I measured how much it it took to compute sqrt(-2*log(x)/x), and it was about 412 processor cycles (I used rdtscll() ). I will submit other results as soon as I get them, probably using another computing algorithm which runs quite faster. Regards, Jose M. Perez. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Tue Oct 14 10:32:19 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Tue, 14 Oct 2003 16:32:19 +0200 Subject: Intel and GNU C++ compilers In-Reply-To: References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> Message-ID: <20031014143219.GB995@sgirmn.pluri.ucm.es> On Mon, Oct 13, 2003 at 04:40:31PM +0000, C J Kenneth Tan -- Heuchera Technologies wrote: > Jose, > > Can we benchmark our OptimaNumerics Linear Algebra Library with you on > the same machine? > > Thank you very much! > > > Best wishes, > Kenneth Tan > ----------------------------------------------------------------------- > C. J. Kenneth Tan, Ph.D. > Heuchera Technologies Ltd. > E-mail: cjtan at OptimaNumerics.com Telephone: +44 798 941 7838 > Web: http://www.OptimaNumerics.com Facsimile: +44 289 066 3015 > ----------------------------------------------------------------------- Hi Kenneth: Thank you very much for your message, unfortunately we have a pretty tied schedule here, and lot's of different things to do. Right now I cannot spend time benchmarking your library on my system, and we cannot provide access to anyone from outside. On the other hand I don't know if the calculations I am running at this moment can exploit your libraries. Thanks again and best regards, Jose M. Perez Madrid. Spain. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From michael.fitzmaurice at ngc.com Mon Oct 13 10:52:18 2003 From: michael.fitzmaurice at ngc.com (Fitzmaurice, Michael) Date: Mon, 13 Oct 2003 07:52:18 -0700 Subject: Beowulf Users Group meeting Message-ID: <03E95480F0B2D042A7598115FB3F5D9D49F3E4@XCGVA009> Please join us at the Baltimore-Washington Beowulf Users Group meeting this Tuesday the 14th at 2:45 at the Northrop Grumman building on 7575 Colshire Drive; McLean, VA 22102. For more details please go to Who should attend? Sales, marketing and Business Development people Pre sales engineers High Performance Computer professionals IT generalist Data Center Managers Program and Project Managers Beowulf Clusters installations are one of the fastest growing areas with in the IT market. Beowulf Clusters are replacing old slower SMP systems for half the cost and with twice the performance. Beowulf Clusters will grow even faster with the introduction of easier to use parallel programming tools. Engineered Intelligence is leading the revolution in break through parallel programming tools for the HPC market. So now application on older SMP machines can be easily moved to COTS cost effective Intel or AMD based servers, which have been clustered to improve performance and reduce costs. Come hear the folks from Engineered Intelligence how your projects can use C x C to make your applications ready to use Beowulf Clusters today. This will be one of our best topics regarding the Beowulf Cluster market. There is no cost for the briefing and you do not need to be a BWBUG member. As always there will be great door prizes and free parking. If you can not make it to the meeting pass the word to a colleague or business associate. T. Michael Fitzmaurice, Jr. Coordinator of the BWBUG 8110 Gatehouse Road, Suite 400W Falls Church, VA 22042 703-205-3132 office 240-475-7877 cell _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From iosephus at sgirmn.pluri.ucm.es Tue Oct 14 12:08:50 2003 From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=) Date: Tue, 14 Oct 2003 18:08:50 +0200 Subject: Pentium4 vs Xeon Message-ID: <20031014160850.GA1163@sgirmn.pluri.ucm.es> Hi: We are going to buy a second machine! :-) It will be a diskless dual processor node. We are thinking about buying the same configuration: Xeon 2.4Ghz 533Mhz FSB, but since Xeon and the motherboards supporting them are so expensive, we have been thinking about dual normal Pentium4 instead. We don't have now any P4 comparable processor to run some tests, and after looking at the Intel docs, the only difference we see between Xeon and P4 is Xeon having more cache. Does anyone has any idea about the relative performance of these processors, what about the price/performance ratio? Is it worth paying for more Xeon? The other point I wanna ask about is the "host bus speed" reported by the kernel at boot time, it reports 133Mhz, and our memories are supposed to run at 266Mhz, is it normal, is it just the double rate thing? Thanks in advance, Jose M. Perez. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Patrick.Begou at hmg.inpg.fr Tue Oct 14 12:53:54 2003 From: Patrick.Begou at hmg.inpg.fr (Patrick Begou) Date: Tue, 14 Oct 2003 18:53:54 +0200 Subject: PVM errors at startup References: <3F86F29F.8A37AC5B@hmg.inpg.fr> Message-ID: <3F8C2A22.83F751A3@hmg.inpg.fr> This email just to close the thread with the solution. The problem was not related to any PVM misconfiguration but to the ethernet driver. Looking at the ethernet communications between 2 nodes with tcpdump has shown that pvmd was started using tcp communications BUT that pvmd were trying to talk each other with UDP protocol (it is also detailed in the PVM doc) and this was the problem. The UDP communications was unsuccessfull between the nodes. Details: The nodes are P4 2.8 with Asustek P4P800 motherboard and on board 3C940 (gigabit) controler. I was using the 3c2000 driver (from the cdrom). Kernel is 2.4.20-20.7bigmem from RedHat 7.3. rsh, rexec and rcp are working fine but this driver seems not to work with UDP protocol??? The solution was to download the sk68lin driver (v6.18) and run the shell script to patch the kernel sources for the current kernel. Then correct the module.conf file and set up the gigabit interface. Now PVM is working fine between the two first nodes and the measured throughput is the same as with 3c2000 asustek driver. I should now setup the other nodes! I would like to thanks Pr. Kenneth R. Koehler and Dr James Arthur Kohl for their great help in checking the full PVM configuration and leading me towards a network driver problem. Patrick -- =============================================================== | Equipe M.O.S.T. | http://most.hmg.inpg.fr | | Patrick BEGOU | ------------ | | LEGI | mailto:Patrick.Begou at hmg.inpg.fr | | BP 53 X | Tel 04 76 82 51 35 | | 38041 GRENOBLE CEDEX | Fax 04 76 82 52 71 | =============================================================== _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Tue Oct 14 13:38:35 2003 From: josip at lanl.gov (Josip Loncaric) Date: Tue, 14 Oct 2003 11:38:35 -0600 Subject: Pentium4 vs Xeon In-Reply-To: <20031014160850.GA1163@sgirmn.pluri.ucm.es> References: <20031014160850.GA1163@sgirmn.pluri.ucm.es> Message-ID: <3F8C349B.5040302@lanl.gov> Jos? M. P?rez S?nchez wrote: > [...] we have been thinking about dual normal Pentium4 [...] SMP operation and larger caches appear to be threshold features in Xeons. Old Pentium III could be used in duals, but Intel's marketing has changed. Normal Pentium4 is *not* dual processor enabled: http://www.intel.com/products/desktop/processors/pentium4/index.htm?iid=ipp_browse+dsktopprocess_p4p& http://www.intel.com/products/server/processors/server/xeon/index.htm?iid=ipp_browse+srvrprocess_xeon512& If you really want a fast dual CPU machine from Intel, you'll probably have to pay for a Xeon... Sincerely, Josip _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Tue Oct 14 13:40:46 2003 From: djholm at fnal.gov (Don Holmgren) Date: Tue, 14 Oct 2003 12:40:46 -0500 Subject: Pentium4 vs Xeon In-Reply-To: <20031014160850.GA1163@sgirmn.pluri.ucm.es> References: <20031014160850.GA1163@sgirmn.pluri.ucm.es> Message-ID: On Tue, 14 Oct 2003, [iso-8859-1] Jos? M. P?rez S?nchez wrote: > Hi: > > We are going to buy a second machine! :-) It will be a diskless dual > processor node. We are thinking about buying the same configuration: > Xeon 2.4Ghz 533Mhz FSB, but since Xeon and the motherboards supporting > them are so expensive, we have been thinking about dual normal Pentium4 > instead. We don't have now any P4 comparable processor to run some > tests, and after looking at the Intel docs, the only difference we see > between Xeon and P4 is Xeon having more cache. Does anyone has any > idea about the relative performance of these processors, what about the > price/performance ratio? Is it worth paying for more Xeon? > > The other point I wanna ask about is the "host bus speed" reported by > the kernel at boot time, it reports 133Mhz, and our memories are > supposed to run at 266Mhz, is it normal, is it just the double rate > thing? > > Thanks in advance, > > Jose M. Perez. The major difference between P4 and Xeon is that P4's are available with up to 800 MHz FSB, and Xeon's with up to 533 MHz FSB. If your code is sensitive to memory bandwidth, a P4 can be a big win. Otherwise they are essentially equivalent. P4 and standard Xeon both have 512K L2 caches. Xeon's with larger L2 caches are available, but if I'm not mistaken there's a big price difference. Pricewise (YMMV), cheap desktop P4's can be had very roughly for half the price of a comparable dual Xeon. You may very well prefer to admin half the number of boxes and so would prefer the Xeon. If you are using an expensive interconnect, you may also come out ahead with the dual processor boxes, buying only half of the PCI adapters and half the switch ports. Currently P4 motherboards are only available (AFAIK) with 33MHz/32bit PCI. That can be a big bottleneck if your cluster application is sensitive to I/O bandwidth. Early in 2004, if the rumours are true, there will be a P4 chipset supporting 66MHz/64bit PCI-X. And in late 2004, PCI Express should be available on both P4 and Xeon motherboards, providing a big increase in I/O bandwidth if one has a network which can take advantage. Xeon's and P4's do four transfers per clock - so, a 533MHz FSB is really a 133MHz clock doing 4 transfers per cycle. The kernel on my 800 MHz FSB P4 reports a 200 MHz host bus speed. Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Tue Oct 14 17:14:26 2003 From: rodmur at maybe.org (Dale Harris) Date: Tue, 14 Oct 2003 14:14:26 -0700 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: <200310011613.46297.lepalom@upc.es> Message-ID: <20031014211426.GI8116@maybe.org> On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated: > > > > 54.2 You know... one problem I see with this, assuming this information is going to pass across the net (or did I miss something). Is that instead of passing something like four bytes (ie "54.2"), you are going to be passing 56 bytes (just counting the cpu_temp line). So the XML blows up a little bit of data 14 times. I can't see this being particularly efficient way of using a network. Sure, it looks pretty, but seems like a waste of bandwidth. -- Dale Harris rodmur at maybe.org /.-) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Tue Oct 14 18:13:53 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue, 14 Oct 2003 18:13:53 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031014211426.GI8116@maybe.org> Message-ID: > > > > > > 54.2 > > You know... one problem I see with this, assuming this information is > going to pass across the net (or did I miss something). Is that instead > of passing something like four bytes (ie "54.2"), you are going to be > passing 56 bytes (just counting the cpu_temp line). So the XML blows up > a little bit of data 14 times. I can't see this being particularly > efficient way of using a network. Sure, it looks pretty, but seems like > a waste of bandwidth. I'm sure some would claim that 56 bytes is not measurable overhead, especially considering the size of tcp/eth/etc headers. but it's damn ugly, to be sure. this sort of thing has been discussed several times on the linux-kernel list as well - formatting of /proc entries. it's clear that some form of human-readability is a good thing. what's not clear is that it has to be so exceptionally verbose. think of it this way: lmsensors output for a machine is a record whose type will not change (very fast, if you insist!). so why should all the metadata about the record format, units, etc be sent each time? suppose you could fetch the fully verbose record once, and then on subsequent queries, just get '54.2 56.7 40.1 3650 4150 5.0 3.3 12.0 -12.0'. the only think you've lost is same-packet-self-description (and, incidentally, insensitivity to reordering of elements...) there *is* actually a very mind-bending binarification procedure for xml. it seems totally cracked to me, though, since afaikt, it completely tosses the self-description aspect, which is almost the main point of xml... of course, the whole xml thing is a massive fraud, since it does nothing at all towards actual interoperability - there must already be thousands of different xml schemas for "SKU", each better than the last, and therefore mutually incompatible... does ASN.1 improve on this situation at all? regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Oct 14 20:45:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 14 Oct 2003 20:45:12 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031014211426.GI8116@maybe.org> Message-ID: On Tue, 14 Oct 2003, Dale Harris wrote: > On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated: > > > > > > > > 54.2 > > > > You know... one problem I see with this, assuming this information is > going to pass across the net (or did I miss something). Is that instead > of passing something like four bytes (ie "54.2"), you are going to be > passing 56 bytes (just counting the cpu_temp line). So the XML blows up > a little bit of data 14 times. I can't see this being particularly > efficient way of using a network. Sure, it looks pretty, but seems like > a waste of bandwidth. Ah, an open invitation to waste a little more:-) Permit me to rant (the following can be freely skipped by the rant-averse:-). Note that this is not a flame, merely an impassioned assertion of an admittedly personal religious viewpoint. Like similar rants concerning the virtues of C vs C++ vs Fortran vs Java or Python vs Perl, it is intended to amuse or possible educate, but doubtless won't change many human minds. This is an interesting question and one I kicked around a long time when designing xmlsysd. Of course it is also a very longstanding issue -- as old as computers or just about. Binary formats (with need for endian etc translation) are obviously the most efficient but are impossible to read casually and difficult to maintain or modify. Compressed binary (or binary that only uses e.g. one bit where one bit will do) the most impossible and most difficult. Back in the old days, memory and bandwidth on all computers was a precious and rare thing. ALL programs tended to use one bit where one bit was enough. Entire formats with headers and metadata and all were created where every bit was parsimoniously allocated out of a limited pool. Naturally, those allocations proved to be inadequate in the long run so that only a few years ago lilo would complain if the boot partition had more than 1023 divisions because once upon a time somebody decided that 10 bits was all this particular field was ever going to get. In order to parse such a binary stream, it is almost essential to use a single library to both format and write the stream and to read and parse it, and to maintain both ends at the same time. Accessing the data ONLY occurs through the library calls. This is a PITA. Cosmically. Seriously. Yes, there are many computer subsystems that do just this, but they are nightmarish to use even via the library (which from a practical point of view becomes an API, a language definition of its own, with its own objects and tools for creating them and extracting them, and the need to be FULLY DOCUMENTED at each step as one goes along) and require someone with a high level of devotion and skill to keep them roughly bugfree. For example, if you write your code for single CPU systems, it becomes a major problem to add support for duals, and then becomes a major problem again to add support for N-CPU SMPs. Debugging becomes a multistep problem -- is the problem in the unit that assembles and provides the data, the encoding library, the decoding library (both of which are one-offs, written/maintained just for the base application) or is it in the client application seeking access to the data? Fortunately, in the old days, nearly all programming was done by professional programmers working for a wage for giant (or not so giant) companies. Binary interfaces were ideal -- they became Intellectual Property >>because<< they were opaque and required a special library whose source was hidden to access the actual binary, which might be entirely undocumented (except via its API library calls). BECAUSE they were so bloomin' hidden an difficult/expensive to modify, software evolved very, very slowly, breaking like all hell every time e.g. MS Word went from revision 1 to 2 to 3 to... because of broken binary incompatibility. ASCII, OTOH, has the advantage of being (in principle) easy to read. However, it is easy to make it as obscure and difficult to read as binary. Examples abound, but let's pull one from /proc, since the entire /proc interface is designed around the premise that ascii is good relative to binary (although that seems to be the sole thing that the many designers of different subsystems agree on). When parsing the basic status data of an application, one can work through: rgb at lilith|T:105>cat /proc/1214/stat 1214 (pine) S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0 14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168 4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0 (which, as you can see, contains the information on the pine application within which I am currently working on my laptop). What? You find that hard to read? Surely it is obvious that the first field is the PID, the second the application name (inside parens, introducing a second, fairly arbitrary delimiter to parse), the runtime status (which is actually NOT a single character, it can vary) and then... ooo, my. Time to check out man proc, kernel source (/usr/src/linux/fs/proc/array.c) and maybe the procps sources. One does better with: rgb at lilith|T:106>cat /proc/1214/status Name: pine State: S (sleeping) Tgid: 1214 Pid: 1214 PPid: 1205 TracerPid: 0 Uid: 1337 1337 1337 1337 Gid: 1337 1337 1337 1337 FDSize: 32 Groups: 1337 0 VmSize: 11752 kB VmLck: 0 kB VmRSS: 5652 kB VmData: 2496 kB VmStk: 52 kB VmExe: 2804 kB VmLib: 3708 kB SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 8000000008001003 SigCgt: 0000000040016c5c CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 This is an almost human readable view of MUCH of the same data that is in /proc/stat. Of course there is the little ASCII encoded hexadecimal garbage at the bottom that could make strong coders weep (again, without a fairly explicit guide into what every byte or even BIT in this array does, as one sort of expects that there are binary masked values stuck in here). In this case man proc doesn't help -- because this is supposedly "human readable" they don't provide a reference there. Still, some of the stuff that is output by ps aux is clearly there in a fairly easily parseable form. Mind you, there are still mysteries. What are the four UID entries? What is the resolution on the memory, and are kB x1000 or x1024? What about the rest of the data in /proc/stat (as there are a lot more fields there). What about the contents of /proc/PID/statm? (Or heavens preserve us, /proc/PID/maps)? Finally, what about other things in /proc, e.g.: rgb at lilith|T:119>cat /proc/stat cpu 3498 0 2122 239197 cpu0 3498 0 2122 239197 page 128909 55007 swap 1 0 intr 279199 244817 13604 0 3427 6 0 4 4 1 3 2 2 1436 0 15893 0 disk_io: (3,0):(15946,11130,257194,4816,109992) ctxt 335774 btime 1066170139 processes 1261 Again, ASCII yes, but now (count them) there are whitespace, :, (, and ',' separators, and one piece of data (the CPU's index) is a part of a field value (cpu0) so that the entire string "cpu" becomes a sort of separator (but only in one of the lines). An impressive ratio of separators used to field labels. I won't even begin to address the LIVE VILE EVIL of overloading nested data structures nested in sequential, arbitrary separators inside the "values" for a single field, disk_io (or is that disk_io:?) If this isn't enough for you, consider /proc/net/dev, which has two separators (: and ws) but is in COLUMNS, /proc/bus/pci/devices (which I still haven't figured out) and yes, the aforementioned sensors interface in /proc. I offer all of the above as evidence of a fairly evil (did you ever notice how evil, live, vile, veil and elvi are all anagrams of one another he asks in a mindless parenthetical insertion to see if you're still awake:-) middle ground between a true binary interface accessible only through library calls (which can actually be fairly clean, if one creates objects/structs with enough mojo to hold the requisite data types so that one can then create a relatively simple set of methods for accessing them) and xml. XML is the opposite end of the binary spectrum. It asserts as its primary design principle that the objects/structs with the right kind of mojo share certain features -- precisely those that constitute the rigorous design requirements of XML (nesting, attributes, values, etc). There is a fairly obvious mapping between a C struct, a C++ object, and an XMLified table. It also asserts implicitly that whether or not the object tags are chosen to be human readable (nobody insists that the tags encapsulating CPU temperature readings be named -- they could have been just ) there MUST be some sort of dictionary created at the same time as the XML implementation. If (very) human readable tags are chosen they are nearly self-documenting, but whole layers of DTD and CSS and so forth treatment of XML compliant markup are predicated upon a clear definition of the tag rules and hierarchy. Oh, and by its very design XML is highly scalable and extensible. Just as one can easily enough add fields into a struct without breaking code that uses existing fields, one can often add tags into an XML document description without breaking existing tags or tag processing code (compare with adding a field anywhere into /proc/stat -- ooo, disaster). This isn't always the case in either case -- sometimes one converts a field in a struct into a struct in its own right, for example, which can do violence to both the struct and an XML realization of it. Still, often one can and when one can't it is usually because you've had a serious insight into the "right" way to structure your data and before the encoding was just plain wrong in some deep way. This happens, but generally only fairly early in the design and implementation process. Note that XML need not be inefficient in transit. BECAUSE it is so highly structured, it compresses very efficiently. Library calls exist to squeeze out insignificant whitespace, for example (ignored by the parser anyway). I haven't checked recently to see whether compression is making its way into the library, but either way one can certainly compress/decompress and/or encrypt/decrypt the assembled XML messages before/after transmission, if CPU is cheaper to you than network or security is an issue. I think that it then comes down to the following. XML may or may not be perfect, but it does form the basis for a highly consistent representation of data structures that is NOT OPAQUE and is EASILY CREATED AND EASILY PARSED with STANDARD TOOLS AND LIBRARIES. When designing an XMLish "language" for your data, you can make the same kind of choices that you face in any program. Do you document your code or not? Do you use lots of variable names like egrp1 or do you write out something roughly human readable like extra_group_1? Do you write your loops so that they correspond to the actual formulae or basic algorithm (and let the compiler do as well as it can with them) or do you block them out to be cache-friendly, insert inline assembler, and so forth to make them much faster but impossible to read or remember even yourself six months after you write them? Some choices make the code run fast and short but hard to maintain. Other choices make it run slower but be more readable and easier to maintain. In the long run, I think most programmers eventually come to a sort of state of natural economy in most of these decisions; one that expresses their personal style, the requirements of their job, the requirements of the task, and a reflection of their experience(s) coding. It is a cost/benefit problem, after all (as is so much in computing). You have to ask how much it costs you to do something X way instead of Y way, and what the payoff/benefits are, in the long run. For myself only, years of experience have convinced me that as far as things like /proc or task/hardware monitoring are concerned, the bandwidth vs ease of development and maintenance question comes down solidly in favor of ease of development and maintenance. Huge amounts of human time are wasted writing parsers and extracting de facto data dictionaries from raw source (the only place where they apparently reside). Tools that are built to collect data from a more or less arbitrary interface have to be almost completely rewritten when that interface changes signficantly (or break horribly in the meantime). So the cost is this human time (programmers'), more human time (the time and productivity lost by people who lack the many tools a better interface would doubtless spawn), and the human time and productivity lost due to the bugs the more complex and opaque and multilayered interface generates. The benefit is that you save (as you note) anywhere from a factor of 3-4 to 10 or more in the total volume of data delivered by the interface. Data organization and human readability come at a price. But what is the REAL cost of this extra data? Data on computers is typically manipulated in pages of memory, and a page is what, 4096 bytes? Data movement (especially of contiguous data) is also very rapid on modern computers -- you are talking about saving a very tiny fraction of a second indeed when you reduce the message from 54 bytes to 4 bytes. Even on the network, on a 100BT connection one is empirically limited by LATENCY on messages less than about 1000 bytes in length. So if you ask how long it takes to send a 4 byte packet or a 54 byte packet (either one of which is TCP encapsulated inside a header that is longer than the data) the answer is that they take exactly the same amount of time (within a few tens of nanoseconds). If the data in question is truly a data stream -- a more or less continuous flow of data going through a channel that represents a true bottleneck, then one should probably use a true binary representation to send the data (as e.g. PVM or MPI generally do), handling endian translation and data integrity and all that. If the data in question is a relatively short (no matter how it is wrapped and encoded) and intermittant source -- as most things like a sensors interface, the proc interface(s) in general, the configuration file of your choice, and most net/web services are, arguably -- then working hard to compress or minimally encapsulate the data in an opaque form is hard to justify in terms of the time (if any) that it saves, especially on networks, CPUs, memory that are ever FASTER. If it doesn't introduce any human-noticeable delay, and the overall load on the system(s) in question remain unmeasurably low (as was generally the case with e.g. the top command ten Moore's Law years or more ago) then why bother? I think (again noting that this is my own humble opinion:-) that there is no point. /proc should be completely rewritten, probably by being ghosted in e.g. /xmlproc as it is ported a little at a time, to a single, consistent, well documented xmlish format. procps should similarly be rewritten in parallel with this process, as should the other tools that extract data from /proc and process it for human or software consumption. Perhaps experimentation will determine that there are a FEW places in /proc where the extra overhead of parsing xml isn't acceptable for SOME applications -- /proc/pid/stat for example. In those few cases it may be worthwhile to make the ghosting permanent -- to provide an xmlish view AND a binary or minimal ASCII view, as is done now, badly, with /proc/pid/stat and /proc/pid/status. This is especially true, BTW, in open source software, where a major component of the labor that creates and maintains both low level/back end service software and high level/front end client software is unpaid, volunteer, part time, and of a wide range of skill and experience. Here the benefits of having a documented, rigorously organized, straightforwardly parsed API layer between tools are the greatest. Finally, to give the rotting horse one last kick, xmlified documents (deviating slightly from API's per se) are ideal for archival storage purposes. Microsoft is being scrutinized now by many agencies concerned about the risks associated from having 90% of our vital services provided by an operating system that has proven in practice to be appallingly vulnerable. Their problem has barely begun. The REAL expense associated with using Microsoft-based documents is going to prove in the long run to be the expense of de-archiving old proprietary-binary-format documents long after the tools that created them have gone away. This is a problem worthy of a rant all by itself (and I've written one or two in other venues) but it hasn't quite reached maturity as it requires enough years of document accumulation and toplevel drift in the binary "standard" before it jumps out and slaps you in the face with six and seven figure expenses. XMLish documents (especially when accompanied by a suitable DTD and/or data dictionary) simply cannot cost that much to convert because their formats are intrinsically open. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From chrismiles1981 at hotmail.com Tue Oct 14 22:02:28 2003 From: chrismiles1981 at hotmail.com (Chris Miles) Date: Wed, 15 Oct 2003 03:02:28 +0100 Subject: Condor Problem Message-ID: Does anyone have any condor experience? im trying to submit a job which is a Borland C++ console application.. the application writes a final output to the screen... but this is not being saved to the output file i specified in the jobs configuration. When i use a simple batch file and echo some text to the screen and submit that as a job it works fine and the echoed text is in the output file. Is there a problem with condor? or is there a problem with c++ or stdout? any help would be greatly appreciated. Thanks in advance... Chris Miles, NeuralGrid, Paisley University, Scotland _________________________________________________________________ Express yourself with cool emoticons - download MSN Messenger today! http://www.msn.co.uk/messenger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kohlja at ornl.gov Tue Oct 14 15:08:54 2003 From: kohlja at ornl.gov (James Kohl) Date: Tue, 14 Oct 2003 15:08:54 -0400 Subject: PVM errors at startup In-Reply-To: <3F8C2A22.83F751A3@hmg.inpg.fr> References: <3F86F29F.8A37AC5B@hmg.inpg.fr> <3F8C2A22.83F751A3@hmg.inpg.fr> Message-ID: <20031014190854.GA31004@neo.csm.ornl.gov> Hey Patrick, Glad you found the problem. This is usually manifested when the networking config is off slightly, or when internal/external networks are confused, but it sounds like you had a much more interesting problem...! :-) Yes, PVM uses rsh/ssh/TCP to start a remote PVM daemon (pvmd) but then the daemons themselves use UDP to talk and route PVM messages. FYI, any PVM tasks that use the "PvmRouteDirect" will use direct TCP sockets. Again, glad you figured it out! (And you're most welcome! :) All the Best, Jim On Tue, Oct 14, 2003 at 06:53:54PM +0200, Patrick Begou wrote: > This email just to close the thread with the solution. > The problem was not related to any PVM misconfiguration but to the > ethernet driver. Looking at the ethernet communications between 2 nodes > with tcpdump has shown that pvmd was started using tcp communications > BUT that pvmd were trying to talk each other with UDP protocol (it is > also detailed in the PVM doc) and this was the problem. The UDP > communications was unsuccessfull between the nodes. > Details: > The nodes are P4 2.8 with Asustek P4P800 motherboard and on board 3C940 > (gigabit) controler. I was using the 3c2000 driver (from the cdrom). > Kernel is 2.4.20-20.7bigmem from RedHat 7.3. > rsh, rexec and rcp are working fine but this driver seems not to work > with UDP protocol??? > The solution was to download the sk68lin driver (v6.18) and run the > shell script to patch the kernel sources for the current kernel. Then > correct the module.conf file and set up the gigabit interface. Now PVM > is working fine between the two first nodes and the measured throughput > is the same as with 3c2000 asustek driver. I should now setup the other > nodes! > I would like to thanks Pr. Kenneth R. Koehler and Dr James Arthur Kohl > for their great help in checking the full PVM configuration and leading > me towards a network driver problem. > Patrick (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(: James Arthur "Jeeembo" Kohl, Ph.D. "Da Blooos Brathas?! They Oak Ridge National Laboratory still owe you money, Fool!" kohlja at ornl.gov http://www.csm.ornl.gov/~kohl/ Long Live Curtis Blues!!! :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rauch at inf.ethz.ch Wed Oct 15 04:49:26 2003 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed, 15 Oct 2003 10:49:26 +0200 (CEST) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: On Tue, 14 Oct 2003, Robert G. Brown wrote: > On Tue, 14 Oct 2003, Dale Harris wrote: > > On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated: > > > > > > > > > > > > 54.2 > > > > > > > > You know... one problem I see with this, assuming this information is > > going to pass across the net (or did I miss something). Is that instead > > of passing something like four bytes (ie "54.2"), you are going to be > > passing 56 bytes (just counting the cpu_temp line). So the XML blows up > > a little bit of data 14 times. I can't see this being particularly > > efficient way of using a network. Sure, it looks pretty, but seems like > > a waste of bandwidth. > > Ah, an open invitation to waste a little more:-) Isn't it a bit cynical to write a 20 KByte e-mail on the topic of saving 56 Bytes? ;-) SCNR, Felix -- Felix Rauch | Email: rauch at inf.ethz.ch Institute for Computer Systems | Homepage: http://www.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H16 | Phone: +41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: +41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From djholm at fnal.gov Wed Oct 15 11:43:09 2003 From: djholm at fnal.gov (Don Holmgren) Date: Wed, 15 Oct 2003 10:43:09 -0500 Subject: Some application performance results on a dual G5 Message-ID: For those who might be interested, I've posted some lattice QCD application performance results on a 2.0 GHz dual G5 PowerMac. See http://lqcd.fnal.gov/benchmarks/G5/ As expected from the specifications, strong memory bandwidth, reasonable scaling, and good floating point performance. Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 15 09:46:45 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 15 Oct 2003 09:46:45 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: On Wed, 15 Oct 2003, Felix Rauch wrote: > > > passing 56 bytes (just counting the cpu_temp line). So the XML blows up > > > a little bit of data 14 times. I can't see this being particularly > > > efficient way of using a network. Sure, it looks pretty, but seems like > > > a waste of bandwidth. > > > > Ah, an open invitation to waste a little more:-) > > Isn't it a bit cynical to write a 20 KByte e-mail on the topic of > saving 56 Bytes? ;-) Cynical? No, not really. Stupid? Probably. If only I could get SOMEBODY to pay me ten measely cents a word for my rants... Alas this is not to be. So the alternative is to see if I can extort ten cents from everybody on the list NOT to write 20K rants like this. Sort of like National Lampoon's famous "Buy this magazine or we'll shoot this dog" issue...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Wed Oct 15 14:16:06 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed, 15 Oct 2003 11:16:06 -0700 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: Message-ID: <20031015181606.GA1574@greglaptop.internal.keyresearch.com> On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote: > So the alternative is to see if I can extort > ten cents from everybody on the list NOT to write 20K rants like this. Do you accept pay-pal? Do you promise to spend all the money buying yourself beer? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From chrismiles1981 at hotmail.com Wed Oct 15 21:22:01 2003 From: chrismiles1981 at hotmail.com (Chris Miles) Date: Thu, 16 Oct 2003 02:22:01 +0100 Subject: Condor Problem Message-ID: Hi, thanks for the reply Using all this instead of condor/globus? The only thing was I need to do this on windows. What i want to do is setup a Grid but also need a cluster to run jobs on Chris >From: Andrew Wang >To: Chris Miles >CC: beowulf at beowulf.org >Subject: Re: Condor Problem >Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST) >MIME-Version: 1.0 >Received: from mc11-f10.hotmail.com ([65.54.167.17]) by >mc11-s20.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Wed, 15 Oct >2003 18:13:50 -0700 >Received: from web16812.mail.tpe.yahoo.com ([202.1.236.152]) by >mc11-f10.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Wed, 15 Oct >2003 18:11:09 -0700 >Received: from [65.49.83.96] by web16812.mail.tpe.yahoo.com via HTTP; Thu, >16 Oct 2003 09:11:03 CST >X-Message-Info: JGTYoYF78jHqyjkG27RbQOhxNCLEO1Jq >Message-ID: <20031016011103.41833.qmail at web16812.mail.tpe.yahoo.com> >Return-Path: andrewxwang at yahoo.com.tw >X-OriginalArrivalTime: 16 Oct 2003 01:11:09.0941 (UTC) >FILETIME=[62388A50:01C39382] > >If all you need is a batch system, I would suggest SGE >and Scalable PBS, which have more users and better >support. > >Both of them are free and opensource, so you can try >both and see which one you like better! > >SGE: http://gridengine.sunsource.net >SPBS: http://www.supercluster.org/projects/pbs/ > >Andrew. > >----------------------------------------------------------------- >?C???? Yahoo!?_?? >?????C???B?????????B?R?A???????A???b?H?????? >http://tw.promo.yahoo.com/mail_premium/stationery.html _________________________________________________________________ Stay in touch with absent friends - get MSN Messenger http://www.msn.co.uk/messenger _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Oct 15 21:11:03 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST) Subject: Condor Problem Message-ID: <20031016011103.41833.qmail@web16812.mail.tpe.yahoo.com> If all you need is a batch system, I would suggest SGE and Scalable PBS, which have more users and better support. Both of them are free and opensource, so you can try both and see which one you like better! SGE: http://gridengine.sunsource.net SPBS: http://www.supercluster.org/projects/pbs/ Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Oct 15 21:37:36 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed, 15 Oct 2003 18:37:36 -0700 Subject: Pentium4 vs Xeon In-Reply-To: References: <20031014160850.GA1163@sgirmn.pluri.ucm.es> <20031014160850.GA1163@sgirmn.pluri.ucm.es> Message-ID: <5.2.0.9.2.20031015183031.03c57ce8@216.82.101.6> There are single-Xeon boards using the Serverworks GC series of chipsets with 64-bit PCI, but they're just as expensive as a budget dual Xeon board (Tyan S2723 or Supermicro X5DPA-GG)... In the $280 to $310 per board price range. Seems rather silly, as the "Prestonia" Socket-604 Xeon CPUs are nothing but a P4 repackaged. There's also this board: http://www.tyan.com/products/html/trinitygcsl.html Which uses a single P4 @ 533MHz FSB, with the same Serverworks chipset. Supermicro X5-SS* series (scroll down): http://www.supermicro.com/Product_page/product-mS.htm >Currently P4 motherboards are only available (AFAIK) with 33MHz/32bit >PCI. That can be a big bottleneck if your cluster application is >sensitive to I/O bandwidth. Early in 2004, if the rumours are true, >there will be a P4 chipset supporting 66MHz/64bit PCI-X. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Wed Oct 15 22:15:56 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Thu, 16 Oct 2003 10:15:56 +0800 (CST) Subject: Condor Problem In-Reply-To: Message-ID: <20031016021556.52225.qmail@web16812.mail.tpe.yahoo.com> Unluckily, SGE has very limited Windows support. PBSPro, which supports MS-Windows (the free versions do not), does offer free licenses to .edu sites. BTW, may be there are more people with condor knowledge from the condor mailing list can answer your questions. http://www.cs.wisc.edu/~lists/archive/condor-users/ Andrew. --- Chris Miles ????> Hi, thanks for the reply > > Using all this instead of condor/globus? > > The only thing was I need to do this on windows. > > What i want to do is setup a Grid but also need a > cluster to run > jobs on > > Chris > > >From: Andrew Wang > >To: Chris Miles > >CC: beowulf at beowulf.org > >Subject: Re: Condor Problem > >Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST) > >MIME-Version: 1.0 > >Received: from mc11-f10.hotmail.com > ([65.54.167.17]) by > >mc11-s20.hotmail.com with Microsoft > SMTPSVC(5.0.2195.5600); Wed, 15 Oct > >2003 18:13:50 -0700 > >Received: from web16812.mail.tpe.yahoo.com > ([202.1.236.152]) by > >mc11-f10.hotmail.com with Microsoft > SMTPSVC(5.0.2195.5600); Wed, 15 Oct > >2003 18:11:09 -0700 > >Received: from [65.49.83.96] by > web16812.mail.tpe.yahoo.com via HTTP; Thu, > >16 Oct 2003 09:11:03 CST > >X-Message-Info: JGTYoYF78jHqyjkG27RbQOhxNCLEO1Jq > >Message-ID: > <20031016011103.41833.qmail at web16812.mail.tpe.yahoo.com> > >Return-Path: andrewxwang at yahoo.com.tw > >X-OriginalArrivalTime: 16 Oct 2003 01:11:09.0941 > (UTC) > >FILETIME=[62388A50:01C39382] > > > >If all you need is a batch system, I would suggest > SGE > >and Scalable PBS, which have more users and better > >support. > > > >Both of them are free and opensource, so you can > try > >both and see which one you like better! > > > >SGE: http://gridengine.sunsource.net > >SPBS: http://www.supercluster.org/projects/pbs/ > > > >Andrew. > > > >----------------------------------------------------------------- > >??? Yahoo!?? > >?????????????????????? > >http://tw.promo.yahoo.com/mail_premium/stationery.html > > _________________________________________________________________ > Stay in touch with absent friends - get MSN > Messenger > http://www.msn.co.uk/messenger > ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From graham.mullier at syngenta.com Thu Oct 16 04:47:12 2003 From: graham.mullier at syngenta.com (graham.mullier at syngenta.com) Date: Thu, 16 Oct 2003 09:47:12 +0100 Subject: XML for formatting (Re: Environment monitoring) Message-ID: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> [Hmm, and will the rants be longer or shorter after he's bought the mental lubricant?] I'm in support of the original rant, however, having had to reverse-engineer several data formats in the past. Most recently a set of molecular-orbital output data. Very frustrating trying to count through data fields and convince myself that we have mapped it correctly. Anecdote from a different field (weather models) that's related - for a while, a weather model used calibration data a bit wrong - sea temperature and sea surface wind speed were swapped. All because someone had to look at a data dump and guess which column was which. So, sure, XML is very wordy, but the time saving (when trying to decipher the data) and potential for avoiding big mistakes more than makes up for it (IMO). Graham Graham Mullier Chemoinformatics Team Leader, Chemistry Design Group, Syngenta, Bracknell, RG42 6EY, UK. direct line: +44 (0) 1344 414163 mailto:Graham.Mullier at syngenta.com -----Original Message----- From: Greg Lindahl [mailto:lindahl at keyresearch.com] Sent: 15 October 2003 19:16 Cc: beowulf at beowulf.org Subject: Re: XML for formatting (Re: Environment monitoring) On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote: > So the alternative is to see if I can extort > ten cents from everybody on the list NOT to write 20K rants like this. Do you accept pay-pal? Do you promise to spend all the money buying yourself beer? -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Thu Oct 16 08:12:36 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Thu, 16 Oct 2003 14:12:36 +0200 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: <20031014211426.GI8116@maybe.org> Message-ID: <20031016121236.GE8711@unthought.net> On Tue, Oct 14, 2003 at 08:45:12PM -0400, Robert G. Brown wrote: ... > rgb at lilith|T:105>cat /proc/1214/stat > 1214 (pine) S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0 > 14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168 > 4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0 While this has nothing to do with your (fine as always ;) rant, I just need to add a comment (which has everything to do with /proc stupidities): > (which, as you can see, contains the information on the pine application > within which I am currently working on my laptop). > > What? You find that hard to read? Imagine I had a process with the (admittedly unlikely but entirely possible) name 'pine) S 1205 (' Your stat output would read: 1214 (pine) S 1205 () S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0 14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168 4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0 Parsing the ASCII-art in /proc/mdstat is at least as fun ;) -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 16 08:08:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 16 Oct 2003 08:08:12 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031015181606.GA1574@greglaptop.internal.keyresearch.com> Message-ID: On Wed, 15 Oct 2003, Greg Lindahl wrote: > On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote: > > > So the alternative is to see if I can extort > > ten cents from everybody on the list NOT to write 20K rants like this. > > Do you accept pay-pal? Do you promise to spend all the money buying > yourself beer? I do accept pay-pal, by strange chance and will cheerfully delete one word out of a 20Kword base for every dime received (and to make it clear to the list that I've done so, naturally I'll post the diff with the original as well as the modified rant:-). I can't promise to spend ALL of the money buying beer, because my liver is old and has already tolerated much abuse over many years and I want it to last a few more decades, but I'll certainly lift a glass t'alla yer health from time to time...:-) On the other hand, given my experiences with people sending me free money via pay-pal up to this point, it would probably be safe to promise to spend it "all" on beer. Even my aged liver can tolerate beer by the thimbleful...if I didn't end up a de facto teetotaller.;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Oct 16 12:02:18 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 16 Oct 2003 12:02:18 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> Message-ID: On Thu, 16 Oct 2003 graham.mullier at syngenta.com wrote: > [Hmm, and will the rants be longer or shorter after he's bought the mental > lubricant?] Buy the right amount and they will be eloquent enough that you won't mind, or too much and they will be short and slurred ;-o > I'm in support of the original rant, however, having had to reverse-engineer > several data formats in the past. Most recently a set of molecular-orbital > output data. Very frustrating trying to count through data fields and > convince myself that we have mapped it correctly. What you want is not XML, but a data format description language. When I first read about XML, that what I believed it was. I was expecting that file optionally described the data format as a prologue, and then had a sequence of efficently packed data structures. But the XML designers created the evil twin of that idea. The header is a schema of parser rules, and each data element had verbose syntax that conveyes little semantic information. A XML file - is difficult for humans to read, yet is even larger than human-oriented output - requires both syntax and rule checking after human editing, yet is complex for machines to parse. - is intended for large data sets, where the negative impacts are multiplied - encourages "cdata" shortcuts that bypass the few supposed advantages. > Anecdote from a different field (weather models) that's related - for a > while, a weather model used calibration data a bit wrong - sea temperature > and sea surface wind speed were swapped. All because someone had to look at > a data dump and guess which column was which. Versus looking at an XML output and guessing what "load_one" means? I see very little difference: repeating a low-content label once for each data element doesn't convey more information. The only XML adds here is avoiding miscounting fields for undocumented data structures. What we really want in both the weather code case and when reporting cluster statistics is a data format description language. That description includes the format of the packed fields, and should include what the fields mean and their units, which is what we are missing in both cases. With such an approach we can efficiently assemble, transmit and deconstruct packed data while having automatic tools to check its validity. And an general-purpose tools can even combine a descrition and compact data set to product XML. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rodmur at maybe.org Thu Oct 16 16:12:13 2003 From: rodmur at maybe.org (Dale Harris) Date: Thu, 16 Oct 2003 13:12:13 -0700 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> Message-ID: <20031016201213.GV8116@maybe.org> On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated: > > What you want is not XML, but a data format description language. > I think the S-expression guys would say that they have one. And it is supermon uses, FWIW. http://sexpr.sourceforge.net/ http://supermon.sourceforge.net/ (supermon pages are currently unavailable.) Dale _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Thu Oct 16 16:52:03 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Thu, 16 Oct 2003 16:52:03 -0400 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031016201213.GV8116@maybe.org> References: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> <20031016201213.GV8116@maybe.org> Message-ID: <1066337523.11093.20.camel@roughneck.liniac.upenn.edu> On Thu, 2003-10-16 at 16:12, Dale Harris wrote: > On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated: > > > > What you want is not XML, but a data format description language. > > > > > I think the S-expression guys would say that they have one. And it is > supermon uses, FWIW. > > > http://sexpr.sourceforge.net/ > > http://supermon.sourceforge.net/ We use supermon as the data gathering mechanism for Clubmask, and I really like it. You can mask to get just certain values, and it is _really_ fast. Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Tue Oct 14 23:31:55 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: 14 Oct 2003 22:31:55 -0500 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: Message-ID: <1066188715.3200.120.camel@terra> On Tue, 2003-10-14 at 19:45, Robert G. Brown wrote: ... > > > ... > > > > rgb As someone who has done programming environment tools most of his reasonably long professional life, I must say you have hit the nail on the head. I have rooted through more than my share of shitty binary formats in my day, and I can honestly say that I go home happier as a result of dealing with an XML trace file in my current project. I was happily working away dealing with only XML, but then it happened. The demons of my past rose their ugly heads when I decided that it would be a good thing to get some ELF information outta some files. Being the industrious guy I am, I went and got ELF docs from Dave Anderson's stash. Did that help? Nope, not really, as it was mangled 64-bit focused ELF. Was it documented? Nope, not really. You could look at the elfdump code to see what that does, so in a backwards way, it was documented. The alternative was to ferret out the format by bugging enough compiler geeks until they gave up the secret handshake. The alternative that I eventually took was to go lay down until the desire to have the ELF information went away. ;-) -- -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Thu Oct 16 17:36:25 2003 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Thu, 16 Oct 2003 16:36:25 -0500 Subject: OT: same commands to multiple servers? Message-ID: <20031016163625.C11181@mikee.ath.cx> I now have control over many AIX servers and I know there are some programs that allow you (once configured) to send the same command to multiple nodes/servers, but do these commands exist within the AIX environment? Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bryce at jfet.net Thu Oct 16 16:15:08 2003 From: bryce at jfet.net (Bryce Bockman) Date: Thu, 16 Oct 2003 16:15:08 -0400 (EDT) Subject: A Petaflop machine in 20 racks? Message-ID: Hi all, Check out this article over at wired: http://www.wired.com/news/technology/0,1282,60791,00.html It makes all sorts of wild claims, but what do you guys think? Obviously, there's memory bandwidth limitations due to PCI. Does anyone know anything else about these guys? Cheers, Bryce _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Thu Oct 16 17:54:31 2003 From: becker at scyld.com (Donald Becker) Date: Thu, 16 Oct 2003 17:54:31 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <20031016201213.GV8116@maybe.org> Message-ID: On Thu, 16 Oct 2003, Dale Harris wrote: > On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated: > > > > What you want is not XML, but a data format description language. > > I think the S-expression guys would say that they have one. And it is > supermon uses, FWIW. No, S-expressions are an ancient concept, developed back in the early days of computing. They were needed in Lisp to linearize tree structures so that they could be saved to, uhmm, paper tape or clay tablets. Sexprs are oriented toward "structured" data. In this context "structured" means "Lisp-like linked lists" rather than "a series of 'C' structs". More directly related concepts are XDR, part of SunRPC MPI packed data Object brokers all of which are trying to solve similar problem. But, except for a few of the "object broker" systems, they don't have the metadata language to translate between domains. For instance, you can't take MPI packed data and automatically convert it to (useful) XML, pass it to an object broker system, or call a non-MPI remote procedure -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Thu Oct 16 19:21:58 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Thu, 16 Oct 2003 16:21:58 -0700 Subject: OT: same commands to multiple servers? In-Reply-To: <20031016163625.C11181@mikee.ath.cx> References: <20031016163625.C11181@mikee.ath.cx> Message-ID: <3F8F2816.9030606@cert.ucr.edu> Mike Eggleston wrote: >I now have control over many AIX servers and I know there >are some programs that allow you (once configured) to send >the same command to multiple nodes/servers, but do these >commands exist within the AIX environment? > No idea if it would work on AIX, but you could try out pconsole: http://www.heiho.net/pconsole/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Mark at MarkAndrewSmith.co.uk Thu Oct 16 19:36:23 2003 From: Mark at MarkAndrewSmith.co.uk (Mark Andrew Smith) Date: Fri, 17 Oct 2003 00:36:23 +0100 Subject: A Petaflop machine in 20 racks? In-Reply-To: Message-ID: Comment: As each generation of this chip gets more powerful, in an exponential way, then clusters of these chips could be used to break encryption algorithms via brute force approaches. If this became anywhere near an outside chance of a possibility of succeeding, or even threat of, then I would expect Governments to carefully consider export requirements and restrictions, or even in the extreme, classify it as a military armament similar to early RSA 128bit software encryption ciphers. However it could be the dawn of a new architecture for us all..... Kindest regards, Mark Andrew Smith Tel: (01942)722518 Mob: (07866)070122 http://www.MarkAndrewSmith.co.uk/ -----Original Message----- From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf Of Bryce Bockman Sent: 16 October 2003 21:15 To: beowulf at beowulf.org Subject: A Petaflop machine in 20 racks? Hi all, Check out this article over at wired: http://www.wired.com/news/technology/0,1282,60791,00.html It makes all sorts of wild claims, but what do you guys think? Obviously, there's memory bandwidth limitations due to PCI. Does anyone know anything else about these guys? Cheers, Bryce _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf This email has been scanned for viruses by NetBenefit using Sophos anti-virus technology --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.525 / Virus Database: 322 - Release Date: 09/10/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.525 / Virus Database: 322 - Release Date: 09/10/2003 This email has been scanned for viruses by NetBenefit using Sophos anti-virus technology _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Thu Oct 16 19:46:19 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Thu, 16 Oct 2003 16:46:19 -0700 Subject: A Petaflop machine in 20 racks? References: Message-ID: <000f01c3943f$b40ac100$32a8a8c0@laptop152422> Browsing through ClearSpeed's fairly "content thin" website, one turns up the following: http://www.clearspeed.com/downloads/overview_cs301.pdf The CS302 has an array of 64 processors and 256Kbytes of memory in the array + 128 Kbytes SRAM on chip. That's 4 Kbytes/processor (much like a cache).. It doesn't say how many bits wide each processor is, though.. 51.2 Gbyte/sec bandwidth is quoted.. that's 800 Mbyte/sec per processor, which is a reasonable sort of rate. 10 microsecond 1K complex FFTs are reasonably fast, but without knowing how many bits, it's hard to say whether it's outstanding. It also doesn't say whether the architecture is, for instance, SIMD. It could well be a systolic array, which would be very well suited to cranking out FFTs or other similar things, but probably not so hot for general purpose crunching. For all their vaunted patent and IP portfolio, they have only one patent listed in the USPTO database under their own name, and that's some sort of DRAM. ----- Original Message ----- From: "Bryce Bockman" To: Sent: Thursday, October 16, 2003 1:15 PM Subject: A Petaflop machine in 20 racks? > Hi all, > > Check out this article over at wired: > > http://www.wired.com/news/technology/0,1282,60791,00.html > > It makes all sorts of wild claims, but what do you guys think? > Obviously, there's memory bandwidth limitations due to PCI. Does anyone > know anything else about these guys? > > Cheers, > Bryce > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Thu Oct 16 21:23:57 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Thu, 16 Oct 2003 21:23:57 -0400 (EDT) Subject: A Petaflop machine in 20 racks? In-Reply-To: Message-ID: Looking at the standard "we have the solution to everyones computing needs press release" a few things are clear: "... multi-threaded array processor ..." which is further verified later in the press release: "... where the CS301 is acting as a co-processor, dynamic libraries offload an application's inner loops to the CS301. Although these inner loops only make up a small portion of the source code, these loops are responsible for the vast majority of the application's running time. By offloading the inner loops, the CS301 can bypass the traditional bottleneck caused by a CPU's limited mathematical capability..." It seems to be a low power array processor which may be of some real value to some people. The real issue is can they keep pace in terms of cost and performance with the commodity CPU market. And what about code portability. Quite a few people have spent quite a lot of time porting and tweaking codes for architectures that seemed to have a rather short lived history. Of course, there is no hardware yet. Doug On Thu, 16 Oct 2003, Bryce Bockman wrote: > Hi all, > > Check out this article over at wired: > > http://www.wired.com/news/technology/0,1282,60791,00.html > > It makes all sorts of wild claims, but what do you guys think? > Obviously, there's memory bandwidth limitations due to PCI. Does anyone > know anything else about these guys? > > Cheers, > Bryce > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Fri Oct 17 03:48:17 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Fri, 17 Oct 2003 09:48:17 +0200 Subject: A Petaflop machine in 20 racks? In-Reply-To: <000f01c3943f$b40ac100$32a8a8c0@laptop152422> References: <000f01c3943f$b40ac100$32a8a8c0@laptop152422> Message-ID: <200310170948.17224.joachim@ccrl-nece.de> Jim Lux: > It also doesn't say whether the architecture is, for instance, SIMD. It > could well be a systolic array, which would be very well suited to cranking > out FFTs or other similar things, but probably not so hot for general > purpose crunching. Exactly. Such coprocessor-boards (typically DSP-based, which also achieve some GFlop/s) already exist for a long time, but obviously are not suited to change "the way we see computing" (place your marketing slogan here). One reason is the lack of portability for code making use of such hardware, but I think if the performance for a wider range of applications would effectively come anywhere close to the peak performance, this problem would be overcome by the premise of getting teraflop-performance for some 10k of $. Thus, the problem probably is that typical applications do not achieve the promised performance. All memory-bound applications will get stuck on the PCI-bus, by both, memory access latency and bandwidth. High sustained performance for real problems can, in the general case, only be achieved in a balanced system. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joachim at ccrl-nece.de Fri Oct 17 04:23:46 2003 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Fri, 17 Oct 2003 10:23:46 +0200 Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: References: Message-ID: <200310171023.46865.joachim@ccrl-nece.de> Donald Becker: > More directly related concepts are > XDR, part of SunRPC > MPI packed data Hmm, as you note below, they both do not describe the data they handle, just transform in into a uniform representation. > Object brokers > all of which are trying to solve similar problem. But, except for a few > of the "object broker" systems, they don't have the metadata language to > translate between domains. For instance, you can't take MPI packed data > and > automatically convert it to (useful) XML, > pass it to an object broker system, or > call a non-MPI remote procedure You might want to check HDF5, or for a simpler yet widely used approach, NetCDF. They are self-describing file formats. But as you can send everything via the net the same way you access it in a file, this should be useful. Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cap at nsc.liu.se Fri Oct 17 04:40:56 2003 From: cap at nsc.liu.se (Peter Kjellstroem) Date: Fri, 17 Oct 2003 10:40:56 +0200 (CEST) Subject: OT: same commands to multiple servers? In-Reply-To: <20031016163625.C11181@mikee.ath.cx> Message-ID: There is something called dsh (distributed shell) part of some IBM package. The guys at llnl has done further work in this direction with pdsh which I belive runs fine on AIX. pdsh can be found at: http://www.llnl.gov/linux/pdsh/ /Peter On Thu, 16 Oct 2003, Mike Eggleston wrote: > I now have control over many AIX servers and I know there > are some programs that allow you (once configured) to send > the same command to multiple nodes/servers, but do these > commands exist within the AIX environment? > > Mike -- ------------------------------------------------------------ Peter Kjellstroem | National Supercomputer Centre | Sweden | http://www.nsc.liu.se _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scheinin at crs4.it Fri Oct 17 04:35:48 2003 From: scheinin at crs4.it (Alan Scheinine) Date: Fri, 17 Oct 2003 10:35:48 +0200 Subject: A Petaflop machine in 20 racks? Message-ID: <200310170835.h9H8ZmY02530@dali.crs4.it> I have not read carefully descriptions of the Opteron architecture until a few minutes ago. I was not able to find a picture of the layout in silicon at the AMD site, I found a picture at Tom's Hardware. http://www.tomshardware.com/cpu/20030422/opteron-04.html The page before shows that 50 percent of the silicon is cache. Of what is not cache, it seems that the floating point unit occupies about 1/6 or 1/7th of the area, moreover, the authors Frank Voelkel, Thomas Pabst, Bert Toepelt, and Mirko Doelle describe the Opteron as having three floating point units, FADD, FMUL and FMISC. Just counting FADD and FMUL and considering the entire area of the Opteron, using 2 GHz for the frequency, that would be about 12 FP units times 2 GHz, 24 GFLOPS. So it is doable. I do not know the depth of the pipeline, but it is likely it is deep. How do you keep the pipeline full? PCI is around 0.032 Giga floating point words per second? The entire memory subsystem needs to be changed drastically. Moreover, whereas integer units might be used to solve problems that are logically complex, floating point problems are typically ones that use a large amount of data, more than what can fit into cache. But you-all knew that already, Alan Scheinin _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 17 09:43:07 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 17 Oct 2003 09:43:07 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: On Thu, 16 Oct 2003, Donald Becker wrote: > translate between domains. For instance, you can't take MPI packed data > and > automatically convert it to (useful) XML, > pass it to an object broker system, or > call a non-MPI remote procedure Yes indeedy. And since XML is at heart linked lists (trees) of structs as well, you still can't get around the difficulty of mapping a previously unseen data file containing XMLish into a set of efficiently accessible structs. Which is doable, but is a royal PITA and requires that you maintain DISTINCT (and probably non-portable) images/descriptions of the data structures and then write all this glue code to import and export. So yeah, I have fantasies of ways of encapsulating C header files and a data dictionary in an XMLified datafile and a toolset that at the very least made it "easy" to relink a piece of C code to read in the datafile and just put the data into the associated structs where I could subsequently use them EFFICIENTLY by local or global name. I haven't managed to make this really portable even in my own code, though -- it isn't an easy problem (so difficult that ad hoc workarounds seem the simpler route to take). This really needs a committee or something and a few zillion NSF dollars to resolve, because it is a fairly serious and widespread problem. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 17 09:29:47 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 17 Oct 2003 09:29:47 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: <1066188715.3200.120.camel@terra> Message-ID: On 14 Oct 2003, Dean Johnson wrote: > As someone who has done programming environment tools most of his > reasonably long professional life, I must say you have hit the nail on > the head. I have rooted through more than my share of shitty binary > formats in my day, and I can honestly say that I go home happier as a > result of dealing with an XML trace file in my current project. I was > happily working away dealing with only XML, but then it happened. The > demons of my past rose their ugly heads when I decided that it would be > a good thing to get some ELF information outta some files. Being the > industrious guy I am, I went and got ELF docs from Dave Anderson's > stash. Did that help? Nope, not really, as it was mangled 64-bit focused > ELF. Was it documented? Nope, not really. You could look at the elfdump > code to see what that does, so in a backwards way, it was documented. > The alternative was to ferret out the format by bugging enough compiler > geeks until they gave up the secret handshake. The alternative that I > eventually took was to go lay down until the desire to have the ELF > information went away. ;-) And yet Don's points are also very good ones, although I think that is at least partly a matter of designer style. XML isn't, after all, a markup language -- it is a markup language specification. As an interface designer, you can implement tags that are reasonably human readable and well-separated in function or not. His observation that what one would REALLY like is a self-documenting interface, or an interface with its data dictionary included as a header, is very apropos. I also >>think<< that he is correct (if I understood his final point correctly) in saying that someone could sit down and write an XML-compliant "DML" (data markup language) with straightforward and consistent rules for encapsulating data streams. Since those rules would presumably be laid down in the design phase, and since a wise implementation of them would release a link-level library with prebuilt functions for creating a new data file and its embedded data dictionary, writing data out to the file, opening the file, and reading/parsing data in from the file, it would actually reduce the amount of wheel reinventing (and very tedious coding!) that has to be done now while creating/enforcing a fairly rigorous structural organization on the data itself. One has to be very careful not to assume that XML will necessarily make a data file tremendously longer than it likely is now. For short files nobody (for the most part) cares, where by short I mean short enough that file latencies dominate the read time -- using long very descriptive tags is easy in configuration files. For longer data files (which humans cannot in general "read" anyway unless they have a year or so to spare) there is nothing to prevent XMLish of the following sort of very general structure: This is part of the production data of Joe's Orchards. Eat Fruit from Joe's! apples%-10.6fbushels | oranges%-12.5ecrates | price%-10.2fdollars 13.400000 |77.00000e+2 |450.00 589.200000 |102.00000e+8|6667.00 ... The stuff between the tags could even be binary. Note that the data itself isn't individually wrapped and tagged, so this might be a form of XML heresy, but who cares? For a configuration file or a small/short data file containing numbers that humans might want to browse/read without an intermediary software layer, I would say this is a bad thing, but for a 100 MB data file (a few million lines of data) the overhead introduced by adding the XML encapsulation and dictionary is utterly ignorable and the mindless repetition of tags in the datastream itself pointless. Note well that this encapsulation is STILL nearly perfectly human readable, STILL easily machine parseable, and will still be both in twenty years after Joe's Orchard has been cut down and turned into firewood (or would be, if Joe had bothered to tell us a bit more about the database in question in the description). The data can even be "validated", if the associated library has appropriate functions for doing so (which are more or less the data reading functions anyway, with error management). I should note that the philosophy above might be closer to that of e.g. TeX/LaTeX than XML/SGML/MML (as discussed below). I've already done stuff somewhat LIKE this (without the formal data dictionary, because I haven't taken the time to write a general purpose tool for my own specific applications, which is likely a mistake in the long run but in the long run, after all, I'll be dead:-) in wulfstat. The .wulfhosts xml permits a cluster to be entered "all at once" using a format like: g%02d 1 15 which is used to generate the hostname strings required to open connections to hosts e.g. g01, g02, ... g15. Obviously the same trick could be used to feed scanf, or to feed a regex parser. The biggest problem I have with XML as a data description/configuration file base isn't really details like these, as I think they are all design decisions and can be done poorly or done well. It is that on the parsing end, libxml2 DOES all of the above, more or less. It generates on the fly a linked list that mirrors the XML source, and then provides tools and a consistent framework of rules for walking the list to find your data. How else could it do it? The one parser has to read arbitrary markup, and it cannot know what the markup is until opens the file, and it opens/reads the file in one pass, so all it can do is mosey along and generate recursive structs and link them. However, that is NOT how one wants to access the data in code that wants to be efficient. Walking a llist to find a float data entry that has a tag name that matches "a" and an index attribute that matches "32912" is VERY costly compared to accessing a[32912]. At this point, the only solution I've found is to know what the data encapsulation is (easy, since I created it:-), create my own variables and structs to hold it for actual reference in code, open and read in the xml data, and then walk the list with e.g. xpath and extract the data from the list and repack it into my variables and structs. This latter step really sucks. It is very, very tedious (although perfectly straightforward to write the parsing/repacking code (so much so that the libxml guy "apologizes" for the tedium of the parsing code in the xml.org documentation:-). It is this latter step that could be really streamlined by the use of an xmlified data dictionary or even (in the extreme C case) encapsulating the actual header file with the associated variable struct definitions. It is interesting and amusing to compare two different approaches to the same problem in applications where the issue really is "markup" in a sense. I write lots of things using latex, because with latex one can write equations in a straightforward ascii encoding like $1 = \sin^2(\theta) + \cos^2(\theta)$. This input is taken out of an ascii stream by the tex parser, tokenized and translated into characters, and converted into an actual equation layout according to the prescriptions in a (the latex) style file plus any layered modifications I might impose on top of it. [Purists could argue about whether or not latex is a true markup language -- tex/latex are TYPESETTING languages and not really intended to support other functions (such as translating this equation into an internal algebraic form in a computer algebra program such as macsyma or maple). However, even though it probably isn't, because every ENTITY represented in the equation string isn't individually tagged wrt function, it certainly functions like markup at a high level with entries entered inside functional delimiters and presented in a way/style that is associated with the delimiters "independent" of the delimiters themselves.] If one compares this to the same equation wrapped in MML (math markup language, which I don't know well enough to be able to reproduce here) it would likely occupy twenty or thirty lines of markup and be utterly unreadable by humans. At least "normal" humans. Machines, however, just love it, as one can write a parser that can BOTH display the equation AND can create the internal objects that permit its manipulation algebraically and/or numerically. This would be difficult to do with the latex, because who knows what all these components are? Is \theta a constant, a label, a variable? Are \sin and \cos variables, functions, or is \s the variable and in a string (do I mean s*i*n*(theta) where all the objects are variables)? The equation that is HUMAN readable and TYPESETTABLE without ambiguity with a style file and low level definition that recognizes these elements as non-functional symbols of certain size and shape to be assembled according to the following rules is far from adequately described for doing math with it. For all that, one could easily write an XML compliant LML -- "latex markup language" -- a perfectly straightforward translation of the fundamental latex structures into XML form. Some of these could be utterly simple (aside for dealing with special character issues: {\em emphasized text} -> emphasized text \begin{equation}a = b+c\end{equation} -> a = b+c linuxdoc is very nearly this translation, actually, except that it doesn't know how to handle equation content AFAIK. This sort of encapsulation is highly efficient for document creation/typesetting within a specific domain, but less general purpose. The point is .... [the following text that isn't there was omitted in the fond hope that my paypal account will swell, following which I will make a trip to a purveyor of fine beverages.] rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jac67 at georgetown.edu Fri Oct 17 09:41:19 2003 From: jac67 at georgetown.edu (Jess Cannata) Date: Fri, 17 Oct 2003 09:41:19 -0400 Subject: OT: same commands to multiple servers? In-Reply-To: <20031016163625.C11181@mikee.ath.cx> References: <20031016163625.C11181@mikee.ath.cx> Message-ID: <3F8FF17F.3030008@georgetown.edu> Mike Eggleston wrote: >I now have control over many AIX servers and I know there >are some programs that allow you (once configured) to send >the same command to multiple nodes/servers, but do these >commands exist within the AIX environment? > > I'm not sure it will run on AIX, but we use C3 from Oak Ridge National Laboratory on all of our Linux Beowulf clusters, and I really like it. You might want to take a look at it: http://www.csm.ornl.gov/torc/C3/index.html -- Jess Cannata Advanced Research Computing Georgetown University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From czarek at sun1.chem.univ.gda.pl Thu Oct 16 18:49:27 2003 From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski) Date: Fri, 17 Oct 2003 00:49:27 +0200 (CEST) Subject: Pentium4 vs Xeon In-Reply-To: Message-ID: On Tue, 14 Oct 2003, Don Holmgren wrote: > Pricewise (YMMV), cheap desktop P4's can be had very roughly for half > the price of a comparable dual Xeon. This is true if you look at pricewatch, but the quotes I received shown that good P4's is less than half of the price (in my case around 36%) of a comparable dual Xeon. I am talking about comparison of the price of Asus PC-DL Dual Xeon 2.8 GHz 512K 533 FSB with 3 GB DDR333 and two 36GB SATA 10K RPM hardrives against Asus P4P800-VM P4 2.8 GHz 800 FSB with 1.5 GB DDR 400 and one 36GB SATA 10K RPM hardrive. Xeons machines are not very popular and it is hard to get a good price for them at your local shop (in my case Ithaca US, in Poland difference would be even bigger). I am benchmarking this P4 2.8 GHz against dual Opteron 1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+). If you are interested in some numbers I can send benchmarks of Gaussian 03, Gamess, and our own F77 code. czarek ---------------------------------------------------------------------- Dr. Cezary Czaplewski Department of Chemistry Box 431 Baker Lab of Chemistry University of Gdansk Cornell University Sobieskiego 18, 80-952 Gdansk, Poland Ithaca, NY 14853 phone: +48 58 3450-430 phone: (607) 255-0556 fax: +48 58 341-0357 fax: (607) 255-4137 e-mail: czarek at chem.univ.gda.pl e-mail: cc178 at cornell.edu ---------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bropers at lsu.edu Fri Oct 17 03:20:54 2003 From: bropers at lsu.edu (Brian D. Ropers-Huilman) Date: Fri, 17 Oct 2003 20:20:54 +1300 Subject: OT: same commands to multiple servers? In-Reply-To: <20031016163625.C11181@mikee.ath.cx> References: <20031016163625.C11181@mikee.ath.cx> Message-ID: <3F8F9856.60602@lsu.edu> I have administered over 100 AIX boxes for a living for over 5 years now. The tool of choice for me is dsh, which ships as part of the PSSP LPP, a canned implementation of Kerberos 4. We simply install the ssp.clients fileset on each node and use our control workstation as the Kerberos realm master. We add the external nodes by hand. I know that dsh is open sourced now and available at: http://dsh.sourceforge.net/ There are several other cheap (as in Libris) solutions as well: 1) Use rsh (with TCPwrappers) 2) Use ssh with a password-less key 3) Write your own code around either of the above 4) Implement Kerberos, either as an LPP from IBM, or get the source and compile yourself I think you'll find dsh a good starting point though. Mike Eggleston wrote: > I now have control over many AIX servers and I know there > are some programs that allow you (once configured) to send > the same command to multiple nodes/servers, but do these > commands exist within the AIX environment? > > Mike > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Brian D. Ropers-Huilman (225) 578-0461 (V) Systems Administrator AIX (225) 578-6400 (F) Office of Computing Services GNU Linux brian at ropers-huilman.net High Performance Computing .^. http://www.ropers-huilman.net/ Fred Frey Building, Rm. 201, E-1Q /V\ \o/ Louisiana State University (/ \) -- __o / | Baton Rouge, LA 70803-1900 ( ) --- `\<, / `\\, ^^-^^ O/ O / O/ O _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Daniel.Kidger at quadrics.com Fri Oct 17 10:07:07 2003 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Fri, 17 Oct 2003 15:07:07 +0100 Subject: OT: same commands to multiple servers? Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE1FD@stegosaurus.bristol.quadrics.com> Consider also pdsh: http://www.llnl.gov/linux/pdsh/ It is an open source varient of IBM's dsh builds on Linux (IA32/IA64, etc.), AIX et al. Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- > -----Original Message----- > From: Jess Cannata [mailto:jac67 at georgetown.edu] > Sent: 17 October 2003 14:41 > To: Mike Eggleston > Cc: beowulf at beowulf.org > Subject: Re: OT: same commands to multiple servers? > > > Mike Eggleston wrote: > > >I now have control over many AIX servers and I know there > >are some programs that allow you (once configured) to send > >the same command to multiple nodes/servers, but do these > >commands exist within the AIX environment? > > > > > > I'm not sure it will run on AIX, but we use C3 from Oak Ridge > National > Laboratory on all of our Linux Beowulf clusters, and I really > like it. > You might want to take a look at it: > > http://www.csm.ornl.gov/torc/C3/index.html > > -- > Jess Cannata > Advanced Research Computing > Georgetown University > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eccf at super.unam.mx Fri Oct 17 12:42:15 2003 From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores) Date: Fri, 17 Oct 2003 10:42:15 -0600 (CST) Subject: RLX? In-Reply-To: <200310170846.h9H8kbA29081@NewBlue.scyld.com> Message-ID: Have you ever try or test RLX server for HPC? What is their performance? cafe _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Fri Oct 17 14:05:49 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Fri, 17 Oct 2003 11:05:49 -0700 Subject: POVray, beowulf, etc. Message-ID: <5.2.0.9.2.20031017105940.03129888@mailhost4.jpl.nasa.gov> I'm aware of some MPI-aware POVray stuff, but is there anything out there that can facilitate something where you want to render a sequence of frames (using, e.g., POVray), one frame to a processor, then gather the images back to a head node for display, in quasi-real time. For instance, say you had a image that takes 1 second to render, and you had 30 processors free to do the rendering. Assuming you set everything up ahead of time, it should be possible to set all the processors spinning, and feeding the rendered images back to a central point where they can be displayed as an animation at 30 fps (with a latency of 1 second) Obviously, the other approach is to have each processor render a part of the image, and assemble them all, but it seems that this might actually be slower overall, because you've got the image assembling time added. I'm looking for a way to do some real-time visualization of modeling results as opposed to a batch oriented "render farm", so it's the pipeline to gather the rendered images from the nodes to the display node that I'm interested in. I suppose one could write a little MPI program that gathers the images up as bitmaps and feeds them to a window, but, if someone has already solved this in a reasonably facile and elegant way, why not use it. James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From johnb at quadrics.com Fri Oct 17 12:01:12 2003 From: johnb at quadrics.com (John Brookes) Date: Fri, 17 Oct 2003 17:01:12 +0100 Subject: OT: same commands to multiple servers? Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E328@stegosaurus.bristol.quadrics.com> How are the startup times of IBM's dsh these days? I seem to remember that it was somewhat on the slow side on big machines. Many moons have passed since I was last on an AIX machine, though, so I assume the situation's improved drastically. Cheers, John Brookes Quadrics > -----Original Message----- > From: Brian D. Ropers-Huilman [mailto:bropers at lsu.edu] > Sent: 17 October 2003 08:21 > To: Mike Eggleston > Cc: beowulf at beowulf.org > Subject: Re: OT: same commands to multiple servers? > > > I have administered over 100 AIX boxes for a living for over > 5 years now. The > tool of choice for me is dsh, which ships as part of the PSSP > LPP, a canned > implementation of Kerberos 4. We simply install the > ssp.clients fileset on > each node and use our control workstation as the Kerberos > realm master. We add > the external nodes by hand. > > I know that dsh is open sourced now and available at: > > http://dsh.sourceforge.net/ > > There are several other cheap (as in Libris) solutions as well: > > 1) Use rsh (with TCPwrappers) > 2) Use ssh with a password-less key > 3) Write your own code around either of the above > 4) Implement Kerberos, either as an LPP from IBM, or get the > source and > compile yourself > > I think you'll find dsh a good starting point though. > > Mike Eggleston wrote: > > > I now have control over many AIX servers and I know there > > are some programs that allow you (once configured) to send > > the same command to multiple nodes/servers, but do these > > commands exist within the AIX environment? > > > > Mike > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- > Brian D. Ropers-Huilman (225) 578-0461 (V) > Systems Administrator AIX (225) 578-6400 (F) > Office of Computing Services GNU Linux > brian at ropers-huilman.net > High Performance Computing .^. http://www.ropers-huilman.net/ Fred Frey Building, Rm. 201, E-1Q /V\ \o/ Louisiana State University (/ \) -- __o / | Baton Rouge, LA 70803-1900 ( ) --- `\<, / `\\, ^^-^^ O/ O / O/ O _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Fri Oct 17 14:38:41 2003 From: becker at scyld.com (Donald Becker) Date: Fri, 17 Oct 2003 14:38:41 -0400 (EDT) Subject: RLX? In-Reply-To: Message-ID: On Fri, 17 Oct 2003, Eduardo Cesar Cabrera Flores wrote: > Have you ever try or test RLX server for HPC? Yes, we had access to their earliest machines and I was there at the NYC announcement. > What is their performance? It depends on the generation. The first generation was great at what it was designed to do: pump out data, such as static web pages, from memory to two 100Mbps Ethernet ports per blade. It used Transmeta chips, 2.5" laptop drives and fans only on the chassis to fit 24 blades in 3U. The blades didn't do well at computational tasks or disk I/O. A third Ethernet port on each blade was connected to an internal repeater. They could only PXE boot using that port, making a flow-controlled boot server important. The second generation switched to Intel ULV (Ultra Low Voltage) processors in the 1GHz range. This approximately doubled the speed over Transmeta chips, especially with floating point. But ULV CPUs are designed for laptops, and the interconnect was no faster. Thus this still was not a computational cluster box. The current generation blades are much faster, with full speed (and heat) CPUs and chipset, fast interconnect and good I/O potential. But lets look at the big picture for HPC cluster packaging: --> Beowulf clusters have crossed the density threshold <-- This happened about two years ago. At the start of the Beowulf project a legitimate problem with clusters was the low physical density. This didn't matter in some installations, as much larger machines were retired leaving plenty of empty space, but it was a large (pun intended) issue for general use. As we evolved to 1U rack-mount servers, the situation changed. Starting with the API CS-20, Beowulf cluster hardware met and even exceeded the compute/physical density of contemporary air-cooled Crays. Since standard 1U dual processor machines can now exceed the air cooled thermal density supported by an average room, selecting non-standard packaging (blades, back-to-back mounting, or vertical motherboard chassis) must be motivated by some other consideration that justifies the lock-in and higher cost. At least with blade servers there are a few opportunities: Low-latency backplane communication Easier connections to shared storage Hot-swap capability to add nodes or replace failed hardware -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Fri Oct 17 14:37:22 2003 From: angel at wolf.com (Angel Rivera) Date: Fri, 17 Oct 2003 18:37:22 GMT Subject: RLX? In-Reply-To: References: Message-ID: <20031017183722.754.qmail@houston.wolf.com> Eduardo Cesar Cabrera Flores writes: > > Have you ever try or test RLX server for HPC? > What is their performance? > We have not but will be getting a couple of bricks for testing soon. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From angel at wolf.com Fri Oct 17 14:37:22 2003 From: angel at wolf.com (Angel Rivera) Date: Fri, 17 Oct 2003 18:37:22 GMT Subject: RLX? In-Reply-To: References: Message-ID: <20031017183722.754.qmail@houston.wolf.com> Eduardo Cesar Cabrera Flores writes: > > Have you ever try or test RLX server for HPC? > What is their performance? > We have not but will be getting a couple of bricks for testing soon. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Oct 17 15:57:24 2003 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 17 Oct 2003 21:57:24 +0200 (CEST) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: I just saw YAML announced on www.ntk.net http://www.yaml.org YAML (rhymes with camel) is a sraightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl and Python. YAML is optimised for serialization , configuration settings, log files, Internet messaging ad filtering. There are YAML writers and parsers fo Perl, Python, Java, Ruby and C. Sounds like it might be good for the purposes we are discussing! BTW, has anyon experimented with Beep for messaging system status, environment variables, logging etc? http://www.beepcore.org _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From srihari at mpi-softtech.com Fri Oct 17 15:34:42 2003 From: srihari at mpi-softtech.com (Srihari Angaluri) Date: Fri, 17 Oct 2003 15:34:42 -0400 Subject: POVray, beowulf, etc. References: <5.2.0.9.2.20031017105940.03129888@mailhost4.jpl.nasa.gov> Message-ID: <3F904452.4090406@mpi-softtech.com> Jim, Not sure if you came across the parallel ray tracer application written using MPI. This does real-time rendering. http://jedi.ks.uiuc.edu/~johns/raytracer/ Jim Lux wrote: > I'm aware of some MPI-aware POVray stuff, but is there anything out > there that can facilitate something where you want to render a sequence > of frames (using, e.g., POVray), one frame to a processor, then gather > the images back to a head node for display, in quasi-real time. > > For instance, say you had a image that takes 1 second to render, and you > had 30 processors free to do the rendering. Assuming you set everything > up ahead of time, it should be possible to set all the processors > spinning, and feeding the rendered images back to a central point where > they can be displayed as an animation at 30 fps (with a latency of 1 > second) > > Obviously, the other approach is to have each processor render a part of > the image, and assemble them all, but it seems that this might actually > be slower overall, because you've got the image assembling time added. > > I'm looking for a way to do some real-time visualization of modeling > results as opposed to a batch oriented "render farm", so it's the > pipeline to gather the rendered images from the nodes to the display > node that I'm interested in. I suppose one could write a little MPI > program that gathers the images up as bitmaps and feeds them to a > window, but, if someone has already solved this in a reasonably facile > and elegant way, why not use it. > > > James Lux, P.E. > Spacecraft Telecommunications Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Oct 17 16:19:15 2003 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 17 Oct 2003 22:19:15 +0200 (CEST) Subject: Also on NTK Message-ID: Sorry if this is off topic too far. Also on NTK, an implementation of zeroconf for Linux, Windows, BSD http://www.swampwolf.com/products/howl/GettingStarted.html Anyone care to speculate on uses for zeroconf in big clusters? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathog at mendel.bio.caltech.edu Fri Oct 17 16:47:08 2003 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Fri, 17 Oct 2003 13:47:08 -0700 Subject: When is cooling air cool enough? Message-ID: Most computer rooms shuttle the air back and forth between the computers and the A/C. I'm wondering if one could not construct a less expensive facility (less power running the A/C which is rarely on, smaller A/C units) if the computer room was a lot more like a wind tunnel: ambient air in (after filtering out any dust or rain), pass it through the computers, and then blow it out the other side of the building. Note the room wouldn't be wide open like a normal computer room. Instead essentially each rack and other largish computer unit would sit in its own separate air flow, so that hot air from one wouldn't heat the next. The question is, how hot can the cooling air be and still keep the computers happy? The answer will determine how big an A/C unit is needed to handle cooling the intake air for those times when it exceeds this upper limit. I'm guessing that so long as a lot of air is moving through the computers most would be ok in a sustained 30C (86F) flow. Remember, this isn't 30C in dead air, it's 30C with high pressure on the intake side of the computer and low pressure on the outlet side, so that the generated heat is rapidly moved out of the computer and away. (But not so much flow as to blow cards out of their sockets!) Somewhere between 30C and 40C one might expect poorly ventilated CPUs and disks to begin to have problems. Above 40C seems a tad too warm. At that temperature it's going to be pretty uncomfortable for the operators too. Anybody have a good estimate for what this upper limit is. For instance, from a computer room with an A/C that failed slowly? There's clearly a lower temperature limit too. However on cold days opening a feedback duct from the outlet back into the intake should do the trick. In really cold climates the intake duct might be closed entirely - when it's 20 below outside. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Fri Oct 17 19:45:24 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Fri, 17 Oct 2003 16:45:24 -0700 Subject: When is cooling air cool enough? In-Reply-To: Message-ID: <5.2.0.9.2.20031017162321.031343c0@mailhost4.jpl.nasa.gov> For component life, colder is better (10 degrees is factor of 2 life/reliability), and the temperature rise inside the box is probably more than you think. You also have some more subtle tradeoffs to address. You don't need as much colder air as warmer air to remove some quantity of heat, and a significant energy cost is pushing the air around (especially since the work involved in running the fan winds up heating the air being moved). This is a fairly standard HVAC design problem. The additional cost to cool the room to, say, 15C instead of 20C is fairly low, if the room is insulated, and there's a lot of recirculation (which is typical for this kind of thing). It's not like you're cooling the room repeatedly after warming up. Once you've reached equilibrium, cooling the mass of equipment down, you're moving the same number of joules of heat either way and the refrigeration COP doesn't change much over that small a temperature range. The heat leakage through the walls is fairly small, compared to the heat dissipated in the equipment. If you were cooling something that doesn't generate heat itself (i.e. a wine cellar or freezer), then the temperature does affect the power consumed. This all said, I worked for a while on a fairly complex electronic system installed at a test facility on a ridge on the island of Kauai, and they had no airconditioning. They had big fans and thermostatically controlled louvers, and could show that statistically, the air temperature never went high enough to cause a problem. I seem to recall something like the calculations showed we'd have to shut down for environmental reasons no more than once every 5 years. Humidity is an issue also, though. At 01:47 PM 10/17/2003 -0700, David Mathog wrote: >Most computer rooms shuttle the air back and forth >between the computers and the A/C. I'm >wondering if one could not construct a less expensive >facility (less power running the A/C which is rarely >on, smaller A/C units) if the computer room was a >lot more like a wind tunnel: ambient air in (after >filtering out any dust or rain), >pass it through the computers, and then blow it out >the other side of the building. Note the room >wouldn't be wide open like a normal computer room. >Instead essentially each rack and other largish >computer unit would sit in its own separate air flow, >so that hot air from one wouldn't heat the next. > >The question is, how hot can the cooling air be and >still keep the computers happy? > >The answer will determine how big an A/C unit is >needed to handle cooling the intake air for those >times when it exceeds this upper limit. > >I'm guessing that so long as a lot of air is moving through >the computers most would be ok in a sustained 30C (86F) flow. >Remember, this isn't 30C in dead air, it's 30C with high >pressure on the intake side of the computer and low >pressure on the outlet side, so that the generated heat >is rapidly moved out of the computer and away. (But not >so much flow as to blow cards out of their sockets!) >Somewhere between 30C and 40C one might expect poorly >ventilated CPUs and disks to begin to have problems. Above >40C seems a tad too warm. At that temperature it's going >to be pretty uncomfortable for the operators too. > >Anybody have a good estimate for what this upper limit is. >For instance, from a computer room with an A/C that failed >slowly? > >There's clearly a lower temperature limit too. However on cold >days opening a feedback duct from the outlet back into the intake >should do the trick. In really cold climates the intake >duct might be closed entirely - when it's 20 below outside. > >Thanks, > >David Mathog >mathog at caltech.edu >Manager, Sequence Analysis Facility, Biology Division, Caltech >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Oct 17 21:41:49 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 17 Oct 2003 18:41:49 -0700 Subject: RLX? In-Reply-To: References: <200310170846.h9H8kbA29081@NewBlue.scyld.com> Message-ID: <20031018014149.GB3774@greglaptop.PEATEC.COM> On Fri, Oct 17, 2003 at 10:42:15AM -0600, Eduardo Cesar Cabrera Flores wrote: > Have you ever try or test RLX server for HPC? > What is their performance? .. what's their price/performance? That decides against them for most of us el-cheapo HPC customers. RLX has some nice features for enterprise computing that may justify a higher cost for enterprises, but... -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 17 21:11:39 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 17 Oct 2003 21:11:39 -0400 (EDT) Subject: XML for formatting (Re: Environment monitoring) In-Reply-To: Message-ID: On Fri, 17 Oct 2003, John Hearns wrote: > I just saw YAML announced on www.ntk.net > > http://www.yaml.org yaml.org doesn't resolve for me in nameservice (yet), but whoa, dude, rippin' ntk site. That's one very seriously geeked news site. rgb > YAML (rhymes with camel) is a sraightforward machine parsable > data serialization format designed for human readability and > interaction with scripting languages such as Perl and Python. > YAML is optimised for serialization , configuration settings, > log files, Internet messaging ad filtering. > > There are YAML writers and parsers fo Perl, Python, Java, Ruby and C. > > > Sounds like it might be good for the purposes we are discussing! > > > > BTW, has anyon experimented with Beep for messaging system status, > environment variables, logging etc? > http://www.beepcore.org > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Oct 17 21:39:57 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 17 Oct 2003 18:39:57 -0700 Subject: A Petaflop machine in 20 racks? In-Reply-To: References: Message-ID: <20031018013957.GA3774@greglaptop.PEATEC.COM> On Thu, Oct 16, 2003 at 04:15:08PM -0400, Bryce Bockman wrote: > Hi all, > > Check out this article over at wired: > > http://www.wired.com/news/technology/0,1282,60791,00.html > > It makes all sorts of wild claims, but what do you guys think? I think it's the Return of the Array Processor. There's very little new in computing these days -- and it has the usual flaws of APs: low bandwidth communication to the host. So if you have a problem that actually fits in the limited memory, and doesn't need to communicate with anyone else very often, it may be a win for you. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 17 21:21:42 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 17 Oct 2003 21:21:42 -0400 (EDT) Subject: When is cooling air cool enough? In-Reply-To: Message-ID: On Fri, 17 Oct 2003, David Mathog wrote: > Most computer rooms shuttle the air back and forth > between the computers and the A/C. I'm > wondering if one could not construct a less expensive > facility (less power running the A/C which is rarely > on, smaller A/C units) if the computer room was a > lot more like a wind tunnel: ambient air in (after > filtering out any dust or rain), > pass it through the computers, and then blow it out > the other side of the building. Note the room > wouldn't be wide open like a normal computer room. > Instead essentially each rack and other largish > computer unit would sit in its own separate air flow, > so that hot air from one wouldn't heat the next. > > The question is, how hot can the cooling air be and > still keep the computers happy? I personally have strong feelings about this, although there probably are sites out there with hard data and statistics and engineering recommendations. 70F or cooler would be my recommendation. In fact, cooler would be my recommendation -- 60F would be better still. I think the number is every 10F costs roughly a year of component life in the 60-80F ranges and even brief periods where the temperature at the intake gets significantly above 80F makes it uncomfortably likely that some component is damaged enough to fail within a year. > The answer will determine how big an A/C unit is > needed to handle cooling the intake air for those > times when it exceeds this upper limit. It costs roughly $1/watt/year to feed AND cool a computer, order of $100-150/cpu/year, with about 1/4 of that for cooling per se. The computer itself costs anywhere from $500 lowball to a couple of thousand per CPU (more if you have an expensive network). The HUMAN cost of screwing around with broken hardware can be crushing, and high temperatures are an open invitation for hardware to break a lot more often (and it breaks all too often at LOW temperatures). It just isn't worth it. > > I'm guessing that so long as a lot of air is moving through > the computers most would be ok in a sustained 30C (86F) flow. > Remember, this isn't 30C in dead air, it's 30C with high > pressure on the intake side of the computer and low > pressure on the outlet side, so that the generated heat > is rapidly moved out of the computer and away. (But not > so much flow as to blow cards out of their sockets!) > Somewhere between 30C and 40C one might expect poorly > ventilated CPUs and disks to begin to have problems. Above > 40C seems a tad too warm. At that temperature it's going > to be pretty uncomfortable for the operators too. So an 86F wind keeps YOU cool in the summer time? Only because you're damp on the outside and evaporating sweat cools you. Think 86F humid, and you're only at 98F at core. The CPU is considerably hotter, and is cooled by the temperature DIFFERENCE. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Fri Oct 17 21:41:25 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Fri, 17 Oct 2003 21:41:25 -0400 Subject: A Petaflop machine in 20 racks? In-Reply-To: <200310170846.h9H8kGA29022@NewBlue.scyld.com> References: <200310170846.h9H8kGA29022@NewBlue.scyld.com> Message-ID: <20031018014123.GB4857@piskorski.com> > > http://www.wired.com/news/technology/0,1282,60791,00.html > From: "Jim Lux" > Subject: Re: A Petaflop machine in 20 racks? > Date: Thu, 16 Oct 2003 16:46:19 -0700 > > Browsing through ClearSpeed's fairly "content thin" website, one turns up > the following: > http://www.clearspeed.com/downloads/overview_cs301.pdf > It also doesn't say whether the architecture is, for instance, SIMD. It > could well be a systolic array, which would be very well suited to cranking > out FFTs or other similar things, but probably not so hot for general > purpose crunching. If it is SIMD, this sounds rather reminiscent of the streaming supercomputer designs people hope to build using SIMD commodity GPU (Graphics Processing Unit) chips, and Peter Schroeder's 2002 "Hacking the GPU" class at CalTech. I don't know much of anything about it, but these older links made for some interesting reading: http://www.cs.caltech.edu/courses/cs101.3/ http://www.cs.caltech.edu/cspeople/faculty/schroder_p.html http://merrimac.stanford.edu/whitepaper.pdf http://merrimac.stanford.edu/resources.html http://graphics.stanford.edu/~hanrahan/talks/why/ I am really not clear how any of that relates to vector co-processor add-on cards like the older design mentioned here (I think FPGA based): http://aggregate.org/ECard/ nor to newer MIMD to SIMD compiling technology (and parallel "nanoprocessors"!) like this: http://aggregate.org/KYARCH/ -- Andrew Piskorski http://www.piskorski.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Fri Oct 17 21:41:25 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Fri, 17 Oct 2003 21:41:25 -0400 Subject: A Petaflop machine in 20 racks? In-Reply-To: <200310170846.h9H8kGA29022@NewBlue.scyld.com> References: <200310170846.h9H8kGA29022@NewBlue.scyld.com> Message-ID: <20031018014123.GB4857@piskorski.com> > > http://www.wired.com/news/technology/0,1282,60791,00.html > From: "Jim Lux" > Subject: Re: A Petaflop machine in 20 racks? > Date: Thu, 16 Oct 2003 16:46:19 -0700 > > Browsing through ClearSpeed's fairly "content thin" website, one turns up > the following: > http://www.clearspeed.com/downloads/overview_cs301.pdf > It also doesn't say whether the architecture is, for instance, SIMD. It > could well be a systolic array, which would be very well suited to cranking > out FFTs or other similar things, but probably not so hot for general > purpose crunching. If it is SIMD, this sounds rather reminiscent of the streaming supercomputer designs people hope to build using SIMD commodity GPU (Graphics Processing Unit) chips, and Peter Schroeder's 2002 "Hacking the GPU" class at CalTech. I don't know much of anything about it, but these older links made for some interesting reading: http://www.cs.caltech.edu/courses/cs101.3/ http://www.cs.caltech.edu/cspeople/faculty/schroder_p.html http://merrimac.stanford.edu/whitepaper.pdf http://merrimac.stanford.edu/resources.html http://graphics.stanford.edu/~hanrahan/talks/why/ I am really not clear how any of that relates to vector co-processor add-on cards like the older design mentioned here (I think FPGA based): http://aggregate.org/ECard/ nor to newer MIMD to SIMD compiling technology (and parallel "nanoprocessors"!) like this: http://aggregate.org/KYARCH/ -- Andrew Piskorski http://www.piskorski.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Fri Oct 17 23:15:21 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Fri, 17 Oct 2003 23:15:21 -0400 Subject: When is cooling air cool enough? In-Reply-To: <200310180131.h9I1VYA16665@NewBlue.scyld.com> References: <200310180131.h9I1VYA16665@NewBlue.scyld.com> Message-ID: <20031018031519.GB19525@piskorski.com> > From: "David Mathog" > Date: Fri, 17 Oct 2003 13:47:08 -0700 > if the computer room was a lot more like a wind tunnel: ambient air > in (after filtering out any dust or rain), pass it through the > computers, and then blow it out the other side of the building. > The question is, how hot can the cooling air be and still keep the > computers happy? That sounds like a pretty neat undergraduate heat transfer homework problem. No seriously, since you're at a university, if you want a rough estimate go over to the Chemical Engineering department and borrow their heat transfer textbook, or better, borrow somebody to set up the problem and calculate it for you. That could work, although what assumptions to make might be sticky. It's been too many years since I've forgotten all that, so perhaps fortunately, I don't quite remember where my old undergrad heat transfer book is right now anyway. :) > I'm guessing that so long as a lot of air is moving through > the computers most would be ok in a sustained 30C (86F) flow. But I bet the other respondants were right when they said that's probably too hot... -- Andrew Piskorski http://www.piskorski.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Fri Oct 17 23:15:21 2003 From: atp at piskorski.com (Andrew Piskorski) Date: Fri, 17 Oct 2003 23:15:21 -0400 Subject: When is cooling air cool enough? In-Reply-To: <200310180131.h9I1VYA16665@NewBlue.scyld.com> References: <200310180131.h9I1VYA16665@NewBlue.scyld.com> Message-ID: <20031018031519.GB19525@piskorski.com> > From: "David Mathog" > Date: Fri, 17 Oct 2003 13:47:08 -0700 > if the computer room was a lot more like a wind tunnel: ambient air > in (after filtering out any dust or rain), pass it through the > computers, and then blow it out the other side of the building. > The question is, how hot can the cooling air be and still keep the > computers happy? That sounds like a pretty neat undergraduate heat transfer homework problem. No seriously, since you're at a university, if you want a rough estimate go over to the Chemical Engineering department and borrow their heat transfer textbook, or better, borrow somebody to set up the problem and calculate it for you. That could work, although what assumptions to make might be sticky. It's been too many years since I've forgotten all that, so perhaps fortunately, I don't quite remember where my old undergrad heat transfer book is right now anyway. :) > I'm guessing that so long as a lot of air is moving through > the computers most would be ok in a sustained 30C (86F) flow. But I bet the other respondants were right when they said that's probably too hot... -- Andrew Piskorski http://www.piskorski.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From czarek at sun1.chem.univ.gda.pl Sat Oct 18 00:52:59 2003 From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski) Date: Sat, 18 Oct 2003 06:52:59 +0200 (CEST) Subject: some ab initio benchmarks In-Reply-To: Message-ID: Hi, quite recently I did some benchmarks of P4 2.8 GHz against dual Opteron 1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+) and some older machines. For comparison I am including benchmarks of dual P3 512 1200MHz I got from Wayne Fisher, The University of Texas at Dallas. On Opteron I have also tried PC GAMESS program which I received from Alex Granovsky. 1. Single point HF energy calculation for Ace-Gly-NMe in 6-31G* (155 basis functions) g03: mem=100MW TEST 6-31G* nosym scf=(tight,incore) gamess: MEMORY=20000000 DIRSCF=.TRUE. [sec] itek g03 Itanium2 1400MHz efc 7.1 26.5 prototype g03 p4 512 2800MHz pgi4 41.1 dahlia g03 Opteron 1400MHz pgi4 49.5 m211 g03 k7mp 2133MHz(MP 2600+) pgi4 83.3 Wayne g03 p3 512 1200MHz pgi4 85 m211 gamess k7mp 2133MHz(MP 2600+) ifc7.1 92.5 prototype gamess p4 512 2800MHz ifc7.1 106.5 dahlia PCgamess Opteron 1400MHz 112.9 dahlia gamess Opteron 1400MHz ifc7.1 128.5 itek gamess Itanium2 1400MHz efc 7.1 150.8 2. Single point MP2 energy calculation for Ace-Gly-NMe in 6-31G* (155 basis functions) g03: mem=100mw rwf=a,250MW,b,250MW,c,250MW TEST rmp2/6-31G* nosym scf=(tight,incore) MaxDisk=750MW gamess: MEMORY=50000000 DIRSCF=.TRUE. itek g03 Itanium2 1400MHz efc 7.1 51.7 prototype g03 p4 512 2800MHz pgi4 111.0 dahlia g03 Opteron 1400MHz pgi4 150.7 m211 gamess k7mp 2133MHz(MP 2600+) ifc7.1 154.2 prototype gamess p4 512 2800MHz ifc7.1 157.0 dahlia PCgamess Opteron 1400MHz 163.8 dahlia gamess Opteron 1400MHz ifc7.1 191.0 itek gamess Itanium2 1400MHz efc 7.1 194.8 m211 g03 k7mp 2133MHz(MP 2600+) pgi4 251.6 Wayne g03 p3 512 1200MHz pgi4 303 3. Manfreds Gaussian Benchmark http://www.chemie.uni-dortmund.de/groups/ocb/projekte/mg98b.html 243 basis functions 399 primitive gaussians RHF/3-21G* Freq [sec] itek g03 Itanium2 1400MHz efc 7.1 2843 prototype g03 p4 512 2800MHz pgi 4 8084 dahlia g03 Opteron 1400MHz pgi 4 9332 m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 10289 Wayne g03 p3 512 1200MHz pgi 4 12920 galera g03 p3xenon 700MHz pgi 3 19317 m001 g03 p3 650MHz pgi 4 22824 4. test397.com from gaussian03 882 basis functions, 1440 primitive gaussians rb3lyp/3-21g force test scf=novaracc [sec] itek g03 Itanium2 1400MHz efc 7.1 6733 prototype g03 p4 512 2800MHz pgi 4 12980 dahlia g03 Opteron 1400MHz pgi 4 17879 m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 20521 Wayne g03 p3 512 1200MHz pgi 4 24521 galera g03 p3xenon 700MHz pgi 3 39353 5. Gaussian calculations of NMR chemical shifts for GlyGlyAlaAla 207 basis functions, 339 primitive gaussians %MEM=800MB B3LYP/GEN NMR [sec] itek g03 Itanium2 1400MHz efc 7.1 275 prototype g03 p4 512 2800MHz pgi 4 614 dahlia g03 Opteron 1400MHz pgi 4 849 m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 948 Wayne g03 p3 512 1200MHz pgi 4 1134 some details: g03 is GAUSSIAN 03 rev. B04 with gaussian blas compiled with 32-bit pgi4.0 gamess is VERSION 6 SEP 2001 (R4) compiled with 32-bit ifc 7.1, for P4 I have used additional options -tpp7 -axKW Opteron (dahlia) had 64bit GinGin64 Linux and I had to use static 32-bit binaries. It should have SuSE Linux Enterprise soon and I will repeat tests using PGI 5.0 64-bit compiler when it will be ready. Itanium2 (itek) uses gamess VERSION = 14 JAN 2003 (R3) compiled with 64-bit efc and GAUSSIAN 03 rev. B04 with mkl60 compiled with 64-bit efc 7.1 P3xenon (galera) uses gamess VERSION = 6 SEP 2001 (R4) compiled with ifc 6.0 and GAUSSIAN 03 rev B.01 with gaussian blas compiled with pgi 3.3 czarek ---------------------------------------------------------------------- Dr. Cezary Czaplewski Department of Chemistry Box 431 Baker Lab of Chemistry University of Gdansk Cornell University Sobieskiego 18, 80-952 Gdansk, Poland Ithaca, NY 14853 phone: +48 58 3450-430 phone: (607) 255-0556 fax: +48 58 341-0357 fax: (607) 255-4137 e-mail: czarek at chem.univ.gda.pl e-mail: cc178 at cornell.edu ---------------------------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Oct 19 10:39:36 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sun, 19 Oct 2003 22:39:36 +0800 (CST) Subject: some ab initio benchmarks In-Reply-To: Message-ID: <20031019143936.29602.qmail@web16807.mail.tpe.yahoo.com> I have 2 pts: 1. The compilers used across different platforms were not the same, why not use the Intel compiler for the P4 as well? 2. What is the working set of the benchmark? If the benchmark fit in the 6MB on-chip L3 of the Itanium2, it is very likely to perform very well. Another benchmark that shows the G5 wins the large memory case, loses small/medium cases, while the Itanium2 loses most of its advantages when the working set does not fit the L3: http://www.xlr8yourmac.com/G5/G5_fluid_dynamics_bench/G5_fluid_dynamics_bench.html Andrew. --- Cezary Czaplewski ????> > Hi, > > quite recently I did some benchmarks of P4 2.8 GHz > against dual Opteron > 1400MHz, dual Itanium2 1400MHz and dual k7mp > 2133MHz(MP 2600+) and some > older machines. For comparison I am including > benchmarks of dual P3 512 > 1200MHz I got from Wayne Fisher, The University of > Texas at Dallas. On > Opteron I have also tried PC GAMESS program which I > received from Alex > Granovsky. > > > 1. Single point HF energy calculation for > Ace-Gly-NMe in 6-31G* > (155 basis functions) > > g03: mem=100MW TEST 6-31G* nosym scf=(tight,incore) > gamess: MEMORY=20000000 DIRSCF=.TRUE. [sec] > > itek g03 Itanium2 1400MHz efc 7.1 > 26.5 > prototype g03 p4 512 2800MHz pgi4 > 41.1 > dahlia g03 Opteron 1400MHz pgi4 > 49.5 > m211 g03 k7mp 2133MHz(MP 2600+) pgi4 > 83.3 > Wayne g03 p3 512 1200MHz pgi4 > 85 > m211 gamess k7mp 2133MHz(MP 2600+) ifc7.1 > 92.5 > prototype gamess p4 512 2800MHz ifc7.1 > 106.5 > dahlia PCgamess Opteron 1400MHz > 112.9 > dahlia gamess Opteron 1400MHz ifc7.1 > 128.5 > itek gamess Itanium2 1400MHz efc 7.1 > 150.8 > > 2. Single point MP2 energy calculation for > Ace-Gly-NMe in 6-31G* > (155 basis functions) > > g03: mem=100mw rwf=a,250MW,b,250MW,c,250MW TEST > rmp2/6-31G* nosym > scf=(tight,incore) > MaxDisk=750MW > gamess: MEMORY=50000000 DIRSCF=.TRUE. > > itek g03 Itanium2 1400MHz efc > 7.1 51.7 > prototype g03 p4 512 2800MHz pgi4 > 111.0 > dahlia g03 Opteron 1400MHz pgi4 > 150.7 > m211 gamess k7mp 2133MHz(MP 2600+) > ifc7.1 154.2 > prototype gamess p4 512 2800MHz > ifc7.1 157.0 > dahlia PCgamess Opteron 1400MHz 163.8 > dahlia gamess Opteron 1400MHz > ifc7.1 191.0 > itek gamess Itanium2 1400MHz efc > 7.1 194.8 > m211 g03 k7mp 2133MHz(MP 2600+) pgi4 > 251.6 > Wayne g03 p3 512 1200MHz pgi4 > 303 > > 3. Manfreds Gaussian Benchmark > http://www.chemie.uni-dortmund.de/groups/ocb/projekte/mg98b.html > > 243 basis functions 399 primitive gaussians > RHF/3-21G* Freq > > [sec] > > itek g03 Itanium2 1400MHz efc 7.1 > 2843 > prototype g03 p4 512 2800MHz pgi 4 > 8084 > dahlia g03 Opteron 1400MHz pgi 4 > 9332 > m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 > 10289 > Wayne g03 p3 512 1200MHz pgi 4 > 12920 > galera g03 p3xenon 700MHz pgi 3 > 19317 > m001 g03 p3 650MHz pgi 4 > 22824 > > 4. test397.com from gaussian03 > > 882 basis functions, 1440 primitive gaussians > rb3lyp/3-21g force test scf=novaracc > > [sec] > > itek g03 Itanium2 1400MHz efc 7.1 > 6733 > prototype g03 p4 512 2800MHz pgi 4 > 12980 > dahlia g03 Opteron 1400MHz pgi 4 > 17879 > m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 > 20521 > Wayne g03 p3 512 1200MHz pgi 4 > 24521 > galera g03 p3xenon 700MHz pgi 3 > 39353 > > 5. Gaussian calculations of NMR chemical shifts for > GlyGlyAlaAla > > 207 basis functions, 339 primitive gaussians > %MEM=800MB > B3LYP/GEN NMR > [sec] > > itek g03 Itanium2 1400MHz efc 7.1 > 275 > prototype g03 p4 512 2800MHz pgi 4 > 614 > dahlia g03 Opteron 1400MHz pgi 4 > 849 > m211 g03 k7mp 2133MHz(MP 2600+) pgi 4 > 948 > Wayne g03 p3 512 1200MHz pgi 4 > 1134 > > some details: > > g03 is GAUSSIAN 03 rev. B04 with gaussian blas > compiled with 32-bit pgi4.0 > > gamess is VERSION 6 SEP 2001 (R4) compiled with > 32-bit ifc 7.1, for P4 I > have used additional options -tpp7 -axKW > > Opteron (dahlia) had 64bit GinGin64 Linux and I had > to use static 32-bit > binaries. It should have SuSE Linux Enterprise soon > and I will repeat > tests using PGI 5.0 64-bit compiler when it will be > ready. > > Itanium2 (itek) uses gamess VERSION = 14 JAN 2003 > (R3) compiled with > 64-bit efc and GAUSSIAN 03 rev. B04 with mkl60 > compiled with 64-bit efc > 7.1 > > P3xenon (galera) uses gamess VERSION = 6 SEP 2001 > (R4) compiled with ifc > 6.0 and GAUSSIAN 03 rev B.01 with gaussian blas > compiled with pgi 3.3 > > > czarek > > ---------------------------------------------------------------------- > Dr. Cezary Czaplewski > Department of Chemistry Box 431 > Baker Lab of Chemistry > University of Gdansk Cornell > University > Sobieskiego 18, 80-952 Gdansk, Poland Ithaca, NY > 14853 > phone: +48 58 3450-430 phone: > (607) 255-0556 > fax: +48 58 341-0357 fax: (607) > 255-4137 > e-mail: czarek at chem.univ.gda.pl e-mail: > cc178 at cornell.edu > ---------------------------------------------------------------------- > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Sun Oct 19 11:37:14 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Sun, 19 Oct 2003 23:37:14 +0800 (CST) Subject: Long lived OpenPBS bug fixed! Message-ID: <20031019153714.3905.qmail@web16808.mail.tpe.yahoo.com> All versions of OpenPBS have this problem: the scheduler uses blocking sockets to contact the nodes, and if a node is dead, the scheduler hangs for several minutes, and all user commands will hang (no so good!). Scalable PBS finally fixed this problem: "... In local testing, we are able to issue a 'kill -STOP' on one node or even all nodes and the pbs_server daemon continues to be highly responsive to user commands, scheduler queries, and job submissions." http://www.supercluster.org/pipermail/scalablepbsusers/2003-October/000162.html *Also*, don't miss the Supercluster Newsletter, which talked about the next generation Maui scheduler called "Moab": http://www.supercluster.org/pipermail/scalablepbsusers/2003-October/000132.html Andrew. ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gmpc at sanger.ac.uk Sun Oct 19 15:32:42 2003 From: gmpc at sanger.ac.uk (Guy Coates) Date: Sun, 19 Oct 2003 20:32:42 +0100 (BST) Subject: RLX In-Reply-To: <200310181602.h9IG2HA27890@NewBlue.scyld.com> References: <200310181602.h9IG2HA27890@NewBlue.scyld.com> Message-ID: > > Have you ever try or test RLX server for HPC? > > What is their performance? > > .. what's their price/performance? Well, it all depends. The performance of the current generation of blade systems are on a par with 1U systems, and you can now get chassis with myrinet or SAN connectivity if you need it. The part of price/performance that tends to get overlooked is manageability. Do you factor in the time and salaries of you admin staff who have to look after the thing? We run clusters with blade servers from various manufacturers (including RLX) and traditional 1U machines. The management overhead on blade systems is significantly lower than for 1U machines, and streets ahead of "beige boxes on shelves". On blade systems the network and SAN switching infrastructure is nicely integrated with the server chassis, and their management interfaces tied in with OS deployment, remove power management etc. The difference in management overhead gets more pronounced as your cluster size increases. The time it takes to look after a 24 node cluster of 1U boxes isn't going to be that different to the time it takes to look after 24 blades, but running a 1000 blades is much less effort than running a 1000 1U servers. Whether this actually matters or not depends on your circumstances. If you have a limitless supply of PhD student slave labour, (eg Virginia Tech and their G5s), then time and cost of management isn't so much of an issue. If you have to pay money for your sys-admins and want to run big clusters, then blades may end up being cost effective. Cheers, Guy Coates -- Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Mon Oct 20 04:35:03 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Mon, 20 Oct 2003 01:35:03 -0700 Subject: A Petaflop machine in 20 racks? In-Reply-To: Message-ID: <5.2.0.9.2.20031020013259.03c0a4e0@216.82.101.6> Quoting from the article: An ordinary desktop PC outfitted with six PCI cards, each containing four of the chips, would perform at about 600 gigaflops (or more than half a teraflop). Assuming you were to build cluster systems with six PCI cards each, it would require 4U rack cases... Unless these floating point cards come as low-profile PCI (MD2 form factor)? 20 racks * 42U per rack = 840U / 4 = 210 nodes, not counting switching equipment. Petaflop with 210 compute nodes? At 04:15 PM 10/16/2003 -0400, you wrote: >Hi all, > > Check out this article over at wired: > >http://www.wired.com/news/technology/0,1282,60791,00.html > > It makes all sorts of wild claims, but what do you guys think? >Obviously, there's memory bandwidth limitations due to PCI. Does anyone >know anything else about these guys? > >Cheers, >Bryce > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Mon Oct 20 08:16:19 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Mon, 20 Oct 2003 14:16:19 +0200 Subject: some ab initio benchmarks In-Reply-To: References: Message-ID: <20031020121619.GM8711@unthought.net> On Sat, Oct 18, 2003 at 06:52:59AM +0200, Cezary Czaplewski wrote: > > Hi, > > quite recently I did some benchmarks of P4 2.8 GHz against dual Opteron > 1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+) and some > older machines. For comparison I am including benchmarks of dual P3 512 > 1200MHz I got from Wayne Fisher, The University of Texas at Dallas. On > Opteron I have also tried PC GAMESS program which I received from Alex > Granovsky. Could you please specify which version of which operating system was used for this? If the kernel does not have NUMA scheduling, the Opterons are severely disadvantaged - it would be useful to know. Thank you, -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From richardlbj at yahoo.com Sat Oct 18 23:56:37 2003 From: richardlbj at yahoo.com (Richard Brown) Date: Sat, 18 Oct 2003 20:56:37 -0700 (PDT) Subject: cluseter node freezes while running namd 2.5/2.5b1 Message-ID: <20031019035637.5382.qmail@web41211.mail.yahoo.com> I have been try to figure this out for the past two months with no luck. I have a 8-node PC cluster that consists of 16 athlon mp2200+, msi k7d master-l mb, intel i82557/i82558 10/100 on-board lan, 500mb kingston ddr266 pc2100 unbuffered, 3com superstack III baseline 24 port 10/100 switch. The cluster was built using oscar2.1/redhat7.3 w/ the kernel update 2.4.20-20. namd used includes 2.5b1 and the latest 2.5, both linux binary distributions and source code builds. the simulation tested is apoa1 benchmark example. namd/apoa1 only runs w/o problems on a single cluster node, either with one or two cpus. Every time it runs on two or more nodes, either using one or two cpus from each node, namd/apoa1 stops somewhere in the middle of run. One of the nodes freezes and does not respond to ping, ssh or the directly attached keyboard. Most of the time there were no error messages. A few times I received apic error or sorcket receive failure. I tried plugging a ps/2 mouse into the nodes as some people suggested for a bug of the motherboad but it did not help. I don't know how to proceed from here. Any suggestions would be appreciated. Thanks, Richard __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cb4 at tigertiger.de Sun Oct 19 21:00:53 2003 From: cb4 at tigertiger.de (Christoph Best) Date: Sun, 19 Oct 2003 21:00:53 -0400 Subject: A Petaflop machine in 20 racks? In-Reply-To: <20031018013957.GA3774@greglaptop.PEATEC.COM> References: <20031018013957.GA3774@greglaptop.PEATEC.COM> Message-ID: <16275.13253.833239.996985@random.tigertiger.de> > > http://www.wired.com/news/technology/0,1282,60791,00.html Greg Lindahl writes: > I think it's the Return of the Array Processor. > > There's very little new in computing these days -- and it has the > usual flaws of APs: low bandwidth communication to the host. > > So if you have a problem that actually fits in the limited memory, and > doesn't need to communicate with anyone else very often, it may be a > win for you. They actually say in this document http://www.clearspeed.com/downloads/overview_cs301.pdf that the chip can be used as stand-alone processor and resembles a standard RISC processor. I do not see whether it would be SIMD or MIMD - the block diagram at least does not show a central control unit separate from the PEs. Given the small on-chip memory, they will have to connect external memory. The thing that would worry me is that the external machine balance is 32 Flops/Word (on 32-bit words), so it will only be useful for applications that do a lot of operations inside a few 100Kb of memory. IBM is following a slightly different approach with the QCDOC and BlueGene/L supercomputers which are based on systems-on-a-chip where they put a two PowerPC cores and all support logic on a single chip, wire it up with one or two GB of memory and connect a lot (64K) of these chips together. They expect 5.5 GFlops/s per node peak and to have 360 TFlops operational in 2004/5 (in 64 racks). You would need about 200 racks to get to a PetaFlops machine... http://sc-2002.org/paperpdfs/pap.pap207.pdf http://www.arxiv.org/abs/hep-lat/0306023 [QCDOC is a Columbia University project in collaboration with IBM - IBM is transitioning the technology from high-energy physics to biology which makes a lot of sense... :-)] To put 64 processors on a chip, I am sure ClearSpeed have to sacrifice a lot in memory and functionality/programmability, and who wins in this tradeoff remains to be seen. Depends on the application, too, of course. BTW, who or what is behind ClearSpeed? Their Bristol address is identical to Infineon's Design Centre there, and Hewlett Packard seems to have a lab there, too. If they have that kind of support, I am sure they thought hard before making these design choices, and it may just be tarketed at certain problems (vector/matrix/FFT-like stuff). -Christoph -- Christoph Best cbst at tigertiger.de Bioinformatics group, LMU Muenchen http://tigertiger.de/cb _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mof at labf.org Mon Oct 20 10:13:56 2003 From: mof at labf.org (Mof) Date: Mon, 20 Oct 2003 23:43:56 +0930 Subject: Solaris Fire Engine. Message-ID: <200310202343.56524.mof@labf.org> http://www.theregister.co.uk/content/61/33440.html ... "We worked hard on efficiency, and we now measure, at a given network workload on identical x86 hardware, we use 30 percent less CPU than Linux." Mof. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Mon Oct 20 11:17:24 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 20 Oct 2003 11:17:24 -0400 Subject: cluseter node freezes while running namd 2.5/2.5b1 In-Reply-To: <20031019035637.5382.qmail@web41211.mail.yahoo.com> References: <20031019035637.5382.qmail@web41211.mail.yahoo.com> Message-ID: <3F93FC84.7020808@scalableinformatics.com> Hi Richard: Are your Intel network drivers up to date? Check on the Intel site. If only one node repeatedly freezes (the same node), you might look at taking it out of the cluster, and seeing if that improves the situation. If it does, swap the one you took out, with one that is still in there, and see if the problem returns. This will help you determine if the problem is node based or system based. Joe Richard Brown wrote: >I have been try to figure this out for the past two >months with no luck. > >I have a 8-node PC cluster that consists of 16 athlon >mp2200+, msi k7d master-l mb, intel i82557/i82558 >10/100 on-board lan, 500mb kingston ddr266 pc2100 >unbuffered, 3com superstack III baseline 24 port >10/100 switch. > >The cluster was built using oscar2.1/redhat7.3 w/ the >kernel update 2.4.20-20. namd used includes 2.5b1 and >the latest 2.5, both linux binary distributions and >source code builds. the simulation tested is apoa1 >benchmark example. > >namd/apoa1 only runs w/o problems on a single cluster >node, either with one or two cpus. Every time it runs >on two or more nodes, either using one or two cpus >from each node, namd/apoa1 stops somewhere in the >middle of run. One of the nodes freezes and does not >respond to ping, ssh or the directly attached >keyboard. Most of the time there were no error >messages. A few times I received apic error or sorcket >receive failure. I tried plugging a ps/2 mouse into >the nodes as some people suggested for a bug of the >motherboad but it did not help. > >I don't know how to proceed from here. Any suggestions >would be appreciated. > >Thanks, >Richard > > >__________________________________ >Do you Yahoo!? >The New Yahoo! Shopping - with improved product search >http://shopping.yahoo.com >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jschauma at netbsd.org Mon Oct 20 11:03:50 2003 From: jschauma at netbsd.org (Jan Schaumann) Date: Mon, 20 Oct 2003 11:03:50 -0400 Subject: New tech-cluster mailing list for NetBSD Message-ID: <20031020150350.GA26140@netmeister.org> Hello, A new tech-cluster at netbsd.org mailing list has been created. As the name suggests, this list is intended for technical discussions on building and using clusters of NetBSD hosts. Initially, this list is expected to be of low volume, but we hope to advocate and advance the use of NetBSD in such environments significantly. Subscription is via majordomo -- please see http://www.NetBSD.org/MailingLists/ for details. -Jan -- http://www.netbsd.org - Multiarchitecture OS, no hype required. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Oct 20 14:03:23 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 20 Oct 2003 14:03:23 -0400 (EDT) Subject: Solaris Fire Engine. In-Reply-To: <200310202343.56524.mof@labf.org> Message-ID: On Mon, 20 Oct 2003, Mof wrote: > http://www.theregister.co.uk/content/61/33440.html > > ... "We worked hard on efficiency, and we now measure, at a given network > workload on identical x86 hardware, we use 30 percent less CPU than Linux." Linux uses much more CPU per packet than it used to. The structural change for IPtable/IPchains capability is very expensive, even when it is not used. And there have been substantial, CPU-costly changes to protect against denial-of-service attacks at many levels. The only protocol stack changes that might benefit cluster use are sendfile/zero-copy, and that doesn't apply to most current hardware or typical cluster message passing. I would be technially easy to revert to the interface of old Linux kernels and see much better than a 30% CPU reduction, but it's very unlikely that would happen politically: Linux development is feature-driven, not performance-driven. And that's easy to understand when your pet feature is at stake, or there is a news story of "Linux Kernel Vulnerable to ". -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kinghorn at pqs-chem.com Mon Oct 20 13:36:28 2003 From: kinghorn at pqs-chem.com (Donald B. Kinghorn) Date: Mon, 20 Oct 2003 12:36:28 -0500 Subject: parllel eigen solvers Message-ID: <200310201236.28901.kinghorn@pqs-chem.com> Does anyone know of any recent progress on parallel eigensolvers suitable for beowulf clusters running over gigabit ethernet? It would be nice to have something that scaled moderately well and at least gave reasonable approximations to some subset of eigenvalues and vectors for large (10,000x10,000) symmetric systems. My interests are primarily for quantum chemistry. It's pretty obvious that you can compute eigenvectors in parallel after you get the eigenvalues but it would be nice to get eigenvalues mostly in parallel requiring maybe just a couple of serial iterates ... Best regards to all -Don _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From czarek at sun1.chem.univ.gda.pl Mon Oct 20 15:08:21 2003 From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski) Date: Mon, 20 Oct 2003 21:08:21 +0200 (CEST) Subject: some ab initio benchmarks In-Reply-To: <20031020121619.GM8711@unthought.net> Message-ID: On Mon, 20 Oct 2003, Jakob Oestergaard wrote: > Could you please specify which version of which operating system was > used for this? Opteron machine (dahlia) was a prototype which dr Paulette Clancy got for evaluation from local computer shop. It had RedHat GinGin 64 operating system preistalled when I did testing. > If the kernel does not have NUMA scheduling, the Opterons are severely > disadvantaged - it would be useful to know. I don't remember which kernel was installed when I did benchmarks, I suppose standard kernel which is coming with GinGin64. Machine should have SuSE installed now so I cannot check it. I will repeat benchmarks with PGI 5 64bit compiler and SuSE when I will have some time. czarek _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Mon Oct 20 17:50:56 2003 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Mon, 20 Oct 2003 14:50:56 -0700 Subject: A Petaflop machine in 20 racks? In-Reply-To: <16275.13253.833239.996985@random.tigertiger.de> References: <20031018013957.GA3774@greglaptop.PEATEC.COM> <20031018013957.GA3774@greglaptop.PEATEC.COM> Message-ID: <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov> At 09:00 PM 10/19/2003 -0400, Christoph Best wrote: >BTW, who or what is behind ClearSpeed? Their Bristol address is >identical to Infineon's Design Centre there, and Hewlett Packard seems >to have a lab there, too. If they have that kind of support, I am sure >they thought hard before making these design choices, and it may just >be tarketed at certain problems (vector/matrix/FFT-like stuff). Off their web site...http://www.clearspeed.com/about.php?team The CEO and president are marketing oriented (CEO: "he focused on taking new technologies to market", President: "..successfully grown glabal sales mangement and field application organizations and instrumental in creating key partnership agreements". The CTO (Ray McConnell) does parallel processing with 300K processors, etc. VP Engr (Russell David) designed mixed signal baseband ICs for wireless market. I didn't turn up any papers in the IEEE on-line library, but that's not particularly signficant, in and of itself. McConnell has a paper http://www.hotchips.org/archive/hc11/hc11pres_pdf/hc99.s3.2.McConnell.pdf shows architectures from PixelFusion, Ltd... SIMD core with 32 bit embedded processor running a 256 PE "Fuzion block". Each PE has an 8 bit ALU and 2kByte PE memory... (sound familiar?) From "Hot Chips 99" James Lux, P.E. Spacecraft Telecommunications Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Mon Oct 20 18:46:31 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 21 Oct 2003 00:46:31 +0200 (CEST) Subject: Solaris Fire Engine. In-Reply-To: Message-ID: On Mon, 20 Oct 2003, Donald Becker wrote: > The only protocol stack changes that might benefit cluster use are > sendfile/zero-copy, and that doesn't apply to most current hardware or > typical cluster message passing. Has actually somebody tried to use sendfile in MPICH or LAM-MPI ? I planned to do it, but this is somewhere in the middle of my always growing TODO queue... Recipes for how to use it were posted a few times at least on netdev list, so those interested can find them easily. > I would be technially easy to revert to the interface of old Linux > kernels and see much better than a 30% CPU reduction, but it's very > unlikely that would happen politically: But there are many projects that live outside the official kernel, the Scyld network drivers being one good example. What's wrong with replacing the IP stack with one maintained separately with performance in mind ? I agree though that this would mean somebody to take care of it and make sure that it works with newer kernels... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon Oct 20 19:08:12 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 20 Oct 2003 19:08:12 -0400 (EDT) Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: Message-ID: On Mon, 20 Oct 2003, Trent Piepho wrote: > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > messages I've sent to this list has started bouncing back to me from > dan at systemsfirm.com. I'm getting about ten copies of each one every other > day. Is anyone else having this problem? Oh yes, and it is a SERIOUS problem. I was just mulling on the right procmail recipe to consign this domain to the dark depths of hell, but if it were done at the list level instead it would only be a good thing. My .procmailrc is already getting quite long indeed. BTW, you (and of course the rest of the list) are just the man to ask; what is the status of Opterons and fortran compilers. I myself don't use fortran any more, but a number of folks at Duke do, and they are starting to ask what the choices are for Opterons. A websearch reveals that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an Opteron fortran, but rumor also suggests that a number of these are really "beta" quality with bugs that may or may not prove fatal to any given project. Then there is Gnu. Any comments on any of these from you (or anybody, really)? Is there a functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? Do the compilers permit access to large (> 3GB) memory, do they optimize the use of that memory, do they support the various SSE instructions? I'm indirectly interested in this as it looks like I'm getting Opterons for my next round of cluster purchases personally, although I'll be using C on them (hopefully 64 bit Gnu C). rgb > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From henken at seas.upenn.edu Mon Oct 20 18:52:03 2003 From: henken at seas.upenn.edu (Nicholas Henke) Date: Mon, 20 Oct 2003 18:52:03 -0400 Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: References: Message-ID: <1066690323.7027.17.camel@roughneck.liniac.upenn.edu> On Mon, 2003-10-20 at 18:41, Trent Piepho wrote: > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > messages I've sent to this list has started bouncing back to me from > dan at systemsfirm.com. I'm getting about ten copies of each one every other > day. Is anyone else having this problem? Yes -- quite annoying :/ Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Mon Oct 20 18:41:51 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Mon, 20 Oct 2003 15:41:51 -0700 (PDT) Subject: flood of bounces from postmaster@systemsfirm.net Message-ID: I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every messages I've sent to this list has started bouncing back to me from dan at systemsfirm.com. I'm getting about ten copies of each one every other day. Is anyone else having this problem? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Mon Oct 20 20:08:31 2003 From: becker at scyld.com (Donald Becker) Date: Mon, 20 Oct 2003 20:08:31 -0400 (EDT) Subject: Solaris Fire Engine. In-Reply-To: Message-ID: On Tue, 21 Oct 2003, Bogdan Costescu wrote: > On Mon, 20 Oct 2003, Donald Becker wrote: > > > The only protocol stack changes that might benefit cluster use are > > sendfile/zero-copy, and that doesn't apply to most current hardware or > > typical cluster message passing. > > Has actually somebody tried to use sendfile in MPICH or LAM-MPI ? I > planned to do it, but this is somewhere in the middle of my always growing > TODO queue... Recipes for how to use it were posted a few times at least > on netdev list, so those interested can find them easily. The trick is to memory map a file use that memory region as message buffers send the message buffers using sendfile() My belief is that the page locking involved with sendfile() would be too costly for anything smaller than about 32KB. While I'm certain that there are a few MPI applications that use messages that large, they don't seem to be typical. > But there are many projects that live outside the official kernel, the > Scyld network drivers being one good example. What's wrong with replacing > the IP stack with one maintained separately with performance in mind ? > I agree though that this would mean somebody to take care of it and make > sure that it works with newer kernels... >From my experience trying to keep the network driver interface stable, I very much doubt that it would be possible to separately maintain a network protocol stack. Especially since it would be perceived as competition with the in-kernel version, which brings out the worst behavior... As a specific example, a few years ago we had cluster performance patches for the 2.2 kernel. Even while the 2.3.99 development was going on, the 2.2 kernel changed too quickly to keep those patches up to date and tested. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From cb4 at tigertiger.de Mon Oct 20 19:31:33 2003 From: cb4 at tigertiger.de (Christoph Best) Date: Tue, 21 Oct 2003 01:31:33 +0200 Subject: A Petaflop machine in 20 racks? In-Reply-To: <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov> References: <20031018013957.GA3774@greglaptop.PEATEC.COM> <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov> Message-ID: <16276.28757.858683.189030@random.tigertiger.de> Jim Lux writes: > At 09:00 PM 10/19/2003 -0400, Christoph Best wrote: > >BTW, who or what is behind ClearSpeed? Their Bristol address is > >identical to Infineon's Design Centre there, and Hewlett Packard seems > >to have a lab there, too. If they have that kind of support, I am sure > >they thought hard before making these design choices, and it may just > >be tarketed at certain problems (vector/matrix/FFT-like stuff). > > The CTO (Ray McConnell) does parallel processing with 300K processors, etc. > VP Engr (Russell David) designed mixed signal baseband ICs for wireless > market. I didn't turn up any papers in the IEEE on-line library, but > that's not particularly signficant, in and of itself. I actually found some more info about them: Clearspeed used to be Pixelfusion, a spin-off from Inmos, who made the original Transputer. http://www.eetimes.com/sys/news/OEG20010524S0044 Clearspeed tried to design a SIMD processor called Fuzion for graphics applications, then around 2001 turned to the networking sector, and now it seems to high-performance computing. So its a processor in search of an application. http://www.eetimes.com/semi/news/OEG20000208S0039 http://www.eetimes.com/semi/news/OEG19990512S0012 Poor guys went through at least three CEOs during the last four years... -Christoph -- Christoph Best cbst at tigertiger.de Bioinformatics group, LMU Muenchen http://tigertiger.de/cb _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Mail.Linux-Consulting.com Mon Oct 20 20:33:23 2003 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Mon, 20 Oct 2003 17:33:23 -0700 (PDT) Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: Message-ID: hi ya trent just add the ip# of systemsfirm.net to your /etc/mail/access files # a polite msg i added for them/somebody to see .. systemsfirm.net REJECT - geez .. do you need help to fix your PC cd /etc/mail ; make ; restart-sendmail or your exim or ... c ya alvin and about 75% or more of the sven virus is coming from mis-managed/mis-configured clusters http://www.Linux-Sec.net/MSJunk On Mon, 20 Oct 2003, Trent Piepho wrote: > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > messages I've sent to this list has started bouncing back to me from > dan at systemsfirm.com. I'm getting about ten copies of each one every other > day. Is anyone else having this problem? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Mon Oct 20 23:34:44 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Mon, 20 Oct 2003 20:34:44 -0700 (PDT) Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: Message-ID: yes... I've tried contacting the admin contact for that domain and got no response... joelja On Mon, 20 Oct 2003, Trent Piepho wrote: > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > messages I've sent to this list has started bouncing back to me from > dan at systemsfirm.com. I'm getting about ten copies of each one every other > day. Is anyone else having this problem? > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Tue Oct 21 08:06:56 2003 From: pesch at attglobal.net (pesch at attglobal.net) Date: Tue, 21 Oct 2003 05:06:56 -0700 Subject: Solaris Fire Engine. References: Message-ID: <3F95215F.9DE43BD9@attglobal.net> In a cluster, would it not make more sense to catch an attack in a firewall rather than at the kernel level? If so, should cluster builders perhaps look for other - more cluster specific - kernels? Should kernel development at some point split in two distinct lines: one for single computer applications and one for clusters? Paul Schenker Donald Becker wrote: > On Mon, 20 Oct 2003, Mof wrote: > > > http://www.theregister.co.uk/content/61/33440.html > > > > ... "We worked hard on efficiency, and we now measure, at a given network > > workload on identical x86 hardware, we use 30 percent less CPU than Linux." > > Linux uses much more CPU per packet than it used to. The structural > change for IPtable/IPchains capability is very expensive, even when it > is not used. And there have been substantial, CPU-costly changes to protect > against denial-of-service attacks at many levels. The only protocol > stack changes that might benefit cluster use are sendfile/zero-copy, and > that doesn't apply to most current hardware or typical cluster message > passing. > > I would be technially easy to revert to the interface of old Linux > kernels and see much better than a 30% CPU reduction, but it's very > unlikely that would happen politically: Linux development is > feature-driven, not performance-driven. And that's easy to understand > when your pet feature is at stake, or there is a news story of "Linux > Kernel Vulnerable to ". > > -- > Donald Becker becker at scyld.com > Scyld Computing Corporation http://www.scyld.com > 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system > Annapolis MD 21403 410-990-9993 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From zby at tsinghua.edu.cn Mon Oct 20 22:34:54 2003 From: zby at tsinghua.edu.cn (Baoyin Zhang) Date: Tue, 21 Oct 2003 10:34:54 +0800 Subject: Jcluster toolkit v 1.0 releases! Message-ID: <266703288.27688@mail.tsinghua.edu.cn> Apologies if you receive multiple copies of this message. Dear all, I am pleased to annouce the Jcluster Toolkit (Ver 1.0) releases, you can freely download it from the website below. http://vip.6to23.com/jcluster/ The toolkit is a high performance Java parallel environment, implemented in pure java. It provides you the popular PVM-like and MPI-like message-passing interface, automatic task load balance across large-scale heterogeneous cluster and high performance, reliable multithreaded communications using UDP protocol. In the version 1.0, Object passing interface is added into PVM-like and MPI-like message passing interface, and provide very convenient deployment -- the classes of user application only need to be deployed at one node in a large-scale cluster. I welcome your comments, suggestions, cooperation, and involvement in improving the toolkit. Best regards Baoyin Zhang _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Tue Oct 21 08:20:10 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 21 Oct 2003 08:20:10 -0400 (EDT) Subject: Solaris Fire Engine. In-Reply-To: <3F95215F.9DE43BD9@attglobal.net> Message-ID: On Tue, 21 Oct 2003 pesch at attglobal.net wrote: > In a cluster, would it not make more sense to catch an attack in a firewall rather than at the kernel level? If > so, should cluster builders perhaps look for other - more cluster specific - kernels? Should kernel development > at some point split in two distinct lines: one for single computer applications and one for clusters? It's the usual problem (and a continuation of my XML rant in a way, as it is at least partly motivated by this). Sure, one can do this. However, it is very, very expensive to do so, a classic case of 90% of the work producing 10% of the benefit, if that. As Don pointed out, even Scyld, with highly talented people who are (in principle:-) even making money doing so found maintaining a separate kernel line crushingly expensive very quickly. Whenever expense is mentioned, especially in engineering, one has to consider benefit, and do a CBA. The CBA is the crux of all optimization theory; find the point of diminishing returns and stay there. I would argue that splitting the kernel is WAY far beyond that point. Folks who agree can skip the editorial below. For that matter, so can folks who disagree...;-) The expense can be expressed/paid one of several ways -- get a distinct kernel optimized and stable, get an entire associated distribution optimized and stable, and then freeze everything except for bugfixes. You then get a local optimum (after a lot of work) that doesn't take a lot of work to maintain, BUT you pay the penalty of drifting apart from the rest of linux and can never resynchronize without redoing all that work (and accepting all that new expense). New, more efficient gcc? Forget it -- the work of testing it with your old kernel costs too much. New device drivers? Hours to days of testing for each one. Eventually a key application or improvement appears in the main kernel line (e.g. 64 bit, Opteron support) that is REALLY different, REALLY worth more to nearly everybody than the benefit they might or might not gain from the custom HPC optimized kernel, and your optimized but stagnant kernel is abandoned. Alternatively, you can effectively twin the entire kernel development cycle, INCLUDING the testing and debugging. Back in my ill-spent youth I spent a considerable amount of time on the linux-smp list (I couldn't take being on the main linux kernel list even then, as its traffic dwarfs both the beowulf list and the linux-smp list combined). I also played a tiny bit with drivers on a couple of occassions. The amount of work, and number of human volunteers, required to drive these processes is astounding, and I would guess that it would have to be done on twinned lists as the kernelvolken would likely not welcome a near doubling of traffic on their lists or doubling of the work burden trying to figure out just who owns a given emergent bug (and inevitably they WOULD have to help figure out who owns emergent bugs, as some of them WOULD belong to them, others to the group supporting the split off sources, if they were to proceed independently but "keep up" with the development kernel so that true divergence did not occur). A better alternative exists (and is even used to some extent). The linux kernel is already highly modular. It is already possible to e.g. bypass the IP stack altogether (as is done by myrinet and other high speed networks) with custom device drivers that work below the IP and TCP layers -- just doing this saves you a lot of the associated latency hit in high speed networks, as TCP/IP is designed for WAN routing and security and tends to be overkill for a secure private LAN IPC channel in a beowulf. This route requires far less maintenance and customization -- specialized drivers for MPI and/or PVM and/or a network socket layer, plus a kernel module or three. Even this is "expensive" and tends to be done only by companies that make hefty marginal profits for their specific devices, but it is FAR cheaper than maintaining a separate kernel altogether. I would also lump into this group applying and testing on an ad hoc basis things like Josip's network optimization patches which make relatively small, relatively specific changes that might technically "break" a kernel for WAN application but can produce measureable benefits for certain classes of communication pattern. This sort of thing is NOT for everybody. It is like a small scale version of the first alternative -- the patches tend to be put together for some particular kernel revision and then frozen (or applied "blindly" to succeeding kernel revisions until they manifestly break). Again this motivates one to freeze kernel and distribution once one gets everything working and live with it until advances elsewhere make it impossible to continue doing so. This is the kind of thing where MAYBE one could get the patches introduced into the mainstream kernel sources in a form that was e.g. sysctl controllable -- "modular", as it were, but inside the non-modular part of the kernel as a "proceed at your own risk" feature. Expense alternatives in hand, one has to measure benefit. We could break up HPC applications very crudely into groups. One group is code that is CPU bound -- where the primary/only bottleneck is the number of double precision floating point (and associated integer) computations that the computer can retire per second. Another might be memory bound -- limited primarily by the speed with which the system can move values into and out of memory doing some simple operations on them in the meantime. Still another might be disk or other non-network I/O bound (people who crunch large data sets to and from large storage devices). Finally yes, one group might be bound by the network and network based IPC's in a parallel division of a program. This latter group is the ONLY group that would really benefit from the kernel split; the rest of the kernel is reasonably well optimized for raw computations, memory access, and even hardware device access (or can be configured and tuned to be without the need of a separate kernel line). I would argue that even the network group splits again, into latency limited and bandwidth limited. Bandwidth limited applications would again see little benefit from a hacked kernel split as TCP can deliver data throughput that is roughly 90% of wire speed (or better) for ethernet, depending on the quality of hardware as much as the kernel. Of course, the degree of the CPU's involvement in sending and receiving these messages could be improved; one would like to be able to use DMA as much as possible to send the messages without blocking the CPU, but this matters only if the CPU can do something useful while awaiting the network IPC transfers; often it cannot. The one remaining group that would significantly benefit is the latency limited group -- true network parallel applications that need to send lots of little messages that cannot be sensibly aggregated in software. The benefit there could be profound, as the TCP stack adds quite a lot of latency (and CPU load) on top of the irreducible hardware latency, IIRC, even on a switched network where the CPU doesn't have to deal with a lot of spurious network traffic. Are there enough members of this group to justify splitting the kernel? I very much doubt it. I don't even think that the existence of this group has motivated the widespread adoption of a non-IP ethernet transport layer -- nearly everybody just lives with the IP stack latency OR... ...uses one of the dedicated HPC networks. This is the real kicker. TCP latency is almost two orders of magnitude greater than either myrinet or dolphin/sci latency (which are both order of microseconds instead of order of hundreds of microseconds). They >>also<< deliver very high bandwidth. Sure, they are expensive, but you know that you are paying for precisely what YOU need for YOUR HPC computations. I don't have to pay for them (even indirectly, by helping out with a whole secondary kernel development track) when MY code is CPU bound; the big DB guys don't have to pay for it when THEIR code depends on how long it takes to read in those ginormous databases of e.g. genetic data; the linear algebra folks who need large, fast memory don't pay for it (unless they try splitting up their linear algebra across the network, of course:-) -- it is paid for only the people who need it, who send lots of little messages or who need its bleeding edge bandwidth or both. One COULD ask, very reasonably, for just about any of the kernel optimizations that can be implemented at the modular level -- that is a matter of writing the module, accepting responsibility for its integration into the kernel and sequential debugging in perpetuity (that is, becoming a slave of the lamp, in perpetuity bound to the kernel lists:-). Alas, TCP/IP is so bound up inside the main part of the kernel that I don't think it can be separated out into modules any more than it already is. ^^^^^ ^^^^^, (closing omitted in the fond hope of remuneration) rgb (C'mon now -- here I am omitting all sorts of words from my rants and my paypal account is still dry as a bone, dry as a desert, bereft of all money, parched as my throat in the noonday sun. Seriously, either I make some money or I'm gonna compose a 50 kiloword opus for my next one...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Tue Oct 21 08:46:57 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 21 Oct 2003 14:46:57 +0200 (CEST) Subject: Solaris Fire Engine. In-Reply-To: Message-ID: On Mon, 20 Oct 2003, Donald Becker wrote: > My belief is that the page locking involved with sendfile() would be too > costly for anything smaller than about 32KB. IIRC, both MPICH and LAM-MPI make the distinction between small and large messages with the default cutoff being 64KB. So large messages could be sent this way... I don't know what you meant with "too costly", but small messsages are not too costly to copy in the stack (normal behaviour) especially with increasing cache sizes of today CPUs, while the large ones (where copying time would be significant) could be sent without the extra copy in the stack. > While I'm certain that there are a few MPI applications that use > messages that large, they don't seem to be typical. ... or might not care that much about the speedup. > >From my experience trying to keep the network driver interface stable, I > very much doubt that it would be possible to separately maintain a > network protocol stack. Well, it was late last night and probably I haven't chosen the most appropriate example... the Scyld network drivers are maintained by one person, while my suggestion was more going toward a community project. > Especially since it would be perceived as competition with the in-kernel > version, which brings out the worst behavior... Yeah, political issues - I think that making the intent clear would solve the problem: there is no competition, it serves a completely different purpose. And given what you wrote in the previous e-mail about "feature-driven", who would use it on normal computers when it misses several "high-profile features" like iptables ? Even more, if it's clear that it should only be used on local fast networks, several aspects of the stack can be optimized without fear of breaking very high latency (satellite) or very low bandwidth (phone modems) connections. But I guess that I should stop dreaming :-) > As a specific example, a few years ago we had cluster performance > patches for the 2.2 kernel. Those maintained by Josip Loncaric ? Again it was a one-man show. I think that this is exactly the problem: there are small projects maintained by one person but which depend on the free time or interest of this person. Given that the clustering had moved from research-only into a lucrative bussiness and that the software (Linux kernel, MPI libraries, etc.) evolved quite a lot and the entry barrier into let's say kernel programing is quite high, it's normal that not many people want to make the step. I already expressed about a year ago my oppinion that such projects can only be carried forward by companies that benefit from them or universities where work from students comes for free. But it seems that there are no companies thinking that they can benefit or universities where students' work is for free... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Tue Oct 21 09:31:37 2003 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Tue, 21 Oct 2003 09:31:37 -0400 (EDT) Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: Message-ID: On Mon, 20 Oct 2003 at 7:08pm, Robert G. Brown wrote > On Mon, 20 Oct 2003, Trent Piepho wrote: > > > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > > messages I've sent to this list has started bouncing back to me from > > dan at systemsfirm.com. I'm getting about ten copies of each one every other > > day. Is anyone else having this problem? > > BTW, you (and of course the rest of the list) are just the man to ask; > what is the status of Opterons and fortran compilers. I myself don't > use fortran any more, but a number of folks at Duke do, and they are > starting to ask what the choices are for Opterons. A websearch reveals > that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an > Opteron fortran, but rumor also suggests that a number of these are > really "beta" quality with bugs that may or may not prove fatal to any > given project. Then there is Gnu. > > Any comments on any of these from you (or anybody, really)? Is there a > functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? > Do the compilers permit access to large (> 3GB) memory, do they optimize > the use of that memory, do they support the various SSE instructions? Well, this is as good a place as many to put up the benchmarks I ran using DYNA (a commercial FEM code from LSTC, first developed at LLNL, and definitely Fortran): http://www.duke.edu/~jlb17/bench-results.pdf According to their docs, the 32bit binary was compiled using ifc6.0. The slowdown in the newer point release is due to them dialing back the optimizations due to compiler bugs. The 64bit Opteron binary was compiled using PGI, but that's all I know about it. To sum it up, I bought some Opterons. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Tue Oct 21 09:41:53 2003 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 21 Oct 2003 15:41:53 +0200 (CEST) Subject: Solaris Fire Engine. In-Reply-To: Message-ID: On Tue, 21 Oct 2003, Bogdan Costescu wrote: > But I guess that I should stop dreaming :-) Well, either I'm not dreaming, or somebody else is dreaming too :-) Below are some fragments of e-mails from David Miller (one of the Linux network maintainers) to netdev today: > People on clusters use their own special clustering hardware and > protocol stacks _ANYWAYS_ because ipv4 is too general to serve their > performance needs. And I think that is a good thing rather than > a bad thing. People should use specialized solutions if that is the > best way to attack their problem. ... > The things cluster people want is totally against what a general > purpose IPV4 implementation should do. Linux needs to provide a > general purpose IPV4 stack that works well for everybody, not just > cluster people. > > I'd rather have millions of servers using my IPV4 stack than a handful > of N-thousand system clusters. > ... > Sure, many people would like to simulate the earth and nuclear weapons > using Linux, but I'm sure as hell not going to put features into the > kernel to help them if such features hurt the majority of Linux users. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Oct 21 11:19:22 2003 From: becker at scyld.com (Donald Becker) Date: Tue, 21 Oct 2003 11:19:22 -0400 (EDT) Subject: flood of bounces from postmaster@systemsfirm.net In-Reply-To: Message-ID: On Mon, 20 Oct 2003, Robert G. Brown wrote: > On Mon, 20 Oct 2003, Trent Piepho wrote: > > > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > > messages I've sent to this list has started bouncing back to me from > > dan at systemsfirm.com. I'm getting about ten copies of each one every other > > day. Is anyone else having this problem? > > Oh yes, and it is a SERIOUS problem. I was just mulling on the right There are many more problems that list readers do not see. I delete the address from the list only when the problem is persistent. The major problem happens when messages take a few days to bounce, and the bounce does not follow standards. In that case there are dozens of messages in the remote queue, and they all appears to be replies by a valid list subscriber. > BTW, you (and of course the rest of the list) are just the man to ask; > what is the status of Opterons and fortran compilers. I myself don't > use fortran any more, but a number of folks at Duke do, and they are > starting to ask what the choices are for Opterons. A websearch reveals > that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an > Opteron fortran, but rumor also suggests that a number of these are > really "beta" quality with bugs that may or may not prove fatal to any > given project. A surprising amount of 64 bit software (certainly not limited to the Opteron) is still not mature enough for general purpose use. It still requires more development and testing to get to the stability level required for real deployment. And it's not the "64 bit" nature of the software, since we did have reasonable maturity on the Alpha years ago. -- Donald Becker becker at scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Tue Oct 21 11:06:37 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Tue, 21 Oct 2003 09:06:37 -0600 Subject: parllel eigen solvers In-Reply-To: <200310211049.OAA18031@nocserv.free.net> References: <200310201236.28901.kinghorn@pqs-chem.com> <200310211049.OAA18031@nocserv.free.net> Message-ID: <20031021150637.GA8076@plk.af.mil> I should point out that density function theorcan be compute-bound on diagonalization. QUEST, a Sandia Code, easily handles several hundred atoms, but the eigen solve dominates by ~300-400 atoms. Thus, intermediate size diagonalization is of strong interest. Art Edwards On Tue, Oct 21, 2003 at 02:49:07PM +0400, Mikhail Kuzminsky wrote: > According to Donald B. Kinghorn > > > > Does anyone know of any recent progress on parallel eigensolvers suitable for > > beowulf clusters running over gigabit ethernet? > > It would be nice to have something that scaled moderately well and at least > > gave reasonable approximations to some subset of eigenvalues and vectors for > > large (10,000x10,000) symmetric systems. > > My interests are primarily for quantum chemistry. > > > In the case you think about semiempirical fockian diagonalisation, > there is a set of alternative methods for direct construction of density > matrix avoiding preliminary finding of eigenvectors. This methods > are realized, in particular, in Gaussian-03 and MOPAC-2002 methods. > > For non-empirical quantum chemistry diagonalisation usually doesn't limit > common performance. In the case of methods like CI it's necessary to > find only some eigenvectors, and it is better to use special diagonalization > methods. > > There is special parallel solver package, but I don't have exact > reference w/me :-( > > Mikhail Kuzminsky > Zelinsky Inst. of Orgamic Chemistry > Moscow > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eccf at super.unam.mx Tue Oct 21 15:32:05 2003 From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores) Date: Tue, 21 Oct 2003 13:32:05 -0600 (CST) Subject: shift bit & performance? In-Reply-To: <200310211603.h9LG3cA22580@NewBlue.scyld.com> Message-ID: Hi, sometime ago, somebody sent an info about performance working with "<<" & ">>" doing shift bits instead of using "*" or "/" Could anybody help me about it? cafe _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From glen at mail.cert.ucr.edu Tue Oct 21 15:21:24 2003 From: glen at mail.cert.ucr.edu (Glen Kaukola) Date: Tue, 21 Oct 2003 12:21:24 -0700 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: References: Message-ID: <3F958734.2030300@cert.ucr.edu> >>On Mon, 20 Oct 2003, Trent Piepho wrote: >> >> >>BTW, you (and of course the rest of the list) are just the man to ask; >>what is the status of Opterons and fortran compilers. I myself don't >>use fortran any more, but a number of folks at Duke do, and they are >>starting to ask what the choices are for Opterons. A websearch reveals >>that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an >>Opteron fortran, but rumor also suggests that a number of these are >>really "beta" quality with bugs that may or may not prove fatal to any >>given project. Then there is Gnu. >> >>Any comments on any of these from you (or anybody, really)? Is there a >>functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? >>Do the compilers permit access to large (> 3GB) memory, do they optimize >>the use of that memory, do they support the various SSE instructions? >> I can tell you about PGI's compilers. They are kinda beta quality as you say. As of now they only want to install on Suse enterprise edition. Although a little fiddling around with the install scripts and you can get them to install on other distributions. But even though you can get the compilers installed, they only seem to run on the Suse beta for opterons. PGI says this should all change in the near future though. As far as the code that the compilers produce, we haven't had any problems at all as far as I know of. The great thing about PGI compilers though is that you can download them and try them out for free for 15 days or so and see for yourself. As far as the Gnu Fortran compiler goes, it seems to work great on Opterons too. But then as you're probably aware, it's only a Fortran 77 compiler. Cheers, Glen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dtj at uberh4x0r.org Tue Oct 21 15:33:33 2003 From: dtj at uberh4x0r.org (Dean Johnson) Date: 21 Oct 2003 14:33:33 -0500 Subject: shift bit & performance? In-Reply-To: References: Message-ID: <1066764813.27603.4.camel@terra> On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote: > Hi, > > sometime ago, somebody sent an info about performance working with "<<" & > ">>" doing shift bits instead of using "*" or "/" > Could anybody help me about it? > There is certainly performance to be had from using a logical shift instead of a multiply or divide, but its of declining value. I am fairly sure that with modern compilers you do a integer divide by a constant power of 2, that it will generate a logical shift. That aint rocket science. -Dean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikee at mikee.ath.cx Tue Oct 21 16:32:07 2003 From: mikee at mikee.ath.cx (Mike Eggleston) Date: Tue, 21 Oct 2003 15:32:07 -0500 Subject: shift bit & performance? In-Reply-To: ; from eccf@super.unam.mx on Tue, Oct 21, 2003 at 01:32:05PM -0600 References: <200310211603.h9LG3cA22580@NewBlue.scyld.com> Message-ID: <20031021153207.N31870@mikee.ath.cx> On Tue, 21 Oct 2003, Eduardo Cesar Cabrera Flores wrote: > > > Hi, > > sometime ago, somebody sent an info about performance working with "<<" & > ">>" doing shift bits instead of using "*" or "/" > Could anybody help me about it? The operations << and >> are closer to assembler operations for integer values than * and /. If using * or / there are many assembler instructions to compute the new values. When using power of 2s for * or / then << and >> are much faster. Mike _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bhalevy at panasas.com Tue Oct 21 17:35:06 2003 From: bhalevy at panasas.com (Halevy, Benny) Date: Tue, 21 Oct 2003 17:35:06 -0400 Subject: shift bit & performance? Message-ID: <30489F1321F5C343ACF6872B2CF7942A039DF8BC@PIKES.panasas.com> Could be meaningful on a 32 bit platform doing 64-bit math emulation. Emulating shift is much cheaper than multiply/divide. Benny >-----Original Message----- >From: Dean Johnson [mailto:dtj at uberh4x0r.org] >Sent: Tuesday, October 21, 2003 3:34 PM >To: Eduardo Cesar Cabrera Flores >Cc: beowulf at beowulf.org >Subject: Re: shift bit & performance? > > >On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote: >> Hi, >> >> sometime ago, somebody sent an info about performance >working with "<<" & >> ">>" doing shift bits instead of using "*" or "/" >> Could anybody help me about it? >> > >There is certainly performance to be had from using a logical >shift instead of a >multiply or divide, but its of declining value. I am fairly >sure that with modern >compilers you do a integer divide by a constant power of 2, >that it will generate >a logical shift. That aint rocket science. > > -Dean > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From pesch at attglobal.net Wed Oct 22 03:32:31 2003 From: pesch at attglobal.net (pesch at attglobal.net) Date: Wed, 22 Oct 2003 00:32:31 -0700 Subject: flood of bounces from postmaster@systemsfirm.net References: Message-ID: <3F96328F.DF327416@attglobal.net> Perhaps it's not related to the topic but any mail I post to this list results automatically in a "incident report" to my mail provider (attglobal.net) which then automatically replies with the mail below. Any inquiry to attglobal.net with the reference number below results always in exactly 0 (zero) replies from attglobal. Paul Schenker "Received: from e4.ny.us.ibm.com ([32.97.182.104]) by prserv.net (in5) with ESMTP id <20031021031824105041p20me>; Tue, 21 Oct 2003 03:18:27 +0000 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.56.224.149]) by e4.ny.us.ibm.com (8.12.10/8.12.2) with ESMTP id h9L3IN0N801416 for ; Mon, 20 Oct 2003 23:18:23 -0400 Received: from BLDVMB.POK.IBM.COM (d01av01.pok.ibm.com [9.56.224.215]) by northrelay01.pok.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id h9L3IMqW036946 for <@vm-av.pok.relay.ibm.com:pesch at attglobal.net>; Mon, 20 Oct 2003 23:18:22 -0400 Message-ID: <200310210318.h9L3IMqW036946 at northrelay01.pok.ibm.com> Received: by BLDVMB.POK.IBM.COM (IBM VM SMTP Level 320) via spool with SMTP id 7133 ; Mon, 20 Oct 2003 21:09:30 MDT Date: Mon, 20 OCT 2003 23:13:12 (-0400 GMT) From: notify at attglobal.net To: CC: Subject: Re: Solaris Fire Engine. (REF:#_CSSEMAIL_0870689) X-Mozilla-Status: 8011 X-Mozilla-Status2: 00000000 X-UIDL: 200310210327271050a5ammfe0013d2 An incident reported by you has been created. Sev: 4 The incident # is listed below. No need to respond to this e-mail. For Account: CSSEMAIL Incident Number: 0870689 Status: INITIAL Last Updated: Mon, 20 OCT 2003 23:13:12 (-0400 GMT) PROBLEM CREATED ************************************************************************* Summary: Re: Solaris Fire Engine. ************************************************************************* If replying via email, do not alter the reference id in the subject line and send only new information, do not send entire note again. Do not send attachments, graphics or images." Donald Becker wrote: > On Mon, 20 Oct 2003, Robert G. Brown wrote: > > > On Mon, 20 Oct 2003, Trent Piepho wrote: > > > > > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every > > > messages I've sent to this list has started bouncing back to me from > > > dan at systemsfirm.com. I'm getting about ten copies of each one every other > > > day. Is anyone else having this problem? > > > > Oh yes, and it is a SERIOUS problem. I was just mulling on the right > > There are many more problems that list readers do not see. I delete the > address from the list only when the problem is persistent. > The major problem happens when messages take a few days to bounce, and > the bounce does not follow standards. In that case there are dozens of > messages in the remote queue, and they all appears to be replies by a > valid list subscriber. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From douglas at shore.net Wed Oct 22 01:33:02 2003 From: douglas at shore.net (Douglas O'Flaherty) Date: Wed, 22 Oct 2003 01:33:02 -0400 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: <200310211601.h9LG1QA22261@NewBlue.scyld.com> References: <200310211601.h9LG1QA22261@NewBlue.scyld.com> Message-ID: <3F96168E.7050908@shore.net> Here's the short summary of Opteron compilers. When someone offers an AMD64 compiler, it typically may be used to create 32-bit or 64-bit executables as long as you are specific about which libraries you use. Any IA-32 compiler can create code and run on Opterons. Of course, 32-bit executables don't get the extra memory either, even when running on a 64-bit OS, but sometimes a 32-bit executable might be what you want. With SC2003 coming up, I expect we'll see a flurry of activity relating to compilers and tools. This information will likely be stale soon. Also, most of these have a free trial period, so you can kick the tires. Intel compilers work great in 32-bit and can be run on a 32 or 64-bit OS natively. Performance and compatability is not an issue. For obvious reasons many of the benchmarks have been run using IFC. PGI's first AMD64 production release was around July 5. There is a limitation on objects greater than 2GB in Linux as a result of the GNU assembly linker, but the application can address as much memory as you can give it. Only a small fraction of the world has objects that large. I've only run into it with synthetic benchmarks. The gal coding is done and PGI is working on the next release. As for performance, since this was the first AMD64 fortran compiler to market, it was used in AMD presentations. You can see performance comparisons in Rich Brunner's presentation from ClusterWorld. It's on-line at http://www.amd.com/us-en/assets/content_type/DownloadableAssets/RichBrunnerClusterWorldpresFINAL.pdf (about slide 39 IIRC) There was a minor patch release near the begining of August. I suspect there is always someone finding flaws, but generally it's doing well. NB: Saw Glenn's post re: PGI on SuSE v. RedHat. We've got it running on both. There were definately some fiddley bits to make it happy on RedHat, but I think they are documented on PGI's site. Absoft had a long beta of their AMD64 compiler and went GA in September. I have no personal experience on it, nor do I know of any public benchmarks. NAG worked closely with AMD on the AMD Core Math Libraries. They should know the processor well. No experience with the Gnu Fortran or Lahey. I believe GFC to be AMD64 functional. Lahey would only generate 32-bit code. Your other question was about SSE2. Yes Opteron has complete SSE2 support. I *know* PGI & IFC support it, I expect the others do as well. doug douglas_at_shore.net Disclaimer: Among my several hats I am also in AMD Marketing. This is an unofficial response. No AMD bits were utlized in the creation of this email, etc.. If you want to talk about Opterons 'officially' you need to email me at doug.oflaherty(at)amd.com On Mon, 20 Oct 2003 at 7:08pm, Robert G. Brown wrote >> On Mon, 20 Oct 2003, Trent Piepho wrote: >> > > >>> > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every >>> > messages I've sent to this list has started bouncing back to me from >>> > dan at systemsfirm.com. I'm getting about ten copies of each one every other >>> > day. Is anyone else having this problem? >> >> >> >> BTW, you (and of course the rest of the list) are just the man to ask; >> what is the status of Opterons and fortran compilers. I myself don't >> use fortran any more, but a number of folks at Duke do, and they are >> starting to ask what the choices are for Opterons. A websearch reveals >> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an >> Opteron fortran, but rumor also suggests that a number of these are >> really "beta" quality with bugs that may or may not prove fatal to any >> given project. Then there is Gnu. >> >> Any comments on any of these from you (or anybody, really)? Is there a >> functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? >> Do the compilers permit access to large (> 3GB) memory, do they optimize >> the use of that memory, do they support the various SSE instructions? > > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Wed Oct 22 04:45:08 2003 From: jakob at unthought.net (Jakob Oestergaard) Date: Wed, 22 Oct 2003 10:45:08 +0200 Subject: shift bit & performance? In-Reply-To: <1066764813.27603.4.camel@terra> References: <1066764813.27603.4.camel@terra> Message-ID: <20031022084508.GA7048@unthought.net> On Tue, Oct 21, 2003 at 02:33:33PM -0500, Dean Johnson wrote: > On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote: > > Hi, > > > > sometime ago, somebody sent an info about performance working with "<<" & > > ">>" doing shift bits instead of using "*" or "/" > > Could anybody help me about it? > > > > There is certainly performance to be had from using a logical shift instead of a > multiply or divide, but its of declining value. I am fairly sure that with modern > compilers you do a integer divide by a constant power of 2, that it will generate > a logical shift. That aint rocket science. > It used to be true that shifts were 'better' on Intel x86 processors, but it is not that simple anymore. On the P4 for example, a sequence of 'add's is cheaper than a left shift, for three adds or less (because the latency on the shift opcode has increased compared to earlier generations). -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From serguei.patchkovskii at sympatico.ca Wed Oct 22 10:05:38 2003 From: serguei.patchkovskii at sympatico.ca (serguei.patchkovskii at sympatico.ca) Date: Wed, 22 Oct 2003 10:05:38 -0400 Subject: (no subject) Message-ID: <20031022140538.QSP8001.tomts7-srv.bellnexxia.net@[209.226.175.20]> > Any IA-32 compiler can create code and run on Opterons. Of course, > 32-bit executables don't get the extra memory either, even when running > on a 64-bit OS Not true. A 32-bit binary running on x86-64 Linux has access to full 32-bit address space. When I run a very simple 32-bit Fortran program, I see program itself mapped at very low addresses; the shaped libraries get mapped at 1Gbyte mark, while the stack grows down from 4Gbyte mark. On an x86 Linux, the upper 1Gbyte (but this depends on the kernel options) is taken by the kernel address space. What this means in practice, is that on an x86 Linux, I can allocate at most 2.5Gbytes of memory for my data without resorting to ugly tricks; in 32-bit mode of x86-64 Linux, this goes up to about 3.5Gbytes - enough to make a difference in some cases. Serguei _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brian.dobbins at yale.edu Wed Oct 22 09:38:00 2003 From: brian.dobbins at yale.edu (Brian Dobbins) Date: Wed, 22 Oct 2003 09:38:00 -0400 (EDT) Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: <3F96168E.7050908@shore.net> Message-ID: > PGI's first AMD64 production release was around July 5. There is a > limitation on objects greater than 2GB in Linux as a result of the GNU > assembly linker, but the application can address as much memory as you One simple way to get around the 2GB limit (*) is to simply use FORTRAN 90 dynamic allocation calls - we've done this, and have run codes up to (so far) about 7.7GB in size. If you're used to static allocations in F77, it's only about two lines to alter things to use dynamic mem. (*) - I don't think this limitation is in the GNU assembly linker, since g77 has no problems here. I think if you compile to assembly, you'll see that PGI has issues with 32-bit wraparound, whereas g77 does not. Their tech people are aware of this, and it's something I expect will be fixed farily soon. Also, if you do happen to run jobs > 4GB, make sure you update the 'top' version you're using (procps.sourceforge.net). Previous versions had wraparound at the 4GB mark, and it's cool seeing a listing say something to the effect of "7.7G" next to the size. :) Cheers, - Brian _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From shewa at inel.gov Wed Oct 22 12:22:20 2003 From: shewa at inel.gov (Andrew Shewmaker) Date: Wed, 22 Oct 2003 10:22:20 -0600 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) In-Reply-To: References: Message-ID: <3F96AEBC.2020107@inel.gov> Robert G. Brown wrote: > BTW, you (and of course the rest of the list) are just the man to ask; > what is the status of Opterons and fortran compilers. I myself don't > use fortran any more, but a number of folks at Duke do, and they are > starting to ask what the choices are for Opterons. A websearch reveals > that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an > Opteron fortran, but rumor also suggests that a number of these are > really "beta" quality with bugs that may or may not prove fatal to any > given project. Then there is Gnu. I have used the PGI compiler 5.0-2 on SuSE SLES 8 with Radion Technologies' (www.radiative.com) Attila Fortran 90 code. One of our scientists has run models in which a single Attila process allocates up to about 7GB of RAM. The performance of the Opteron was quite impressive too. I'm still testing the g77 3.3 prerelease that SuSE includes. By default it creates 64 bit binaries. The gfortran (G95) snapshot doesn't work, but I'm planning on building it myself later on and trying to compile the above Attila code with it. Radiative looked at this earlier (months ago) and it wasn't ready at that time. Andrew > > Any comments on any of these from you (or anybody, really)? Is there a > functional 64-bit Gnu fortran for the Opteron? Does Intel Fortran work? > Do the compilers permit access to large (> 3GB) memory, do they optimize > the use of that memory, do they support the various SSE instructions? > > I'm indirectly interested in this as it looks like I'm getting Opterons > for my next round of cluster purchases personally, although I'll be > using C on them (hopefully 64 bit Gnu C). > > rgb > > >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > -- Andrew Shewmaker, Associate Engineer Phone: 1-208-526-1415 Idaho National Eng. and Environmental Lab. P.0. Box 1625, M.S. 3605 Idaho Falls, Idaho 83415-3605 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwardsa at plk.af.mil Wed Oct 22 13:21:39 2003 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Wed, 22 Oct 2003 11:21:39 -0600 Subject: Cooling Message-ID: <20031022172139.GA12958@plk.af.mil> I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will be on metal racks. Does anyone have a simple way to calculate cooling requirements? We will have fair flexibility with air flow. Art Edwards -- Art Edwards Senior Research Physicist Air Force Research Laboratory Electronics Foundations Branch KAFB, New Mexico (505) 853-6042 (v) (505) 846-2290 (f) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From JAI_RANGI at SDSTATE.EDU Wed Oct 22 15:43:36 2003 From: JAI_RANGI at SDSTATE.EDU (RANGI, JAI) Date: Wed, 22 Oct 2003 14:43:36 -0500 Subject: How to calculate operations on the cluster Message-ID: Hi, Can some tell me how to find out that how many operations can be performed on your cluster. If some say 3 million operation can be performed on this cluster, how to verify that and how to find out the actual performance. -Thanks -Jai Rangi _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Wed Oct 22 16:16:09 2003 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Wed, 22 Oct 2003 13:16:09 -0700 Subject: Cooling References: <20031022172139.GA12958@plk.af.mil> Message-ID: <001d01c398d9$5b09cd00$32a8a8c0@laptop152422> To a first order, figure you've got to reject 150-200W per node.. that's roughly 10kW of heat you need to get rid of. That's 10kJ/second. That will tell you right away how many "tons" of A/C you'll need (1 ton = 12000 BTU/hr or, more usefully, here, 3.517 kW)... Looks like you'll need 3-4 tons (3 tons and 5tons are standard sizes...) Next, figure out how much temperature rise in the air you can tolerate (say, 10 degrees C) Use the specific heat of air to calculate how many kilos (or, more practically, cubic feet) of air you need to move (use 1000 J/kg deg as an approximation... you need to move 1 kg of air every second or about a cubic meter... roughly approximating, a cubic meter is about 35 cubic feet, so you need around 2100 cubic feet per minute) As a practical matter, you'll want a lot more flow (using idealized numbers when it's cheap to put margin in is foolish). Also, a 10 degree rise is pretty substantial... If you kept the room at 15C, the air coming out of the racks would be 25C, and I'll bet the processors would be a good 20C above that. Calculating for a 5 degree rise might be a better plan. Just double the flow. Unless you're investing in specialized ducting that pushes the AC only through the racks and not the room, a lot of the flow will be going around the racks, whether you like it or not. In general, one likes to keep the duct flow speed below 1000 linear feet per minute (for noise reasons!), so your ducting will be around 3-4 square feet. This is not a window airconditioner!... This is the curse of rackmounted equipment in general. Getting the heat out of the room is easy. The tricky part is getting the heat out of the rack. Think about it, you've got to pump all those thousands of CFM *through the rack*, which is aerodynamically not well suited to this, especially in 1U boxes. How much cross sectional area is there in that rack chassis aperture for the air? How fast does that imply that the air is moving? What sort of pressure drop is there going through the rack? Take a look at RGB's Brahma web site. There's some photos there of their chiller unit, so you can get an idea of what's involved. Your HVAC engineer will do a much fancier and useful version of this, allowing for things such as pressure drop, the amount of recirculation, the amount of heat leaking in from other sources (lighting, bodies in the room, etc.), heating from the fans, and so forth; But, at least you've got a ball park figure for what you're going to need. Jim Lux ----- Original Message ----- From: "Arthur H. Edwards" To: Sent: Wednesday, October 22, 2003 10:21 AM Subject: Cooling > I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The > cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will > be on metal racks. Does anyone have a simple way to calculate cooling > requirements? We will have fair flexibility with air flow. > > Art Edwards > > -- > Art Edwards > Senior Research Physicist > Air Force Research Laboratory > Electronics Foundations Branch > KAFB, New Mexico > > (505) 853-6042 (v) > (505) 846-2290 (f) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From toon at moene.indiv.nluug.nl Wed Oct 22 17:43:16 2003 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Wed, 22 Oct 2003 23:43:16 +0200 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net) References: <3F96AEBC.2020107@inel.gov> Message-ID: <3F96F9F4.8050505@moene.indiv.nluug.nl> Andrew Shewmaker wrote: > I'm still testing the g77 3.3 prerelease that SuSE includes. By default > it creates 64 bit binaries. Is there any interest in having g77 deal correctly with > 2Gb *direct access* records ? I have a patch in progress (due to http://gcc.gnu.org/PR10885) that I can't test myself ... > The gfortran (G95) snapshot doesn't work, but I'm planning on building > it myself later on and trying to compile the above Attila code with it. > Radiative looked at this earlier (months ago) and it wasn't ready at > that time. Please do not forget to enter bug reports in our Bugzilla database (see http://gcc.gnu.org/bugs.html). Thanks ! -- Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Daniel.Kidger at quadrics.com Wed Oct 22 09:53:51 2003 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed, 22 Oct 2003 14:53:51 +0100 Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@syst emsfirm.net) Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE210@stegosaurus.bristol.quadrics.com> >From: Brian Dobbins [mailto:brian.dobbins at yale.edu] (cut) > Also, if you do happen to run jobs > 4GB, make sure you > update the 'top' > version you're using (procps.sourceforge.net). Previous versions had > wraparound at the 4GB mark, and it's cool seeing a listing > say something to the effect of "7.7G" next to the size. :) On the subject of top another caveat is that top is hard-coded at compile time about what it thinks the pagesize is. If you compile kernels with bigger pagesizes (generally a 'good thing' for large memory nodes) then 'top' gets the memory used by your programs wrong by a factor of x2,x4 etc. ! Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leopold.palomo at upc.es Thu Oct 23 05:33:35 2003 From: leopold.palomo at upc.es (Leopold Palomo Avellaneda) Date: Thu, 23 Oct 2003 11:33:35 +0200 Subject: OpenMosix, opinions? Message-ID: <200310231133.35912.leopold.palomo@upc.es> Hi, I'm a newbie in all of this questions of paralelism and clusters. I'm reading all of I can. I have found some point that I need some opinions. Hipotesis, having a typical beowulf, with some nodes, a switch, etc. All of the nodes running GNU/Linux, and the applications that are running are using MPI or PVM. All works, etc .... Imaging that we have an aplication. A pararell aplication that doesn't use a lot I/O operation, but intensive cpu, and some messages. Something like a pure parallel app. We implement it using PVM or MPI ... MPI. And we make a test, and we have some result. Now, we have our beowulf, with a linux kernel with OpenMosix with a patch that can migrate threads (light weith process, Mighsm, http://mcaserta.com/maask/) or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, that com from here: http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/ benchmark.htm. We have our program, and we change it that use threads for the paralel behaviour and not MPI. And we run the same test. So, what will be better? Any one have tested it? Thank's in advance. Best regards, Leo _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Oct 23 07:52:14 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 23 Oct 2003 07:52:14 -0400 (EDT) Subject: Cooling In-Reply-To: <20031022172139.GA12958@plk.af.mil> Message-ID: On Wed, 22 Oct 2003, Arthur H. Edwards wrote: > I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The > cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will > be on metal racks. Does anyone have a simple way to calculate cooling > requirements? We will have fair flexibility with air flow. My kill-a-watt shows 1900+ AMD Athlon duals drawing roughly ~230W/node (or 115 per processor) (under steady, full load). I don't have a single CPU system in this class to test, but because of hardware replication I would guess that one draws MORE than half of this, probably ballpark of 150-160W where YMMV depending on memory and disk and etc configuration. Your clock is also a bit higher than what I measure and there is a clockspeed dependence on the CPU side, so you should likely guesstimate highball, say 175W OR buy a <$50 kill-a-watt (numerous sources online) and measure your prototyping node yourself and get a precise number. Then it is a matter of arithmetic. To be really safe and make the arithmetic easy enough to do on my fingers, I'll assume 200 W/node. Times 48 is 9600 watts. Plus 400 watts for electric lights, a head node with disk, a monitor, a switch (this is likely lowball, but we highballed the nodes). Call it 10 KW in a roughly 1000 cubic foot space. One ton of AC removes approximately 3500 watts continuously. You therefore need at LEAST 3 tons of AC. However, you'd really like to be able to keep the room COLD, not just on a part with its external environment, and so need to be able to remove heat infiltrating through the walls, so providing overcapacity is desireable -- 4-5 tons wouldn't be out of the question. This also gives you at least limited capacity for future growth and upgrade without another remodelling job (maybe you'll replace those singles with duals that draw 250-300W apiece in the same rack density one day). You also have to engineer airflow so that cold air enters on the air intake side of the nodes (the front) and is picked up by a warm air return after being exhausted, heated after cooling the nodes, from their rear. I don't mean that you need air delivery and returns per rack necessarily, but the steady state airflow needs to retard mixing and above all prevent air exhausted by one rack being picked up as intake to the next. There are lots of ways to achieve this. You can set up the racks so that the node fronts face in one aisle and node exhausts face in the rear and arrange for cold air delivery into the lower part of the node front aisle (and warm air return on the ceiling). You can put all the racks in a single row and deliver cold air as low as possible on the front side and remove it on the ceiling of the rear side. If you have a raised floor and four post racks with sidepanels you can deliver it from underneath each rack and remove it from the top. This is all FYI, but it is a good idea to hire an actual architect or engineer with experience in server room design to design your power/cooling system, as there are lots of things (thermal power kill switch, for example) that you might miss but they should not. However, I think that the list wisdom is that you should deal with them armored with a pretty good idea of what they should be doing, as the unfortunate experience of many who have done so is that even the pros make costly mistakes when it comes to server rooms (maybe they just don't do enough of them, or aren't used to working with 1000 cubic foot spaces). If you google over the list archives, there are longranging, extended discussions on server room design that embrace power delivery, cooling, node issues, costs, and more. rgb > > Art Edwards > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From fmahr at gmx.de Thu Oct 23 09:06:09 2003 From: fmahr at gmx.de (Ferdinand Mahr) Date: Thu, 23 Oct 2003 15:06:09 +0200 Subject: OpenMosix, opinions? References: <200310231133.35912.leopold.palomo@upc.es> Message-ID: <3F97D241.2180EEF8@gmx.de> Hi Leo, > Imaging that we have an aplication. A pararell aplication that doesn't use a > lot I/O operation, but intensive cpu, and some messages. Something like a > pure parallel app. We implement it using PVM or MPI ... MPI. And we make a > test, and we have some result. > > Now, we have our beowulf, with a linux kernel with OpenMosix with a patch that > can migrate threads (light weith process, Mighsm, http://mcaserta.com/maask/) > or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, that > com from here: http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/ > benchmark.htm. > > We have our program, and we change it that use threads for the paralel > behaviour and not MPI. And we run the same test. So, what will be better? Any > one have tested it? I haven't tested your special situation, but here are my thoughts about it: - Why changing an application that you already have? It costs you an unnecessary amount of time and money. - Migshm seems to enable OpenMosix to migrate System V shared memory processes, not threads. But, "Threads created using the clone() system call can also be migrated using Migshm", that's what you want, right? I don't know how well that works, but it limits you to clone(), and I don't know if thats sufficient for reasonable thread programming. Still (as you mentioned before), you really can only write code that uses minimum I/O and interprocess/thread communication because of network limitations. - Programs using PThreads don't run in parallel with OpenMosix/Migshm, they can only be migrated in whole. - If your MPI/PVM programs are well designed, they are usually really fast and can scale very well when CPU-bound. - Currently (Open)Mosix is better for load-balancing than HPC, especially in clusters with different hardware configurations. In HPC clusters, you usually have identical compute nodes. Hope that helps, Ferdinand _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From RobertsGP at ncsc.navy.mil Thu Oct 23 15:07:19 2003 From: RobertsGP at ncsc.navy.mil (Roberts Gregory P DLPC) Date: Thu, 23 Oct 2003 14:07:19 -0500 Subject: UnitedLinux? Message-ID: Has anyone used UnitedLinux 1.0? I am using it on a 2 node dual CPU Opteron system. Greg -----Original Message----- From: Bill Broadley [mailto:bill at math.ucdavis.edu] Sent: Thursday, September 25, 2003 7:46 PM To: Brian Dobbins Cc: Bill Broadley; beowulf at beowulf.org Subject: Re: A question of OS!! > Yikes.. what kernels are used on these systems by default, and how large > is the code? I've been running SuSE 8.2 Pro on my nodes, and have gotten Factory default in both cases AFAIK. I don't have access to the SLES system at the moment, but the redhat box is: Linux foo.math.ucdavis.edu 2.4.21-1.1931.2.349.2.2.entsmp #1 SMP Fri Jul 18 00:06:19 EDT 2003 x86_64 x86_64 x86_64 GNU/Linux What relationship that has to the original 2.4.21 I know not. > varying performance due to motherboard, BIOS level and kernel. (SuSE 8.2 > Pro comes a modified 2.4.19, but I've also run 2.6.0-test5) > Also, are the BIOS settings the same? And how are the RAM slots I don't have access to the SLES bios. > populated? That made a difference, too! I'm well aware of the RAM slot issues, and I've experimentally verified that the full bandwidth is available. Basically each cpu will see 2GB/sec or so to main memory, and both see a total of 3GB/sec if both use memory simultaneously. > (Oh, and I imagine they're both writing to a local disk, or minimal > amounts over NFS? That could play a big part, too.. ) Yeah, both local disk, and not much. I didn't notice any difference when I commented out all output. > I should have some numbers at some point for how much things vary, but > at the moment we've been pretty busy on our systems. Any more info on > this would be great, though, since I've been looking at the faster chips, > too! ACK, I never considered that the opterons might be slower in some ways at faster clock speeds. My main suspicious is that MPICH was messaging passing for local nodes in some strange way and triggering some corner case under SLES. I.e. writing an int at a time between CPUs who are fighting over the same page. None of my other MPI benchmarks for latency of bandwidth (at various message sizes) have found any sign of problems. Numerous recompiles of MPICH haven't had any effect either. -- Bill Broadley Mathematics UC Davis _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lepalom at wol.es Thu Oct 23 10:17:17 2003 From: lepalom at wol.es (Leopold Palomo Avellaneda) Date: Thu, 23 Oct 2003 16:17:17 +0200 Subject: OpenMosix, opinions? In-Reply-To: <3F97D241.2180EEF8@gmx.de> References: <200310231133.35912.leopold.palomo@upc.es> <3F97D241.2180EEF8@gmx.de> Message-ID: <200310231617.17014.lepalom@wol.es> A Dijous 23 Octubre 2003 15:06, Ferdinand Mahr va escriure: > Hi Leo, > > > Imaging that we have an aplication. A pararell aplication that doesn't > > use a > > > lot I/O operation, but intensive cpu, and some messages. Something like a > > pure parallel app. We implement it using PVM or MPI ... MPI. And we make > > a > > > test, and we have some result. > > > > Now, we have our beowulf, with a linux kernel with OpenMosix with a patch > > that > > > can migrate threads (light weith process, Mighsm, > > http://mcaserta.com/maask/) > > > or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, > > that > > > com from here: > > http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/ > > > benchmark.htm. > > > > We have our program, and we change it that use threads for the paralel > > behaviour and not MPI. And we run the same test. So, what will be better? > > Any > > > one have tested it? Hi, > I haven't tested your special situation, but here are my thoughts about > it: > > - Why changing an application that you already have? It costs you an > unnecessary amount of time and money. Ok, I just explaining an example. If I have to begin from 0, which approach will be better? > - Migshm seems to enable OpenMosix to migrate System V shared memory > processes, not threads. But, "Threads created using the clone() system > call can also be migrated using Migshm", that's what you want, right? I > don't know how well that works, but it limits you to clone(), and I > don't know if thats sufficient for reasonable thread programming. Still > (as you mentioned before), you really can only write code that uses > minimum I/O and interprocess/thread communication because of network > limitations. Yes, you are right. However, I hope than soon it will run pure threads. I have heart that 2.6 have a lot of improvements in the thread part, but I'm not sure. > > - Programs using PThreads don't run in parallel with OpenMosix/Migshm, > they can only be migrated in whole. Well, Pthreads can migrate with openMosix (not Linux Threads!), without the patch. I have understood that. > - If your MPI/PVM programs are well designed, they are usually really > fast and can scale very well when CPU-bound. The question that I comment is to make the programation of a parallel program as a threads programation, and the rest is a job of the kernel in a cluster. If this is avalaible, the management of the parallelism will be a job of the SO, in a distributed machine. > - Currently (Open)Mosix is better for load-balancing than HPC, > especially in clusters with different hardware configurations. In HPC > clusters, you usually have identical compute nodes. > > Hope that helps, Yes, of course. Thank's, regards. Leo _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gilberto at ula.ve Thu Oct 23 17:43:14 2003 From: gilberto at ula.ve (Gilberto Diaz) Date: 23 Oct 2003 17:43:14 -0400 Subject: Oscar 2.3 Message-ID: <1066945394.1200.132.camel@odie> Hello everybody I'm trying to install a small cluster using RH8.0 and oscar 2.3. The machines has a sis900 NIC (PXE capable) in the motherboard. When I try to boot the client nodes they not boot because the sis900.o module is not present. Does anybody have any idea how to load the module in the init image in order to boot the nodes without change the kernel using the kernel picker? Thanks in advance Regards Gilberto _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From erwan at mandrakesoft.com Fri Oct 24 04:48:03 2003 From: erwan at mandrakesoft.com (Erwan Velu) Date: Fri, 24 Oct 2003 10:48:03 +0200 Subject: CLIC 2, the newest version is out ! Message-ID: <1066985283.32232.57.camel@revolution.mandrakesoft.com> CLIC is a GPL Linux based distribution made for meeting the HPC needs. CLIC 2 now allow people to install a full Linux cluster from scratch in a few hours. This product contains the Linux core system + the clustering autoconfiguration tools + the deployement tools + MPI stacks (mpich, lam/mpi). CLIC 2 is based on the results of MandrakeClustering, and includes several major features: - New backend engine (fully written in perl) - A new configure step during the server's graphical installation - An automated Dual Ethernet configuration (One NIC for computing, One Nic for administrating) - A new kernel (2.4.22) - A new version of urpmi parallel (a parallel rpm installer) - A graphical tool for managing users (add/remove) : userdrake - A new node management |- You just need to power on a fresh node to install and integrate it in your cluster ! |- Fully automated add/remove procedure And of course the lastest version of the clustering software: - Maui 3.2.5-p5 - ScalablePBS 1.0-p4 - Ganglia 2.5.4 - Mpich 1.2.5-2 - LAM/MPI 6.5.9 (will be updated when 7.1 will be available) - PXELinux 2.06 CLIC 2 will no more being compatible with CLIC 1 due to a fully rewritten backend. This will no more happen in the future but it was needed as CLIC 1 was a test release. We hope this product will meet the CLIC community needs. CLIC 2 is now available on your favorite mirrors in the mandrake-iso directory. For example you can found it at Europe: ftp://ftp.lip6.fr:/pub/linux/distributions/mandrake-iso/i586/CLIC-2.0-i586.iso ftp://ftp.mirror.ac.uk:/sites/sunsite.uio.no/pub/unix/Linux/Mandrake/Mandrake-iso/i586/CLIC-2.0-i586.iso ftp://ftp.tu-chemnitz.de:/pub/linux/mandrake-iso/i586/CLIC-2.0-i586.iso USA: ftp://ftp.rpmfind.net:/linux/Mandrake-iso/i586/CLIC-2.0-i586.iso ftp://mirrors.usc.edu:/pub/linux/distributions/mandrake-iso/i586/CLIC-2.0-i586.iso The documentation is included inside the cdrom (/doc/) under pdf and html format. This is the MandrakeClustering documentation based on the same core, everything is the same except the configuration GUI which is only available in MandrakeClustering. All the configuration scripts that DrakCluster (our GUI) uses are beginning whith the "setup_" prefix. So for auto configurating your server, you use the setup_auto_server.pl script. adding new nodes to your cluster, you use setup_auto_add_nodes.pl removing a node, you can use the setup_auto_remove_nodes.pl All this scripts have a really easy to learn syntax :) I hope this release will please every CLIC user, this new generation of CLIC is really easier to use than the previous releases. PS: I've been heard that the 2.4.22 kernel brand may seriously damage LG cdrom drives. So be carefull with CLIC2 if you own LG cdrom drives, remove your cdrom drive before installing it. - CLIC Website: http://clic.mandrakesoft.com/index-en.html -- Erwan Velu Linux Cluster Distribution Project Manager MandrakeSoft 43 rue d'aboukir 75002 Paris Phone Number : +33 (0) 1 40 41 17 94 Fax Number : +33 (0) 1 40 41 92 00 Web site : http://www.mandrakesoft.com OpenPGP key : http://www.mandrakesecure.net/cks/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From scheinin at crs4.it Fri Oct 24 11:06:55 2003 From: scheinin at crs4.it (Alan Scheinine) Date: Fri, 24 Oct 2003 17:06:55 +0200 Subject: A Petaflop machine in 20 racks? Message-ID: <200310241506.h9OF6tP02285@dali.crs4.it> I asked ClearSpeed what is the width of the floating point units and today I received a reply. The floating point units in the CS301 are 32 bits wide. A previous email on the subject noted a earlier design Each PE has an 8 bit ALU for the 256 PE "Fuzion block". Evidently, this design is different. My opinion: 32 bits is more than adequate for many signal processing applications, not so long ago 24 bits was considered enough for signal processing. But for simulations of physical events the "eigenvalues" have a range that makes 32 bit floating point too small. regards, Alan _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Daniel.Kidger at quadrics.com Fri Oct 24 12:09:31 2003 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Fri, 24 Oct 2003 17:09:31 +0100 Subject: A Petaflop machine in 20 racks? Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE233@stegosaurus.bristol.quadrics.com> > I asked ClearSpeed what is the width of the floating point units > and today I received a reply. > The floating point units in the CS301 are 32 bits wide. Dont forget that www.clearspeed.com used to be www.pixelfusion.com Their target market at the time was massively parallel SIMD PCI based graphics engines. So that is most likely why they use only 32bit floats. Yours, Daniel. (and yes Clearspeed are based in Bristol,UK but are nothing to do with us.) -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jeffrey.b.layton at lmco.com Tue Oct 28 11:09:54 2003 From: jeffrey.b.layton at lmco.com (Jeff Layton) Date: Tue, 28 Oct 2003 11:09:54 -0500 Subject: SFF boxes for a cluster? Message-ID: <3F9E94D2.3020307@lmco.com> Good morning, I've seen a few cluster made from the Small Form Factor (SFF) boxes including "Space Simulator". Has anyone else made a decent size cluster (n > 16) from these boxes? If so, how has the reliability been? Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Peter.Lindgren at experian.com Tue Oct 28 13:19:11 2003 From: Peter.Lindgren at experian.com (Lindgren, Peter) Date: Tue, 28 Oct 2003 10:19:11 -0800 Subject: SFF boxes for a cluster? Message-ID: We have had 48 Dell GX260 SFF boxes in production since March without a single hardware failure. Peter _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Tue Oct 28 15:12:38 2003 From: eric at fnordsystems.com (Eric Kuhnke) Date: Tue, 28 Oct 2003 12:12:38 -0800 Subject: Beowulf digest, Vol 1 #1515 - 1 msg In-Reply-To: <09AE3D324A22D511A1A50002A5289F2101030E0A@lao-exchpo1-nt.nv .doe.gov> Message-ID: <5.2.0.9.2.20031028120820.04272e60@216.82.101.6> One serious problem with the Shuttle and most competing "small form factor" PCs is the air intake, which is located on the sides. You can't put them flush with each other side-by-side on shelves... Most minitower or midtower ATX cases (and proper 1U or 2U cases) have air intake entirely on the front panel. air intake on the left side: http://www.sfftech.com/showdocs.cfm?aid=447 At 11:45 AM 10/28/2003 -0800, you wrote: >I've got one of those SS51G's at home and I love it. My only complaint is >that it does get a bit warm with a video card, but for a cluster you wont >need one. > >-----Original Message----- >From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com] >Sent: Tuesday, October 28, 2003 10:07 AM >To: beowulf at beowulf.org >Subject: Beowulf digest, Vol 1 #1515 - 1 msg > > >Send Beowulf mailing list submissions to > beowulf at beowulf.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://www.beowulf.org/mailman/listinfo/beowulf >or, via email, send a message with subject or body 'help' to > beowulf-request at beowulf.org > >You can reach the person managing the list at > beowulf-admin at beowulf.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Beowulf digest..." > > >Today's Topics: > > 1. SFF boxes for a cluster? (Jeff Layton) > >--__--__-- > >Message: 1 >Date: Tue, 28 Oct 2003 11:09:54 -0500 >From: Jeff Layton >Subject: SFF boxes for a cluster? >To: beowulf at beowulf.org >Reply-to: jeffrey.b.layton at lmco.com >Organization: Lockheed-Martin Aeronautics Company > >Good morning, > > I've seen a few cluster made from the Small Form Factor >(SFF) boxes including "Space Simulator". Has anyone else >made a decent size cluster (n > 16) from these boxes? If so, >how has the reliability been? > >Thanks! > >Jeff > >-- >Dr. Jeff Layton >Aerodynamics and CFD >Lockheed-Martin Aeronautical Company - Marietta > > > > >--__--__-- > >_______________________________________________ >Beowulf mailing list >Beowulf at beowulf.org >http://www.beowulf.org/mailman/listinfo/beowulf > > >End of Beowulf Digest > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ZukaitAJ at nv.doe.gov Tue Oct 28 14:45:02 2003 From: ZukaitAJ at nv.doe.gov (Zukaitis, Anthony) Date: Tue, 28 Oct 2003 11:45:02 -0800 Subject: Beowulf digest, Vol 1 #1515 - 1 msg Message-ID: <09AE3D324A22D511A1A50002A5289F2101030E0A@lao-exchpo1-nt.nv.doe.gov> I've got one of those SS51G's at home and I love it. My only complaint is that it does get a bit warm with a video card, but for a cluster you wont need one. -----Original Message----- From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com] Sent: Tuesday, October 28, 2003 10:07 AM To: beowulf at beowulf.org Subject: Beowulf digest, Vol 1 #1515 - 1 msg Send Beowulf mailing list submissions to beowulf at beowulf.org To subscribe or unsubscribe via the World Wide Web, visit http://www.beowulf.org/mailman/listinfo/beowulf or, via email, send a message with subject or body 'help' to beowulf-request at beowulf.org You can reach the person managing the list at beowulf-admin at beowulf.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beowulf digest..." Today's Topics: 1. SFF boxes for a cluster? (Jeff Layton) --__--__-- Message: 1 Date: Tue, 28 Oct 2003 11:09:54 -0500 From: Jeff Layton Subject: SFF boxes for a cluster? To: beowulf at beowulf.org Reply-to: jeffrey.b.layton at lmco.com Organization: Lockheed-Martin Aeronautics Company Good morning, I've seen a few cluster made from the Small Form Factor (SFF) boxes including "Space Simulator". Has anyone else made a decent size cluster (n > 16) from these boxes? If so, how has the reliability been? Thanks! Jeff -- Dr. Jeff Layton Aerodynamics and CFD Lockheed-Martin Aeronautical Company - Marietta --__--__-- _______________________________________________ Beowulf mailing list Beowulf at beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf End of Beowulf Digest _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From periea at bellsouth.net Tue Oct 28 16:08:49 2003 From: periea at bellsouth.net (periea at bellsouth.net) Date: Tue, 28 Oct 2003 16:08:49 -0500 Subject: SAS running on compute nodes Message-ID: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> Hello All, Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA... Phil... _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rossini at blindglobe.net Tue Oct 28 17:30:35 2003 From: rossini at blindglobe.net (A.J. Rossini) Date: Tue, 28 Oct 2003 14:30:35 -0800 Subject: SAS running on compute nodes In-Reply-To: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> (periea@bellsouth.net's message of "Tue, 28 Oct 2003 16:08:49 -0500") References: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> Message-ID: <858yn5m1v8.fsf@blindglobe.net> writes: > Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA... Sure, as a bunch of singleton processes. I don't think you can do much more than that (but would be interested if I'm wrong). best, -tony -- rossini at u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be confidential and privileged. If you received this message in error, please destroy it and notify the sender. Thank you. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gabriele.butti at unimib.it Tue Oct 28 04:58:04 2003 From: gabriele.butti at unimib.it (Butti Gabriele - Dottorati di Ricerca) Date: 28 Oct 2003 10:58:04 +0100 Subject: opteron VS Itanium 2 Message-ID: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Dear all, we are planning to build up a new cluster (16 nodes) before this year's end; we are evaluating different proposals from machine sellers, but the main doubt we have at this moment is whether choosing an Itanium 2 architecture or an AMD Opteron one. I know that ther's had already been on this list a debate on such a topic, but maybe some of you has some new experience to tell about. There is a wild bunch of benchmarks on these machines, but we fear that these are somewhat misleading and are not designed to test CPU's for intense scientific computing. The code we want to run on these machines is basically a home-made code, not fully optimized, which allocates around 500 Mb of RAM per node. Communication between nodes is a quite rare event and does not affect much computation time. In the past we had a very nice experience using Alpha CPU's which performed very well. To sum up, the question is: is the Itanium2 worth the price difference or is the Opteron the best choice? Thank you all Gabriele Butti -- \\|// -(o o)- /------------oOOOo--(_)--oOOOo-------------\ | | | Gabriele Butti | | ----------------------- | | Department of Material Science | | University of Milano-Bicocca | | Via Cozzi 53, 20125 Milano, ITALY | | Tel (+39)02 64485214 | | .oooO Oooo. | \--------------( )---( )---------------/ \ ( ) / \_) (_/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jim at ks.uiuc.edu Tue Oct 28 21:23:48 2003 From: jim at ks.uiuc.edu (Jim Phillips) Date: Tue, 28 Oct 2003 20:23:48 -0600 (CST) Subject: opteron VS Itanium 2 In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: Hi, The Athlon design has some Alpha blood in it, and in my experience they both excel on branchy, unoptimized, float-intensive code. The Opteron is similar to the Athlon, but I wouldn't bother with 64-bit unless you're actually going to use more than 2 GB of memory per node. Athlon vs Pentium 4 or Xeon is a closer match, and you really need to run some benchmarks to decide between them. If you have access to an Opteron you should benchmark it as well, since I've heard they fly on some problems. Itanium 2 (Madison) is the current NAMD speed champ (although it's tied with a hyperthreaded P4 running multithreaded code), but it took some serious work to get the inner loops to the point that the Intel compiler could software pipeline them to get decent performance. I've heard that some Fortran codes had an easier time of it. Big branches really hurt. -Jim On 28 Oct 2003, Butti Gabriele - Dottorati di Ricerca wrote: > Dear all, > we are planning to build up a new cluster (16 nodes) before this > year's end; we are evaluating different proposals from machine sellers, > but the main doubt we have at this moment is whether choosing an Itanium > 2 architecture or an AMD Opteron one. > > I know that ther's had already been on this list a debate on such a > topic, but maybe some of you has some new experience to tell about. > > There is a wild bunch of benchmarks on these machines, but we fear that > these are somewhat misleading and are not designed to test CPU's for > intense scientific computing. The code we want to run on these machines > is basically a home-made code, not fully optimized, which allocates > around 500 Mb of RAM per node. Communication between nodes is a quite > rare event and does not affect much computation time. In the past we had > a very nice experience using Alpha CPU's which performed very well. > > To sum up, the question is: is the Itanium2 worth the price difference > or is the Opteron the best choice? > > Thank you all > > Gabriele Butti > -- > \\|// > -(o o)- > /------------oOOOo--(_)--oOOOo-------------\ > | | > | Gabriele Butti | > | ----------------------- | > | Department of Material Science | > | University of Milano-Bicocca | > | Via Cozzi 53, 20125 Milano, ITALY | > | Tel (+39)02 64485214 | > | .oooO Oooo. | > \--------------( )---( )---------------/ > \ ( ) / > \_) (_/ > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From smuelas at mecanica.upm.es Wed Oct 29 04:30:28 2003 From: smuelas at mecanica.upm.es (smuelas) Date: Wed, 29 Oct 2003 10:30:28 +0100 Subject: opteron VS Itanium 2 In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it> References: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: <20031029103028.5b7a89a7.smuelas@mecanica.upm.es> Why don't you try a more humble Athlon, (2800 will be enough and you can use DRAM at 400). You will economize a lot of money and for intensive operation it is very, very quick. I have a small cluster with 8 nodes and Athlon 2400 and the results are astonishing. The important point is the motherboard, and nforce is great. On 28 Oct 2003 10:58:04 +0100 gabriele.butti at unimib.it (Butti Gabriele - Dottorati di Ricerca) wrote: > Dear all, > we are planning to build up a new cluster (16 nodes) before this > year's end; we are evaluating different proposals from machine sellers, > but the main doubt we have at this moment is whether choosing an Itanium > 2 architecture or an AMD Opteron one. > > I know that ther's had already been on this list a debate on such a > topic, but maybe some of you has some new experience to tell about. > > There is a wild bunch of benchmarks on these machines, but we fear that > these are somewhat misleading and are not designed to test CPU's for > intense scientific computing. The code we want to run on these machines > is basically a home-made code, not fully optimized, which allocates > around 500 Mb of RAM per node. Communication between nodes is a quite > rare event and does not affect much computation time. In the past we had > a very nice experience using Alpha CPU's which performed very well. > > To sum up, the question is: is the Itanium2 worth the price difference > or is the Opteron the best choice? > > Thank you all > > Gabriele Butti > -- > \\|// > -(o o)- > /------------oOOOo--(_)--oOOOo-------------\ > | | > | Gabriele Butti | > | ----------------------- | > | Department of Material Science | > | University of Milano-Bicocca | > | Via Cozzi 53, 20125 Milano, ITALY | > | Tel (+39)02 64485214 | > | .oooO Oooo. | > \--------------( )---( )---------------/ > \ ( ) / > \_) (_/ > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Santiago Muelas E.T.S. Ingenieros de Caminos, (U.P.M) Tf.: (34) 91 336 66 59 e-mail: smuelas at mecanica.upm.es Fax: (34) 91 336 67 61 www: http://w3.mecanica.upm.es/~smuelas _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csmith at platform.com Wed Oct 29 10:01:58 2003 From: csmith at platform.com (Chris Smith) Date: Wed, 29 Oct 2003 07:01:58 -0800 Subject: SAS running on compute nodes In-Reply-To: <858yn5m1v8.fsf@blindglobe.net> References: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> <858yn5m1v8.fsf@blindglobe.net> Message-ID: <1067439718.3742.53.camel@plato.dreadnought.org> On Tue, 2003-10-28 at 14:30, A.J. Rossini wrote: > writes: > > > > Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA... > > Sure, as a bunch of singleton processes. I don't think you can do > much more than that (but would be interested if I'm wrong). > Actually ... you can after a fashion. SAS has something called MP CONNECT as part of the SAS/CONNECT product which allows you to call out to other SAS processes to have them run code for you, so you can do parallel SAS programs. http://support.sas.com/rnd/scalability/connect/index.html -- Chris _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Wed Oct 29 10:11:19 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed, 29 Oct 2003 09:11:19 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310291511.h9TFBJx10935@mycroft.ahpcrc.org> On Tue Oct 28 19:26:25 2003, Gabriele Butti wrote: >To sum up, the question is: is the Itanium2 worth the price difference >or is the Opteron the best choice? The SpecFP2000 performance difference between the best I2 and best Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000). The 1.5 GHz I2 with the 6MB cache is very expensive with a recent estimate here for dual processor nodes with the >>smaller<< cache at over $12,000 per node when Myrinet interconnect costs and other incidentals are included. A dual Opteron 246 at 2.0 GHz with the same interconnect and incidentals included was about $4,250. Top of the line Pentium 4 duals again with same interconnect and incidentals about $750 less at $3,500. For bandwidth/memory intensive codes, I think the Opteron is a clear winner in a dual processor configuration because of its dual channel to memory design. Stream triad bandwidth during SMP operation is ~50% more than a one processor test. Both the dual Pentium 4 and Itanium 2 share their memory bus and split (with some loss) the bandwidth in dual mode. In a single processor configuration the conclusion is less clear. Itanium's spec numbers are very impressive, but still not high enough to win on price performance. The new Pentium 4 3.2 GHz Extremem Edition with its 4x200 FSB has very good SpecFP2000 numbers out performing the Opteron by about 100 spec points and may be the best price performance choice in a single processor configuration. But of course the above logic means nothing with a benchmark of >>your<< application and specific vendor quotes in >>your<< hands. rbw #--------------------------------------------------- # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw at networkcs.com, richard.walsh at netaspx.com # rbw at ahpcrc.org #--------------------------------------------------- # Nullum magnum ingenium sine mixtura dementiae fuit. # - Seneca #--------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Wed Oct 29 11:13:27 2003 From: ctierney at hpti.com (Craig Tierney) Date: 29 Oct 2003 09:13:27 -0700 Subject: opteron VS Itanium 2 In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it> References: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: <1067444007.6209.16.camel@hpti10.fsl.noaa.gov> On Tue, 2003-10-28 at 02:58, Butti Gabriele - Dottorati di Ricerca wrote: > Dear all, > we are planning to build up a new cluster (16 nodes) before this > year's end; we are evaluating different proposals from machine sellers, > but the main doubt we have at this moment is whether choosing an Itanium > 2 architecture or an AMD Opteron one. > > I know that ther's had already been on this list a debate on such a > topic, but maybe some of you has some new experience to tell about. > > There is a wild bunch of benchmarks on these machines, but we fear that > these are somewhat misleading and are not designed to test CPU's for > intense scientific computing. The code we want to run on these machines > is basically a home-made code, not fully optimized, which allocates > around 500 Mb of RAM per node. Communication between nodes is a quite > rare event and does not affect much computation time. In the past we had > a very nice experience using Alpha CPU's which performed very well. > > To sum up, the question is: is the Itanium2 worth the price difference > or is the Opteron the best choice? > Why don't you run your codes on the two platforms and figure it out for yourself? Better yet, get the vendors to do it. I have seen cases where Itanium 2 performs much better than Opteron, justifying the price difference. Other codes did not show the same difference, but both were faster than a Xeon. Craig > Thank you all > > Gabriele Butti _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Thomas.Alrutz at dlr.de Wed Oct 29 11:15:48 2003 From: Thomas.Alrutz at dlr.de (Thomas Alrutz) Date: Wed, 29 Oct 2003 17:15:48 +0100 Subject: opteron VS Itanium 2 References: <1067335084.12500.63.camel@tantalio.mater.unimib.it> Message-ID: <3F9FE7B4.1000607@dlr.de> Hi Gabriele, we have bought a similar Linux Cluster (16 nodes) you are lokking for with the smallest dual Opteron 240 (1.4 GHz) and two Gigabit networks (one for communications (MPI) and one for nfs). > Dear all, > we are planning to build up a new cluster (16 nodes) before this > year's end; we are evaluating different proposals from machine sellers, > but the main doubt we have at this moment is whether choosing an Itanium > 2 architecture or an AMD Opteron one. > > I know that ther's had already been on this list a debate on such a > topic, but maybe some of you has some new experience to tell about. > The nodes have all 2 GB RAM (4*512 MB DDR333 REG), 2 Gigabit NICs (Broadcom onboard) and a Harddisk. The board we had choosen was the Rioworks HDAMA. I know it is not cheap, but it is stable and performances well with the SUSE/United Linux Enterprise Edition. > There is a wild bunch of benchmarks on these machines, but we fear that > these are somewhat misleading and are not designed to test CPU's for > intense scientific computing. The code we want to run on these machines > is basically a home-made code, not fully optimized, which allocates > around 500 Mb of RAM per node. Communication between nodes is a quite > rare event and does not affect much computation time. In the past we had > a very nice experience using Alpha CPU's which performed very well. We have done some benchmarking with our TAU-Code (unstructured finite volume CFD-code, in multigrid), which hangs extremly on the memory bandwith and latency. Therefore we tested 4 different architectures: 1. AMD Athlon MP 1.8 GHz FSB 133 MHZ - with gcc3.2 in 32 Bit 2. Intel Xeon 2.66 GHz FSB 133 MHZ - with icc7 in 32 bit 3. Intel Itanium2 1.0 GHz FSB 100 MHZ - with ecc6 in 64 Bit 4. AMD Opteron 240 1.4 GHz FSB 155 MHZ - with gcc3.2 in 64 Bit For the benchmark we used a "real life" example (aircraft configuration with wing, body and engine - approx. 2 million grid points) which desires 1.3 GB to 1.7 GB for the job (1 process) We have performed 30 iterations (Navier Stokes calculation - Spalart Allmares - central scheme - multigrid cycle) and taken the total (Wallclock) time. > > To sum up, the question is: is the Itanium2 worth the price difference > or is the Opteron the best choice? To answer your question take a look on the following chart : All times in seconds for 1 cpu on the node in use 1. AMD Athlon MP 1.8 GHz - 30 iter. = 3642.4 sec. 2. Intel Xeon 2.66 GHz - 30 iter. = 2151.4 sec. <- fastest 3. Intel Itanium2 1.0 GHz - 30 iter. = 3571.8 sec. 4. AMD Opteron 240 1.4 GHz - 30 iter. = 2256.5 sec. and 2 cpu on the node in use (2 process via MPI) 1. AMD Athlon MP 1.8 GHz - 30 iter. = 2076.1 sec. 2. Intel Xeon 2.66 GHz - 30 iter. = 1447.8 sec 3. Intel Itanium2 1.0 GHz - 30 iter. = 1842.8 sec. 4. AMD Opteron 240 1.4 GHz - 30 iter. = 1159.5 sec. <-- fastest So here you can see why we had to choose an Opteron based node to build up the cluster. The price/performance ratio for the Opteron machine is verry good compared to the itanium2 machines. And the Xeons are not so much cheaper.... Thomas -- __/|__ | Dipl.-Math. Thomas Alrutz /_/_/_/ | DLR Institut fuer Aerodynamik und Stroemungstechnik |/ | Numerische Verfahren DLR | Bunsenstr. 10 | D-37073 Goettingen/Germany _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From daniel at labtie.mmt.upc.es Wed Oct 29 14:16:43 2003 From: daniel at labtie.mmt.upc.es (Daniel Fernandez) Date: Wed, 29 Oct 2003 20:16:43 +0100 Subject: Video-less nodes Message-ID: <1067455003.21980.11.camel@qeldroma.cttc.org> Hi all, I would like to get some opinions about video-less nodes in a cluster, we know that there is no problem about monitoring nodes remotely and reading logs but I suppose that in a kernel panic situation there's some valuable on-screen information... ? any thoughts ? Of course there's the possibility about putting really cheap video cards just that we'll able to see the text screen , nothing more ;) -- Daniel Fernandez Laboratori de Termot?cnia i Energia - CTTC UPC Campus Terrassa _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Wed Oct 29 15:45:25 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed, 29 Oct 2003 12:45:25 -0800 (PST) Subject: Video-less nodes In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org> Message-ID: On Wed, 29 Oct 2003, Daniel Fernandez wrote: > Hi all, > > I would like to get some opinions about video-less nodes in a cluster, > we know that there is no problem about monitoring nodes remotely and > reading logs but I suppose that in a kernel panic situation there's some > valuable on-screen information... ? any thoughts ? console on serial... let your terminal server collect oopses... > Of course there's the possibility about putting really cheap video cards > just that we'll able to see the text screen , nothing more ;) > > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlb17 at duke.edu Wed Oct 29 16:41:21 2003 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed, 29 Oct 2003 16:41:21 -0500 (EST) Subject: Video-less nodes In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org> Message-ID: On Wed, 29 Oct 2003 at 8:16pm, Daniel Fernandez wrote > I would like to get some opinions about video-less nodes in a cluster, > we know that there is no problem about monitoring nodes remotely and > reading logs but I suppose that in a kernel panic situation there's some > valuable on-screen information... ? any thoughts ? > > Of course there's the possibility about putting really cheap video cards > just that we'll able to see the text screen , nothing more ;) As always, the answer is it depends. A serial console should handle all your needs. But sometimes the BIOS sucks or the console doesn't work right or... IMHO, unless it messes other stuff up (e.g. drags your only PCI bus down to 32/33), there's not much reason *not* to stuff cheap video boards into nodes. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Oct 29 17:00:47 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 29 Oct 2003 17:00:47 -0500 (EST) Subject: Video-less nodes In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org> Message-ID: On Wed, 29 Oct 2003, Daniel Fernandez wrote: > Hi all, > > I would like to get some opinions about video-less nodes in a cluster, > we know that there is no problem about monitoring nodes remotely and > reading logs but I suppose that in a kernel panic situation there's some > valuable on-screen information... ? any thoughts ? > > Of course there's the possibility about putting really cheap video cards > just that we'll able to see the text screen , nothing more ;) To my direct experience, the extra time you waste debugging problems on videoless nodes by hauling them out of the rack, sticking video in them, resolving the problem, removing the video, and reinserting the nodes is far more costly than cheap video, or better yet onboard video (many/most good motherboards have onboard video these days) and being able to resolve many of these problems without deracking the nodes. Just my opinion of course. When things go well, of course, it doesn't matter. Just think about the labor involved in a single BIOS reflash, for example. rgb > > -- > Daniel Fernandez > Laboratori de Termot?cnia i Energia - CTTC > UPC Campus Terrassa > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Wed Oct 29 22:00:09 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed, 29 Oct 2003 22:00:09 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: <200310291511.h9TFBJx10935@mycroft.ahpcrc.org> Message-ID: > >To sum up, the question is: is the Itanium2 worth the price difference > >or is the Opteron the best choice? > > The SpecFP2000 performance difference between the best I2 and best > Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000). which to me indicates that the working set of SPEC codes is a good match to the cache of high-end It2's. this says nothing about It2's, but rather points out that SPEC components are nearly obsolete (required to run well in just 64MB core, if I recall correctly!) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jmdavis at mail2.vcu.edu Wed Oct 29 15:12:20 2003 From: jmdavis at mail2.vcu.edu (Mike Davis) Date: Wed, 29 Oct 2003 15:12:20 -0500 Subject: Video-less nodes In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org> References: <1067455003.21980.11.camel@qeldroma.cttc.org> Message-ID: <3FA01F24.2090405@mail2.vcu.edu> The onscreen info should also be logged. And then there's always the crash files. We now have a couple of clusters with videoless nodes (although they are on serial switches, Cyclades). Mike Daniel Fernandez wrote: >Hi all, > >I would like to get some opinions about video-less nodes in a cluster, >we know that there is no problem about monitoring nodes remotely and >reading logs but I suppose that in a kernel panic situation there's some >valuable on-screen information... ? any thoughts ? > >Of course there's the possibility about putting really cheap video cards >just that we'll able to see the text screen , nothing more ;) > > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andreas.boklund at htu.se Thu Oct 30 01:57:35 2003 From: andreas.boklund at htu.se (andreas boklund) Date: Thu, 30 Oct 2003 07:57:35 +0100 Subject: opteron VS Itanium 2 Message-ID: Just a note, > For bandwidth/memory intensive codes, I think the Opteron is a clear > winner in a dual processor configuration because of its dual channel > to memory design. Stream triad bandwidth during SMP operation is > ~50% more than a one processor test. Both the dual Pentium 4 and Itanium > 2 share their memory bus and split (with some loss) the bandwidth in > dual mode. This is true as long as you are using an applicaiton where one process has its own memory area. If you would have 2 processes and shared memory the Opt, would behave like a small NUMA machine and a process will get a penalty for accessing another process (processors) memory segment. To quote D. Barron, "If it seems to be to good to be true, it probably is!", i have never yet seen true linear scalability, and with Ahmdahl out there i doubt that i ever will. Best //Andreas _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Oct 30 10:51:41 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 30 Oct 2003 10:51:41 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: Message-ID: > > For bandwidth/memory intensive codes, I think the Opteron is a clear > > winner in a dual processor configuration because of its dual channel > > to memory design. Stream triad bandwidth during SMP operation is > > ~50% more than a one processor test. Both the dual Pentium 4 and Itanium > > 2 share their memory bus and split (with some loss) the bandwidth in > > dual mode. this is particularly bad on "high-end" machines. for instance, several machines have 4 it2's on a single FSB. there's a reason that specfprate scales so much better on 1/2/4-way opterons than on 1/2/4-way it2's. don't even get me started about those old profusion-chipset 8-way PIII machines that Intel pushed for a while... > This is true as long as you are using an applicaiton where one process has its own > memory area. If you would have 2 processes and shared memory the Opt, would > behave like a small NUMA machine and a process will get a penalty for accessing > another process (processors) memory segment. huh? sharing data behaves pretty much the same on opteron systems (broadcast-based coherency) as on shared-FSB (snoopy) systems. it's not at all clear yet whether opterons are higher latency in the case where you have *often*written* shared data. it is perfectly clear that shared/snoopy buses don't scale, and neither does pure broadcast coherency. I figure that both Intel and AMD will be adding some sort of directory support in future machines. if they bother, that is - the market for many-way SMP is definitely not huge, at least not in the mass-market sense. regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Thu Oct 30 10:57:00 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Thu, 30 Oct 2003 09:57:00 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> On Wed Oct 29 21:38:48 2003, Mark Hahn wrote: >> >To sum up, the question is: is the Itanium2 worth the price difference >> >or is the Opteron the best choice? >> >> The SpecFP2000 performance difference between the best I2 and best >> Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000). > >which to me indicates that the working set of SPEC codes is a good >match to the cache of high-end It2's. this says nothing about It2's, >but rather points out that SPEC components are nearly obsolete >(required to run well in just 64MB core, if I recall correctly!) Of course, there is some truth to what you say, but "this says nothing about It2" seems a tad dramatic (but ... definitely in character ... ;-) ). Below is the memory table for most of the benchmarks. A few fit in the 6MB cache (although some surely should, as some codes do or can be made too fit into cache). Many are in the 100 to 200 MB range. The floating point accumen of the I2 chip is hard to question with the capability of performing 4 64-bit flops per clock (that's a 6.0 GFlops peak at 1.5 GHz; 12.0 at 32-bits). Moreover, even an I2 with 1/2 the Opteron's clock and only 50% more cache (L3 vs L2) performs more or less equal to the Opteron 246 on SpecFP2000. And after all a huge cache does raise the average memory bandwidth felt by the average code ... ;-) (even as average codes sizes grow) ... and a large node count divides the total memory required per node. Large clusters should love large caches ... you know the quest for super-linear speed ups. The I2's weakness is in price-performance and in memory bandwidth in SMP configurations in my view. My last line in the prior note was a reminder to the original poster that SpecFP numbers are not a final answer. I repeated the "benchmark you code" mantra ... partly to relieve Bob Brown of his responsibility to do so ;-). Got any snow up in the Great White North yet? Regards, rbw max max num num rsz vsz obs unchanged stable? ----- ----- --- --------- ------- gzip 180.0 199.0 181 68 vpr 50.0 53.6 151 6 gcc 154.0 156.0 134 0 mcf 190.0 190.0 232 230 stable crafty 2.0 2.6 107 106 stable parser 37.0 66.8 263 254 stable eon 0.6 1.5 130 0 perlbmk 146.0 158.0 186 0 gap 192.0 194.0 149 148 stable vortex 72.0 79.4 162 0 bzip2 185.0 199.0 153 6 twolf 3.4 4.0 273 0 wupwise 176.0 177.0 185 181 stable swim 191.0 192.0 322 320 stable mgrid 56.0 56.7 281 279 stable applu 181.0 191.0 371 369 stable mesa 9.4 23.1 132 131 stable galgel 63.0 155.0 287 59 art 3.7 4.3 157 37 equake 49.0 49.4 218 216 stable facerec 16.0 18.5 182 173 stable ammp 26.0 28.4 277 269 stable lucas 142.0 143.0 181 179 stable fma3d 103.0 105.0 268 249 stable sixtrack 26.0 59.8 148 141 stable apsi 191.0 192.0 271 270 stable _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Thu Oct 30 11:07:25 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Thu, 30 Oct 2003 10:07:25 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310301607.h9UG7PX06372@mycroft.ahpcrc.org> On Thu, 30 Oct 2003 07:57, Andreas Boklund wrote: >Just a note, > >> For bandwidth/memory intensive codes, I think the Opteron is a clear >> winner in a dual processor configuration because of its dual channel >> to memory design. Stream triad bandwidth during SMP operation is >> ~50% more than a one processor test. Both the dual Pentium 4 and Itanium >> 2 share their memory bus and split (with some loss) the bandwidth in >> dual mode. > >This is true as long as you are using an applicaiton where one process has its own >memory area. If you would have 2 processes and shared memory the Opt, would >behave like a small NUMA machine and a process will get a penalty for accessing >another process (processors) memory segment. > >To quote D. Barron, "If it seems to be to good to be true, it probably is!", i have never >yet seen true linear scalability, and with Ahmdahl out there i doubt that i ever will. Agreed. Of course, in the case of dual Pentium and Itaniums, even non- overlapping memory locations buy you nothing bandwidth-wise. Small or large scale perfect cross-bars to memory are tough and expensive. The Cray X1, with all its customer design effort and great total bandwidth on the node board, targeted only 1/4 of peak-data-required iin its design and delivers less under the full load of its 16-way SMP vector engines. And it's node board is probably the best bandwidth engine in the world at the moment. Regards, rbw _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Thu Oct 30 12:28:45 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Thu, 30 Oct 2003 11:28:45 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310301728.h9UHSj508273@mycroft.ahpcrc.org> On Thu, 30 Oct 2003 12:00:54, Mark Hahn wrote: >> Of course, there is some truth to what you say, but "this says nothing about >> It2" seems a tad dramatic (but ... definitely in character ... ;-) ). Below is > >all the world's a stage ;) Life without drama is life without the pursuit of happiness ... ;-). >> the memory table for most of the benchmarks. A few fit in the 6MB cache (although >> some surely should, as some codes do or can be made too fit into cache). Many > >seriously, the memory access patterns of very few apps are uniform >across their rss. I probably should have said "working set fits in 6M". Good point, most memory accesses are not globally stride-one. But of course this fact leads us back to the idea that cache >>is<< important for a suite of "representative codes". >and you're right; I just reread the spec blurb, and their aim was 100-200MB. > >> are in the 100 to 200 MB range. The floating point accumen of the I2 chip is hard > >that's max rss; it's certainly an upper bound on working set size, >but definitely not a good estimator. Yes, an upper bound. We would need more data on the Spec codes to know if the working sets are mostly sitting in the I2 cache. There is an inevitable dynamism here with larger caches swallowing up larger and larger chunks of the "average code's" working set and while the average working set grows over time. >in other words, it tells you something about the peak number of pages that >the app ever touches. it doesn't tell you whether 95% of those pages are >never touched again, or whether the app only touches 1 cacheline per page. > >in yet other words, max rss is relevant to swapping, not cache behavior. You might also say it this way ... cache-exceeding, max-RSS magnitude by itself does guarantee the elimination of unwanted cache effects. > >> And after all a huge cache does raise the average memory bandwidth felt by the >> average code ... ;-) (even as average codes sizes grow) ... and a large node count > >even though Spec uses geo-mean, it can strongly be influenced by outliers, >as we've all seen with Sun's dramatic "performance improvements" ;) > >in particular, 179.art is a good example. I actually picked it out by >comparing the specFP barchart for mckinley vs madison - it shows a fairly >dramatic improvement. this *could* be due to compiler improvements, >but given that 179.art has a peak RSS of 3.7MB, I think there's a real >cache effect here. I agree again, but would say that such a suite as SpecFP should include some codes that yield to cache-effects because some real world codes do. Always learn or am reminded of something from your posts Mark ... keep on keeping us honest and true ;-) like a Canadian Mountie. Regards, rbw _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Oct 30 12:45:20 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 30 Oct 2003 12:45:20 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: <200310301728.h9UHSj508273@mycroft.ahpcrc.org> Message-ID: > this fact leads us back to the idea that cache >>is<< important for a suite > of "representative codes". yes, certainly, and TBBIYOC (*). but the traditional perhaps slightly stodgy attitude towards this has been that caches do not help machine balance. that is, it2 has a peak/theoretical 4flops/cycle, but since that would require, worstcase, 3 doubles per flop, the highest-ranked CPU is actually imbalanced by a factor of 22.5! (*) the best benchmark is your own code let's step back a bit. suppose we were designing a new version of SPEC, and wanted to avoid every problem that the current benchmarks have. here are some partially unworkable ideas: keep geometric mean, but also quote a few other metrics that don't hide as much interesting detail. for instance, show the variance of scores. or perhaps show base/peak/trimmed (where the lowest and highest component are simply dropped). cache is a problem unless your code is actually a spec component, or unless all machines have the same basic cache-to-working-set relation for each component. alternative: run each component on a sweep of problem sizes, and derive two scores: in-cache and out-cache. use both scores as part of the overall summary statistic. I'd love to see good data-mining tools for spec results. for instance, I'd like to have an easy way to compare consecutive results for the same machine as the vendor changed the compiler, or as clock increases. there's a characteristic "shape" to spec results - which scores are high and low relative to the other scores for a single machine. not only does this include outliers (drastic cache or compiler effects), but points at strengths/weaknesses of particular architectures. how to do this, perhaps some kind of factor analysis? regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at physics.mcmaster.ca Thu Oct 30 12:00:54 2003 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Thu, 30 Oct 2003 12:00:54 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> Message-ID: > Of course, there is some truth to what you say, but "this says nothing about > It2" seems a tad dramatic (but ... definitely in character ... ;-) ). Below is all the world's a stage ;) > the memory table for most of the benchmarks. A few fit in the 6MB cache (although > some surely should, as some codes do or can be made too fit into cache). Many seriously, the memory access patterns of very few apps are uniform across their rss. I probably should have said "working set fits in 6M". and you're right; I just reread the spec blurb, and their aim was 100-200MB. > are in the 100 to 200 MB range. The floating point accumen of the I2 chip is hard that's max rss; it's certainly an upper bound on working set size, but definitely not a good estimator. in other words, it tells you something about the peak number of pages that the app ever touches. it doesn't tell you whether 95% of those pages are never touched again, or whether the app only touches 1 cacheline per page. in yet other words, max rss is relevant to swapping, not cache behavior. > And after all a huge cache does raise the average memory bandwidth felt by the > average code ... ;-) (even as average codes sizes grow) ... and a large node count even though Spec uses geo-mean, it can strongly be influenced by outliers, as we've all seen with Sun's dramatic "performance improvements" ;) in particular, 179.art is a good example. I actually picked it out by comparing the specFP barchart for mckinley vs madison - it shows a fairly dramatic improvement. this *could* be due to compiler improvements, but given that 179.art has a peak RSS of 3.7MB, I think there's a real cache effect here. > Got any snow up in the Great White North yet? no, but I notice that the permanent temporary DX units are not working as hard to keep the machineroom from melting down ;) oh, yeah, and there's something wrong with the color of the leaves. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rbw at ahpcrc.org Thu Oct 30 16:32:38 2003 From: rbw at ahpcrc.org (Richard Walsh) Date: Thu, 30 Oct 2003 15:32:38 -0600 Subject: opteron VS Itanium 2 Message-ID: <200310302132.h9ULWcM12979@mycroft.ahpcrc.org> Mark Hahn wrote: >> this fact leads us back to the idea that cache >>is<< important for a suite >> of "representative codes". > >yes, certainly, and TBBIYOC (*). but the traditional perhaps slightly >stodgy attitude towards this has been that caches do not help machine >balance. that is, it2 has a peak/theoretical 4flops/cycle, but since >that would require, worstcase, 3 doubles per flop, the highest-ranked >CPU is actually imbalanced by a factor of 22.5! > >(*) the best benchmark is your own code Agreed, but since the scope of the discussion seemed to be microprocessors which are all relatively bad on balance compared to vector ISA/designs, I did not elaborate on balance. This is design area that favors the Opteron (and Power 4) because the memory controller is on-chip (unlike the Pentium 4 and I2) and as such, its performance improves with clock. I think it is interesting to look at other processor's theoretical balance numbers in relationship to the I2's that you compute (I hope I have them all correct): Pentium 4 EE 3.2 GHz: (3.2 GHz * 2 flops * 24 bytes) / 6.4 bytes/sec = Balance of 24 (max on chip cache 2MB) Itanium 2 1.5 GHz: (1.5 GHz * 4 flops * 24 bytes) / 6.4 bytes/sec = Balance of 22.5 (max on chip cache 6MB) Opteron 246 2.0 GHz: (2.0 GHz * 2 flops * 24 bytes) / 6.4 bytes/sec = Balance of 15 (max on chip cache 1MB) Power 4 1.7 GHz: (1.7 GHz * 4 flops * 24 bytes) / 6.4 bytes/sec = Balance of 25.5* (max on chip cache 1.44MB) Cray X1 .8 GHz: (0.8 GHz * 4 flops * 24 bytes) / 19.2 bytes/sec = Balance of 4 (512 byte off-chip L2) * IBM memory performance is with 1 core disabled and may now be higher than this. When viewed in context, yes, the I2 is poorly balanced, but it is typical of microprocessors, and it is not the worst among them. It also offers the largest compensating cache. Where it loses alot of ground is in the dual processor configuration. Opteron yields a better number, but this is because it can't do as many flops. The Cray X1 is has the most agressive design specs and yields a large enough percentage of peak to beat the fast clocked micros on vector code (leaving the ugly question of price aside). This is in part due to the more balanced design, but also due to its vector ISA which is just better at moving data from memory. >let's step back a bit. suppose we were designing a new version of SPEC, >and wanted to avoid every problem that the current benchmarks have. >here are some partially unworkable ideas: > >keep geometric mean, but also quote a few other metrics that don't >hide as much interesting detail. for instance, show the variance of >scores. or perhaps show base/peak/trimmed (where the lowest and highest >component are simply dropped). Definitely. I am constantly trimming the reported numbers myself and looking at the bar graphs for an eye-ball variance. It takes will power to avoid being seduced by a single summarizing number. The Ultra III's SpecFP number was a good reminder. >cache is a problem unless your code is actually a spec component, >or unless all machines have the same basic cache-to-working-set relation >for each component. alternative: run each component on a sweep of problem >sizes, and derive two scores: in-cache and out-cache. use both scores >as part of the overall summary statistic. Very good as well. This is the "cpu-rate-comes-to-spec" approach that I am sure Bob Brown would endorse. >I'd love to see good data-mining tools for spec results. for instance, >I'd like to have an easy way to compare consecutive results for the same >machine as the vendor changed the compiler, or as clock increases. ... or increased cache size. Another winning suggestion. >there's a characteristic "shape" to spec results - which scores are >high and low relative to the other scores for a single machine. not only >does this include outliers (drastic cache or compiler effects), but >points at strengths/weaknesses of particular architectures. how to do this, >perhaps some kind of factor analysis? This is what I refer to as the Spec finger print or Roshacht(sp?) test. We need a neural net derived analysis and classification here. Another presentation that I like is the "star graph" in which major characteristics (floating point perf., integer perf., cache, memory bandwidth, etc.) are layed out in equal degrees as vectors around a circle. Each processor is measured on each axis to give a star print and the total area is a measure of "total goodness". I hope someone from Spec is reading this ... and they remember who made these suggestions ... ;-). Regards, rbw #--------------------------------------------------- # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw at networkcs.com, richard.walsh at netaspx.com # rbw at ahpcrc.org # #--------------------------------------------------- # Nullum magnum ingenium sine mixtura dementiae fuit. # - Seneca #--------------------------------------------------- _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrewxwang at yahoo.com.tw Thu Oct 30 23:31:01 2003 From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=) Date: Fri, 31 Oct 2003 12:31:01 +0800 (CST) Subject: opteron VS Itanium 2 In-Reply-To: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> Message-ID: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> Other problems with the Itanium 2 are power hungry and heat problem. Also, reported on another mailing list: Earth Simulator 35.8 TFlop/s ASCI Q Alpha EV-68 13.8 TFlop/s Apple G5 dual (Big Mac) 9.5 TFlop/s HP RX2600 Itanium 2 8.6 TFlop/s This would place the Big Mac in the 3rd place on the top500 list -- assuming they have reported all submitted results in the report: http://www.netlib.org/benchmark/performance.pdf (p53) Andrew. > The I2's weakness is in price-performance and in > memory bandwidth in SMP configurations > in my view. My last line in the prior note was a > reminder to the original poster > that SpecFP numbers are not a final answer. I > repeated the "benchmark you code" > mantra ... partly to relieve Bob Brown of his > responsibility to do so ;-). > > Got any snow up in the Great White North yet? > > Regards, > > rbw > > > max max num num > rsz vsz obs unchanged stable? > ----- ----- --- --------- ------- > gzip 180.0 199.0 181 68 > vpr 50.0 53.6 151 6 > gcc 154.0 156.0 134 0 > mcf 190.0 190.0 232 230 stable > crafty 2.0 2.6 107 106 stable > parser 37.0 66.8 263 254 stable > eon 0.6 1.5 130 0 > perlbmk 146.0 158.0 186 0 > gap 192.0 194.0 149 148 stable > vortex 72.0 79.4 162 0 > bzip2 185.0 199.0 153 6 > twolf 3.4 4.0 273 0 > > wupwise 176.0 177.0 185 181 stable > swim 191.0 192.0 322 320 stable > mgrid 56.0 56.7 281 279 stable > applu 181.0 191.0 371 369 stable > mesa 9.4 23.1 132 131 stable > galgel 63.0 155.0 287 59 > art 3.7 4.3 157 37 > equake 49.0 49.4 218 216 stable > facerec 16.0 18.5 182 173 stable > ammp 26.0 28.4 277 269 stable > lucas 142.0 143.0 181 179 stable > fma3d 103.0 105.0 268 249 stable > sixtrack 26.0 59.8 148 141 stable > apsi 191.0 192.0 271 270 stable > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----------------------------------------------------------------- ??? Yahoo!?? ?????????????????????? http://tw.promo.yahoo.com/mail_premium/stationery.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 31 11:02:29 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 31 Oct 2003 11:02:29 -0500 (EST) Subject: opteron VS Itanium 2 In-Reply-To: <200310302132.h9ULWcM12979@mycroft.ahpcrc.org> Message-ID: On Thu, 30 Oct 2003, Richard Walsh wrote: > >cache is a problem unless your code is actually a spec component, > >or unless all machines have the same basic cache-to-working-set relation > >for each component. alternative: run each component on a sweep of problem > >sizes, and derive two scores: in-cache and out-cache. use both scores > >as part of the overall summary statistic. > > Very good as well. This is the "cpu-rate-comes-to-spec" approach > that I am sure Bob Brown would endorse. Oh, sure. "I endorse this." ;-) As you guys are working out fine on your own, I like it combined with Mark's suggestion of showing the entire constellation for spec (which of course you CAN access and SHOULD access in any case instead of relying on geometric or any other mean measure of performance:-). I really think that many HPC performance benchmarks primary weakness is that they DON'T sweep problem size and present results as a graph, and that they DON'T present a full suite of different results that measure many identifiably different components of overall performance. From way back with early linpack, this has left many benchmarks susceptible to vendor manipulation -- there are cases on record of vendors (DEC, IIRC, but likely others) actually altering CPU/memory architecture to optimize linpack performance because linpack was what sold their systems. This isn't just my feeling, BTW -- Larry McVoy has similar concerns (more stridently expressed) in his lmbench suite -- he actually had (and likely still has) as a condition of their application to a system that they can NEVER be applied singly with just one (favorable:-) number or numbers quoted in a publication or advertisement --- the results of the complete suite have to be presented all together, with your abysmal failures side by side with your successes. I personally am less religious about NEVER doing anything and dislike semi-closed sources and "rules" even for benchmarks (it makes far more sense to caveat emptor and pretty much ignore vendor-based performance claims in general:-), but do think that you get a hell of a lot more information from a graph of e.g. stream results as a function of vector size than you get from just "running stream". Since running stream as a function of vector size more or less requires using malloc to allocate the memory and hence adds one additional step of indirection to memory address resolution, it also very slightly worsens the results, but very likely in the proper direction -- towards the real world, where people do NOT generally recompile an application in order to change problem size. I also really like Mark's idea of having a benchmark database site where comparative results from a wide range of benchmarks can be easily searched and collated and crossreferenced. Like the spec site, actually. However, that's something that takes a volunteer or organization with spare resources, much energy, and an attitude to make happen, and since one would like to e.g. display spec results on a non-spec site and since spec is (or was, I don't keep up with its "rules") fairly tightly constrained on who can run it and how/where its results can be posted, it might not be possible to create your own spec db, your own lmbench db, your own linpack db, all on a public site. cpu_rate you can do whatever you want with -- it is full GPL code so a vendor could even rewrite it as long as they clearly note that they have done so and post the rewritten sources. Obviously you should either get results from somebody you trust or run it yourself, but that is true for any benchmark, with the latter being vastly preferrable.:-) If I ever have a vague bit of life in me again and can return to cpu_rate, I'm in the middle of yet another full rewrite that should make it much easier to create and encapsulate a new code fragment to benchmark AND should permit running an "antistream" version of all the tests involving long vectors (one where all the memory addresses are accessed in a random/shuffled order, to deliberately defeat the cache). However, I'm stretched pretty thin at the moment -- a talk to give Tuesday on xmlsysd/wulfstat, a CW column due on Wednesday, and I've agreed to write an article on yum due on Sunday of next week I think (and need to finish the yum HOWTO somewhere in there as well). So it won't be anytime soon...:-) > >I'd love to see good data-mining tools for spec results. for instance, > >I'd like to have an easy way to compare consecutive results for the same > >machine as the vendor changed the compiler, or as clock increases. > > ... or increased cache size. Another winning suggestion. > > >there's a characteristic "shape" to spec results - which scores are > >high and low relative to the other scores for a single machine. not only > >does this include outliers (drastic cache or compiler effects), but > >points at strengths/weaknesses of particular architectures. how to do this, > >perhaps some kind of factor analysis? > > This is what I refer to as the Spec finger print or Roshacht(sp?) > test. We need a neural net derived analysis and classification here. . The only one I'd trust is the one already implemented in wetware. After all, classification according to what? > Another presentation that I like is the "star graph" in which major > characteristics (floating point perf., integer perf., cache, memory > bandwidth, etc.) are layed out in equal degrees as vectors around > a circle. Each processor is measured on each axis to give a star > print and the total area is a measure of "total goodness". > > I hope someone from Spec is reading this ... and they remember who > made these suggestions ... ;-). But things are more complicated than this. The real problem with SPEC is that your application may well resemble one of the components of the suite, in which case that component is a decent predictor of performance for your application almost by definition. However, the mean performance on the suite may or may not be well correlated with that component, or your application may not resemble ANY of the components on the suite. Then there are variations with compiler, operating system, memory configuration, scaling (or lack thereof!) with CPU clock. As Mark says, TBBIYOC is the only safe rule if you seek to compare systems on the basis of "benchmarks". I personally tend to view large application benchmarks like linpack and spec with a jaded eye and prefer lmbench and my own microbenchmarks to learn something about the DETAILED performance of my architecture on very specific tasks that might be components of a large application, supplemented with YOC. Or rather MOC. Zen question: Which one reflects the performance of an architecture, a BLAS-based benchmark or an ATLAS-tuned BLAS-based benchmark? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From deadline at linux-mag.com Fri Oct 31 09:11:49 2003 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Fri, 31 Oct 2003 09:11:49 -0500 (EST) Subject: Cluster Poll Results In-Reply-To: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> Message-ID: For those interested, the latest poll at www.cluster-rant.com was on cluster size. We had a record 102 responses! Take a look at http://www.cluster-rant.com/article.pl?sid=03/10/25/1330216 for links to results and to the new poll on interconnects. Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Oct 31 11:55:43 2003 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 31 Oct 2003 11:55:43 -0500 (EST) Subject: Cluster Poll Results In-Reply-To: Message-ID: On Fri, 31 Oct 2003, Douglas Eadline, Cluster World Magazine wrote: > > For those interested, the latest poll at www.cluster-rant.com was on > cluster size. We had a record 102 responses! Take a look at > http://www.cluster-rant.com/article.pl?sid=03/10/25/1330216 > for links to results and to the new poll on interconnects. You need to let people vote more than once in something like this. I have three distinct clusters and there are two more I'd vote for the owners here at Duke. (They pretty much reflect the numbers you're getting, which show well over half the clusters at 32 nodes or less). It is interesting that this indicates that the small cluster is a lot more common than big clusters, although the way numbers work there are a lot more nodes in big clusters than in small clusters. At least in your biased and horribly unscientific (but FUN!) poll:-) So from a human point of view, providing support for small clusters is more important, but from an institutional/hardware point of view, big clusters dominate. It is also very interesting to me that RH (for example) thinks that there is something that they are going to provide that is worth e.g. several hundred thousand dollars in the case of a 1000+ node cluster running their "workstation" product. Fifty dollars certainly. Five hundred dollars maybe. A thousand dollars possibly, but only if they come up with a cluster-specific installation with some actual added value. Sigh. rgb > > Doug > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jcownie at etnus.com Fri Oct 31 12:10:31 2003 From: jcownie at etnus.com (James Cownie) Date: Fri, 31 Oct 2003 17:10:31 +0000 Subject: opteron VS Itanium 2 (Benchmark cheating) In-Reply-To: Message from "Robert G. Brown" of "Fri, 31 Oct 2003 11:02:29 EST." Message-ID: <1AFcn9-5Y0-00@etnus.com> > From way back with early linpack, this has left many benchmarks > susceptible to vendor manipulation -- there are cases on record of > vendors (DEC, IIRC, but likely others) actually altering CPU/memory > architecture to optimize linpack performance because linpack was > what sold their systems. This certainly applied to some compilers which "optimized" sdot and ddot by recognizing the source (down to the precise comments) and plugged in a hand coded assembler routine. Changing a comment (for instance mis-spelling Jack's name :-) or replacing a loop variable called "i" with one called "k" could halve the linpack result. When $$$ are involved people are prepared to sail close to the wind... -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Fri Oct 31 14:36:09 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Fri, 31 Oct 2003 11:36:09 -0800 (PST) Subject: opteron VS Itanium 2 (Benchmark cheating) In-Reply-To: <1AFcn9-5Y0-00@etnus.com> Message-ID: On Fri, 31 Oct 2003, James Cownie wrote: > > From way back with early linpack, this has left many benchmarks > > susceptible to vendor manipulation -- there are cases on record of > > vendors (DEC, IIRC, but likely others) actually altering CPU/memory > > architecture to optimize linpack performance because linpack was > > what sold their systems. > > This certainly applied to some compilers which "optimized" sdot and > ddot by recognizing the source (down to the precise comments) and > plugged in a hand coded assembler routine. Nvidia and ATI have recently done similar things, where their drivers would attempt to detect benchmarks being run and then use optimized routines or cheat on following specifications. Renaming quake2.exe to something else would cause a large decrease in framerate for example. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From raysonlogin at yahoo.com Fri Oct 31 14:45:04 2003 From: raysonlogin at yahoo.com (Rayson Ho) Date: Fri, 31 Oct 2003 11:45:04 -0800 (PST) Subject: opteron VS Itanium 2 In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com> Message-ID: <20031031194504.30508.qmail@web11404.mail.yahoo.com> But still, at least the results showed that the G5s provided similar performance, and less expensive than IA64... Rayson --- Greg Lindahl wrote: > On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote: > > > This would place the Big Mac in the 3rd place on the > > top500 list > > Except that there are several other new large clusters that will > likely place higher -- LANL announced a 2,048 cpu Opteron cluster a > while back, and LLNL has something new, too, I think. Comparing > yourself to the obsolete list in multiple press releases isn't very > clever. > > -- greg > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at clustervision.com Fri Oct 31 12:38:20 2003 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 31 Oct 2003 18:38:20 +0100 (CET) Subject: Cluster Poll Results In-Reply-To: Message-ID: On Fri, 31 Oct 2003, Robert G. Brown wrote: > > It is also very interesting to me that RH (for example) thinks that > there is something that they are going to provide that is worth e.g. > several hundred thousand dollars in the case of a 1000+ node cluster > running their "workstation" product. Fifty dollars certainly. Five > hundred dollars maybe. A thousand dollars possibly, but only if they > come up with a cluster-specific installation with some actual added > value. > I'll second that. There has been a debate running on this topic on the Fedora list over the last few days. Sorry to be so boring, but its something we should debate too. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Oct 31 13:19:12 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 31 Oct 2003 10:19:12 -0800 Subject: opteron VS Itanium 2 In-Reply-To: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> References: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com> Message-ID: <20031031181912.GB1289@greglaptop.internal.keyresearch.com> On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote: > This would place the Big Mac in the 3rd place on the > top500 list Except that there are several other new large clusters that will likely place higher -- LANL announced a 2,048 cpu Opteron cluster a while back, and LLNL has something new, too, I think. Comparing yourself to the obsolete list in multiple press releases isn't very clever. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From walkev at presearch.com Fri Oct 31 14:44:59 2003 From: walkev at presearch.com (Vann H. Walke) Date: Fri, 31 Oct 2003 14:44:59 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: References: Message-ID: <1067629499.21719.73.camel@localhost.localdomain> On Fri, 2003-10-31 at 12:38, John Hearns wrote: > On Fri, 31 Oct 2003, Robert G. Brown wrote: > > > > > It is also very interesting to me that RH (for example) thinks that > > there is something that they are going to provide that is worth e.g. > > several hundred thousand dollars in the case of a 1000+ node cluster > > running their "workstation" product. Fifty dollars certainly. Five > > hundred dollars maybe. A thousand dollars possibly, but only if they > > come up with a cluster-specific installation with some actual added > > value. > > > I'll second that. > > There has been a debate running on this topic on the Fedora list > over the last few days. > > Sorry to be so boring, but its something we should debate too. > Hmm... Let's take the case of a 1000 node system. If we assume a $3000/node cost (probably low once rack, UPS, hardware support, and interconnect are added in), we arrive at an approximate hardware cost of $3,000,000. If we were to use the RHEL WS list price of $179/node, we get $179,000 or about 6% of the hardware cost. That is assuming RedHat will not provide any discount on large volume purchases (unlikely). Is 6% unreasonable? What are the alternatives? - Keep using an existing RH distro: Only if you're willing to move into do it yourself mode when RH stop support (December?). I expect very few would be happy with this option. However, if you have a working RH7.3 cluster, it works, and you don't have to worry too much about security, why change? For new clusters though.... - Fedora - Planned releases 2-3 times a year. So, if I build a system on the Fedora release scheduled this Monday, who will be providing security patches for it 2 years from now (after 4-6 new releases have been dropped). My guess is no-one. Again, we're in the do it yourself maintenance or frequent OS upgrade mode. - SUSE - Not sure about this one. Their commercial pricing model is pretty close to RedHat's. Are they going to keep developing consumer releases? What will the support be for those releases? Can we really expect more than we get from a purely community developed system? Perhaps someone with more SUSE knowledge could comment? - Debian - Could be a good option, but to some extent you end up in the same position as Fedora. How often do the releases come out. Who supports the old releases? What hardware / software will work on the platform? - Gentoo - Not reliable, stable enough to meet my needs for clustering - Mandrake - Mandrake has their clustering distribution, which could be a good possibility, but the cost is as high or higher than RedHat. - Scyld - Superior design, supported, but again very high cost and may have to fight some compatibility issues since the it's market share in the Linux world is less than tiny. - OSCAR / Rocks / etc... - generally installed on top of another distribution. We still have to pick a base distribution. My conclusions - If you're in a research facility / university type setting where limited amounts of down time are acceptable, a free or nearly free system is perfect. A new Fedora/Debian/SuSE release comes out, shut the system down over Christmas break and rebuild it. (As long as you're happy spending a fair amount of time doing rebuilds and fixing upgrade problems). If however you really need the thing to work - Corporate research sites, satellite data processing, etc... the cost of the operating system may be minuscule relative to the cost of having the system down. If you _really_ want a particular application to work having it certified and supported on the OS may be important. The project on which I'm working - building sonar training simulators for the US Navy Submarine force requires stable systems which should operate without major maintenance / operational changes for many years. Knowing the RedHat will support the enterprise line for 5 years is a big selling point. The cluster management portion of the software stack would be great to have integrated in to the product, but if third party vendors (Linux Networx, OSCAR, Rocks, etc...) can provide the cluster management portion on top of the distribution, a solution can be found. In some ways this is even better since your cluster management decision is independent of the OS vendor. I basically just want to make the point that the cluster space is filled with people of many different needs. Will everyone want RHEL? My guess is a resounding NO. (In the days of RH7.3 you could almost say Yes.) But, there are situations in which a stable, supported product is needed. This is the market RedHat is trying to target and states so pretty clearly ("Enterprise"). Small users and research systems get somewhat left out in the cold, but we probably shouldn't complain after having a free ride for the last 5+ years. So, is 6% unreasonable? Vann > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eemoore at fyndo.com Fri Oct 31 14:52:55 2003 From: eemoore at fyndo.com (Dr Eric Edward Moore) Date: Fri, 31 Oct 2003 19:52:55 +0000 Subject: opteron VS Itanium 2 In-Reply-To: (Mark Hahn's message of "Thu, 30 Oct 2003 12:45:20 -0500 (EST)") References: Message-ID: <87he1pfalk.fsf@azathoth.fyndo.com> Mark Hahn writes: > there's a characteristic "shape" to spec results - which scores are > high and low relative to the other scores for a single machine. not only > does this include outliers (drastic cache or compiler effects), but > points at strengths/weaknesses of particular architectures. how to do this, > perhaps some kind of factor analysis? Well, being bored, I tried factor analysis on the average results for the submitted specfp benchmarks at http://www.specbench.org/ The 5 factors with the largest eigenvaslues are: Eigenvalue: 0.314116 0.353034 0.799331 1.432038 10.614996 2.22% 2.25% 5.70% 10.22% 75.82% 168.wupwise -0.4134913 0.0241240 -0.1437086 -0.2757206 0.2715672 171.swim 0.0245451 0.0965325 0.3495143 0.1209393 0.2783842 172.mgrid 0.1122617 0.1365769 0.3273285 0.1332301 0.2839204 173.applu 0.0299056 0.0439954 0.4163242 0.1913496 0.2725619 177.mesa 0.4791260 0.4190313 -0.0949648 -0.3785996 0.2448368 178.galgel -0.0489231 -0.5404192 -0.2464610 0.2391370 0.2648068 179.art 0.0646181 0.5095081 -0.4736362 0.6508958 0.1054875 183.equake -0.5560255 0.0841426 0.0214064 0.1615493 0.2794066 187.facerec -0.0402649 0.0446221 -0.2628912 -0.0557252 0.2897607 188.ammp 0.3993861 -0.3404615 -0.1456043 0.0359475 0.2832809 189.lucas -0.2380202 0.0908976 0.0801927 -0.2140971 0.2842518 191.fma3d -0.0326577 0.1661895 -0.1149762 -0.3148501 0.2774768 200.sixtrack 0.1950678 -0.1574121 0.2852895 0.2008475 0.2741305 301.apsi 0.1128198 -0.2379642 -0.3013536 -0.1224494 0.2782804 Pretty much all the specfp tests correlate with each other pretty well, except for 179.art, which correlates... poorly with the others (it's correlation with 177.mesa is just 0.03). So most of the variation in the results is some sort of "raw speed" number, which has near-equal weightings of all the tests besides 179.art. Next most important is whatever makes art so different from all the others (maybe it's a persistent cache-misser, or maybe it's just the easiest for vendors to tweak). Not entirely sure what to make of the others. There does seem to be some commonality between 171.swim 172.mgrid 173.applu and 200.sixtrack in the third biggest factor (plus a lot of whatever art isn't) that could be important. The next two seem to mostly have something to do with whatever makes 177.mesa special. This is presumably all useless, but someone might be entertained :) > regards, mark hahn. -- Eric E. Moore _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathiasbrito at yahoo.com.br Fri Oct 31 16:38:52 2003 From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=) Date: Fri, 31 Oct 2003 18:38:52 -0300 (ART) Subject: sum of matrices Message-ID: <20031031213852.87539.qmail@web12206.mail.yahoo.com> Hi, Last days I write a code(in c) that make the sum of 2 matrices. Let me say a little about how it works. I send 1 row of the 1st matrice and 1 row of 2nd matrice for each process, when a process finish its job, if have more lines i send more to it and it make the sum of these new 2 lines. The problem is, the program works fine with 100x100(or less) matrice, but when I increase this range, something like 10000x10000 i receive the fallowing message: p0_8467: p4_error: Child process exited while making connection to remote process on node2: 0 This is a MPI problem or it`s my code? What can I do to fix this problem. ===== Mathias Brito Universidade Estadual de Santa Cruz - UESC Departamento de Ci?ncias Exatas e Tecnol?gicas Estudante do Curso de Ci?ncia da Computa??o Yahoo! Mail - o melhor webmail do Brasil http://mail.yahoo.com.br _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From xyzzy at speakeasy.org Fri Oct 31 15:52:12 2003 From: xyzzy at speakeasy.org (Trent Piepho) Date: Fri, 31 Oct 2003 12:52:12 -0800 (PST) Subject: opteron VS Itanium 2 In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com> Message-ID: On Fri, 31 Oct 2003, Greg Lindahl wrote: > On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote: > > This would place the Big Mac in the 3rd place on the > > top500 list > > Except that there are several other new large clusters that will > likely place higher -- LANL announced a 2,048 cpu Opteron cluster a > while back, and LLNL has something new, too, I think. Comparing > yourself to the obsolete list in multiple press releases isn't very > clever. I thought that the 3rd place was in the new preliminary top500 list that included all the big machines that will be there when the official list comes out. But there's been so much poor and conflicting information about Big Mac who knows? I'd like to know how much they payed for the infiniband hardware. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Fri Oct 31 16:14:35 2003 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Fri, 31 Oct 2003 15:14:35 -0600 Subject: opteron VS Itanium 2 In-Reply-To: References: Message-ID: On Fri, 31 Oct 2003, Trent Piepho wrote: > I thought that the 3rd place was in the new preliminary top500 list that > included all the big machines that will be there when the official list > comes out. But there's been so much poor and conflicting information > about Big Mac who knows? I'd like to know how much they payed for the > infiniband hardware. Yeah, me too. As someone who just ponied up for a rather large IB installation, I'm not sure that most people realize what a substantial percentage of the cost of the cluster the IB might be. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From weideng at uiuc.edu Fri Oct 31 15:37:45 2003 From: weideng at uiuc.edu (Wei Deng) Date: Fri, 31 Oct 2003 14:37:45 -0600 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <20031031203745.GU1408@aminor.cs.uiuc.edu> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: > - OSCAR / Rocks / etc... - generally installed on top of another > distribution. We still have to pick a base distribution. >From what I heard from Rocks mailing list, they will release 3.1.0 the next Month, which will be based on RHEL 3.0, compiled from source code that is publicly available, and free of charge. Even though Rocks is based on RedHat distribution, it is complete, which means you only need to download Rocks ISOs to accomplish your installation. -- Wei Deng Pablo Research Group Department of Computer Science University of Illinois 217-333-9052 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Fri Oct 31 16:17:35 2003 From: josip at lanl.gov (Josip Loncaric) Date: Fri, 31 Oct 2003 14:17:35 -0700 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <3FA2D16F.4030807@lanl.gov> Vann H. Walke wrote: > On Fri, 2003-10-31 at 12:38, John Hearns wrote: >>On Fri, 31 Oct 2003, Robert G. Brown wrote: >> >>>It is also very interesting to me that RH (for example) thinks that >>>there is something that they are going to provide that is worth e.g. >>>several hundred thousand dollars in the case of a 1000+ node cluster >>>running their "workstation" product. Fifty dollars certainly. Five >>>hundred dollars maybe. A thousand dollars possibly, but only if they >>>come up with a cluster-specific installation with some actual added >>>value. >> >>I'll second that. > > Hmm... Let's take the case of a 1000 node system. If we assume a > $3000/node cost (probably low once rack, UPS, hardware support, and > interconnect are added in), we arrive at an approximate hardware cost of > $3,000,000. If we were to use the RHEL WS list price of $179/node, we > get $179,000 or about 6% of the hardware cost. That is assuming RedHat > will not provide any discount on large volume purchases (unlikely). Is > 6% unreasonable? These days, one seldom builds 1000 node systems out of basic x86 boxes. Consider a 1024 node AMD64 system instead: The list price on RHEL WS Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. This is unlikely to create any sales. RH should be paid for the valuable service they provide (patch streams etc.) but this is not worth $811K to builders of large clusters. There are other good alternatives, most of them *MUCH* cheaper. I fully agree with RGB that RH needs to announce a sensible pricing structure for clusters in order to participate in this market. Would a single system image (BProc) cluster constructed by recompiling the kernel w/BProc patches fit RH's legal definition of a single "installed system" and a single "platform"? If so, $792 for a 1024-node cluster would be quite acceptable... Sincerely, Josip _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Fri Oct 31 17:38:50 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 31 Oct 2003 17:38:50 -0500 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA2D16F.4030807@lanl.gov> References: <1067629499.21719.73.camel@localhost.localdomain> <3FA2D16F.4030807@lanl.gov> Message-ID: <1067639930.26872.1.camel@squash.scalableinformatics.com> On Fri, 2003-10-31 at 16:17, Josip Loncaric wrote: > These days, one seldom builds 1000 node systems out of basic x86 boxes. > Consider a 1024 node AMD64 system instead: The list price on RHEL WS > Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. > This is unlikely to create any sales. SUSE AMD64 version of 9.0 is something like $120. It was somewhat more stable for my tests than the RH beta (GinGin64). I hope that RH will arrange for similar pricing. -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From joelja at darkwing.uoregon.edu Fri Oct 31 19:00:30 2003 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Fri, 31 Oct 2003 16:00:30 -0800 (PST) Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA2E6FD.6050107@scali.com> Message-ID: On Fri, 31 Oct 2003, Steffen Persvold wrote: > Josip Loncaric wrote: > > > > > > These days, one seldom builds 1000 node systems out of basic x86 boxes. > > Consider a 1024 node AMD64 system instead: The list price on RHEL WS > > Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. > > This is unlikely to create any sales. so download the source build and call you distro something other than redhat enterprise linux... or use debian... or cope. > > RH should be paid for the valuable service they provide (patch streams > > etc.) but this is not worth $811K to builders of large clusters. There > > are other good alternatives, most of them *MUCH* cheaper. I fully agree > > with RGB that RH needs to announce a sensible pricing structure for > > clusters in order to participate in this market. so don't use redhat. > Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't > claim support from RH for more than one of the systems. read the liscsense agreement for you redhat enterprise disks... > Regards, > Steffen > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja at darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at keyresearch.com Fri Oct 31 18:43:42 2003 From: lindahl at keyresearch.com (Greg Lindahl) Date: Fri, 31 Oct 2003 15:43:42 -0800 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <20031031234342.GC3744@greglaptop.internal.keyresearch.com> On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote: > So, is 6% unreasonable? For just the base OS? Yes. The market-place has spoken very loudly about that, especially people building large machines. -- greg _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sp at scali.com Fri Oct 31 17:49:33 2003 From: sp at scali.com (Steffen Persvold) Date: Fri, 31 Oct 2003 23:49:33 +0100 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: <3FA2D16F.4030807@lanl.gov> References: <1067629499.21719.73.camel@localhost.localdomain> <3FA2D16F.4030807@lanl.gov> Message-ID: <3FA2E6FD.6050107@scali.com> Josip Loncaric wrote: > > > These days, one seldom builds 1000 node systems out of basic x86 boxes. > Consider a 1024 node AMD64 system instead: The list price on RHEL WS > Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. > This is unlikely to create any sales. > > RH should be paid for the valuable service they provide (patch streams > etc.) but this is not worth $811K to builders of large clusters. There > are other good alternatives, most of them *MUCH* cheaper. I fully agree > with RGB that RH needs to announce a sensible pricing structure for > clusters in order to participate in this market. Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't claim support from RH for more than one of the systems. Regards, Steffen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tod at gust.sr.unh.edu Fri Oct 31 18:59:16 2003 From: tod at gust.sr.unh.edu (Tod Hagan) Date: 31 Oct 2003 18:59:16 -0500 Subject: Cluster Poll Results (tangent into OS choices, Fedora and Debian) In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain> References: <1067629499.21719.73.camel@localhost.localdomain> Message-ID: <1067644757.5702.219.camel@haze.sr.unh.edu> On Fri, 2003-10-31 at 14:44, Vann H. Walke wrote: > What are the alternatives? > [snip] > - Fedora - Planned releases 2-3 times a year. So, if I build a system > on the Fedora release scheduled this Monday, who will be providing > security patches for it 2 years from now (after 4-6 new releases have > been dropped). My guess is no-one. Again, we're in the do it yourself > maintenance or frequent OS upgrade mode. > [snip] > - Debian - Could be a good option, but to some extent you end up in the > same position as Fedora. How often do the releases come out. Who > supports the old releases? What hardware / software will work on the > platform? If Fedora achieves 2-3 upgrades per year then it will be fairly different from Debian, which seems to be at 2-3 years per upgrade these days, (well almost). After a new release comes out Debian supports the old one for a period of time (12 months?) with security updates before pulling the plug. Debian can be upgraded in place as opposed to requiring a full resinstall; while this is great for desktops and servers, I'm not sure if this is important for a cluster. As a result of the extended release cycle Debian stable tends to lack support for the newest hardware (Opteron 64-bit, for example). This is why Knoppix, which is based on Debian, isn't derived from Debian stable, but rather from packages in the newer releases (testing, unstable and experimental). But the flip side is that the stable release, while dated, tends to work well as it's had a lot of testing. Debian could probably use more recognition as a target platform by commercial software vendors but it incorporates a huge number of packages including many open source applications pertinent to science. Breadth in packaged applications is probably more important for workstations since clusters tend to use small numbers of apps very intensely. As a distribution Debian is more oriented towards servers than the desktop (to the point that frustrated users have spawned the "Debian Desktop" subproject). It seems to me that clusters have more in common with servers than with desktops so that Debian's deliberate release rate is a better match for the cluster environment than distros which release often in order to incorporate the latest GUI improvements. P.S. While looking into the number of packages in Debian vs. Fedora I stumbled across this frightening bit (gotta throw a Halloween reference in somewhere) on the Fedora site: http://fedora.redhat.com/participate/terminology.html > Packages in Fedora Extras should avoid conflicts with other packages > in Fedora Extras to the fullest extent possible. Packages in Fedora > Extras must not conflict with packages in Fedora Core. It seems that Fedora intends to achieve applications breadth through "Fedora Extras" package sets in other repositories, but the prohibition of conflicts between Extras packages isn't as strong as the absolute prohibition of conflicts between Extras and Core packages. Could this result in a new era of DLL hell a few years down the road? Wow, I guess I just slung some FUD at Fedora, but maintaining a 2-3 releases per year rate probably requires a small core, putting the bulk of applications into the Extras category and thus increasing the chance of conflict. (Wasn't that the original recipe for DLL hell?) Debian has avoided this through a much larger core, which of course slows the release cycle. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Fri Oct 31 17:37:36 2003 From: ctierney at hpti.com (Craig Tierney) Date: 31 Oct 2003 15:37:36 -0700 Subject: sum of matrices In-Reply-To: <20031031213852.87539.qmail@web12206.mail.yahoo.com> References: <20031031213852.87539.qmail@web12206.mail.yahoo.com> Message-ID: <1067639856.6209.211.camel@hpti10.fsl.noaa.gov> On Fri, 2003-10-31 at 14:38, Mathias Brito wrote: > Hi, > > Last days I write a code(in c) that make the sum of 2 > matrices. Let me say a little about how it works. I > send 1 row of the 1st matrice and 1 row of 2nd matrice > for each process, when a process finish its job, if > have more lines i send more to it and it make the sum > of these new 2 lines. The problem is, the program > works fine with 100x100(or less) matrice, but when I > increase this range, something like 10000x10000 i > receive the fallowing message: > > p0_8467: p4_error: Child process exited while making > connection to remote process on node2: 0 > > This is a MPI problem or it`s my code? What can I do > to fix this problem. It is probably your code. Are you allocating the matrix statically or dynamically? Try increasing the stack size on your node(s). Craig > > > ===== > Mathias Brito > Universidade Estadual de Santa Cruz - UESC > Departamento de Ci?ncias Exatas e Tecnol?gicas > Estudante do Curso de Ci?ncia da Computa??o > > Yahoo! Mail - o melhor webmail do Brasil > http://mail.yahoo.com.br > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sp at scali.com Fri Oct 31 19:52:23 2003 From: sp at scali.com (Steffen Persvold) Date: Sat, 01 Nov 2003 01:52:23 +0100 Subject: Cluster Poll Results (tangent into OS choices) In-Reply-To: References: Message-ID: <3FA303C7.8050600@scali.com> Joel Jaeggli wrote: > On Fri, 31 Oct 2003, Steffen Persvold wrote: [] > >>Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't >>claim support from RH for more than one of the systems. > > > read the liscsense agreement for you redhat enterprise disks... > Well the EULA doesn't say anything about having to pay $792 for each node in a cluster (actually it doesn't mention paying license fee's at all). The only relevant stuff I can find is item 2, "Intellectual Property Rights" : "If Customer makes a commercial redistribution of the Software, unless a separate agreement with Red Hat is executed or other permission granted, then Customer must modify the files identified as REDHAT-LOGOS and anaconda-image to remove all images containing the Red Hat trademark or the Shadowman logo. Merely deleting these files may corrupt the Software." And I wouldn't say that installing on your cluster nodes is "making a commercial redistribution" would you ? Or have I missed something fundamental ? Regards, Steffen _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf