From Andrew.Cannon at nnc.co.uk  Wed Oct  1 04:12:32 2003
From: Andrew.Cannon at nnc.co.uk (Cannon, Andrew)
Date: Wed, 1 Oct 2003 09:12:32 +0100
Subject: RH8 vs RH9
Message-ID: <DD1E19A9AFC2D311A32200508B5589EF0519DA28@nnc.co.uk>

Hi All,

We have a small test cluster running RH8 which seems to work well. We are
going to expand this cluster and I was wondering what, if any, are the
advantages of installing the cluster using RH9 instead of RH8? Are there any
disadvantages?

Thanks

Andrew

Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
Cheshire, WA16 8QZ.

Telephone; +44 (0) 1565 843768
email: mailto:andrew.cannon at nnc.co.uk
NNC website: http://www.nnc.co.uk


NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856).  The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ.

This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed.  If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From leopold.palomo at upc.es  Wed Oct  1 04:21:59 2003
From: leopold.palomo at upc.es (Leopold Palomo Avellaneda)
Date: Wed, 1 Oct 2003 10:21:59 +0200
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
References: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
Message-ID: <200310011001.31106.lepalom@vilma.upc.es>

A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> Dont overlook lm_sensors+cron
>
Why?


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From leopold.palomo at upc.es  Wed Oct  1 04:24:21 2003
From: leopold.palomo at upc.es (Leopold Palomo Avellaneda)
Date: Wed, 1 Oct 2003 10:24:21 +0200
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
References: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
Message-ID: <200310011001.31106.lepalom@vilma.upc.es>

A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> Dont overlook lm_sensors+cron
>
Why?


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  1 08:24:13 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 1 Oct 2003 08:24:13 -0400 (EDT)
Subject: RH8 vs RH9
In-Reply-To: <DD1E19A9AFC2D311A32200508B5589EF0519DA28@nnc.co.uk>
Message-ID: <Pine.LNX.4.44.0310010800030.18171-100000@lilith.rgb.private.net>

On Wed, 1 Oct 2003, Cannon, Andrew wrote:

> Hi All,
> 
> We have a small test cluster running RH8 which seems to work well. We are
> going to expand this cluster and I was wondering what, if any, are the
> advantages of installing the cluster using RH9 instead of RH8? Are there any
> disadvantages?

Many humans wonder about that, given the very short time that RH8 was
around before RH9 came out.  The usual rule is that major number
upgrades are associated with changes in core libraries that break binary
compatibility, so that binaries built for RH 8 are not guaranteed to
work for RH 9.

I think that the easiest way for you to determine precisely what changed
is to look at e.g. 

  ftp://ftp.dulug.duke.edu/pub/redhat/linux/9/en/os/i386/RELEASE-NOTES

and see if anything in there is important to your work.  Beyond that,
there are a few issues to consider:

   a) 8 will, probably fairly soon, be no longer maintained.  9 will be,
at least for a while (possibly for one more year).  Of course the
maintenance issue right now is very cloudy for RH in general with the
Fedora/RHEL situation a work in progress.  However, maintenance alone is
(in my opinion) a good reason to be using 9 and to move from 8 to 9 to
achieve it.  Fedora will likely be strongly derived from 9 and the
current rawhide in any event.  How the "community based" RH release will
end up being maintained is the interesting question.  One possibility is
"as rapidly as RHEL plus a few days", the difference being the time
required to download the GPL-required logo-free source rpm(s) after an
update and rebuild them and insert them into the community version.  Or
of course you can spring for a RHEL license (set) for your cluster,
which may or may not be reasonable in cost or scale well per node by the
time all the University-price dickering is done.

   b) 9 had some fairly significant library upgrades, service upgrades,
and bug fixes.  That doesn't mean 8 is "bad" -- it just means that your
chances of encountering trouble with 9 are in principle smaller than
with 8, and one hopes that the upgrades added a bit to performance as
well.

  c) A lot of the enhancements in 9 were more useful or relevant to
userspace and LAN client operation (CUPS or Open Office, for example)
than they were to cluster nodes.  So in that sense perhaps it doesn't
matter as much.

We're using 9 on a bunch of hosts and nodes with happiness.  We're also
using 7.3 (still) on a bunch of hosts and nodes with happiness.  We
skipped 8 only because they released 9 before we finished creating a
stable/tested 8 repository as RH changed their release cycle and dropped
the .0, .1 and so forth "correction" releases.

I don't know that we'll ever use RHEL with happiness unless RH charges
something like $1 per system as their university price (which isn't
insane, actually, given that an entire university can install and
maintain, as Duke does, off of a single campus-local repository largely
run by and debugged by and maintained by campus administrators, so RH's
costs don't scale at all strongly with the number of internal campus RH
systems).  

Fedora, quite possibly, but as noted we are fearful, uncertain, and
doubtful at the moment, for once because of real issues and not just as
a sort of Microsoft joke...

    rgb

> 
> Thanks
> 
> Andrew
> 
> Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
> Cheshire, WA16 8QZ.
> 
> Telephone; +44 (0) 1565 843768
> email: mailto:andrew.cannon at nnc.co.uk
> NNC website: http://www.nnc.co.uk
> 
> 
> 
> NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856).  The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ.
> 
> This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed.  If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lepalom at vilma.upc.es  Wed Oct  1 04:01:30 2003
From: lepalom at vilma.upc.es (Leopold Palomo Avellaneda)
Date: Wed, 1 Oct 2003 10:01:30 +0200
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
References: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
Message-ID: <200310011001.31106.lepalom@vilma.upc.es>

A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> Dont overlook lm_sensors+cron
>
Why?


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From thornton at yoyoweb.com  Wed Oct  1 08:34:40 2003
From: thornton at yoyoweb.com (Thornton Prime)
Date: Wed, 01 Oct 2003 05:34:40 -0700
Subject: RH8 vs RH9
In-Reply-To: <DD1E19A9AFC2D311A32200508B5589EF0519DA28@nnc.co.uk>
References: <DD1E19A9AFC2D311A32200508B5589EF0519DA28@nnc.co.uk>
Message-ID: <1065011679.1923.16.camel@localhost.localdomain>


> We have a small test cluster running RH8 which seems to work well. We are
> going to expand this cluster and I was wondering what, if any, are the
> advantages of installing the cluster using RH9 instead of RH8? Are there any
> disadvantages?

You should check out the release notes. On the whole, I'd say there
isn't much advantage unless you can take advantage of NTPL. Most of the
other enhancements were primarily for desktop users.

The next release should be 2.6-kernel ready, so rather than 9 you may
consider experimenting with Severn or Taroon.  Taroon has much better
support for 64-bit platforms, if you are headed there.

thornton

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  1 08:37:44 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 1 Oct 2003 08:37:44 -0400 (EDT)
Subject: Environment monitoring
In-Reply-To: <200310011001.31106.lepalom@vilma.upc.es>
Message-ID: <Pine.LNX.4.44.0310010824380.18171-100000@lilith.rgb.private.net>

On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote:

> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> > Dont overlook lm_sensors+cron
> >
> Why?

On a system equipped with an internal sensor, lm_sensors can often read
e.g. core CPU temperature on the system itself.  A polling cron script
can then read this and take action, e.g. initiate a shutdown if it
exceeds some threshold.

There are good and bad things about this.  A good thing is it addreses
the real problem -- overheating in the system itself -- and not room
temperature.  CPU's can overheat because of a fan failure when the room
remains cold, and a sensors-driven poweroff can then save your hardware
on a node by node basis.

The bad thing is that it does NOT give you any sort of measure of room
temperature per se, although if you have the poweroff script send you
mail first, getting deluged with N messages as the entire cluster shuts
down would be a good clue that your room cooling failed:-).  Also,
lm_sensors has the API from hell.  In fact, I would hardly call it an
API.  One has to pretty much craft a polling script on the basis of each
supported sensor independently, which requires you to know WAY more than
you ever wanted to about the particular sensor your system may or may
not have.

Alas, if only somebody would give the lm_sensors folks a copy of a good
book on XML for christmas, and they decided to take the monumental step
of converting /proc/sensors into a single xml-based file with the
RELEVANT information presented in toplevel tags like 

  <cpu_temp id="0" units="C">50.4</cpu_temp> 

and the irrelevant information presented in tags like

  <hardware><name>lm78</name><version>1.22a</version></hardware>

then we could ALL reap the fruits of their labor without needing a copy
of the lm78 version 1.22a API manual and having to write an application
that supports each of the sensors THROUGH THEIR INTERFACE one at a
time...;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rocky at atipa.com  Wed Oct  1 09:46:25 2003
From: rocky at atipa.com (Rocky McGaugh)
Date: Wed, 1 Oct 2003 08:46:25 -0500 (CDT)
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0310010824380.18171-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310010832310.15194-100000@rocky>


On Wed, 1 Oct 2003, Robert G. Brown wrote:

> Alas, if only somebody would give the lm_sensors folks a copy of a good
> book on XML for christmas, and they decided to take the monumental step
> of converting /proc/sensors into a single xml-based file with the
> RELEVANT information presented in toplevel tags like 
> 
>   <cpu_temp id="0" units="C">50.4</cpu_temp> 
> 
> and the irrelevant information presented in tags like
> 
>   <hardware><name>lm78</name><version>1.22a</version></hardware>
> 
> then we could ALL reap the fruits of their labor without needing a copy
> of the lm78 version 1.22a API manual and having to write an application
> that supports each of the sensors THROUGH THEIR INTERFACE one at a
> time...;-)

We have that. lm_sensors+cron+gmond. Nice little XML stream on every node
with every other nodes temps. One can keep a range of tolerance for cpu0, 
cpu1, motherboard, and disk temps and shutdown whenever you need to. 

a netbotz would be cooler though. i'd still use the lm_sensors+cron+gmond
and still have the netbotz as a toy..:)

-- 
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://67.8450073/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lepalom at upc.es  Wed Oct  1 10:13:46 2003
From: lepalom at upc.es (Leopold Palomo)
Date: Wed, 1 Oct 2003 16:13:46 +0200
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0310010824380.18171-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310010824380.18171-100000@lilith.rgb.private.net>
Message-ID: <200310011613.46297.lepalom@upc.es>

A Dimecres 01 Octubre 2003 14:37, Robert G. Brown va escriure:
> On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote:
> > A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> > > Dont overlook lm_sensors+cron
> >
> > Why?
>
> On a system equipped with an internal sensor, lm_sensors can often read
> e.g. core CPU temperature on the system itself.  A polling cron script
> can then read this and take action, e.g. initiate a shutdown if it
> exceeds some threshold.
>
> There are good and bad things about this.  A good thing is it addreses
> the real problem -- overheating in the system itself -- and not room
> temperature.  CPU's can overheat because of a fan failure when the room
> remains cold, and a sensors-driven poweroff can then save your hardware
> on a node by node basis.
>
> The bad thing is that it does NOT give you any sort of measure of room
> temperature per se, although if you have the poweroff script send you
> mail first, getting deluged with N messages as the entire cluster shuts
> down would be a good clue that your room cooling failed:-).  Also,
> lm_sensors has the API from hell.  In fact, I would hardly call it an
> API.  One has to pretty much craft a polling script on the basis of each
> supported sensor independently, which requires you to know WAY more than
> you ever wanted to about the particular sensor your system may or may
> not have.
>
> Alas, if only somebody would give the lm_sensors folks a copy of a good
> book on XML for christmas, and they decided to take the monumental step
> of converting /proc/sensors into a single xml-based file with the
> RELEVANT information presented in toplevel tags like
>
>   <cpu_temp id="0" units="C">50.4</cpu_temp>
>
> and the irrelevant information presented in tags like
>
>   <hardware><name>lm78</name><version>1.22a</version></hardware>
>
> then we could ALL reap the fruits of their labor without needing a copy
> of the lm78 version 1.22a API manual and having to write an application
> that supports each of the sensors THROUGH THEIR INTERFACE one at a
> time...;-)

Ok. I was a bit surprise about your sentence. I know that lmsensors is not 
perfect, but it does their job. Ok, I don't think that use lm_sensors to try 
to calculate the T of the room is a bit excesive.

About the xml,... well, ok, it would be a nice feature, but as plain text, 
knowing your hardware it's so good, too.

Best Regards.

Pd How about the pdf, ps, etc?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  1 10:33:29 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 1 Oct 2003 10:33:29 -0400 (EDT)
Subject: Environment monitoring
In-Reply-To: <200310011613.46297.lepalom@upc.es>
Message-ID: <Pine.LNX.4.44.0310011000290.18171-100000@lilith.rgb.private.net>

On Wed, 1 Oct 2003, Leopold Palomo wrote:

> Ok. I was a bit surprise about your sentence. I know that lmsensors is not 
> perfect, but it does their job. Ok, I don't think that use lm_sensors to try 
> to calculate the T of the room is a bit excesive.
> 
> About the xml,... well, ok, it would be a nice feature, but as plain text, 
> knowing your hardware it's so good, too.
> 

Sorry, I tend to get distracted and rant from time to time (even though
as Greg noted, sometimes the rants are of lesser quality:-). In this
particular case the rant is really directed to all of /proc, but the
sensors interface is the worst example of the lot.

I'm "entitled" to rant because I've written two tools (procstatd and
xmlsysd) that parse all sorts of data, including sensors data in
procstatd, out and provide it to clients for monitoring purposes.  Even
my daemon wasn't the first to do this, but I think it was one of the
first two that functioned as a binary without running a shell script or
the like on each node.  procstatd actually predated ganglia by a fair
bit, FWIW.

On the basis of this fairly extensive experience I can say that
lmsensors output is very poorly organized from the perspective of
somebody trying to write a general purpose parser to extract the data it
provides.  In particular, it uses a directory tree structure where the
PARTICULAR sensors interface that you have appears as part of the path,
and where what you find underneath that path depends on the particular
sensor that you've got as well.

Hopefully it is obvious how Evil this makes it from the point of view of
somebody trying to write a general purpose tool to parse it.  Basically,
to write such a tool one has to go through the lmsensors sources and
reverse engineer each interface it supports to determine what is
produced and where, one at a time.  This is more than slightly nuts.
What do "most" sensors provide?  Fields like cpu temperature (for cpu's
0-N), fan speed (for fans 0-N), core voltage (for lines 0-N).  

Sure, some provide more, some provide less, but what are we discussing?
The monitoring of cpu temperature, under the reasonable assumption that
either we have a sensor that provides it or we don't, and that we really
don't give a rodent's furry touchis WHICH sensor we have as long as it
gives us "CPU Temperature", preferrably for every CPU.

So a good API is one that has a single file entitled /proc/sensors, and
in that file one finds things like:

<?xml version="1.0"?>
<sensors>
  <cpu_temperature id="0" units="C">54.2</cpu_temperature>
  <cpu_temperature id="1" units="C">51.7</cpu_temperature>
  <fan_speed id="0"....

  <hardware>
    <type>lm78</type>
    <blah>...</blah>
    <blah>...</blah>
  </hardware>
</sensors>

I can write code to parse this in a few minutes of work, literally, and
the same code will work for all interfaces that lm_sensors might
support, and I don't need to know the interface the system has in it
beforehand (although with the knowledge I might add some advanced
features if it supports them).  Presenting the knowledge is also trivial
-- a web interface might be as sparse as a reader/parser and/or a DTD.

Compare to parsing something like (IIRC)

  /proc/sensors/device-with-a-bunch-of-numbers/subunit/field

where the path that you find under specific devices-with-numbers depends
on the toplevel value on a device by device basis and the contents of
field can as well.  Yech.

And Rocky, hiding the problem with gmond is fine, but then it puts the
burden for writing an API for the API on the poor people that have to
support the gmond interface.  Yes they can (and I could) do this.  I
personally refuse.  They obviously have gritted their teeth and done so.
The correct solution is clearly to redo the lm_sensors interface itself
so that it is organized as the above indicates.

Which criticism, by the way, applies to a LOT of /proc, which currently
looks like it was organized by a bunch of wild individualists who have
handled every emergent subfield by overloading its data in a single
"field" line, usually with documentation only in the form of reading
procps or kernel source.  Just because this is actually true doesn't
excuse it.  Parsing the contents of /proc is maddening for just this
reason, and the cost is a lot of needless complexity, pointless bugs and
upgrade incompatibilities for many people.  Putting the data into
xml-wrapped form would be a valuable exercise in the discipline of
structuring data, for the most part.

   rgb

> Best Regards.
> 
> Pd How about the pdf, ps, etc?

I'll try to work on this as soon as I can.  My task list for the day
looks something like a) debug/fix some dead nodes; b) add a requested
feature/view to wulfstat (that has been on hold for a week or more:-(,
c) work on a bunch of documents associated with teaching and curriculum
at Duke (sigh); d) about eight more tasks, none of which I will likely
get to, including work on my research.

However, this is about the third or fourth time people have requested
a "fix" for the ps/pdf/font issue (with acroread it can even fail
altogether to read the document -- presumably some gs/acrobat
incompatibility where I use gs-derived tools) so I'll try very hard to
craft some sort of fix by the weekend.

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Wed Oct  1 12:36:26 2003
From: becker at scyld.com (Donald Becker)
Date: Wed, 1 Oct 2003 12:36:26 -0400 (EDT)
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0310010832310.15194-100000@rocky>
Message-ID: <Pine.LNX.4.44.0310011137280.1770-100000@training.scyld.com>

On Wed, 1 Oct 2003, Rocky McGaugh wrote:

> On Wed, 1 Oct 2003, Robert G. Brown wrote:
> > Alas, if only somebody would give the lm_sensors folks a copy of a good
> > book on XML for christmas, and they decided to take the monumental step
...
> > then we could ALL reap the fruits of their labor without needing a copy
> > of the lm78 version 1.22a API manual and having to write an application
> > that supports each of the sensors THROUGH THEIR INTERFACE one at a
> > time...;-)
> 
> We have that. lm_sensors+cron+gmond.

I think you missed RGB's point.  The lm_sensors implementation sucks.
Sure, any one specific implementation can be justified.  But having each
implementation use a different output and calibration shows that this
is not an architecture, just a collection of hacks.

The usual reply at this point is "just update the user-level script for
the new motherboard type".  Yup... and you should probably update the
constants in your programs' delay loops at the same time.

With lm_sensors you can get a one-off hack working, but cannot implement
a general case.  Compare this to IPMI, which presents the same
information.  IPMI has a crufty design and ugly implementations, but it
is an architected system.  With care you can implement and deploy code
that works on a broad range of current and future machine.

While I'm on the soapbox, gmond deserves its own mini-butane-torch
flame.
I implemented the translator from Beostat (our status/statistics
subsystem) to gmond (per-machine information for Ganglia), so I have a
pretty good side-by-side comparison.

First, how did they choose what statistics to present?
Apparently just because the numbers were there.

What is the point of using a XML DTD if it is just used to
package undefined data types?  A wrapper around a wrapper...

Example metric lines:
<METRIC NAME="load_fifteen" VAL="1.41" TYPE="float" UNITS="" TN="246"
 TMAX="950" DMAX="0" SLOPE="both" SOURCE="gmond"/>
<METRIC NAME="proc_total" VAL="77" TYPE="uint32" UNITS="" TN="154"
   TMAX="950" DMAX="0" SLOPE="both" SOURCE="gmond"/>
Not only are these metric types not enumerated, they are made more
confusing by abbreviations and no definition.

To tie both together:  What is "proc_total"?
Number of processors?  Number of processes?  Does it count system
daemons?  It seems to be the useless number "ps x | wc", rather than
the number of end user, application processes.

Many statistics are only usable when used/presented as a set.  Why split
the numbers into multiple elements?  It just multiplies the size and
parsing load.

____
Background: Beostat is our status/statistics interface that we published
3+ years ago.  It exports interfaces at multiple levels:
    network protocol,
    shared memory table
       only for very performance sensitive programs, such as schedulers
    dynamic library
       the preferred interface for programs
    command output
Thus Beostat is a infrastructure subsystem, rather than a single-purpose
stack of programs.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From johnb at quadrics.com  Wed Oct  1 09:59:16 2003
From: johnb at quadrics.com (John Brookes)
Date: Wed, 1 Oct 2003 14:59:16 +0100
Subject: Upper bound on no. of sockets
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com>

I think there is a 1k per-process limit on open sockets. It's tuneable in
2.4 kernels, IIRC, but I don't remember how (off the top of my head).
'ulimit -n' adjusts the max number of fd's, but I'm not sure that'll take it
past a/the kernel limit. Maybe recompile kernel? Maybe poke
/proc/sys/.../...? Maybe adjust in userland?

Maybe use fewer sockets ;-)

Does anybody know the score?

Cheers,

John Brookes
Quadrics

> -----Original Message-----
> From: Balaji Rangasamy [mailto:br66 at HPCL.CSE.MsState.Edu]
> Sent: 30 September 2003 05:44
> To: beowulf at beowulf.org
> Subject: Upper bound on no. of sockets
> 
> 
> Hi,
> Is there an upper bound on the number of sockets that can be 
> created by a
> process? If there is one, is the limitation enforced by OS? 
> And what other
> factors does it depend on? Can you please be specific on the 
> numbers for
> different OS (RH Linux 7.2) ?
> Thank you very much,
> Balaji.
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rokrau at yahoo.com  Wed Oct  1 13:06:40 2003
From: rokrau at yahoo.com (Roland Krause)
Date: Wed, 1 Oct 2003 10:06:40 -0700 (PDT)
Subject: RH8 vs RH9 (Robert G. Brown)
In-Reply-To: <200310011504.h91F4DY02889@NewBlue.Scyld.com>
Message-ID: <20031001170640.91750.qmail@web40002.mail.yahoo.com>


--- beowulf-request at scyld.com wrote:
>    6. Re:RH8 vs RH9 (Robert G. Brown)
> From: "Robert G. Brown" <rgb at phy.duke.edu>
> Many humans wonder about that, given the very short time that RH8 was
> around before RH9 came out.  The usual rule is that major number
> upgrades are associated with changes in core libraries that break
> binary
> compatibility, so that binaries built for RH 8 are not guaranteed to
> work for RH 9.

Indeed some of them wont, I have first hand experience that binaries
produced with the Intel Fortran compiler on RH-8, even when statically
linked, will not run on a RH-9 system. Further, if you need the Intel
Fortan compiler, RH-9 is not really an option for you because it is not
officially supported and it will not be either. Inofficially I can
confirm that it works fine if you are not using the OpenMP capabilities
of the compiler. 

> achieve it.  Fedora will likely be strongly derived from 9 and the
> current rawhide in any event.  How the "community based" RH release
> will
> end up being maintained is the interesting question.  One possibility
> is
> "as rapidly as RHEL plus a few days", the difference being the time
> required to download the GPL-required logo-free source rpm(s) after
> an
> update and rebuild them and insert them into the community version. 

Having used fedora in the past on a desktop client I am hopeful that it
will be possible to get all necessary packages for a cluster into an
'aptable' repository, be it hosted by fedora or somewhere else (think
e.g.  sourceforge). If people work together, as they have in the past,
I dont see why RH would succeed pushing their rediculous price policies
upon cluster users.

Roland


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at math.ucdavis.edu  Wed Oct  1 18:10:14 2003
From: bill at math.ucdavis.edu (Bill Broadley)
Date: Wed, 1 Oct 2003 15:10:14 -0700
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0310011137280.1770-100000@training.scyld.com>
References: <Pine.LNX.4.44.0310010832310.15194-100000@rocky> <Pine.LNX.4.44.0310011137280.1770-100000@training.scyld.com>
Message-ID: <20031001221014.GA28394@sphere.math.ucdavis.edu>


I'd recommend:
	http://www.maxim-ic.com/quick_view2.cfm/qv_pk/2820

For $25.00 you have a trivial to interface to temperature probe that
even is smart enough to collect samples even if a machine is down
(complete with time stamp).  It will even build a histogram of temp
samples for you.

It's kinda cool that you can leave one in your luggage or send it up
in a space probe and then get periodic samples when you arrive at your
destination.

In anyways people use them for all kinds of things, even in space:
	http://www.voiceofidaho.org/tvnsp/01atchrn.htm

More info:
	http://www.ibutton.com/ibuttons/thermochron.html

They can also be connected via USB, Parallel, and serial.  The other
cool feature is they are chainable, so we have one behind the machine
(i.e. rack temp), one on top of the rack (room temp), and one at the
airconditioner output all on one wire.  Each button has a guarenteed
unique 64 bit ID.

Once you get a feel for the dynamics of the system it becomes really
easy to spot anomalies.

Recommended, the thermo buttons are cheaper, but IMO for most things
the thermocron premium is worth it so you can have continuous sampling
even if a machine crashes.

The logs are very handy for fighting when facilities to combat the well
it's not really getting that hot that often kinda thing.

Oh, I guess I should mention I have no financial ties to any of
the mentioned companies.  So no I won't sell you one.

-- 
Bill Broadley
Mathematics
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Wed Oct  1 18:36:35 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 01 Oct 2003 15:36:35 -0700
Subject: more on structural models for clusters
Message-ID: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>

In regards to my recent post looking for cluster implementations for 
structural dynamic models, I would like to add that I'm interested in 
"highly distributed" solutions where the computational load for each 
processor is very, very low, as opposed to fairly conventional (and widely 
available) schemes for replacing the Cray with a N-node cluster.

The number of processors would be comparable to the number of structural 
nodes (to a first order of magnitude)

Imagine you had something like a geodesic dome with a microprocessor at 
each vertex that wanted to compute the loads for that vertex, communicating 
only with the adjacent vertices...

Trivial, egregiously simplified,  and demo cases are just fine, and, in 
fact, probably preferable....


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Wed Oct  1 19:19:26 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Wed, 1 Oct 2003 16:19:26 -0700
Subject: RH8 vs RH9 (Robert G. Brown)
In-Reply-To: <20031001170640.91750.qmail@web40002.mail.yahoo.com>
References: <200310011504.h91F4DY02889@NewBlue.Scyld.com> <20031001170640.91750.qmail@web40002.mail.yahoo.com>
Message-ID: <20031001231926.GA2900@greglaptop.internal.keyresearch.com>

On Wed, Oct 01, 2003 at 10:06:40AM -0700, Roland Krause wrote:

> Inofficially I can confirm that it works fine if you are not using
> the OpenMP capabilities of the compiler.

Which is no surprise, as the thread library stuff changed fairly
radically in RedHat 9. I have some sympathy for Intel's compiler guys
on that issue.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  1 19:01:55 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 2 Oct 2003 09:01:55 +1000
Subject: RH8 vs RH9
In-Reply-To: <Pine.LNX.4.44.0310010800030.18171-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310010800030.18171-100000@lilith.rgb.private.net>
Message-ID: <200310020901.57000.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 1 Oct 2003 10:24 pm, Robert G. Brown wrote:

> a) 8 will, probably fairly soon, be no longer maintained.  9 will be,
> at least for a while (possibly for one more year).

Updates for 7.3 ends on December 31st 2003.
Updates for 8.0 ends on December 31st 2003.
Updates for 9 ends on April 30th 2004.

So going to 9 will only get you an extra 4 months of updates.

http://www.redhat.com/apps/support/errata/

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/e1zjO2KABBYQAh8RArjhAJoDUAq9xSKjz6pJ58nIvSk1GEqG2QCeJ7f3
5XYQ/rJIzUPP744CNvAOLXA=
=UNIB
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  1 18:58:21 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 2 Oct 2003 08:58:21 +1000
Subject: Environment monitoring
In-Reply-To: <200310011001.31106.lepalom@vilma.upc.es>
References: <Pine.LNX.4.44.0309301523100.11455-100000@rocky> <200310011001.31106.lepalom@vilma.upc.es>
Message-ID: <200310020858.30401.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 1 Oct 2003 06:21 pm, Leopold Palomo Avellaneda wrote:

> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> > Dont overlook lm_sensors+cron
>
> Why?

Presumably because you can use it to monitor the temp and fan sensors and 
stuff and raise alarms if they go out of bounds.

	http://secure.netroedge.com/~lm78/

And from the info page:

Project Mission / Background / Ethics:

 The primary mission for our project is to provide the best and most complete 
hardware health monitoring drivers for Linux. We strive to produce well 
organized, efficient, safe, flexible, and tested code free of charge to all 
Linux users using the Intel x86 hardware platform. The project attempts to 
support as many related devices as possible (when testing and documentation 
is available), especially those which are commonly included on mainboards.

 Our drivers provide the base software layer for utilities to acquire data on 
the environmental conditions of the hardware. We also provide a sample 
text-oriented utility to display sensor data. While this simple utility is 
sufficient for many users, others desire more elaborate user interfaces. We 
leave the development of these GUI-oriented utilities to others. See our 
useful addresses page for references. 

http://secure.netroedge.com/~lm78/info.html

NB: I've used these at home from time to time, but we don't use them on our
      IBM cluster as we can grab the same info out of CSM.

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/e1wQO2KABBYQAh8RApUxAJ0V9QuvuGOLCnS7qXCkWD+9/OrOlgCfezuT
QQ5wnTot9uoJCy3tRjuDKAQ=
=fDWX
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Wed Oct  1 18:27:58 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 01 Oct 2003 15:27:58 -0700
Subject: cluster computing for mechanical structural FEM models
Message-ID: <5.2.0.9.2.20031001152545.03110070@mailhost4.jpl.nasa.gov>

I'm looking for references to work on distributed computing for structural 
models like trusses and spaceframes.  They are typically sparse/diagonalish 
matrices that represent the masses and springs, so distributing the work in 
a cluster seems a natural fit.

Anybody done anything like this (as a demonstration, e.g.) say, using 
NASTRAN inputs?


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From vanw at tticluster.com  Thu Oct  2 08:37:50 2003
From: vanw at tticluster.com (Kevin Van Workum)
Date: Thu, 2 Oct 2003 08:37:50 -0400 (EDT)
Subject: lm_sensors output
Message-ID: <Pine.LNX.4.53.0310020833040.14105@tticluster.com>

The recent discussion on environment sensors motivated me to take the
subject more seriously. I therefore installed lm_senors on one of my nodes
for testing. I simply used the lm_sensors RPM from RH8.0, ran
sensors-detect and did what it told me to do. It apparently worked. The
problem is, I don't really know what the output means or what I should be
looking for. I guess I'm a novice. Anyways, the output from sensors is
shown below.

What is VCore and why is mine out of range?
What are all the other voltages describing?
V5SB is out of range also, is that a bad thing?
I have only 1 CPU, so I guess temp2 and fan2 are meaningless, right?

$ sensors
w83697hf-isa-0290
Adapter: ISA adapter
Algorithm: ISA algorithm
VCore:     +1.50 V  (min =  +0.00 V, max =  +0.00 V)
+3.3V:     +3.29 V  (min =  +2.97 V, max =  +3.63 V)
+5V:       +5.02 V  (min =  +4.50 V, max =  +5.48 V)
+12V:     +12.20 V  (min = +10.79 V, max = +13.11 V)
-12V:     -12.85 V  (min = -13.21 V, max = -10.90 V)
-5V:       -5.42 V  (min =  -5.51 V, max =  -4.51 V)
V5SB:      +5.51 V  (min =  +4.50 V, max =  +5.48 V)
VBat:      +3.29 V  (min =  +2.70 V, max =  +3.29 V)
fan1:     4687 RPM  (min =  187 RPM, div = 32)
fan2:        0 RPM  (min =  187 RPM, div = 32)
temp1:       +53?C  (limit =  +60?C, hysteresis = +127?C) sensor = thermistor
temp2:    +208.0?C  (limit =  +60?C, hysteresis =  +50?C) sensor = thermistor
alarms:
beep_enable:
          Sound alarm disabled

Kevin Van Workum, Ph.D.
www.tsunamictechnologies.com
ONLINE COMPUTER CLUSTERS

__/__ __/__ *
 /     /   /
/     /   /
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From AlberT at SuperAlberT.it  Thu Oct  2 03:35:57 2003
From: AlberT at SuperAlberT.it (AlberT)
Date: Thu, 2 Oct 2003 09:35:57 +0200
Subject: Upper bound on no. of sockets
In-Reply-To: <Pine.GSO.4.44.0309292328210.10011-100000@aurora.cs.msstate.edu>
References: <Pine.GSO.4.44.0309292328210.10011-100000@aurora.cs.msstate.edu>
Message-ID: <200310020935.58006.AlberT@SuperAlberT.it>

On Tuesday 30 September 2003 06:44, Balaji Rangasamy wrote:
> Hi,
> Is there an upper bound on the number of sockets that can be created by a
> process? If there is one, is the limitation enforced by OS? And what other
> factors does it depend on? Can you please be specific on the numbers for
> different OS (RH Linux 7.2) ?
> Thank you very much,
> Balaji.
>

from man setrlimit:

[quote]

getrlimit  and  setrlimit  get  and set resource limits respectively.  Each 
resource has an
       associated soft and hard limit, as defined by the rlimit structure (the  
rlim  argument  to
       both getrlimit() and setrlimit()):

            struct rlimit {
                rlim_t rlim_cur;   /* Soft limit */
                rlim_t rlim_max;   /* Hard limit (ceiling
                                      for rlim_cur) */
            };

       The  soft  limit is the value that the kernel enforces for the 
corresponding resource.  The
       hard limit acts as a ceiling for the soft limit: an unprivileged 
process may only  set  its
       soft  limit  to  a value in the range from 0 up to the hard limit, and 
(irreversibly) lower
       its hard limit.  A privileged process may make arbitrary changes to 
either limit value.

       The value RLIM_INFINITY denotes no limit on a resource (both in the 
structure  returned  by
       getrlimit() and in the structure passed to setrlimit()).

[snip]

 RLIMIT_NOFILE
              Specifies a value one greater than the maximum file descriptor 
number  that  can  be
              opened  by  this  process.   Attempts  (open(), pipe(), dup(), 
etc.)  to exceed this
              limit yield the error EMFILE.

[/QUOTE]


-- 
<?php echo '       Emiliano `AlberT` Gabrielli       '."\n".
           '  E-Mail: AlberT at SuperAlberT.it  '."\n".
           '  Web:    http://SuperAlberT.it  '."\n".
'  IRC:    #php,#AES azzurra.com '."\n".'ICQ: 158591185'; ?>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathog at mendel.bio.caltech.edu  Thu Oct  2 11:33:21 2003
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 02 Oct 2003 08:33:21 -0700
Subject: Environment monitoring
Message-ID: <E1A55SD-00071v-00@mendel.bio.caltech.edu>

Robert G. Brown  rgb at phy.duke.edu wrote:

>The bad thing is that it does NOT give you any sort of measure of room
>temperature per se,

Well, no, but to be fair that's hardly lm_sensors fault.
The problem is that few (any?) motherboards have a
sensor positioned away from hot devices on the upstream
end of the wind flow.  One can sometimes acquire a fair
approximation of this info using SMART from a hard drive
if the airflow across the drive is good and
the drive itself does not run very hot.  We have not yet
filled the second processor slot on the mobos of our beowulf
and that temperature sensor gives a pretty good indication
of the air temperature in the case (32C) vs. under a live
Athlon MP 2200+ processor (no load, 40.5C). 

We use lm_sensors with mondo 

  http://mondo-daemon.sourceforge.net/

to watch the systems and shut them down if they overheat.

Generally this works well.  Mondo can compensate for
the shortcomings of the lm_sensors/motherboard combos which
sometimes arise.  For instance, on our ASUS A7V266 mobos
(workstations, not in a beowulf!) some of the sensors tend
to go whacky for one or two measurements.  Fan speeds go
to 0 or temps to 255C.  Mondo is set to require an out
of range condition for 3 seconds before triggering
a shutdown, and so far we have not seen a glitch last that
long.

Regards,


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcmoore at atipa.com  Thu Oct  2 13:56:04 2003
From: jcmoore at atipa.com (Curt Moore)
Date: 02 Oct 2003 12:56:04 -0500
Subject: lm_sensors output
In-Reply-To: <Pine.LNX.4.53.0310020833040.14105@tticluster.com>
References: <Pine.LNX.4.53.0310020833040.14105@tticluster.com>
Message-ID: <1065117364.12473.27.camel@picard.lab.atipa.com>

This is really the bad thing about lm_sensors which some have touched on
previously; too much guesswork.  Many times even if the drivers are
present and up to date for your specific hardware, the values may be
meaningless as different board manufacturers may choose to physically
connect the monitoring chip(s) to different onboard devices, such as in
the case with fans.  You have to have a knowledge of which onboard piece
of hardware is connected to which input of the monitoring chip in order
to make sense of the sensors output.  Don't get me wrong, when
lm_sensors works, it works great but sometimes it takes a little work to
get to that point.

Even if the values are sane for your hardware, you still have to go into
the sensors.conf and set max, min, and hysteresis values, if you so
choose, in order to have this information make sense for your specific
hardware.

In recent months, vendors such as Tyan have begun to distribute
customized sensors.conf files for their boards which take into account
the differences between boards and how sensor chips are connected to the
onboard devices for each of their boards.

As Don mentioned earlier, IPMI is more generalized and is much easier to
ask for "CPU 1 Temperature" and actually get "CPU 1 Temperature" instead
of data from some other onboard thermistor.  A mistake in this area
could end up costing time and money if something overheats and it's not
detected because of polling the wrong data.

>From my experience, it would be very difficult to come up with a
generalized set of sensors values to work across differing motherboard
types.  A "standard" such as IPMI makes things much easier to accurately
collect and act upon as all of the "hard" work has already been done by
those implementing IPMI on the hardware.  One would hope that these
individuals would have the in-depth knowledge of exactly which values to
map to which sensor inputs and any computations needed for these values
so that clean and accurate values are returned when the hardware is
polled.

-Curt

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Curt Moore
Systems Integration Engineer
At?pa Technologies
jcmoore at atipa.com (O) 785-813-0312 (Fax) 785-841-1809
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From brbarret at osl.iu.edu  Thu Oct  2 13:05:54 2003
From: brbarret at osl.iu.edu (Brian Barrett)
Date: Thu, 2 Oct 2003 10:05:54 -0700
Subject: Upper bound on no. of sockets
In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com>
References: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com>
Message-ID: <AF182665-F4FA-11D7-A069-000A959EBB70@osl.iu.edu>

On Oct 1, 2003, at 6:59 AM, John Brookes wrote:

> I think there is a 1k per-process limit on open sockets. It's tuneable 
> in
> 2.4 kernels, IIRC, but I don't remember how (off the top of my head).
> 'ulimit -n' adjusts the max number of fd's, but I'm not sure that'll 
> take it
> past a/the kernel limit. Maybe recompile kernel? Maybe poke
> /proc/sys/.../...? Maybe adjust in userland?
>
> Maybe use fewer sockets ;-)
>
> Does anybody know the score?

On linux, there is a default per-process limit of 1024 (hard and soft 
limits) file descriptors.  You can see the per-process limit by running 
limit (csh/tcsh) or ulimit -n (sh).  There is also a limit on the total 
number of file descriptors that the system can have open, which you can 
find by looking at /proc/sys/fs/file-max.  On my home machine, the max 
file descriptor count is around 104K (the default), so that probably 
isn't a worry for you.

There is the concept of a soft and hard limit for file descriptors.  
The soft limit is the "default limit", which is generally set to 
somewhere above the needs of most applications.  The soft limit can be 
increased by a normal user application up to the hard limit.  As I said 
before, the defaults for the soft and hard limits on modern linux 
machines are the same, at 1024.  You can adjust either limit by adding 
the appropriate lines in /etc/security/limits.conf (at least, that 
seems to be the file on both Red Hat and Debian).  In theory, you could 
set the limit up to file-max, but that probably isn't a good idea.  You 
really don't want to run your system out of file descriptors.

There is one other concern you might want to think about.  If you ever 
use any of the created file descriptors in a call to select(), you have 
to ensure all the select()ed file descriptors fit in an FD_SET.  On 
Linux, the size of an FD_SET is hard-coded at 1024 (on most of the 
BSDs, Solaris, and Mac OS X, it can be altered at application compile 
time).  So you may not want to ever set the soft limit above 1024.  
Some applications may expect that any file descriptor that was 
successfully created can be put into an FD_SET.  If this isn't the 
case, well, life could get interesting.


Hope this helps,

Brian

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csmith at lnxi.com  Thu Oct  2 11:45:40 2003
From: csmith at lnxi.com (Curtis Smith)
Date: Thu, 2 Oct 2003 09:45:40 -0600
Subject: lm_sensors output
References: <Pine.LNX.4.53.0310020833040.14105@tticluster.com>
Message-ID: <006001c388fc$3b67cb60$a423a8c0@blueberry>

VCore is the voltage of the CPU #1. You can get the full definition of all
values at http://www2.lm-sensors.nu/~lm78/.

Curtis Smith
Principle Software Engineer
Linux Networx Inc. (www.lnxi.com)

----- Original Message ----- 
From: "Kevin Van Workum" <vanw at tticluster.com>
To: <beowulf at beowulf.org>
Sent: Thursday, October 02, 2003 6:37 AM
Subject: lm_sensors output


> The recent discussion on environment sensors motivated me to take the
> subject more seriously. I therefore installed lm_senors on one of my nodes
> for testing. I simply used the lm_sensors RPM from RH8.0, ran
> sensors-detect and did what it told me to do. It apparently worked. The
> problem is, I don't really know what the output means or what I should be
> looking for. I guess I'm a novice. Anyways, the output from sensors is
> shown below.
>
> What is VCore and why is mine out of range?
> What are all the other voltages describing?
> V5SB is out of range also, is that a bad thing?
> I have only 1 CPU, so I guess temp2 and fan2 are meaningless, right?
>
> $ sensors
> w83697hf-isa-0290
> Adapter: ISA adapter
> Algorithm: ISA algorithm
> VCore:     +1.50 V  (min =  +0.00 V, max =  +0.00 V)
> +3.3V:     +3.29 V  (min =  +2.97 V, max =  +3.63 V)
> +5V:       +5.02 V  (min =  +4.50 V, max =  +5.48 V)
> +12V:     +12.20 V  (min = +10.79 V, max = +13.11 V)
> -12V:     -12.85 V  (min = -13.21 V, max = -10.90 V)
> -5V:       -5.42 V  (min =  -5.51 V, max =  -4.51 V)
> V5SB:      +5.51 V  (min =  +4.50 V, max =  +5.48 V)
> VBat:      +3.29 V  (min =  +2.70 V, max =  +3.29 V)
> fan1:     4687 RPM  (min =  187 RPM, div = 32)
> fan2:        0 RPM  (min =  187 RPM, div = 32)
> temp1:       +53?C  (limit =  +60?C, hysteresis = +127?C) sensor =
thermistor
> temp2:    +208.0?C  (limit =  +60?C, hysteresis =  +50?C) sensor =
thermistor
> alarms:
> beep_enable:
>           Sound alarm disabled
>
> Kevin Van Workum, Ph.D.
> www.tsunamictechnologies.com
> ONLINE COMPUTER CLUSTERS
>
> __/__ __/__ *
>  /     /   /
> /     /   /
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Oct  2 17:25:20 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 2 Oct 2003 17:25:20 -0400 (EDT)
Subject: Power Supply: Supermicro P4DL6 Board?
In-Reply-To: <Pine.LNX.4.44.0309301919260.1770-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.44.0310021716220.3304-100000@coffee.psychology.mcmaster.ca>

> > disks are much, much cooler than they used to be, probably dropping 
> > below power consumed by ram on most clusters.
> 
> Note that most performance-oriented RAM types now have metal cases and
> heat sinks.  They didn't add the metal because it _looks_ cool.

I'm not so sure.  I looked at the spec for a current samsung pc333 ddr
512Mb chip, and it works out to about 16W per GB.  I think most people 
still have 512MB dimms, and probably pc266 (13.6W/GB).  I don't really see 
why a dimm would have trouble dissipating ~20W, considering its size.
I suspect dimm heatsinks are actually a fashion statement inspired by the 
heat-spreaders found on some rambus rimms (which were *spreaders*, a
consequence of how rambus does power management...)

personally, I'm waiting till I can invest in peltier-cooled dimms ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Oct  2 19:08:39 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 3 Oct 2003 09:08:39 +1000
Subject: more on structural models for clusters
In-Reply-To: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>
References: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>
Message-ID: <200310030908.41322.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 2 Oct 2003 08:36 am, Jim Lux wrote:

> Imagine you had something like a geodesic dome with a microprocessor at
> each vertex that wanted to compute the loads for that vertex, communicating
> only with the adjacent vertices...

The nearest I can remember to something like that (which sounds like an 
excellent idea) was for a fault tolerant model built around processors 
connected in a grid where each monitored the neighbours and if one was seen 
to go bad it could be sent a kill signal and the grid would logically reform 
without that processor.

I think I read it in New Scientist between 1-4 years ago, but this abstract 
from the IEEE Transactions on Computers sounds similar (you've got to pay for 
the full article apparently):

	http://csdl.computer.org/comp/trans/tc/1988/11/t1414abs.htm

A Multiple Fault-Tolerant Processor Network Architecture for Pipeline 
Computing

Good luck!
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/fK/3O2KABBYQAh8RArPyAKCCoaQXbywrq9h+3geGOVCE97dhgQCeKzV0
B94q2Yd0yPYFwDbcVINl/4w=
=rbMB
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Thu Oct  2 20:39:33 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 02 Oct 2003 17:39:33 -0700
Subject: more on structural models for clusters
In-Reply-To: <20031003002932.GA5984@sphere.math.ucdavis.edu>
References: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>
 <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>
Message-ID: <5.2.0.9.2.20031002173001.0310ce38@mailhost4.jpl.nasa.gov>

At 05:29 PM 10/2/2003 -0700, Bill Broadley wrote:
>On Wed, Oct 01, 2003 at 03:36:35PM -0700, Jim Lux wrote:
> > In regards to my recent post looking for cluster implementations for
> > structural dynamic models, I would like to add that I'm interested in
> > "highly distributed" solutions where the computational load for each
> > processor is very, very low, as opposed to fairly conventional (and widely
> > available) schemes for replacing the Cray with a N-node cluster.
> >
> > The number of processors would be comparable to the number of structural
> > nodes (to a first order of magnitude)
>
>Er, why bother?  Is there some reason to distribute those things so
>thinly?  Your average dell can do 1-4 Billion floating point ops/sec,
>why bother with so few per CPU?  Am I missing something?


Your average Dell isn't suited to inclusion as a MCU core in an ASIC at 
each node and would cost more than $10/node...  I'm looking at Z80/6502/low 
end DSP kinds of computational capability in a mesh containing, say, 
100,000 nodes.

Sure, we'd do algorithm development on a bigger machine, but in the end 
game, you're looking at zillions of fairly stupid nodes.  The commodity 
cluster aspect would only be in the development stages, and because it's 
much more likely that someone has solved the problem for a Beowulf (which 
is fairly loosely coupled and coarse grained) than for a big multiprocessor 
with tight coupling like a Cray.

Haven't fully defined the required performance yet, but, as a starting 
point, I'd need to "solve the system" in something like 100 
microseconds.  The key is that I need an algorithm for which the workload 
scales roughly linearly as a function of the number of nodes, because the 
computational power available also scales as the number of loads.

Clearly, I'm not going to do a brute force inversion or LU decomposition of 
a 100,000x100,000 matrix...  However, inverting 100,000 matrices, each, 
say, 10x10, is reasonable.

 >Bill Broadley
 >Mathematics
 >UC Davis

James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From he.94 at osu.edu  Thu Oct  2 20:59:55 2003
From: he.94 at osu.edu (Hao He)
Date: Thu, 02 Oct 2003 20:59:55 -0400
Subject: NFS Problem
Message-ID: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com>

Hi, there.

I am building  a cluster with 16 or 32 nodes, based on Pentium 4, Intel 875P
chipsets and Intel CSA Gigabit NIC.
The distribution is RedHat 9.
I have some experience before but I still got some problem in NFS.

Problem 1: When I just use 'rw' and 'intr' as the parameters used in
/etc/fstab, I got following problem when startup clients (while the server
with NFS daemon is running):
Mount: RPC: Remote system error -- no route to host
Then I added 'bg' to /etc/fstab, this time the result is better. Several
minutes after the client booted up, the remote directory mounted.
However, in many cases following meassage was prompted:
nfs warning: mount version older than kernel

Problem 2: I am mounting two remote directories from the server, however, at
some nodes, only one directory even no directory got mounted.
If only one directory mounted successfully, it differs from one client to
another, and to the same node, it changes from time to time at system
booting up, like dicing.
This really confused me.

Problem 3: Sometimes I got the message at the server node like this:
(scsi 0:A:0:0): Locking max tag count at 33.
However, seems it does not make trouble to mounted directories.
I think it must be related with NFS.

I have a further question: Since there may be 16 or 32 or even more clients
try to mount the remote directory at the same time,
can the NFS server really handle so much requests simultaneously? Is there
any effective alternate method to share data, besides NFS?

How to solve these problems? Any suggestion?
Thank you very much. I will appreciate your response.

Best wishes,
Hao He

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Fri Oct  3 01:13:37 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Fri, 3 Oct 2003 07:13:37 +0200
Subject: NFS Problem
In-Reply-To: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com>
References: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com>
Message-ID: <20031003051337.GA6263@unthought.net>

On Thu, Oct 02, 2003 at 08:59:55PM -0400, Hao He wrote:
> Hi, there.
> 
> I am building  a cluster with 16 or 32 nodes, based on Pentium 4, Intel 875P
> chipsets and Intel CSA Gigabit NIC.
> The distribution is RedHat 9.
> I have some experience before but I still got some problem in NFS.
> 
> Problem 1: When I just use 'rw' and 'intr' as the parameters used in
> /etc/fstab, I got following problem when startup clients (while the server
> with NFS daemon is running):
> Mount: RPC: Remote system error -- no route to host

That's a network problem or a network configuration problem.

Usually this would be a name resolution problem. Check that the hostname
in your fstab can be resolved in early boot (add it to your hosts file
if necessary), or use the IP address of the server instead.

But the error message seems to indicate that it's not resolution but
routing - very odd...  Is the network up?   Do you have any special
networking setup?

Try checking your init-scripts to see that the network is really started
before the NFS filesystems are mounted.

> Then I added 'bg' to /etc/fstab, this time the result is better. Several
> minutes after the client booted up, the remote directory mounted.

So you NFS mount depends on something (network related) that isn't up at
the time when the system tries to mount your NFS filesystems.

Either you have a special (and wrong) setup, or RedHat messed up good :)

Check the order in which things are started in your /etc/rc3.d/
directory.  Network should go before NFS.

> However, in many cases following meassage was prompted:
> nfs warning: mount version older than kernel

Most likely this is not really a problem - I've had systems with that
message work just fine.

You could check to see if RedHat has updates to mount.

> 
> Problem 2: I am mounting two remote directories from the server, however, at
> some nodes, only one directory even no directory got mounted.
> If only one directory mounted successfully, it differs from one client to
> another, and to the same node, it changes from time to time at system
> booting up, like dicing.
> This really confused me.

Isn't this problem 1 over again?

> 
> Problem 3: Sometimes I got the message at the server node like this:
> (scsi 0:A:0:0): Locking max tag count at 33.

That's a SCSI diagnostic. You can ignore it.

> However, seems it does not make trouble to mounted directories.
> I think it must be related with NFS.

It's not related to NFS.

> 
> I have a further question: Since there may be 16 or 32 or even more clients
> try to mount the remote directory at the same time,
> can the NFS server really handle so much requests simultaneously? Is there
> any effective alternate method to share data, besides NFS?

That should be no problem at all.

NFS should be up to the task with no special tuning at all.

Once you have all your nodes mounting NFS properly, you can start
looking into tuning for performance - but it really should work 'out of
the box' with no special tweaking.

> 
> How to solve these problems? Any suggestion?
> Thank you very much. I will appreciate your response.

Use the following options to the NFS mounts in your fstab:
  hard,intr

You can add
  rsize=8192,wsize=8192
for tuning.

You should not need 'bg' - although it may be convenient if you need to
be able to boot your nodes when the NFS server is down.

One thing you should make sure: never use host-names or netgroups in
your exports file on the server (!)   *Only* use IP addresses or
wildcards - *Never* use names.  Using names in your 'exports' file on
the server can cause *all* kinds of weird sporadic irreproducible
problems - it's a long-standing and extremely annoying problem, but
fortunately one that has an easy workaround.

Check:
*) Server: Your exports file (only IP or wildcard exports)
*) Clients: Your fstab (use server IP or name in hosts file)
*) Clients: Is network started before NFS mount?

Please write to the list about your progress  :)

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Andrew.Cannon at nnc.co.uk  Fri Oct  3 05:30:07 2003
From: Andrew.Cannon at nnc.co.uk (Cannon, Andrew)
Date: Fri, 3 Oct 2003 10:30:07 +0100
Subject: Filesystem question (sort of newbie)
Message-ID: <DD1E19A9AFC2D311A32200508B5589EF0519DA41@nnc.co.uk>

Hi All,

I am going to be setting up a 16 node cluster in the near future. I have
only set up a 4 node cluster before and I am a little unsure about how to
sort out the disk space.

Each computer will be running Red Hat (either 8 or 9 I haven't decided yet,
any advice is still appreciated), and I was wondering how to best organise
the disks on each node. 

I am thinking (only started wondering about this today) of installing the
cluster software on the master node (pvm, MPI and the actual calculation
software, MCNP) and mounting the disk on each of the other nodes, so that
all they have on their hard drives is the minimal install of RH. The
question I am asking is, will this work and what sort of performance hit
will there be? Would I be better installing the software on each computer?

TIA (sorry for being so stoopid, I'm still very much a learner at linux and
clustering)

Andy

Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
Cheshire, WA16 8QZ.

Telephone; +44 (0) 1565 843768
email: mailto:andrew.cannon at nnc.co.uk
NNC website: http://www.nnc.co.uk


NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856).  The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ.

This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed.  If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Fri Oct  3 05:32:34 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Fri, 03 Oct 2003 05:32:34 -0400
Subject: Filesystem question (sort of newbie)
In-Reply-To: <DD1E19A9AFC2D311A32200508B5589EF0519DA41@nnc.co.uk>
References: <DD1E19A9AFC2D311A32200508B5589EF0519DA41@nnc.co.uk>
Message-ID: <3F7D4232.3070900@lmco.com>

Andrew,

   Let me recommend Warewulf (warewulf-cluster.org). It
boots the nodes using RH 7.3 (although it should work with
8 or but I haven't tested it), but it boots into a small Ram
Disk (about 70 megs depending upon what you need on the
nodes). It's very easy to setup, configure and use, plus you
don't need to install RH on each node. Warewulf will use
a hard disk in the nodes if available for swap and local scratch
space. However, it will also work with diskless nodes (although
you don't get swap or scratch space).
   Warewulf will also take /home from the master node and
NFS mount it throughout the cluster. So you can install your
code on /home for all of the nodes.

Good Luck!

Jeff

> Hi All,
>
> I am going to be setting up a 16 node cluster in the near future. I have
> only set up a 4 node cluster before and I am a little unsure about how to
> sort out the disk space.
>
> Each computer will be running Red Hat (either 8 or 9 I haven't decided 
> yet,
> any advice is still appreciated), and I was wondering how to best 
> organise
> the disks on each node.
>
> I am thinking (only started wondering about this today) of installing the
> cluster software on the master node (pvm, MPI and the actual calculation
> software, MCNP) and mounting the disk on each of the other nodes, so that
> all they have on their hard drives is the minimal install of RH. The
> question I am asking is, will this work and what sort of performance hit
> will there be? Would I be better installing the software on each 
> computer?
>
> TIA (sorry for being so stoopid, I'm still very much a learner at 
> linux and
> clustering)
>
> Andy
>
> Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
> Cheshire, WA16 8QZ.
>
> Telephone; +44 (0) 1565 843768
> email: mailto:andrew.cannon at nnc.co.uk
> NNC website: http://www.nnc.co.uk
>
>
>
> NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC 
> Limited (no. 1120437), National Nuclear Corporation Limited (no. 
> 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited 
> (no. 235856).  The registered office of each company is at Booths 
> Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for 
> Technica-NNC Limited whose registered office is at 6 Union Row, 
> Aberdeen AB10 1DQ.
>
> This email and any files transmitted with it have been sent to you by 
> the relevant UK operating company and are confidential and intended 
> for the use of the individual or entity to whom they are addressed.  
> If you have received this e-mail in error please notify the NNC system 
> manager by e-mail at eadm at nnc.co.uk.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Dr. Jeff Layton
Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Fri Oct  3 08:59:52 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Fri, 03 Oct 2003 08:59:52 -0400
Subject: Filesystem question (sort of newbie)
In-Reply-To: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>
Message-ID: <3F7D72C8.3050409@lmco.com>

Mark Hahn wrote:

> > 8 or but I haven't tested it), but it boots into a small Ram
> > Disk (about 70 megs depending upon what you need on the
>
> alternately, it's almost trivial to PXE boot nodes, mount a simple
> root FS from a server/master, and use the local disk, if any, for
> swap and/or tmp.  one nice thing about this is that you can do it
> with any distribution you like - mine's RH8, for instance.
>
> personally, I prefer the nfs-root approach, probably because once
> you boot, you won't be wasting any ram with boot-only files.
> for a cluster of 48 nodes, there seems to be no drawback;
> for a much larger cluster, I expect all the boot-time traffic
> would be crippling, and you might want to use some kind of
> multicast to distribute a ramdisk image just once...
>

While I don't prefer the nfs-root approach, Warewulf
can do that as well (haven't tried it personally). What
kind of network do you use for the 48-node cluster?
Anybody else use the nfs-root approach?

The 70 megs used in the ram disk is pretty well thought
out. There are some basic things to boot the node, but
it also includes glibc and you can easily add MPICH,
LAM, Ganglia, SGE, etc. The developer has thought
out these packages very well so that only the pieces
of each of these packages that needs to be on the nodes
actually gets installed on the nodes. Very well thought
out.

Oh, one other thing. The image that goes to the nodes
via TFTP (over PXE) is compressed so it's about half
the size of the final ram disk. This really helps cut
down on network traffic (even works over my poor
rtl8139 network).

One of the things I'd like to experiment with is using
squasfs to reduce the size of the ram disk. IMHO, 70
megs is not very big, but reducing it to 30-40 Megs
might be worth the effort.

> regards, mark hahn.
>

Thanks!

Jeff


-- 
Dr. Jeff Layton
Senior Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Fri Oct  3 09:34:30 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Fri, 3 Oct 2003 09:34:30 -0400 (EDT)
Subject: Filesystem question (sort of newbie)
In-Reply-To: <3F7D4232.3070900@lmco.com>
Message-ID: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>

> 8 or but I haven't tested it), but it boots into a small Ram
> Disk (about 70 megs depending upon what you need on the

alternately, it's almost trivial to PXE boot nodes, mount a simple
root FS from a server/master, and use the local disk, if any, for 
swap and/or tmp.  one nice thing about this is that you can do it
with any distribution you like - mine's RH8, for instance.

personally, I prefer the nfs-root approach, probably because once
you boot, you won't be wasting any ram with boot-only files.
for a cluster of 48 nodes, there seems to be no drawback;
for a much larger cluster, I expect all the boot-time traffic 
would be crippling, and you might want to use some kind of 
multicast to distribute a ramdisk image just once...

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct  3 11:24:48 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 3 Oct 2003 11:24:48 -0400 (EDT)
Subject: Filesystem question (sort of newbie)
In-Reply-To: <DD1E19A9AFC2D311A32200508B5589EF0519DA41@nnc.co.uk>
Message-ID: <Pine.LNX.4.44.0310031112570.22807-100000@ganesh.phy.duke.edu>

On Fri, 3 Oct 2003, Cannon, Andrew wrote:

> Each computer will be running Red Hat (either 8 or 9 I haven't decided yet,
> any advice is still appreciated), and I was wondering how to best organise
> the disks on each node. 
> 
> I am thinking (only started wondering about this today) of installing the
> cluster software on the master node (pvm, MPI and the actual calculation
> software, MCNP) and mounting the disk on each of the other nodes, so that
> all they have on their hard drives is the minimal install of RH. The
> question I am asking is, will this work and what sort of performance hit
> will there be? Would I be better installing the software on each computer?
> 
> TIA (sorry for being so stoopid, I'm still very much a learner at linux and
> clustering)

If the nodes have lots of memory, most of their access to non-data disk
(programs and libraries) will come out of caches after the systems have
been up for a while, so they won't take a HUGE performance hit, but
things like loading a big program for the first time may take longer.

However, if you work to master PXE and kickstart (which go together like
ham and eggs) and have adequate disk, in the long run your maintenance
will be minimized by putting energy into developing a node kickstart
script.  Then you just boot the nodes into kickstart over the network,
wait a few minutes for the install and boot into production.

This will take you some time to learn (there are HOWTO-like resource
online, so it isn't a LOT of time) and if you got nodes with NICs that
don't support PXE you'll likely want to replace them or add ones that
do, but once you invest these capital costs the payback is that your
marginal cost for installing additional nodes after the first node you
get to install "perfectly" is so close to zero as to make no nevermind.
Make a dhcp table entry.  Boot node into install.  Boot node.
Reinstalling is exactly the same process and can be done in minutes if a
hard disk crashes.

It gets to be so easy that we almost routinely do a reinstall after
working on a system for any reason, including ones where it probably
isn't necessary.  You can reinstall a system from anywhere on the
internet (if your hardware is accessible and preconfigured for this to
work).  

Finally, if you include yum on the nodes, you can automagically update
the nodes from a master repository image on your server, and mirror your
server image from one of the Red hat mirrors, and actually maintain a
stream of updates onto the nodes with no further action on your part.

At this point, if you aren't doing Scyld or one of the preconfigured
cluster packages and want to roll your own cluster out of a base install
plus selected RPMs (and why not?) PXE+kickstart/RH+yum forms a pretty
solid low-energy paradigm for installation and maintenance once you've
learned how to make it work.

   rgb

> 
> Andy
> 
> Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
> Cheshire, WA16 8QZ.
> 
> Telephone; +44 (0) 1565 843768
> email: mailto:andrew.cannon at nnc.co.uk
> NNC website: http://www.nnc.co.uk
> 
> 
> 
> NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856).  The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ.
> 
> This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed.  If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ajiang at mail.eecis.udel.edu  Sat Oct  4 12:00:51 2003
From: ajiang at mail.eecis.udel.edu (Ao Jiang)
Date: Sat, 4 Oct 2003 12:00:51 -0400 (EDT)
Subject: Help: About Intel Fortran Compiler:
Message-ID: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>

  Hi, All:
  I tried to compile a Fortran 90 MPI program by
the Intel Frotran Compiler in the OSCAR cluster.
  I run the command:
"
ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
"
 The system failed to compile it and gave me the following information:
"
   module EHFIELD
   program FDTD3DPML
   external function RISEF
   external function WINDOWFUNCTION
   external function SIGMA
   external function GETISTART
   external function GETIEND
   external subroutine COM_EYZ
   external subroutine COM_EYX
   external subroutine COM_EZX
   external subroutine COM_EZY
   external subroutine COM_HYZ
   external subroutine COM_HYX
   external subroutine COM_HZX
   external subroutine COM_HZY

3228 Lines Compiled
/tmp/ifcVao851.o(.text+0x5a): In function `main':
: undefined reference to `mpi_init_'
/tmp/ifcVao851.o(.text+0x6e): In function `main':
: undefined reference to `mpi_comm_rank_'
/tmp/ifcVao851.o(.text+0x82): In function `main':
: undefined reference to `mpi_comm_size_'
/tmp/ifcVao851.o(.text+0xab): In function `main':
: undefined reference to `mpi_wtime_'
/tmp/ifcVao851.o(.text+0x422): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x448): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x47b): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x49e): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x4c1): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x4e7): more undefined references to `mpi_bcast_'
follow
/tmp/ifcVao851.o(.text+0x24511): In function `com_hzy_':
: undefined reference to `mpi_recv_'
/tmp/ifcVao851.o(.text+0x24b76): In function `com_hzy_':
: undefined reference to `mpi_send_'
"

At the same time, I tried the same program in the other scyld cluster,
using NAG compiler.

I use command:
"
f95 -I/usr/include -lmpi -lm -o p_wg3 p_fdtd3dwg3.f90
"

It works fine. So that means my fortran program in fine.

Both of the cluster use the MPICH implementation.

But because I have to work on that OSCAR cluster with Intel compiler,
I wonder
1. why the errors happen?
2. Is the problem of cluster or the Intel compiler?
3. How I can solve it.

I know there are a lot of guy with experience and experts of cluster and
MPI in this mailing list. I appreciate your suggestion and advice from
you.

Thanks.

Tom


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From br66 at HPCL.CSE.MsState.Edu  Sun Oct  5 00:52:45 2003
From: br66 at HPCL.CSE.MsState.Edu (Balaji Rangasamy)
Date: Sat, 4 Oct 2003 23:52:45 -0500 (CDT)
Subject: Upper bound on no. of sockets
In-Reply-To: <LDENIIHGLJNAFHFLJNOJGEGGCFAA.yudong@hsb.gsfc.nasa.gov>
Message-ID: <Pine.GSO.4.44.0310042336430.28474-100000@aurora.cs.msstate.edu>

Thanks a billion for all the responses. Here is another question: Is there
a way to
send some data to the listener when I do a connect()? I tried using
sin_zero field of the sockaddr_in structure, but quite unsuccessfully. The
problem is I want to uniquely identify the actively connecting process (IP
address and port number information wont suffice). I can send() the
identifier value to the listener after the connect(), but I want to cut
down the cost of an additional send. Any suggestions are greatly
appreciated.
Thanks,
Balaji.

PS: I am not sure if it is appropriate to send this question to this
mailing list. My sincere apologies for those who find this question
annoyingly incongruous.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at yahoo.com  Sat Oct  4 15:15:08 2003
From: lathama at yahoo.com (Andrew Latham)
Date: Sat, 4 Oct 2003 12:15:08 -0700 (PDT)
Subject: Filesystem question (sort of newbie)
In-Reply-To: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20031004191508.27391.qmail@web60306.mail.yahoo.com>

This is by far my favorite approach however I tend to tweak it with a very
large initrd and custom kernel. I am using older hardware with its max ram so I
use it as best I can. with no local harddisk I am always looking at the best
method of network file access and have gone so far as to try wget with http.


--- Mark Hahn <hahn at physics.mcmaster.ca> wrote:
> > 8 or but I haven't tested it), but it boots into a small Ram
> > Disk (about 70 megs depending upon what you need on the
> 
> alternately, it's almost trivial to PXE boot nodes, mount a simple
> root FS from a server/master, and use the local disk, if any, for 
> swap and/or tmp.  one nice thing about this is that you can do it
> with any distribution you like - mine's RH8, for instance.
> 
> personally, I prefer the nfs-root approach, probably because once
> you boot, you won't be wasting any ram with boot-only files.
> for a cluster of 48 nodes, there seems to be no drawback;
> for a much larger cluster, I expect all the boot-time traffic 
> would be crippling, and you might want to use some kind of 
> multicast to distribute a ramdisk image just once...
> 
> regards, mark hahn.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

=====
Andrew Latham

Penguin loving, moralist agnostic.

LathamA.com - (lay-th-ham-eh)
lathama at lathama.com - lathama at yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From franz.marini at mi.infn.it  Mon Oct  6 04:21:50 2003
From: franz.marini at mi.infn.it (Franz Marini)
Date: Mon, 6 Oct 2003 10:21:50 +0200 (CEST)
Subject: Help: About Intel Fortran Compiler:
In-Reply-To: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>
References: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>
Message-ID: <Pine.LNX.4.53.0310061010350.3375@merlino.mi.infn.it>

On Sat, 4 Oct 2003, Ao Jiang wrote:

> ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90

Try with :

  ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90

Btw, a cleaner way to compile mpi programs is to use the mpif90 
(mpif77 for fortran77) command (which is a wrapper for the real compiler). 
You should be able to make it use the ifc by setting the MPICH_F90 
(MPICH_F77 for fortran77) and MPICH_F90LINKER environment variables to 
choose which compiler to use, e.g. let's say you want to use the ifc 
compiler, and you're using bash, you would have to do:

  export MPICH_F90=ifc
  export MPICH_F90LINKER=ifc

and then, in order to compile your mpi program you should issue the 
command:

  mpif90 -o p_wg3 p_fdtd3dwg3_pml.f90


> 2. Is the problem of cluster or the Intel compiler?

Neither. Intel works fine with Oscar. 

Have a good day,

F.


---------------------------------------------------------
Franz Marini
Sys Admin and Software Analyst,
Dept. of Physics, University of Milan, Italy.
email : franz.marini at mi.infn.it
phone : +390250317221
--------------------------------------------------------- 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Mon Oct  6 07:22:35 2003
From: ds10025 at cam.ac.uk (ds10025 at cam.ac.uk)
Date: Mon, 06 Oct 2003 12:22:35 +0100
Subject: Copy files between Nodes via NFS
In-Reply-To: <Pine.GSO.4.44.0309171050160.16964-100000@verdun.ks.uiuc.ed
 u>
References: <003f01c37d29$47d7ec10$0e01010a@hpcncd.cpe.ku.ac.th>
Message-ID: <5.0.2.1.0.20031006121907.03a25120@hermes.cam.ac.uk>

Morning

I have basic node PC that NFS mount directories from the master node. When 
I try to copy files using 'cp' from the node to NFS mounted directory the 
node PC just hang.

Have any comes across this problem?


How best move/copy files across nodes?

Regards

Dan

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Mon Oct  6 07:21:02 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Mon, 06 Oct 2003 13:21:02 +0200
Subject: Intel compilers and libraries
Message-ID: <20031006112102.GC15837@sgirmn.pluri.ucm.es>

Hello:

We are thinking about purchasing the Intel C++ compiler for linux,
mainly for getting the most of our harware (Xeon 2.4Gz processors), we
are also interested in the Intel MKL (Math Kernel Library), I would like
to know if the performance gain using Intel compiler+libraries, which exploit
SSE2 and make other optimizations for P4/Xeon, are as good as Intel
claims, anyone in the list using those products?

On the other hand, isn't MKL just as good as any other good math library compiled
with Xeon/P4 optimization and extensions (using Intel C++ compiler for
example).

Another question, the only difference I can see reading Intel docs between
P4 and Xeon is more cache on Xeon, and HyperThreading (below P4/Xeon 3Ghz),
does it really makes a big difference taking into account the much more
expensive Xeons are. Any one having experience with both platforms.

Greetings:

Jose M. P?rez.
Madrid. Spain.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From j.a.white at larc.nasa.gov  Mon Oct  6 10:05:07 2003
From: j.a.white at larc.nasa.gov (Jeffery A. White)
Date: Mon, 06 Oct 2003 10:05:07 -0400
Subject: undefined references to pthread related calls
Message-ID: <3F817693.9040001@larc.nasa.gov>

Hi group,

  I have a user of my software (a f90 based CFD code using mpich) that 
is haveing trouble installing my code on
their system. They are using mpich and the Intel version 7.1 ifc 
compiler. The problem occurs at the link step.
They are getting undefined references to what appear to be system calls 
to pthread related functions such as
pthread_self, pthread_equal, pthread_mutex_lock. Does any one else 
encountered and know how to fix this problem?

Thanks,

Jeff


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel.kidger at quadrics.com  Mon Oct  6 03:23:32 2003
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Mon, 6 Oct 2003 08:23:32 +0100
Subject: Help: About Intel Fortran Compiler calling mpich
In-Reply-To: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>
References: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>
Message-ID: <200310060823.32239.daniel.kidger@quadrics.com>

Tom,
   this is the standard old chestnut about Fortran and trailing underscores on 
function names.

if you do say ' nm -a /opt/mpich-1.2.5/lib/libmpi.a |grep -i mpi_comm_rank' 
 I expect you will see  2 trailing underscores.  

Different Fortran vendors add a different number of underscores - some add 2 
by default (eg g77), some one (eg ifc), and some none. Sometimes there is a a 
compiler option to change this. 

There are three solutions to this issue:

1/ (Lazy option) recompile mpich several times; once with each Fortran 
compiler you have.
2/ Compile your application with the option that matches your prebuilt mpich 
(presumably 2 underscores - but note that ifc doesn't have an option for 
this)
3/ rebuild mpich with '-fno-second-underscore' (using say g77) . This is the 
common ground. You can link code to this with all current Fortran compilers.

You may also meet the 'mpi_getarg, x_argc' issue - this too is easy to fix.

-- 
Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

On Saturday 04 October 2003 5:00 pm, Ao Jiang wrote:
>   Hi, All:
>   I tried to compile a Fortran 90 MPI program by
> the Intel Frotran Compiler in the OSCAR cluster.
>   I run the command:
> "
> ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
> "
>  The system failed to compile it and gave me the following information:
> "
>    module EHFIELD
>    program FDTD3DPML
>    external function RISEF
>    external function WINDOWFUNCTION
>    external function SIGMA
>    external function GETISTART
>    external function GETIEND
>    external subroutine COM_EYZ
>    external subroutine COM_EYX
>    external subroutine COM_EZX
>    external subroutine COM_EZY
>    external subroutine COM_HYZ
>    external subroutine COM_HYX
>    external subroutine COM_HZX
>    external subroutine COM_HZY
>
> 3228 Lines Compiled
>
> /tmp/ifcVao851.o(.text+0x5a): In function `main':
> : undefined reference to `mpi_init_'
>
> /tmp/ifcVao851.o(.text+0x6e): In function `main':
> : undefined reference to `mpi_comm_rank_'
>
> /tmp/ifcVao851.o(.text+0x82): In function `main':
> : undefined reference to `mpi_comm_size_'
>
> /tmp/ifcVao851.o(.text+0xab): In function `main':
> : undefined reference to `mpi_wtime_'
>
> /tmp/ifcVao851.o(.text+0x422): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x448): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x47b): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x49e): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x4c1): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x4e7): more undefined references to `mpi_bcast_'
> follow
>
> /tmp/ifcVao851.o(.text+0x24511): In function `com_hzy_':
> : undefined reference to `mpi_recv_'
>
> /tmp/ifcVao851.o(.text+0x24b76): In function `com_hzy_':
> : undefined reference to `mpi_send_'
>
> "
>
> At the same time, I tried the same program in the other scyld cluster,
> using NAG compiler.
>
> I use command:
> "
> f95 -I/usr/include -lmpi -lm -o p_wg3 p_fdtd3dwg3.f90
> "
>
> It works fine. So that means my fortran program in fine.
>
> Both of the cluster use the MPICH implementation.
>
> But because I have to work on that OSCAR cluster with Intel compiler,
> I wonder
> 1. why the errors happen?
> 2. Is the problem of cluster or the Intel compiler?
> 3. How I can solve it.
>
> I know there are a lot of guy with experience and experts of cluster and
> MPI in this mailing list. I appreciate your suggestion and advice from
> you.
>
> Thanks.
>
> Tom
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Mon Oct  6 10:54:43 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Mon, 6 Oct 2003 14:54:43 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
Message-ID: <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>

Jose,

Pardon me for some advertising here, but our OptimaNumerics Linear
Algebra Library can very significantly outperform Intel MKL.
Depending on the particular routine and platform, we have seen
performance advantage of almost 32x (yes, that's 32 times!) using
OptimaNumerics Linear Algebra Library!

I can send you one of our white papers which shows performance
benchmark details off-line.  If anyone else is interested, please do
send me an e-mail also.


Best wishes,
Kenneth Tan
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------
This e-mail (and any attachments) is confidential and privileged.  It
is intended only for the addressee(s) stated above.  If you are not an
addressee, please accept my apologies and please do not use,
disseminate, disclose, copy, publish or distribute information in this
e-mail nor take any action through knowledge of its contents: to do so
is strictly prohibited and may be unlawful.  Please inform me that
this e-mail has gone astray, and delete this e-mail from your system.
Thank you for your co-operation.
-----------------------------------------------------------------------


On Mon, 6 Oct 2003, [iso-8859-1] Jos? M. P?rez S?nchez wrote:

> Date: Mon, 06 Oct 2003 13:21:02 +0200
> From: "[iso-8859-1] Jos? M. P?rez S?nchez" <iosephus at sgirmn.pluri.ucm.es>
> To: beowulf at beowulf.org
> Subject: Intel compilers and libraries
>
> Hello:
>
> We are thinking about purchasing the Intel C++ compiler for linux,
> mainly for getting the most of our harware (Xeon 2.4Gz processors), we
> are also interested in the Intel MKL (Math Kernel Library), I would like
> to know if the performance gain using Intel compiler+libraries, which exploit
> SSE2 and make other optimizations for P4/Xeon, are as good as Intel
> claims, anyone in the list using those products?
>
> On the other hand, isn't MKL just as good as any other good math library compiled
> with Xeon/P4 optimization and extensions (using Intel C++ compiler for
> example).
>
> Another question, the only difference I can see reading Intel docs between
> P4 and Xeon is more cache on Xeon, and HyperThreading (below P4/Xeon 3Ghz),
> does it really makes a big difference taking into account the much more
> expensive Xeons are. Any one having experience with both platforms.
>
> Greetings:
>
> Jose M. P?rez.
> Madrid. Spain.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From clwang at csis.hku.hk  Sun Oct  5 21:59:34 2003
From: clwang at csis.hku.hk (Cho Li Wang)
Date: Mon, 06 Oct 2003 09:59:34 +0800
Subject: Cluster2003: Call for Participation (Preliminary)
Message-ID: <3F80CC86.FAFBFAD2@csis.hku.hk>

----------------------------------------------------------------------

                         CALL FOR PARTICIPATION
 
       2003 IEEE International Conference on Cluster Computing
                         1 - 4 December 2003
                   Sheraton Hong Kong Hotel & Towers,
                   Tsim Sha Tsui, Kowloon, Hong Kong
                URL: http://www.csis.hku.hk/cluster2003/

                            Cosponsored by
                        IEEE Computer Society
          IEEE Computer Society Task Force on Cluster Computing
                IEEE Hong Kong Section Computer Chapter
                     The University of Hong Kong

                         Industrial Sponsors : 
                      Hewlett-Packard, Microsoft, 
                IBM, Extreme Networks, Sun Microsystems, 
                      Intel, Dawning, and Dell.
-----------------------------------------------------------------------

Dear Friends,

You are cordially invited to participate the annual international
cluster computing conference to be held on Dec. 1-4, 2003 in Hong Kong,
the most dynamic city in the Orient. The Cluster series of conferences
is one of the flagship events sponsored by the IEEE Task Force on
Cluster Computing (TFCC) since its inception in 1999. The competition
among refereed papers was particularly strong this year, with 48 papers
being selected as full papers from the 164 papers that were submitted,
for a 29% acceptance rate. An additional 19 papers were selected for
poster presentation. Besides the technical paper presentation, there
will be three keynotes, four tutorials, one panel, a Grid live demo
session, and a number of invited talks and exhibits to be arranged
during the conference period. A preliminary program schedule is attached
below. Please share this Call for Participation information with your
colleagues working in the area of cluster computing. For registration,
please visit our registration web page at:

     http://www.csis.hku.hk/cluster2003/registration.htm
  (The deadline for advance registration is October 22, 2003.)

TCPP Awards will be granted to students members, and will partially 
cover the registration and travel cost to attend the conference.
See : http://www.caip.rutgers.edu/~parashar/TCPP/TCPP-Awards.htm

We look forward meeting you in Hong Kong!


Cho-Li Wang and Daniel Katz
Cluster2003, Program Co-chairs

------------------------------------------------------------------
*****************************************
Cluster 2003 Preliminary Program Schedule
*****************************************

Monday, December 1
------------------ 

8:00-5:00 - Conference/Tutorial Registration
8:30-12:00:  Morning Tutorials
 
Designing Next Generation Clusters with Infiniband: Opportunities 
and Challenges 
    D. Panda (Ohio State University)

Using MPI-2: Advanced Features of the Message Passing Interface 
    W. Gropp, E. Lusk, R. Ross, R. Thakur (Argonne National Lab.)

12:00-1:30 - Lunch 
1:30-5:00 : Afternoon Tutorials
 
The Gridbus Toolkit for Grid and Utility Computing 
    R. Buyya (University of Melbourne)

Building and Managing Clusters with NPACI Rocks 
    G. Bruno, M. Katz, P. Papadopoulos, F. Sacerdoti, 
    NPACI Rocks group at San Diego Supercomputer Center), 
    L. Liew, N. Ninaba (Singapore Computing Systems)

************************
Tuesday, December 2
************************ 

7:00-5:00 Conference Registration 
9:00-9:15 Welcome and Opening Remarks 
9:15-10:15 Keynote 1 (TBA)
10:45-12:15 : Session 1A, 1B, 1C

Session 1A (Room A) : Scheduling I 

Dynamic Scheduling of Parallel Real-time Jobs by Modelling Spare 
Capabilities in Heterogeneous Clusters
   Ligang He, Stephen A. Jarvis, Graham R. Nudd, Daniel P. Spooner 
   (University of Warwick, UK)

Parallel Job Scheduling on Multi-Cluster Computing Systems
  Jemal Abawajy and S. P. Dandamudi (Carleton University, Canada)

Interstitial Computing: Utilizing Spare Cycles on Supercomputers
   Stephen Kleban and Scott Clearwater 
   (Sandia National Laboratories, USA)

Session 1B (Room B) : Applications 

A Cluster-Based Solution for High Performance Hmmpfam Using EARTH 
Execution Model
   Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen, Guang R. Gao
   (University of Delaware, USA)

Computing Large-scale Alignments on a Multi-cluster
   Chunxi Chen and Bertil Schmidt 
   (Nanyang Technological University, Singapore)

Auto-CFD: Efficiently Parallelizing CFD Applications on Clusters 
   Li Xiao (Michigan State University, USA),  
   Xiaodong Zhang (College of WIlliam and Mary, NSF, USA), 
   Zhengqian Kuang, Baiming Feng, Jichang Kang 
   (Northwestern Polytechnic University, China)

Session 1C (Room C)  :  Performance Analysis 

Performance Analysis of a Large-Scale Cosmology Application on 
Three Cluster Systems
  Zhiling Lan and Prathibha Deshikachar 
  (Illinois Institute of Technology, USA)

A Performance Monitor based on Virtual Global Time for Clusters of PCs
   Michela Taufer (UC San Diego, USA), Thomas M. Stricker 
   (ETH Zurich, Switzerland)

A Distributed Performance Analysis Architecture for Clusters
   Holger Brunst, Wolfgang E. Nagel (Dresden University of Technology, 
   Germany), Allen D. Malony (University of Oregon, USA)

12:15-2:00 Lunch 
2:00-3:30 : Session 2A, 2B, 2C

Session 2A (Room A) : Scheduling II 

Coordinated Co-scheduling in time-sharing Clusters through a Generic
Framework
   Saurabh Agarwal (IBM India Research Labs, India), Gyu Sang Choi, 
   Chita R. Das (Pennsylvania State University, USA), Andy B. Yoo 
   (Lawrence Livermore National Laboratory, USA), Shailabh Nagar 
   (IBM T.J. Watson Research Center, USA)

A Robust Scheduling Strategy for Moldable Jobs
  Sudha Srinivasan, Savitha Krishnamoorthy, P. Sadayappan 
  (Ohio State University, USA)

Towards Load Balancing Support for I/O-Intensive Parallel Jobs in a 
Cluster of Workstations
   Xiao Qin, Hong Jiang, Yifeng Zhu, David R. Swanson 
   (University of Nebraska-Lincoln, USA)

Session 2B (Room B) : Java 

JavaSplit: A Runtime for Execution of Monolithic Java Programs on 
Heterogeneous Collections of Commodity Workstations
   Michael Factor (IBM Research Lab in Haifa, Israel), Assaf Schuster, 
   Konstantin Shagin (Israel Institute of Technology, Israel)

Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, 
Myrinet and SCI Clusters
   Guillermo L. Taboada, Juan Touri?o, Ramon Doallo 
   (University of A Coruna, Spain)

Compiler Optimized Remote Method Invocation
   Ronald Veldema and  Michael Philippsen 
   (University of Erlangen-Nuremberg, Germany)

Session 2C (Room C) : Communication I
  
Optimizing Mechanisms for Latency Tolerance in Remote Memory Access 
Communication on Clusters 
   Jarek Nieplocha , V. Tipparaju, M. Krishnan 
   (Pacific Northwest National Laboratory, USA), 
   G. Santhanaraman, D.K. Panda (Ohio State University, USA) 

Impact of Computational Resource Reservation to the Communication 
Performance in the Hypercluster Environment
   Kai Wing Tse and P.K Lun (The Hong Kong Polytechnic University, 
   Hong Kong)

Kernel Implementations of Locality-Aware Dispatching Techniques for 
Web Server Clusters
   Michele Di Santo, Nadia Ranaldo, Eugenio Zimeo 
   (University of Sannio, Italy) 

3:30-4:00 Coffee Break 
4:00-4:30 Invited Talk 1 (Room C) : TBA 
4:30-5:00 Invited Talk 2 (Room C) : TBA 
5:30-7:30 Poster Session (Details Attached at the End)

6:00-7:30 Reception

******************************
Wednesday, December 3 
*******************************

8:30-5:00 Conference Registration 
9:00-10:00 Keynote 2 (Room C) TBA 
10:00-10:30 Coffee Break 
10:30-12:00 Session 3A, 3B, 3C

Session 3A (Room A): Middleware 

OptimalGrid: Middleware for Automatic Deployment of Distributed FEM 
Problems on an Internet-Based Computing Grid
   Tobin Lehman and James Kaufman 
   (IBM Almaden Research Center, USA) 

Adaptive Grid Resource Brokering 
   Abdulla Othman, Peter Dew, Karim Djemame, Iain Gourlay 
   (University of Leeds, UK)

HPCM: A Pre-compiler Aided Middleware for the Mobility of Legacy Code
   Cong Du, Xian-He Sun, Kasidit Chanchio 
   (Illinois Institute of Technology, USA)

Session 3B (Room B) : Cluster/Job Management I

The Process Management Component of a Scalable Systems Software
Environment
   Ralph Butler (Middle Tennessee State University, USA), 
   Narayan Desai, Andrew Lusk, Ewing Lusk 
   (Argonne National Laboratory,USA)

Load Distribution for Heterogeneous and Non-Dedicated Clusters Based 
on Dynamic Monitoring and Differentiated Services
   Liria Sato (University of  Sao Paulo, Brazil), 
   Hermes Senger(Catholic University of Santos, Brazil)

GridRM: An Extensible Resource Monitoring System
   Mark Baker and Garry Smith 
   (University of Portsmouth, UK)

Session 3C (Room C) : I/O I

A High Performance Redundancy Scheme for Cluster File Systems
   Manoj Pillai and Mario Lauria 
   (Ohio State University, USA)

VegaFS: A Prototype for File-sharing Crossing Multiple 
Administrative Domains
   Wei Li, Jianmin Liang, Zhiwei Xu 
   (Chinese Academy of Sciences, China)

Design and Performance of the Dawning Cluster File System
   Jin Xiong, Sining Wu, Dan Men, Ninghui Sun, Guojie Li 
   (Chinese Academy of Sciences, China)


12:00-1:30 Lunch 
1:30-3:00 Session 4A, 4B, Vender Talk 1

Session 4A (Room A) Novel Systems 

Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
   Aur?lien Bouteiller, Lemarinier, Krawezik, Cappello 
   (Universit? de Paris Sud, France)

A Performance Comparison of Linux and a Lightweight Kernel
   Ron Brightwell, Rolf Riesen, Keith Underwood 
  (Sandia National Laboratories, USA), Trammell B. Hudson 
  (Operating Systems Research, Inc.), 
  Patrick Bridges, Arthur B. Maccabe 
  (University of New Mexico, USA)

Implications of a PIM Architectural Model for MPI
   Arun Rodrigues, Richard Murphy, Peter Kogge, Jay Brockman 
  (University of Notre Dame, USA), Ron Brightwell, Keith Underwood 
  (Sandia National Laboratories, USA)

Session 4B (Room B) Cluster/Job Management II 

Reusable Mobile Agents for Cluster Computing
   Ichiro Satoh (National Institute of Informatics, Japan)

High Service Reliability For Cluster Server Systems
   M. Mat Deris, M.Rabiei, A. Noraziah, H.M. Suzuri 
   (University College of Science and Technology, Malaysia)

Wide Area Cluster Monitoring with Ganglia
   Federico D. Sacerdoti, Mason J. Katz 
   (San Diego Supercomputing Center, USA), 
   Matthew L. Massie, David E. Culler (UC Berkeley, USA)

Vender Talk 1 (Room C)

3:00-3:30 Coffee Break 
3:30-5:00 Panel Discussion 
6:30-8:30 Banquet Dinner (Ballroom, Conference Hotel)

****************************
Thursday, December 4
****************************
 
8:30-5:00 Conference Registration 

Special Technical Session : Dec. 4  (9am - 4:30pm)

Grid Demo - Life Demonstrations of Grid Technologies and Applications
Session Chairs: Peter Kacsuk (MTA SZTAKI Research Institute, Hungary), 
                Rajkumar Buyya (University of Melbourne, Australia)


9:00-10:00 Keynote 3 (Room C)  
10:00-10:30 Coffee Break 
10:30-12:00 Vender Talk 2, 5B, 5C

Vender Talk 2 (Room A) 

Session 5B (Room B) : Novel Software  

Efficient Parallel Out-of-core Matrix Transposition 
   Sriram Krishnamoorthy, Gerald Baumgartner, Daniel Cociorva, 
   Chi-Chung Lam, P Sadayappan (Ohio State University, USA)

A Case Study of Parallel I/O for Biological Sequence Search on Linux
Clusters
   Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson 
   (University of Nebraska-Lincoln, USA)

CTFS: A New Light-weight, Cooperative Temporary File System for 
Cluster-based Web Server
   Jun Wang (University of Nebraska-Lincoln, USA)

Session 5C (Room C) I/O II

Efficient Structured Data Access in Parallel File Systems
  Avery Ching, Alok Choudhary, Wei-keng Liao 
  (Northwestern University, USA), Robert Ross, William Gropp 
  (Argonne National Laboratory, USA)

View I/O: Improving the Performance of Non-contiguous I/O
   Florin Isaila and Walter F. Tichy (University of Karlsruhe, Germany)

Supporting Efficient Noncontiguous Access in PVFS over InfiniBand
   Jiesheng Wu (Ohio State University), Pete Wyckoff (Ohio Supercomputer 
   Center, USA), D.K. Panda (Ohio State University, USA)

12:00-2:00 Lunch 
2:00-2:30 Invited Talk 3 (Room C) 
2:30-3:00 Invited Talk 4 (Room C)
3:00-3:30 Coffee Break 
3:30-5:00 : Session 6A, 6B, 6C

Session 6A (Room A) : Scheduling III 

A General Self-adaptive Task Scheduling System for Non-dedicated 
Heterogeneous Computing 
   Ming Wu and Xian-He Sun (Illinois Institute of Technology, USA)

Adding Memory Resource Consideration into Workload Distribution for 
Software DSM Systems
   Yen-Tso Liu, Ce-Kuen Shieh (National Chung Kung University, Taiwan), 
   Tyng-Yeu Liang (National Kaohsiung University of Applied Sciences,
   Taiwan)

An Energy-Based Implicit Co-scheduling Model for Beowulf Cluster
   Somsak Sriprayoonsakul and Putchong Uthayopas 
   (Kasetsart University, Thailand)

Session 6B (Room B) : High Availability  
 
Availability Prediction and Modeling of High Availability OSCAR Cluster
   Lixin Shen, Chokchai Leangsuksun, Tong Liu, Hertong Song 
   (Louisiana Tech University, USA), Stephen L. Scott 
   (Oak Ridge National Laboratory, USA)

A System Recovery Benchmark for Clusters
   Ira Pramanick, James Mauro, Ji Zhu (Sun Microsystems, Inc., USA)

Performance Evaluation of Routing Algorithms in RHiNET-2 Cluster
   Michihiro Koibuchi, Konosuke Watanabe, Kenichi Kono, Akiya Jouraku,
   Hideharu Amano (Keio University, Japan)

Session 6C (Room C) : Communications II
 
Application-Bypass Reduction for Large-Scale Clusters
   Adam Wagner, Darius Buntias, D.K. Panda (Ohio State University, USA), 
   Ron Brightwell (Sandia National Laboratories, USA)

Improving the Performance of MPI Derived Datatypes by Optimizing 
Memory-Access Cost
   Surendra Byna (Illinois Institute of Technology, USA), 
   William Gropp (Argonne National Laboratory, USA), 
   Xian-He Sun (Illinois Institute of Technology, USA), 
   Rajeev Thakur (Argonne National Laboratory, USA)

Shared Memory Mirroring for Reducing Communication Overhead on 
Commodity Networks
   Jarek Nieplocha, B. Palmer, E. Apra 
   (Pacific Northwest National Laboratory, USA)

*************************************
5:00 : End of the Conference
*************************************
-------------------------------------------------------------------
Poster Session/Short Papers 

"Plug-and-Play" Cluster Computing using Mac OS X
   Dean Dauger (Dauger Research, Inc.) and 
   Viktor K. Decyk (UC Los Angeles, USA)

Improving Performance of a Dynamic Load Balancing System by Using 
Number of Effective Tasks
   Min Choi, Jung-Lok Yu, Seung-Ryoul Maeng 
   (Korea Advanced Institute of Science and Technology, Korea)

Dynamic Self-Adaptive Replica Location Method in Data Grids
   Dongsheng Li, Nong Xiao, Xicheng Lu, Kai Lu, Yijie Wang 
   (National University of Defense Technology, China)

Efficient I/O Caching in Data Grid and Cluster Management 
   Song Jiang (College of William and Mary, USA), 
   Xiaodong Zhang (National Science Foundation, USA)

Optimized Implementation of Extendible Hashing to Support Large 
File System Directory
   Rongfeng Tang, Dan Mend, Sining Wu 
  (Chinese Academy of Sciences, China) 

Parallel Design Pattern for Computational Biology and Scientific 
Computing Applications
   Weiguo Liu and Bertil Schmidt 
   (Nanyang Technological University, Singapore)

FJM: A High Performance Java Message Library
   Tsun-Yu Hsiao, Ming-Chun Cheng, Hsin-Ta Chiao, Shyan-Ming Yuan 
   (National Chiao Tung University, Taiwan)

Cluster Architecture with Lightweighted Redundant TCP Stacks
   Hai Jin and Zhiyuan Shao 
   (Huazhong University of Science and Technology, China)

>From Clusters to the Fabric: The Job Management Perspective
  Thomas R?oblitz, Florian Schintke, Alexander Reinefeld 
  (Zuse Institute Berlin, Germany)

Towards an Efficient Cluster-based E-Commerce Server
   Victoria Ungureanu, Benjamin Melamed, Michael Katehakis 
   (Rutgers University, USA)

A Kernel Running in a DSM - Design Aspects of a Distributed 
Operating System
   Ralph Goeckelmann, Michael Schoettner, Stefan Frenz, 
   Peter Schulthess (University of Ulm, Germany)

Distributed Recursive Sets: Programmability and Effectiveness for 
Data Intensive Applications
   Roxana Diaconescu (UC Irvine, USA)

Run-Time Prediction of Parallel Applications on Shared Environment
   Byoung-Dai Lee (University of Minnesota, USA), 
  Jennifer M. Schopf (Argonne National Laboratory, USA)

An Instance-Oriented Security Mechanism in Grid-based Mobile Agent
System
   Tianchi Ma and  Shanping Li (Zhejiang University, China)

A Hierarchical and Distributed Approach for Mapping Large Applications 
to Heterogeneous Grids using Genetic Algorithms 
   Soumya Sanyal, Amit Jain, Sajal Das 
   (University of Texas at Arlington, USA), 
   Rupak Biswas (NASA Ames Research Center, USA)

BCFG: A Configuration Management Tool for Heterogeneous Clusters
   Narayan Desai, Andrew Lusk, Rick Bradshaw, Remy Evard 
  (Argonne National Laboratory, USA)

Communication Middleware Systems for Heterogenous Clusters: 
A Comparative Study
   Daniel Balkanski, Mario Trams, Wolfgang Rehm 
   (Technische Universita Chemnitz, Germany)

QoS-Aware Adaptive Resource Management in Distributed Multimedia 
System Using Server Clusters
   Mohammad Riaz Moghal, Mohammad Saleem Mian 
   (University of Engineering and Technology, Pakistan)  

On the InfiniBand Subnet Discovery Process
   Aurelio Berm?dez, Rafael Casado, Francisco J. Quiles 
   (Universidad de Castilla-La Mancha, Spain), 
   Timothy M. Pinkston (University of Southern California, USA), 
   Jos? Duato (Universidad Polit?cnica de Valencia, Spain)

--------------------------------------------------------------

Chairs/Committees 

General Co-Chairs
Jack Dongarra (University of Tennessee)
Lionel Ni (Hong Kong University of Science and Technology)

General Vice Chair
Francis C.M. Lau (The University of Hong Kong)

Program Co-Chairs
Daniel S. Katz (Jet Propulsion Laboratory)
Cho-Li Wang (The University of Hong Kong)

Program Vice Chairs
Bill Gropp (Argonne National Laboratory) -- Middleware
Wolfgang Rehm (Technische Universit?t Chemnitz) -- Hardware
Zhiwei Xu (Chinese Academy of Sciences, China) -- Applications

Tutorials Chair
Ira Pramanick (Sun Microsystems)

Workshops Chair
Jiannong Cao (Hong Kong Polytechnic University)

Exhibits/Sponsors Chairs
Jim Ang (Sandia National Lab)
Nam Ng (The University of Hong Kong)

Publications Chair
Rajkumar Buyya (The University of Melbourne)

Publicity Chair
Arthur B. Maccabe (The University of New Mexico)

Poster Chair
Putchong Uthayopas (Kasetsart University)

Finance/Registration Chair
Alvin Chan (Hong Kong Polytechnic University)

Local Arrangements Chair
Anthony T.C. Tam (The University of Hong Kong)

Programme Committee

David Abramson (Monash U., Australia) 
Gabrielle Allen (Albert Einstein Institute, Germany) 
David A. Bader (U. of New Mexico, USA) 
Mark Baker (U. of Portsmouth, UK) 
Ron Brightwell (Sandia National Laboratory USA) 
Rajkumar Buyya (U. of Melbourne, Australia)
Giovanni Chiola (Universita' di Genova Genova, Italy) 
Sang-Hwa Chung (Pusan National U., Korea) 
Toni Cortes (Universitat Politecnica de Catalunya, Spain) 
Al Geist (Oak Ridge National Laboratory, USA) 
Patrick Geoffray (Myricom Inc., USA) 
Yutaka Ishikawa (U. of Tokyo, Japan) 
Chung-Ta King (National Tsing Hua U., Taiwan) 
Tomohiro Kudoh (AIST, Japan)
Ewing Lusk (Argonne National Laboratory, USA) 
Jens Mache (Lewis and Clark College, USA) 
Phillip Merkey (Michigan Tech U., USA) 
Matt Mutka (Michigan State U., USA) 
Charles D. Norton (JPL, California Institute of Technology, USA) 
D.K. Panda (Ohio State U., USA) 
Philip Papadopoulos (UC San Diego, USA) 
Myong-Soon Park (Korea U., Korea) 
Neil Pundit (Sandia National Laboratory, USA) 
Thomas Rauber (U. Bayreuth, Germany) 
Alexander Reinefeld (ZIB, Germany) 
Rob Ross (Argonne National Laboratory, USA) 
Gudula Ruenger (Chemnitz U. of Technology, Germany) 
Jennifer Schopf (Argonne National Laboratory, USA) 
Peter Sloot (U. of Amsterdam, Netherlands) 
Thomas Stricker (Institut fur Computersysteme, Switzerland) 
Ninghui Sun (Chinese Academy of Sciences, China) 
Xian-He Sun (Illinois Institute of Technology, USA) 
Rajeev Thakur (Argonne National Laboratory, USA) 
Putchong Uthayopas (Kasetsart U., Thailand) 
David Walker (U. of Wales Cardiff, UK) 
Xiaodong Zhang (NSF, USA)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Florent.Calvayrac at univ-lemans.fr  Mon Oct  6 11:54:09 2003
From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac)
Date: Mon, 06 Oct 2003 17:54:09 +0200
Subject: undefined references to pthread related calls
In-Reply-To: <3F817693.9040001@larc.nasa.gov>
References: <3F817693.9040001@larc.nasa.gov>
Message-ID: <3F819021.2050605@univ-lemans.fr>

Jeffery A. White wrote:
> Hi group,
> 
>  I have a user of my software (a f90 based CFD code using mpich) that is 
> haveing trouble installing my code on
> their system. They are using mpich and the Intel version 7.1 ifc 
> compiler. The problem occurs at the link step.
> They are getting undefined references to what appear to be system calls 
> to pthread related functions such as
> pthread_self, pthread_equal, pthread_mutex_lock. Does any one else 
> encountered and know how to fix this problem?
> 
> Thanks,
> 
> Jeff
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
> 


is the compiler installed on a Redhat 8.0 ?

Besides, maybe they use OpenMP/HPF directives and options
  which can mess up things and are usually useless on
a cluster with one CPU per node.


-- 
Florent Calvayrac                          | Tel : 02 43 83 26 26
Laboratoire de Physique de l'Etat Condense | Fax : 02 43 83 35 18
UMR-CNRS 6087         | http://www.univ-lemans.fr/~fcalvay
Universite du Maine-Faculte des Sciences   |
72085 Le Mans Cedex 9

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Oct  6 12:56:37 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 6 Oct 2003 12:56:37 -0400 (EDT)
Subject: Help: About Intel Fortran Compiler:
In-Reply-To: <Pine.LNX.4.53.0310061010350.3375@merlino.mi.infn.it>
Message-ID: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>

On Mon, 6 Oct 2003, Franz Marini wrote:

> On Sat, 4 Oct 2003, Ao Jiang wrote:
> 
> > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
> 
> Try with :
> 
>   ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90
> 
> Btw, a cleaner way to compile mpi programs is to use the mpif90 
> (mpif77 for fortran77) command (which is a wrapper for the real
> compiler).

Acckkk!!

This is one of the horribly broken things about most MPI
implementations.  It's not reasonable to say
    "to use this library you must use our magic compile script"
A MPI library should be just that -- a library conforming to system
standards.  You should be able to link it with just "-lmpi".

Most of the Fortran underscore issues may be hidden from the user with
weak linker aliases.

Similarly, it's not reasonable to say
    "to run this program, you must use our magic script"
You should be able to just run the program, by name, in the usual way.

Our BeoMPI implementation demonstrated how to do it right many years
ago, and we provided the code back to the community.  Many people on
this list seem to take the attitude "I've already learned the crufty
way, therefore the improvements don't matter."

One element of a high-quality library is ease of use, and in the long
run that matters more than a few percent faster for a specific function
call. 


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glen at mail.cert.ucr.edu  Mon Oct  6 13:24:57 2003
From: glen at mail.cert.ucr.edu (Glen Kaukola)
Date: Mon, 06 Oct 2003 10:24:57 -0700
Subject: Intel compilers and libraries
In-Reply-To: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
Message-ID: <3F81A569.80805@cert.ucr.edu>

Jos? M. P?rez S?nchez wrote:

>Hello:
>
>We are thinking about purchasing the Intel C++ compiler for linux,
>mainly for getting the most of our harware (Xeon 2.4Gz processors), we
>are also interested in the Intel MKL (Math Kernel Library), I would like
>to know if the performance gain using Intel compiler+libraries, which exploit
>SSE2 and make other optimizations for P4/Xeon, are as good as Intel
>claims, anyone in the list using those products?
>

You realize that there's a free version of the Intel compiler for Linux 
right?  Anyway, our experience with their Fortran compiler has been that 
it's roughly on par with the Portland Group's compiler.  However, if 
Pentium 4 optimizations are turned on, the code produced by the Intel 
compiler runs just a little bit faster.

Glen

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Wolfgang.Dobler at kis.uni-freiburg.de  Mon Oct  6 12:50:06 2003
From: Wolfgang.Dobler at kis.uni-freiburg.de (Wolfgang Dobler)
Date: Mon, 6 Oct 2003 18:50:06 +0200
Subject: Beowulf digest, Vol 1 #1482 - 2 msgs
In-Reply-To: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com>
References: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com>
Message-ID: <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de>

Hi Ao,

 >   I tried to compile a Fortran 90 MPI program by
 > the Intel Frotran Compiler in the OSCAR cluster.
 >   I run the command:
 >     ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
 >
 >  The system failed to compile it and gave me the following information:
 >   3228 Lines Compiled
 >   /tmp/ifcVao851.o(.text+0x5a): In function `main':
 >   : undefined reference to `mpi_init_'

 [...]

 > I wonder
 > 1. why the errors happen?
 > 2. Is the problem of cluster or the Intel compiler?

Looks like the infamous underscore problem. You have a library (libmpi.so
or libmpi.a) that has been built using the GNU F77 compiler without the
option `-fno-second-underscore' and accordingly the MPI symbols are called
`mpi_init__', not `mpi_init_', etc.

But the Intel compiler (and all other non-G77 compilers) expects a symbol
with only one underscore appended ( `mpi_init_'), but that one is not in
the library.

 > 3. How I can solve it.

The way out is to either rebuild the library, compiling with `g77
-fno-second-underscore' or with the Intel compiler, or (the less elegant
choice) to refer to the MPI functions with one underscore in you F90 code:
  call MPI_INIT_(ierr)


There is one related question I want to ask the ld-specialists on the list:

On some machines libraries like MPICH contain all symbol names with both
underscore conventions, i.e. `mpi_init__', and `mpi_init_' at the same
time. Does anybody know whether there are easy ways of building such a
library? Is there something like `symbol aliases' and how would one create
these when generating the library?
 

W o l f g a n g
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From j.a.white at larc.nasa.gov  Mon Oct  6 12:56:56 2003
From: j.a.white at larc.nasa.gov (Jeffery A. White)
Date: Mon, 06 Oct 2003 12:56:56 -0400
Subject: undefined references to pthread related calls
Message-ID: <3F819ED8.9020002@larc.nasa.gov>

Group,

   Thanks for your responses. Turns out that the problem appears to be 
an incompatiblilty between ifc 7.1 and the glibc version
in the version of RH 8.0 being used. The RH 8.0 being used had some 
patches that updated glibc. I was able to fix it by removing
the -static option when compling with ifc. I have tested this with a 
patch free version of 8.0 and I don't see the problem wit or without
the -static option specified. At runtime my code does not use any calls 
that seem to access pthread related system routines. I am
guessing  that by deferring reolution of the link until runtime I have 
bypassed the problem. Obviously if I did use routines that
needed pthread related code I would still have a problem so this isn't a 
general fix.

Jeff


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Oct  6 13:29:09 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 6 Oct 2003 13:29:09 -0400 (EDT)
Subject: Filesystem question (sort of newbie)
In-Reply-To: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310061312330.4514-100000@training.scyld.com>

On Fri, 3 Oct 2003, Mark Hahn wrote:

> > 8 or but I haven't tested it), but it boots into a small Ram
> > Disk (about 70 megs depending upon what you need on the

For the Scyld Beowulf system we developed a more sophisticated "diskless
administrative" approach that has better scaling and more predictable
performance.  We cache executable objects, libraries and executables,
using an method that works unchanged with either Ramdisk (==tmpfs) or
local disk cache.

Keep in mind that this is just one element of making a cluster system
scalable and easy to manage.  Using a workstation-oriented distribution
as the software base for compute nodes means generating many different
kinds of configuration files, and dealing with the scheduling impact of
the various daemons.

> alternately, it's almost trivial to PXE boot nodes, mount a simple
> root FS from a server/master, and use the local disk, if any, for 
> swap and/or tmp.  one nice thing about this is that you can do it
> with any distribution you like - mine's RH8, for instance.

The obvious problems are configuration, scaling and update consistency
issues.

> personally, I prefer the nfs-root approach, probably because once
> you boot, you won't be wasting any ram with boot-only files.

They are trivial to get rid of either by explicitly erasing or switching
to a new ramdisk (e.g. our old stage 3) when initialization completes.

> for a cluster of 48 nodes, there seems to be no drawback;
> for a much larger cluster, I expect all the boot-time traffic 
> would be crippling, and you might want to use some kind of 
> multicast to distribute a ramdisk image just once...

Multicast bulk data transfer was a good idea back when we had Ethernet
repeaters.  Today it should only be used for service discovery and
low-rate status updates.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Mon Oct  6 14:02:39 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Mon, 06 Oct 2003 14:02:39 -0400
Subject: Help: About Intel Fortran Compiler:
In-Reply-To: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>
References: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>
Message-ID: <3F81AE3F.9050202@lmco.com>

Donald Becker wrote:

> On Mon, 6 Oct 2003, Franz Marini wrote:
>
> > On Sat, 4 Oct 2003, Ao Jiang wrote:
> >
> > > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 
> p_fdtd3dwg3_pml.f90
> >
> > Try with :
> >
> >   ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90
> >
> > Btw, a cleaner way to compile mpi programs is to use the mpif90
> > (mpif77 for fortran77) command (which is a wrapper for the real
> > compiler).
>
> Acckkk!!
>
> This is one of the horribly broken things about most MPI
> implementations.  It's not reasonable to say
>     "to use this library you must use our magic compile script"
> A MPI library should be just that -- a library conforming to system
> standards.  You should be able to link it with just "-lmpi".
>

I don't like the mpi compiler helper scripts much either. I just
want a simple makefile or a list of the libraries to link in
in the correct order. I usually end up reading the helper scripts
and pulling out the library order and putting it in my makefiles
anyway (no offense to anyone).

However, in defense of the different MPI implementations,
they have somewhat different philosophies on how to get
the best performance and ease of use. Sometimes this involves
other libraries. Just telling the user to add '-lmpi' to the
end of their link command may not tell them everything
(e.g. they may need to add the pthreads library, or libdl or
whatever).

> One element of a high-quality library is ease of use, and in the long
> run that matters more than a few percent faster for a specific function
> call.
>

One piece of data. While we haven't looked at specific MPI
calls, we have noticed up to about a 30% difference in wall
clock time with our codes between the various MPI
implementations using the same system (same nodes, same
code, same input, same network, same nodes, etc.). I'm all for
that kind of performance boost even if it's a little more
cumbersome to compile/link/run (although one's mileage may
vary depending upon the code)

Jeff


-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From franz.marini at mi.infn.it  Mon Oct  6 14:55:57 2003
From: franz.marini at mi.infn.it (Franz Marini)
Date: Mon, 6 Oct 2003 20:55:57 +0200 (CEST)
Subject: Help: About Intel Fortran Compiler:
In-Reply-To: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>
References: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.53.0310062046470.9773@merlino.mi.infn.it>

On Mon, 6 Oct 2003, Donald Becker wrote:

> Acckkk!!

Ok, I shouldn't have said "a cleaner way".
I don't usually use mpif77 (or f90) to compile programs requiring mpi 
libs, in fact.

I prefer to explicitly tell the compiler which library I want, and where 
to find them. Btw, this is much simpler and faster if you have multiple 
versions/releases of the same library.

Anyway, clean or not, elegant or not, mpif77 should (and I say should) 
work. 

Btw, I still can't understand why the hell each fortran compiler uses a 
different way to treat underscores. This, and another thousands of reasons 
make me hate fortran. (erm, please, this is a *personal* pov, let's not 
start another flame/discussion on the fortran vs <insert language of 
choice> issue ;)).


Have a nice day,

F.


---------------------------------------------------------
Franz Marini
Sys Admin and Software Analyst,
Dept. of Physics, University of Milan, Italy.
email : franz.marini at mi.infn.it
phone : +39 02 50317221
--------------------------------------------------------- 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Mon Oct  6 17:18:07 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Mon, 6 Oct 2003 14:18:07 -0700
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
Message-ID: <20031006211806.GB2091@greglaptop.internal.keyresearch.com>

On Mon, Oct 06, 2003 at 02:54:43PM +0000, C J Kenneth Tan -- Heuchera Technologies wrote:

> Pardon me for some advertising here, but our OptimaNumerics Linear
> Algebra Library can very significantly outperform Intel MKL.

Kenneth,

Welcome to the beowulf mailing list. Here are some helpful suggestions:

1) Don't top post. Answer postings like I do here, by quoting the
relevant part of the posting you're replying to.

2) Don't include an 8-line confidentiality notice in a posting to a public,
archived mailing list, distributed all over the world.

3) Marketing slogans and paragraphs with several !s don't work so well
here. More sophisticated customers aren't drawn by a claim of a 32x
performance advantage without knowing what is being measured. Is it a
100x100 matrix LU decomposition? Well, no, because Intel's MKL and the
free ATLAS library run at a respectable % of peak. Is it on a 1000
point FFT? Well, no, because the free FFTW library runs at a
respectable % of peak on that.

4) Put your performance whitepapers on your website, or it looks
fishy. I looked and didn't see a single performance claim there.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Mon Oct  6 18:41:34 2003
From: ds10025 at cam.ac.uk (D. Scott)
Date: 06 Oct 2003 23:41:34 +0100
Subject: Root-nfs error 13 while mounting
Message-ID: <E1A6e2o-0008BI-L4@maroon.csi.cam.ac.uk>

Evening

I'm getting error 13 when my diskless client try to mount file system. Hoe 
best to resolved this error 13?


Dan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Mon Oct  6 23:55:13 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Mon, 6 Oct 2003 23:55:13 -0400 (EDT)
Subject: Root-nfs error 13 while mounting
In-Reply-To: <E1A6e2o-0008BI-L4@maroon.csi.cam.ac.uk>
Message-ID: <Pine.LNX.4.44.0310062353350.19472-100000@coffee.psychology.mcmaster.ca>

> I'm getting error 13 when my diskless client try to mount file system. Hoe 
> best to resolved this error 13?

it's best resolved by translating it to text: EACCESS or "permission denied".
I'm guessing you should look at the logs on your fileserver, 
since it seems to be rejecting your clients.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Tue Oct  7 05:17:36 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Tue, 7 Oct 2003 11:17:36 +0200
Subject: weak symbols [Re: Beowulf digest, Vol 1 #1482 - 2 msgs]
In-Reply-To: <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de>
References: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com> <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de>
Message-ID: <200310071117.36507.joachim@ccrl-nece.de>

Wolfgang Dobler:
> On some machines libraries like MPICH contain all symbol names with both
> underscore conventions, i.e. `mpi_init__', and `mpi_init_' at the same
> time. Does anybody know whether there are easy ways of building such a
> library? Is there something like `symbol aliases' and how would one create
> these when generating the library?

Yes, most linkers support "weak symbols" in one way or another (there is no 
common way, usually a pragma or "function attributes" (gcc) are used) which 
supply all required API symbols for the one real implemented function. Just 
take a look at a source file like mpich/src/pt2pt/send.c to see how this can 
be done (some preprocessing "magic").

It can also be done w/o weak symbols at the cost of a slightly bigger library.

  Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcownie at etnus.com  Tue Oct  7 07:07:19 2003
From: jcownie at etnus.com (James Cownie)
Date: Tue, 07 Oct 2003 12:07:19 +0100
Subject: more on structural models for clusters
Message-ID: <1A6pgV-16F-00@etnus.com>

Jim,

> Your average Dell isn't suited to inclusion as a MCU core in an ASIC
> at each node and would cost more than $10/node...  I'm looking at
> Z80/6502/low end DSP kinds of computational capability in a mesh
> containing, say, 100,000 nodes.

Have you seen this gizmo ? (It's just so cute I had to pass it on :-)

  http://www.lantronix.com/products/eds/xport/

It's a 48MHz x86 with 256KB of SRAM and 512KB of flash, a 10/100Mb
ethernet interface an RS232 and three bits of digital I/O and it all
fits _inside_ an RJ45 socket.

It comes loaded up with a web server and so on.

It's on sale here in the UK for GBP 39 + VAT one off, so should come
down somewhere near the price you mention above for your 100,000 off
in the US.

(It might also be useful to the folks who want to build their own
environmental monitoring. Couple one of these up to the serial
interconnect on a temperature monitoring button and you'd immediately
be able to access it from the net).

-- Jim

James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathiasbrito at yahoo.com.br  Tue Oct  7 08:18:16 2003
From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=)
Date: Tue, 7 Oct 2003 09:18:16 -0300 (ART)
Subject: Tools for debuging
Message-ID: <20031007121816.67790.qmail@web12208.mail.yahoo.com>

I'm having problems with a prograa, and i really need
a tool for debug it. There's specific debugers for mpi
programas, if have more than one, what is the best
choice?

Thanks


=====
Mathias Brito
Universidade Estadual de Santa Cruz - UESC
Departamento de Ci?ncias Exatas e Tecnol?gicas
Estudante do Curso de Ci?ncia da Computa??o

Yahoo! Mail - o melhor webmail do Brasil
http://mail.yahoo.com.br
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nican at nsc.liu.se  Tue Oct  7 07:50:26 2003
From: nican at nsc.liu.se (Niclas Andersson)
Date: Tue,  7 Oct 2003 13:50:26 +0200 (CEST)
Subject: CALL FOR PARTICIPATION: Workshop on Linux Clusters for Super Computing
Message-ID: <bulk.21206.20031007134738@papput.nsc.liu.se>

CALL FOR PARTICIPATION

================================================================
4th Annual Workshop on Linux Clusters For Super Computing (LCSC)
Clusters for High Performance Computing and Grid Solutions

22-24 October, 2003 
Hosted by National Supercomputer Centre (NSC)
Link?ping University, SWEDEN
================================================================

The programme is in its final state. The workshop is brimful of
knowledgeable speakers giving exciting talks about Linux clusters,
grids and distributed applications requiring vast computational
resources. Just a few samples:

- Keynote: Andrew Grimshaw, University of Virginia and CTO of Avaki Inc. 

- Comparisons of Linux clusters with the Red Storm MPP
  William J. Camp, Project Leader of Red Storm,
  Sandia National Laboratories

- The EGEE project: building a grid infrastructure for Europe
  Bob Jones, EGEE Technical Director, CERN 

- Linux on modern NUMA architectures
  Jes Sorensen, Wild Open Source Inc.

- The AMANDA Neutrino Telescope                 
  Stephan Hundertmark, Stockholm University

and many more.

In addition to invited speakers there will be vendor presentations,
exhibitions and tutorials.

Last date for registration: October 10.  

For more information and registration: http://www.nsc.liu.se/lcsc
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From keith.murphy at attglobal.net  Tue Oct  7 11:28:50 2003
From: keith.murphy at attglobal.net (Keith Murphy)
Date: Tue, 7 Oct 2003 08:28:50 -0700
Subject: Tools for debuging
References: <20031007121816.67790.qmail@web12208.mail.yahoo.com>
Message-ID: <025701c38ce7$b5b64060$02fea8c0@oemcomputer>

Check out Etnus's Totalview parallel debugger
www.etnus.com


Keith Murphy
Dolphin Interconnect
C: 818-292-5100
T: 818-597-2114
F: 818-597-2119
www.dolphinics.com


----- Original Message ----- 
From: "Mathias Brito" <mathiasbrito at yahoo.com.br>
To: <beowulf at beowulf.org>
Sent: Tuesday, October 07, 2003 5:18 AM
Subject: Tools for debuging


> I'm having problems with a prograa, and i really need
> a tool for debug it. There's specific debugers for mpi
> programas, if have more than one, what is the best
> choice?
>
> Thanks
>
>
> =====
> Mathias Brito
> Universidade Estadual de Santa Cruz - UESC
> Departamento de Ci?ncias Exatas e Tecnol?gicas
> Estudante do Curso de Ci?ncia da Computa??o
>
> Yahoo! Mail - o melhor webmail do Brasil
> http://mail.yahoo.com.br
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rokrau at yahoo.com  Tue Oct  7 13:30:19 2003
From: rokrau at yahoo.com (Roland Krause)
Date: Tue, 7 Oct 2003 10:30:19 -0700 (PDT)
Subject: undefined references to pthread related calls
In-Reply-To: <200310071605.h97G5FV13188@NewBlue.Scyld.com.scyld.com>
Message-ID: <20031007173019.8519.qmail@web40010.mail.yahoo.com>

--- beowulf-request at scyld.com wrote:
>    2. undefined references to pthread related calls (Jeffery A.
> White)

FYI,
Intel has released a version of their compiler that fixes the link
problem for applications that use OpenMP. Intel Fortran now supports
glibc-2.3.2 which is used in RH-9 and Suse-8.2. The old compatibility
hacks have become obsolete at least. 

I hear Intel-8 is in beta, anyone have experience with it?

Roland


> Subject: undefined references to pthread related calls
> 
> Group,
> 
>    Thanks for your responses. Turns out that the problem appears to
> be 
> an incompatiblilty between ifc 7.1 and the glibc version
> in the version of RH 8.0 being used. The RH 8.0 being used had some 
> patches that updated glibc. I was able to fix it by removing
> the -static option when compling with ifc. I have tested this with a 
> patch free version of 8.0 and I don't see the problem wit or without
> the -static option specified. At runtime my code does not use any
> calls 
> that seem to access pthread related system routines. I am
> guessing  that by deferring reolution of the link until runtime I
> have 
> bypassed the problem. Obviously if I did use routines that
> needed pthread related code I would still have a problem so this
> isn't a 
> general fix.
> 
> Jeff
> 
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Tue Oct  7 15:28:42 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Tue, 07 Oct 2003 15:28:42 -0400
Subject: updated cluster finishing script system
Message-ID: <1065554922.32374.47.camel@protein.scalableinformatics.com>

Folks:

  I updated my cluster finishing script package.  This package allows
you to perform post-installation configuration changes (e.g. finishing)
for an RPM based cluster which maintains image state on local disks.  It
used to be specialized to the ROCKS distribution, but it has evolved
significantly and should work with generalized RPM based distributions.

  Major changes:

1) No RPMs are distributed (this is a good thing, read on)

2) a build script generates customized RPMs for you after asking you 4
questions.

(please, no jokes about unladen swallows, neither European nor
African...)

   These RPMs allow you to customize the finishing server and the
finishing script client as you require for your task.  This includes
choosing the server's IP address (used to be hard-coded to 10.1.1.1),
the server's export directory (used to be hard-coded to /opt/finishing),
the cluster's network (used to be hard-coded to 10.0.0.0), and the
cluster's netmask (used to be hard-coded to 255.0.0.0).

3) Documentation (see below)

Have a look at http://scalableinformatics.com/finishing/ for more
details, including new/better instructions.  It is licensed under the
GPL for end users.  Contact us offline if you want to talk about
redistribution licenses.

Joe

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Oct  7 19:50:38 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 8 Oct 2003 09:50:38 +1000
Subject: updated cluster finishing script system
In-Reply-To: <1065554922.32374.47.camel@protein.scalableinformatics.com>
References: <1065554922.32374.47.camel@protein.scalableinformatics.com>
Message-ID: <200310080950.41343.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 8 Oct 2003 05:28 am, Joseph Landman wrote:

> It is licensed under the GPL for end users.  Contact us offline if you want
> to talk about redistribution licenses.

Err, if it's licensed under the GPL then the "end users" who receive it under 
that license can redistribute it themselves under the GPL.  Part 6 of the GPL 
v2 says:

[quote]
  6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the original 
licensor to copy, distribute or modify the Program subject to these terms and 
conditions.  You may not impose any further restrictions on the recipients' 
exercise of the rights granted herein. [...]
[/quote]

Of course as the copyright holder you could also do dual licensing, so I guess 
this is what you mean - correct ?

But whichever it is, once you have released something under the GPL you cannot 
prevent others from redistributing it under the GPL themselves.

cheers,
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/g1FOO2KABBYQAh8RAu1oAJ0fLlcljVYwXj7xgnkjGFyNaoWOFwCfWM/r
IC1/xPLO2ePGM2zlJF2ZHK8=
=HOnr
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ajiang at mail.eecis.udel.edu  Tue Oct  7 21:58:21 2003
From: ajiang at mail.eecis.udel.edu (Ao Jiang)
Date: Tue, 7 Oct 2003 21:58:21 -0400 (EDT)
Subject: Still about the MPICH and Intel Fortran Compiler:
In-Reply-To: <Pine.GSO.4.58.0309231524590.13864@icarus.cc.uic.edu>
Message-ID: <Pine.GSO.4.33.0310072143050.2226-100000@stimpy.eecis.udel.edu>

  Hi,
  First, I want to thank all of you for the answers and suggestions
for my question last time.
(
Last time, I tried:
"
ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
"
 The system failed to compile it and gave me the following information:
"
   module EHFIELD
   program FDTD3DPML
   external function RISEF

3228 Lines Compiled
/tmp/ifcVao851.o(.text+0x5a): In function `main':
: undefined reference to `mpi_init_'
.
.
.
)

  Most of friends suggest me to use '-lmpi', instead of '-Lmpi', I tried
it, the system gave me the following error:
"
ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90

   module EHFIELD
   program FDTD3DPML
   external function RISEF
   external subroutine COM_HZY

3228 Lines Compiled
ld: cannot find -lmpi
"
 either does '-lmpif', although there exist 'mpif.h' and 'mpi.h' in the
directory '/opt/mpich-1.2.5/include'.

I also tried the command:
"
/opt/mpich-1.2.5/bin/mpif90 -w -o p_wg3 p_fdtd3dwg3_pml.f90
"

The system gave the error:
"
3228 Lines Compiled
/opt/intel/compiler70/ia32/lib/libIEPCF90.a(f90fioerr.o)(.text+0x4d3): In
function `f_f77ioerr':
: undefined reference to `__ctype_b'
"

In fact, I don't know what this error means. Of course, I don't know
how to slove it either.

Tom


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Oct  7 22:23:04 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 8 Oct 2003 12:23:04 +1000
Subject: updated cluster finishing script system
In-Reply-To: <1065579065.32368.134.camel@protein.scalableinformatics.com>
References: <1065554922.32374.47.camel@protein.scalableinformatics.com> <200310080950.41343.csamuel@vpac.org> <1065579065.32368.134.camel@protein.scalableinformatics.com>
Message-ID: <200310081223.05966.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 8 Oct 2003 12:11 pm, Joseph Landman wrote:

> Thanks for catching the wording error.

No worries, I wasn't intending to be pedantic, just curious. :-)

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/g3UIO2KABBYQAh8RAgSkAJ48X7RY3ABNnYa2DlQ0z0vHfinaxACfdsMk
hIZqsuVLevZqp2OBtfAafEs=
=2vpF
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Tue Oct  7 22:11:05 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Tue, 07 Oct 2003 22:11:05 -0400
Subject: updated cluster finishing script system
In-Reply-To: <200310080950.41343.csamuel@vpac.org>
References: <1065554922.32374.47.camel@protein.scalableinformatics.com>
	 <200310080950.41343.csamuel@vpac.org>
Message-ID: <1065579065.32368.134.camel@protein.scalableinformatics.com>

On Tue, 2003-10-07 at 19:50, Chris Samuel wrote:

> On Wed, 8 Oct 2003 05:28 am, Joseph Landman wrote:
> 
> > It is licensed under the GPL for end users.  Contact us offline if you want
> > to talk about redistribution licenses.
> 
> Err, if it's licensed under the GPL then the "end users" who receive it under 
> that license can redistribute it themselves under the GPL.  Part 6 of the GPL 
> v2 says:

...

> Of course as the copyright holder you could also do dual licensing, so I guess 
> this is what you mean - correct ?

Commercial redistribution ala the MySQL form of license.  You are
correct, it was a mis-wording on my part.  Basically if someone decides
to turn this into a commercial product (ok, stop laughing...), or wants
support, or a warranty, then they need to speak with us first.  As the
package is mostly source code, make files and scripts, it seems odd to
consider distributing it any other way.  

More to the point, there are some things that should be free (Libre and
beer, though some keep asking me where the free beer is).  Stuff like
this should be free (as in Libre).  RGB and I had a conversation about
this I think... .  I leave it to others to supply the beer.

> But whichever it is, once you have released something under the GPL you cannot 
> prevent others from redistributing it under the GPL themselves.

... which I don't want to hinder (redistribution under GPL), rather I
want to encourage ... 

Thanks for catching the wording error.

-- 
Joseph Landman <landman at scalableinformatics.com>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctierney at hpti.com  Wed Oct  8 13:16:03 2003
From: ctierney at hpti.com (Craig Tierney)
Date: 08 Oct 2003 11:16:03 -0600
Subject: Still about the MPICH and Intel Fortran Compiler:
In-Reply-To: <Pine.GSO.4.33.0310072143050.2226-100000@stimpy.eecis.udel.edu>
References: <Pine.GSO.4.33.0310072143050.2226-100000@stimpy.eecis.udel.edu>
Message-ID: <1065633362.22256.8.camel@woody>

On Tue, 2003-10-07 at 19:58, Ao Jiang wrote:
>   Hi,
>   First, I want to thank all of you for the answers and suggestions
> for my question last time.
> (
> Last time, I tried:
> "
> ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
> "
>  The system failed to compile it and gave me the following information:
> "
>    module EHFIELD
>    program FDTD3DPML
>    external function RISEF
> 
> 3228 Lines Compiled
> /tmp/ifcVao851.o(.text+0x5a): In function `main':
> : undefined reference to `mpi_init_'

The option -L specifies the path for libraries.
The option -l specifies the library to link.

Your command should be:

ifc -I/opt/mpich-1.2.5/include -L/opt/mpich-1.2.5/lib -lmpi -w -lm -o
p_wg3 p_fdtd3dwg3_pml.f90

Craig


> .
> .
> .
> )
> 
>   Most of friends suggest me to use '-lmpi', instead of '-Lmpi', I tried
> it, the system gave me the following error:
> "
> ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90
> 
>    module EHFIELD
>    program FDTD3DPML
>    external function RISEF
>    external subroutine COM_HZY
> 
> 3228 Lines Compiled
> ld: cannot find -lmpi
> "
>  either does '-lmpif', although there exist 'mpif.h' and 'mpi.h' in the
> directory '/opt/mpich-1.2.5/include'.
> 
> I also tried the command:
> "
> /opt/mpich-1.2.5/bin/mpif90 -w -o p_wg3 p_fdtd3dwg3_pml.f90
> "
> 
> The system gave the error:
> "
> 3228 Lines Compiled
> /opt/intel/compiler70/ia32/lib/libIEPCF90.a(f90fioerr.o)(.text+0x4d3): In
> function `f_f77ioerr':
> : undefined reference to `__ctype_b'
> "
> 
> In fact, I don't know what this error means. Of course, I don't know
> how to slove it either.
> 
> Tom
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-- 
Craig Tierney <ctierney at hpti.com>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Wed Oct  8 14:02:17 2003
From: ds10025 at cam.ac.uk (D. Scott)
Date: 08 Oct 2003 19:02:17 +0100
Subject: Root-nfs error 13 while mounting
Message-ID: <E1A7Ide-0003y5-23@maroon.csi.cam.ac.uk>

Evening

Have resolved the problem. It was due to setting in dhcpd.conf it require 
option root-path pointing to root path of the node. I get another error. 
When diskless node boot up it can not find init file. Also, what is min 
files is transfer to /tftfpboot/node/?

Dan
On Oct 7 2003, Mark Hahn wrote:

> > I'm getting error 13 when my diskless client try to mount file system. 
> > Hoe best to resolved this error 13?
> 
> it's best resolved by translating it to text: EACCESS or "permission 
> denied". I'm guessing you should look at the logs on your fileserver, 
> since it seems to be rejecting your clients.
> 
> 
> _______________________________________________ Beowulf mailing list, 
> Beowulf at beowulf.org To change your subscription (digest mode or 
> unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Wed Oct  8 16:50:34 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Wed, 8 Oct 2003 13:50:34 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310081324100.8067-100000@twin.uoregon.edu>

the 3ware 6000 7000 and 7006 cards are all gone from the marketplace the 
cards you want to look at are the 3ware 7506 (parallel ata) or the 3ware 
8506 (serial ata). the 2400 was never seriously in the running for us 
because it only supports 4 drives.

joelja

 On Wed, 8 Oct 2003, Daniel Fernandez wrote:

> Hi,
> 
> I would like to know some advice about what kind of technology apply
> into a RAID file server ( through NFS ) . We started choosing hardware
> RAID to reduce cpu usage.
> 
> We have two options , SCSI RAID and ATA RAID. The first would give the
> best results but on the other hand becomes really expensive so we have
> in mind two ATA RAID controllers:
> 
>                 Adaptec 2400A
> 		3Ware 6000/7000 series controllers
> 
> Any one of these has its strong and weak points, after seeing various
> benchmarks/comparisons/reviews these are the only candidates that
> deserve our attention.
> 
> The server has a dozen of client workstations connected through a
> switched 100Mbit LAN , all of these equipped with it's own OS and
> harddisk, all home directories will be stored under the main server,
> main workload (compilation and edition) would be done on the local
> machines tough, server only takes care of file sharing.
> 
> Also parallel MPI executions will be done between the clients.
> 
> Considering that not all the workstantions would be working full time
> and with cost in mind ? it's worth an ATA RAID solution ?
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Wed Oct  8 17:46:31 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Wed, 8 Oct 2003 14:46:31 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081324100.8067-100000@twin.uoregon.edu>
Message-ID: <Pine.LNX.3.96.1031008143206.16161A-100000@Maggie.Linux-Consulting.com>


hi ya daniel

>  On Wed, 8 Oct 2003, Daniel Fernandez wrote:
> 
> > Hi,
> > 
> > I would like to know some advice about what kind of technology apply
> > into a RAID file server ( through NFS ) . We started choosing hardware
> > RAID to reduce cpu usage.
> > 
> > We have two options , SCSI RAID and ATA RAID. The first would give the
> > best results but on the other hand becomes really expensive so we have
> > in mind two ATA RAID controllers:
> > 
> >                 Adaptec 2400A
> > 		3Ware 6000/7000 series controllers
> > 
> > Any one of these has its strong and weak points, after seeing various
> > benchmarks/comparisons/reviews these are the only candidates that
> > deserve our attention.

good points about ata raid
	- large disks storage ( 300GB drives at $300 each +/- )
	- get those drives w/ 8MB buffer disk cache
	- cheap ... can do with software raid or $40 ata-133 ide 
	controller

	- $300 more for making ata drives appear like scsi drives
	with 3ware raid controllers 

	- slower rpm disks ... usually it tops out at 7200rpm

	- it supposedly can sustain 133MB/sec transfers

	- if you use software raid, you can monitor the raid status

	- if you use hardware raid, you are limited to the tools
	the hw vendor gives you tomonitor the raid status
	of pending failures or dead drives

good points about scsi ..
	- some say scsi disks are faster ... 
	- super expensive .. $200 for 36 GB .. at 15000rpm

	- it supposedly can sustain 320MB/sec transfers

if the disks does transfer at its full speed ... 320MB/sec or 133MB/sec
does the rest of the system get to keep up with processing the
data spewing off and onto the disks

independent of which raid system is built, you wil need 2 or 3
more backup systems to backup your Terabyte sized raid systems

more raid fun
	http://www.1U-Raid5.net

c ya
alvin

> > The server has a dozen of client workstations connected through a
> > switched 100Mbit LAN , all of these equipped with it's own OS and
> > harddisk, all home directories will be stored under the main server,
> > main workload (compilation and edition) would be done on the local
> > machines tough, server only takes care of file sharing.
> > 
> > Also parallel MPI executions will be done between the clients.
> > 
> > Considering that not all the workstantions would be working full time
> > and with cost in mind ? it's worth an ATA RAID solution ?

good p
> > 
> > 
> 
> -- 
> -------------------------------------------------------------------------- 
> Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
> GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel at labtie.mmt.upc.es  Wed Oct  8 15:46:59 2003
From: daniel at labtie.mmt.upc.es (Daniel Fernandez)
Date: Wed, 08 Oct 2003 21:46:59 +0200
Subject: building a RAID system
Message-ID: <1065642419.9483.55.camel@qeldroma.cttc.org>

Hi,

I would like to know some advice about what kind of technology apply
into a RAID file server ( through NFS ) . We started choosing hardware
RAID to reduce cpu usage.

We have two options , SCSI RAID and ATA RAID. The first would give the
best results but on the other hand becomes really expensive so we have
in mind two ATA RAID controllers:

                Adaptec 2400A
		3Ware 6000/7000 series controllers

Any one of these has its strong and weak points, after seeing various
benchmarks/comparisons/reviews these are the only candidates that
deserve our attention.

The server has a dozen of client workstations connected through a
switched 100Mbit LAN , all of these equipped with it's own OS and
harddisk, all home directories will be stored under the main server,
main workload (compilation and edition) would be done on the local
machines tough, server only takes care of file sharing.

Also parallel MPI executions will be done between the clients.

Considering that not all the workstantions would be working full time
and with cost in mind ? it's worth an ATA RAID solution ?

-- 
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Laboratori de Termot?cnia i Energia - CTTC
UPC Campus Terrassa

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Wed Oct  8 06:33:13 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 08 Oct 2003 06:33:13 -0400
Subject: Why NFS hang when copying files of 6MB?
In-Reply-To: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>
References: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>
Message-ID: <1065609193.28674.32.camel@squash.scalableinformatics.com>

On Wed, 2003-10-08 at 18:17, D. Scott wrote:
> Hi
> 
> On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes?

Lots of possibilities, though I am not sure you have supplied enough
information to hazard a guess (unless someone ran into this before and
already knows the answer).

An operation on an NFS mounted file system can hang when:

1) the nfs server becomes unresponsive (crash, overload, file system
full, ...)
2) the client becomes unresponsive ...
3) the network becomes unresponsive ...
...

If you could indicate more details, it is likely someone might be able
to tell you where to look next.

> 
> 
> Dan
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Wed Oct  8 18:17:02 2003
From: ds10025 at cam.ac.uk (D. Scott)
Date: 08 Oct 2003 23:17:02 +0100
Subject: Why NFS hang when copying files of 6MB?
Message-ID: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>

Hi

On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes?


Dan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Wed Oct  8 19:39:41 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 8 Oct 2003 19:39:41 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca>

> I would like to know some advice about what kind of technology apply
> into a RAID file server ( through NFS ) . We started choosing hardware
> RAID to reduce cpu usage.

that's unfortunate, since the main way HW raid saves CPU usage is 
by running slower ;)

seriously, CPU usage is NOT a problem with any normal HW raid,
simply because a modern CPU and memory system is *so* much better
suited to performing raid5 opterations than the piddly little
controller in a HW raid card.  the master/fileserver for my 
cluster is fairly mundane (dual-xeon, i7500, dual PC1600), and 
it can *easily* saturate its gigabit connection.  after all, ram
runs at around 2 GB/s sustained, and the CPU can checksum at 3 GB/s!

concern for PCI congestion is a much more serious issue.

finally, why do you care at all?  are you fileserving through
a fast (>300 MB/s) network like quadrics/myrinet/IB?  most people
limp along at a measly gigabit, which even a two-ide-disk raid0
can saturate...

> The server has a dozen of client workstations connected through a
> switched 100Mbit LAN , all of these equipped with it's own OS and

jeez, since your limited to 10 MB/s, you could do raid5 on a 486
and still saturate the net.  seriously, CPU consumption is NOT an issue
at 10 MB/s.

> machines tough, server only takes care of file sharing.

so excess cycles on the fileserver will be wasted unless used.

> Considering that not all the workstantions would be working full time
> and with cost in mind ? it's worth an ATA RAID solution ?

you should buy a single promise sata150 tx4 and four big sata disks
(7200 RPM 3-year models, please).

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Wed Oct  8 19:28:37 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Wed, 8 Oct 2003 16:28:37 -0700 (PDT)
Subject: Why NFS hang when copying files of 6MB?
In-Reply-To: <1065609193.28674.32.camel@squash.scalableinformatics.com>
Message-ID: <Pine.LNX.3.96.1031008162633.12224A-100000@Maggie.Linux-Consulting.com>


On Wed, 8 Oct 2003, Joe Landman wrote:

> On Wed, 2003-10-08 at 18:17, D. Scott wrote:
> > Hi
> > 
> > On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes?
> 
> Lots of possibilities, though I am not sure you have supplied enough
> information to hazard a guess (unless someone ran into this before and
> already knows the answer).
> 
> An operation on an NFS mounted file system can hang when:
> 
> 1) the nfs server becomes unresponsive (crash, overload, file system
> full, ...)

not enough memory, too much swap spce

> 2) the client becomes unresponsive ...
> 3) the network becomes unresponsive ...
> ...

- bad hub, bad switch, bad cables
- bad nic cards, bad motherboard, 
- bad kernel, bad drivers
- bad dhcp config, waiting for machines that went offline

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  8 19:41:12 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 9 Oct 2003 09:41:12 +1000
Subject: Why NFS hang when copying files of 6MB?
In-Reply-To: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>
References: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>
Message-ID: <200310090941.13302.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 9 Oct 2003 08:17 am, D. Scott wrote:

> On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes?

You need to give a lot more detail on that, try having a quick read of:

	http://www.catb.org/~esr/faqs/smart-questions.html#beprecise

Basically there are all sorts of possible problems from kernel bugs, node 
hardware problems through to various network problems...

Useful information would be things like:

	/etc/fstab from the nodes
	output of the mount command
	the output of strace when you try and do the 'cp':

		strace -o cp.log -e trace=file cp /path/to/file /path/to/destination

good luck!
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/hKCYO2KABBYQAh8RAqltAJ4/R91yD0KKVA6wB3+UDZxZcAOsFwCbBZn1
DeaCjkFO8bwGLhhSkxB20yE=
=d7Gz
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Wed Oct  8 22:27:35 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Wed, 8 Oct 2003 19:27:35 -0700 (PDT)
Subject: CAD
In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver>
Message-ID: <Pine.LNX.3.96.1031008192444.22819A-100000@Maggie.Linux-Consulting.com>


hi ya

On Thu, 9 Oct 2003, Manoj Gupta wrote:

> Hello,
> 
> One of my clients has asked me to provide a solution for his AutoCAD
> work.
> The minimum file size on which he works is nearly of 400 MB and it takes
> 15-20 minutes to load on his single system.

tell them to break the drawing up into itty-bitty pieces
and work on a real autocad drawing .. :-)
	- separate the item into separate pieces so it can be
	bent, welded, drilled, etc

or get a 3Ghz cpu and load up 4GB or 8GB of memory 

and nope ... beowulf or any other cluster will not help autocad

c ya
alvin
- part time autocad me ..but i cant draw a line .. :-)
- easier to contract out the 1u chassis design "drawings" :-)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  8 19:47:32 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 9 Oct 2003 09:47:32 +1000
Subject: PocketPC Cluster
Message-ID: <200310090947.33601.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Not strictly a Beowulf as there's no Linux, but interesting nonetheless. :-)

IrDA for the networking, 11 compute + 1 management, slower than "a mainstream 
Pentium II-class desktop PC" (they don't specify what spec).

http://www.spbsoftwarehouse.com/dev/articles/pocketcluster/index.html

	Twelve Pocket PC devices have been joined in a cluster to perform distributed
	calculations - the devices share the load of a complex calculation. The
	concept was to compare the performance of several Pocket PC devices linked
	into a cluster with the performance of a typical Pentium II-class desktop
	computer.
	[...]

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/hKIUO2KABBYQAh8RAvJvAJoDNqZ/2m8cIqo02Hbbwzpm2DWeMQCeOltt
3LuUp1Kkoc4jnmwVNgoDoFI=
=+abL
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mg_india at sancharnet.in  Wed Oct  8 20:03:57 2003
From: mg_india at sancharnet.in (Manoj Gupta)
Date: Thu, 09 Oct 2003 05:33:57 +0530
Subject: CAD
Message-ID: <000001c38df8$e1a6d9c0$bbd2003d@myserver>

Hello,

One of my clients has asked me to provide a solution for his AutoCAD
work.
The minimum file size on which he works is nearly of 400 MB and it takes
15-20 minutes to load on his single system.

Can Beowulf be used to solve this problem and minimize the time required
so as to improve productivity?


Sawan Gupta || mg_india at sancharnet.in ||


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Wed Oct  8 20:23:28 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 8 Oct 2003 20:23:28 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.3.96.1031008143206.16161A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>

> 	- get those drives w/ 8MB buffer disk cache

what reason do you have to regard 8M as other than a useless
marketing feature?  I mean, the kenel has a cache that's 100x
bigger, and a lot faster.

> 	- slower rpm disks ... usually it tops out at 7200rpm

unless your workload is dominated by tiny, random seeks,
the RPM of the disk isn't going to be noticable.

> 	- it supposedly can sustain 133MB/sec transfers

it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE
disks in raid0.  interestingly, the chipset controller is normally
not competing for the same bandwidth as the PCI, so even with 
entry-level hardware, it's not hard to break 133.

> 	- if you use software raid, you can monitor the raid status

this is the main and VERY GOOD reason to use sw raid.

> 	- some say scsi disks are faster ... 

usually lower-latency, often not higher bandwidth.  interestingly,
ide disks usually fall off to about half peak bandwidth on inner 
tracks.  scsi disks fall off too, but usually less so - they 
don't push capacity quite as hard.

> 	- it supposedly can sustain 320MB/sec transfers

that's silly, of course.  outer tracks of current disks run at 
between 50 and 100 MB/s, so that's the max sustained.  you can even
argue that's not really 'sustained', since you'll eventually get
to slower inner tracks.

> independent of which raid system is built, you wil need 2 or 3
> more backup systems to backup your Terabyte sized raid systems

backup is hard.  you can get 160 or 200G tapes, but they're almost 
as expensive as IDE disks, not to mention the little matter of a 
tape drive that costs as much as a server.  raid5 makes backup
less about robustness than about archiving or rogue-rm-protection.
I think the next step is primarily a software one - 
some means of managing storage, versioning, archiving, etc...

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bob at drzyzgula.org  Wed Oct  8 21:04:03 2003
From: bob at drzyzgula.org (Bob Drzyzgula)
Date: Wed, 8 Oct 2003 21:04:03 -0400
Subject: CAD
In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver>
References: <000001c38df8$e1a6d9c0$bbd2003d@myserver>
Message-ID: <20031008210403.F28876@www2>

AutoCAD versions since R13 only run on
Windows, and AFAIK no version of AutoCAD
has ever been shipped for Linux. Beowulf is a
Linux- (or, taken more liberally than
most people intend, Unix-) specific thing.

Thus, unless I misunderstand, no.

--Bob Drzyzgula

On Thu, Oct 09, 2003 at 05:33:57AM +0530, Manoj Gupta wrote:
> 
> Hello,
> 
> One of my clients has asked me to provide a solution for his AutoCAD
> work.
> The minimum file size on which he works is nearly of 400 MB and it takes
> 15-20 minutes to load on his single system.
> 
> Can Beowulf be used to solve this problem and minimize the time required
> so as to improve productivity?
> 
> 
> Sawan Gupta || mg_india at sancharnet.in ||
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  8 21:45:08 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 8 Oct 2003 21:45:08 -0400 (EDT)
Subject: building a RAID systemo
In-Reply-To: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310082103370.1253-100000@lilith.rgb.private.net>

On Wed, 8 Oct 2003, Mark Hahn wrote:

> you should buy a single promise sata150 tx4 and four big sata disks
> (7200 RPM 3-year models, please).

I totally agree with everything Mark said and second this.  Although
3-year ata (lower) or scsi (higher) disks would be just fine too,
depending on how much you care to spend and how much it costs you if
things go down.

e.g. md raid under linux works marvelously well, and one can even create
a kickstart file so that it makes your raid for you on a fully automated
install, which is very cool.  It is also dirt cheap.  My home (switched
100 Mbps, 8-9 hosts/nodes depending on what is on) has a 150 GB RAID-5
server (3x80 GB 3-year ATA 7200 RPM disks) on a 2.2 GHz Celeron server
with an extra ATA controller so there is only one disk per channel.  It
cost about $800 total to build inside a full tower case with extra fans
including one with leds in front so that it glows blue.  You couldn't
get the CASE of a HW raid for that price, I don't think (although I
admit that it won't do hot swap and dual power supplies).

The total RAID/NFS load since 9/19 is:

root        11  0.0  0.0     0    0 ?        SW   Sep19   0:00 [mdrecoveryd]
root        21  0.0  0.0     0    0 ?        SW   Sep19   0:00 [raid1d]
root        22  0.0  0.0     0    0 ?        SW   Sep19   0:02 [raid5d]
root        23  0.0  0.0     0    0 ?        SW   Sep19   5:03 [raid5d]
...
root      4928  0.0  0.0     0    0 ?        SW   Sep19   2:58 [nfsd]
root      4929  0.0  0.0     0    0 ?        SW   Sep19   2:57 [nfsd]
root      4930  0.0  0.0     0    0 ?        SW   Sep19   3:00 [nfsd]
root      4931  0.0  0.0     0    0 ?        SW   Sep19   2:43 [nfsd]
root      4932  0.0  0.0     0    0 ?        SW   Sep19   3:00 [nfsd]
root      4933  0.0  0.0     0    0 ?        SW   Sep19   2:43 [nfsd]
root      4934  0.0  0.0     0    0 ?        SW   Sep19   2:56 [nfsd]
root      4935  0.0  0.0     0    0 ?        SW   Sep19   2:58 [nfsd]

(or less than 30 minutes of total CPU).  At 1440 min/day, for 18 days
(conservatively) that is about 0.1% load, on average.

This is a home network load, sure (which includes gaming and a fair bit
of data access, but no, we're not talking GB per day moving over the
lines).  In a more data-intensive environment this would increase, but
there is a lot of head room.  The point is that a 2.2 GHz system has a
LOT of horsepower.  We used to run entire departments of twenty or
thirty workstations using $10-20,000 Sun servers at maybe 5 MEGAHertz on
10 Mbps thinwire networks with fair to middling satisfaction.  My $800
home server has several thousand times the raw speed, about a thousand
times the memory, a thousand times the disk, AND it is RAID 5 disk at
that.  The network has only increased in speed by a factor of maybe
10-20 (allowing for switched vs hub).  Mucho headroom indeed.

BTW, our current department primary server is a 1 GHz PIII, although
we're adding a second CPU shortly as load dictates.  And if you are
planning your server to handle something other than a small cluster or
LAN where downtime isn't too "expensive" you may want to look at higher
quality (rackmount) servers and disk arrays in enclosures that permit
e.g. hot swap and that have redundant power.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  8 23:12:41 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 9 Oct 2003 13:12:41 +1000
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <200310091312.42544.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 9 Oct 2003 10:23 am, Mark Hahn wrote:

> raid5 makes backup
> less about robustness than about archiving or rogue-rm-protection.
> I think the next step is primarily a software one -
> some means of managing storage, versioning, archiving, etc...

For those who haven't seen it, this is a very interesting way of doing 
snapshot style backups:

	http://www.mikerubel.org/computers/rsync_snapshots/

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/hNIpO2KABBYQAh8RAvXaAJ0ecv77jUJe3DWpsinqBFgs4W4JlQCfRz/z
HfXF/JkFSszlvX10/JXjisM=
=7lAy
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Wed Oct  8 22:58:17 2003
From: becker at scyld.com (Donald Becker)
Date: Wed, 8 Oct 2003 22:58:17 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310082243520.21674-100000@training.scyld.com>

On Wed, 8 Oct 2003, Mark Hahn wrote:

> > 	- get those drives w/ 8MB buffer disk cache
> 
> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.

The larger cache does provide some benefit.  Disks now read and cache up
to a whole track/cylinder at once, starting from when the head settles
from a seek up to when the desired sector is read.  You can't do that
type of caching in the kernel.

As disks become more dense, more memory is needed to save a cylinder's
worth of data, so we should expect the cache size to increase.

But you point is likely "disk cache is mostly legacy superstition".
MS-Windows 98 and earlier had such horrible caching behavior that a few MB
of on-disk cache could triple the performance.  This was also why
MS-Windows would run much faster under Linux+VMWare than on the raw
hardware.

> > 	- it supposedly can sustain 133MB/sec transfers

Normal disks top out at 70MB/sec read, 50MB/sec write on the outer
tracks.  These numbers drop significantly on the inner tracks.
You might get 10MB/sec better with 10K or 15K RPM SCSI drives, but it's
certainly not linear with the speed.

BTW, 2.5" laptop drives are _far_ worse.
Typical for a modern fast drive is 20MB/sec read and 10MB/sec write.
Older drivers were worse.

> > 	- some say scsi disks are faster ... 
> 
> usually lower-latency, often not higher bandwidth.  interestingly,
> ide disks usually fall off to about half peak bandwidth on inner 
> tracks.  scsi disks fall off too, but usually less so - they 
> don't push capacity quite as hard.

Look at the shape of the transfer performance curve -- the shape is
sometimes the same as the similar IDE drive, but sometimes has a much
different curve.  Wider tracks mean faster seek settling but lower
density.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Wed Oct  8 22:33:49 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Wed, 8 Oct 2003 19:33:49 -0700 (PDT)
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.3.96.1031008192744.22819B-100000@Maggie.Linux-Consulting.com>


hi ya mark

On Wed, 8 Oct 2003, Mark Hahn wrote:

> > 	- get those drives w/ 8MB buffer disk cache
> 
> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.

for those squeezing the last 1MB/sec transfer out of their
disks ... 8MB did seem to make a difference
	( streaming video apps - encoding/decoding/xmit )

> > 	- slower rpm disks ... usually it tops out at 7200rpm
> 
> unless your workload is dominated by tiny, random seeks,
> the RPM of the disk isn't going to be noticable.

usually a side affect of partitioning too

> > 	- it supposedly can sustain 133MB/sec transfers
> 
> it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE
> disks in raid0.  interestingly, the chipset controller is normally
> not competing for the same bandwidth as the PCI, so even with 
> entry-level hardware, it's not hard to break 133.

super easy to overflow the disks and pci .. depending on apps

> > 	- if you use software raid, you can monitor the raid status
> 
> this is the main and VERY GOOD reason to use sw raid.

yup
 
> > 	- some say scsi disks are faster ... 
> 
> usually lower-latency, often not higher bandwidth.  interestingly,
> ide disks usually fall off to about half peak bandwidth on inner 
> tracks.  scsi disks fall off too, but usually less so - they 
> don't push capacity quite as hard.

scsi capacity doesnt seem to be an issue for them ... 
they're falling behind by several generations
( scsi disks used to be the highest capacity drives .. not any more )

> > 	- it supposedly can sustain 320MB/sec transfers
> 
> that's silly, of course.  outer tracks of current disks run at 
> between 50 and 100 MB/s, so that's the max sustained.  you can even
> argue that's not really 'sustained', since you'll eventually get
> to slower inner tracks.

yup ... those are just marketing numbers... all averages ...

and bigg differences between inner tracks and outer tracks

> > independent of which raid system is built, you wil need 2 or 3
> > more backup systems to backup your Terabyte sized raid systems
> 
> backup is hard.  you can get 160 or 200G tapes, but they're almost 

to me ... backup of terabyte sized systems is trivial ...
	- just give me lots of software raid subsystems
	( 2 backups for each "main" system )

	- lot cheaper than tape drives and 1000x faster than tapes
	for live backups

	- will never touch a tape backup again ... too sloow
	and too unreliable no matter how clean the tape heads are
		( too slow being the key problem for restoring )

c ya
alvin

> as expensive as IDE disks, not to mention the little matter of a 
> tape drive that costs as much as a server.  raid5 makes backup
> less about robustness than about archiving or rogue-rm-protection.
> I think the next step is primarily a software one - 
> some means of managing storage, versioning, archiving, etc...
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  8 22:31:50 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 8 Oct 2003 22:31:50 -0400 (EDT)
Subject: CAD
In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver>
Message-ID: <Pine.LNX.4.44.0310082211420.1253-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Manoj Gupta wrote:

> Hello,
> 
> One of my clients has asked me to provide a solution for his AutoCAD
> work.
> The minimum file size on which he works is nearly of 400 MB and it takes
> 15-20 minutes to load on his single system.

Load from what into what?  It is hard for me to see how a 400 MB file
could take this long to load into memory over any modern channel, as
this is less than 0.5 MB/sec.  This is roughly the bandwidth one
achieves throwing floppies across a room one at a time by hand.

That is, I can't imagine how this is bandwidth limited, unless the
client has primitive hardware.  From a local disk (even a bad one) this
should take ballpark of a few seconds to load into memory.  From NFS
order of a minute or three (in most configurations, less on faster
networks).

If the load is so slow because the program is crunching the file as it
loads it (reading a bit, thinking a bit, reading a bit more) then
nothing can speed this up unless AutoCAD has a parallel version of their
program.

> Can Beowulf be used to solve this problem and minimize the time required
> so as to improve productivity?

I don't know for sure (although somebody else on the list might).  I
doubt it, though, unless autocad has a parallel version that can use a
linux cluster to speed things up.

However, your first step in answering it for yourself is going to be
doing measurements to determine what the bottleneck is.  If it is I/O
then invest in better I/O (perhaps a better network).  So measure e.g.
the network load if it is getting the file from a network file server.

If the problem is that the file is coming from a winXX server with too
little memory on an antique CPU and with creaky old disks on a 10 Mbps
hub, well, FIRST replace the winxx with linux, the old server with a new
server, the old disks with new disks, the 10 BT with 1000 BT.  At that
point you won't have a bandwidth problem, as the server should be able
to deliver files at some tens of MB/sec pretty easily.  If the problem
persists, try to figure out what autocad is doing when it loads.

   rgb

> 
> 
> Sawan Gupta || mg_india at sancharnet.in ||
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Thu Oct  9 02:00:33 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Wed, 8 Oct 2003 23:00:33 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.04.10310082210390.2228-100000@12-207-199-254.client.attbi.com>

On Wed, 8 Oct 2003, Mark Hahn wrote:
> > 	- get those drives w/ 8MB buffer disk cache
> 
> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.

I found a comparison of 8MB vs 2MB drives in a raid, though it's windows
based and not that great:
http://www.madshrimps.be/?action=getarticle&number=13&artpage=289&articID=69

Seems like the 8MB didn't really make much of a difference.

> > independent of which raid system is built, you wil need 2 or 3
> > more backup systems to backup your Terabyte sized raid systems
> 
> backup is hard.  you can get 160 or 200G tapes, but they're almost 
> as expensive as IDE disks, not to mention the little matter of a 

100GB LTO tapes can be had for $36, that's less than half the price of the
cheapest 200 GB drives. 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From maurice at harddata.com  Thu Oct  9 00:58:27 2003
From: maurice at harddata.com (Maurice Hilarius)
Date: Wed, 08 Oct 2003 22:58:27 -0600
Subject: building a RAID system
In-Reply-To: <200310090112.h991CPb24907@NewBlue.scyld.com>
Message-ID: <5.1.1.6.2.20031008225509.04259800@mail.harddata.com>

Where you said:

>I would like to know some advice about what kind of technology apply
>into a RAID file server ( through NFS ) . We started choosing hardware
>RAID to reduce cpu usage.
>
>We have two options , SCSI RAID and ATA RAID. The first would give the
>best results but on the other hand becomes really expensive so we have
>in mind two ATA RAID controllers:
>
>                 Adaptec 2400A
>                 3Ware 6000/7000 series controllers

I would suggest using the 3Ware (current models are 7506 ( parallel ATA) 
and 8506 ( Serial ATA)).


Use mdamd to create software RAID devices.
It will yield better performance, and is much more flexible.
If you are building a large array, use multiple controllers to increase 
throughput.


With our best regards,

Maurice W. Hilarius       Telephone: 01-780-456-9771
Hard Data Ltd.               FAX:       01-780-456-9772
11060 - 166 Avenue        mailto:maurice at harddata.com
Edmonton, AB, Canada      http://www.harddata.com/
    T5X 1Y3

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 03:52:39 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 00:52:39 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.04.10310082210390.2228-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.3.96.1031009004145.30204A-100000@Maggie.Linux-Consulting.com>


hi ya

On Wed, 8 Oct 2003, Trent Piepho wrote:

> On Wed, 8 Oct 2003, Mark Hahn wrote:
> > > 	- get those drives w/ 8MB buffer disk cache
> > 
> > what reason do you have to regard 8M as other than a useless
> > marketing feature?  I mean, the kenel has a cache that's 100x
> > bigger, and a lot faster.
> 
> I found a comparison of 8MB vs 2MB drives in a raid, though it's windows
> based and not that great:
> http://www.madshrimps.be/?action=getarticle&number=13&artpage=289&articID=69

i dont have much data between 2MB and 8MB ... just various people's
feedback ...
	- releasable data i do have is at
	http://www.Linux-1U.net/Disks/Tests/
 
- testing for 2MB and 8MB should be done on the same system  of the
  same sized disks and exact same partition, distro, patchlevel and "test
  programs to amplify the differences"
	- lots of disk writes and reads ... that overflow the memory
	so that disk access is forced ...

> Seems like the 8MB didn't really make much of a difference.
> 
> > > independent of which raid system is built, you wil need 2 or 3
> > > more backup systems to backup your Terabyte sized raid systems

-- emphasizing .. "Terabyte" sized disk subsystems

> > backup is hard.  you can get 160 or 200G tapes, but they're almost 
> > as expensive as IDE disks, not to mention the little matter of a 
> 
> 100GB LTO tapes can be had for $36, that's less than half the price of the
> cheapest 200 GB drives. 

we/i like to build systems that backup 1TB or 2TB per 1U server ...
 	- tapes doesn't come close ... different ballpark 

	- a rack of 1U servers is a minimum of 40TB - 80TB of data ..

	- and than to turn around and simulate a disk crash
	and restore from backups from bare metal   or how fast
	to get a replacement system back online ( hot swap - live backups)

	- i think those 200GB tape drives is something to also
	add into the costs of backup media  .. as are restore
	from tape considerations  before deciding on tape vs disk 
	backup media ( all depends on the purpose of the server and data )

	- last i played with tape drives was those $3K - $4K exabyte tape
	drives ... nice and fast (writing) ..  but very slow for 
	restore and unreliable ... and time consuming and NOT automated

	- people costs the mosts for doing proper backups ... 
	( someone has to write the backup methodology ro swap the tapes 
	etc )

fries ( a local pc store here ) had 160GB disks 8MB buffers for $80 after
rebates ... otherwise general rule is $1 per GB of raw disk storage per
disk

fun stuff ..

have fun
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Thu Oct  9 05:25:18 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Thu, 9 Oct 2003 02:25:18 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.3.96.1031009004145.30204A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.04.10310090130570.2480-100000@12-207-199-254.client.attbi.com>

On Thu, 9 Oct 2003, Alvin Oga wrote:
> > > backup is hard.  you can get 160 or 200G tapes, but they're almost 
> > > as expensive as IDE disks, not to mention the little matter of a 
> > 
> > 100GB LTO tapes can be had for $36, that's less than half the price of the
> > cheapest 200 GB drives. 
> 
> we/i like to build systems that backup 1TB or 2TB per 1U server ...
>  	- tapes doesn't come close ... different ballpark 

How do you stick 2TB in a 1 U server?  I've seen 1U cases with four IDE bays,
and the largest IDE drive I've seen is 250 GB.

I've got two 4U rackmount systems sitting side by side on the same shelf.  One
is a ADIC Scalar 24, which holds 24 100 GB LTO tapes.  The other is a 16 drive
server with 200GB SATA drives and two 8 port 3ware cards.  The tape library
has 2.4 TB and the IDE server is 3.2 TB.  To be fair, the IDE server is brand
new, while the ADIC is around a year old.  If the tape library were bought
today, it would have a LTO-2 drive with double the capacity and could store
4.8 TB.  So tapes seem to come pretty close to me.  It also quite a bit more
practical changes tapes with the library than to be swapping hard drives
around.  The libraries built in barcode reader keeps track of them all for me. 
I can type a command and have it spit out all the tapes that a certain set of
backups are on.  They fit nicely in a box in their plastic cases and if I drop
one it will be ok.  I can stick them on a shelf for five years and still
expect to read them.  And the tapes don't take up any rackspace or power or
need any cooling.  I've never had a tape go bad on me either, even though I've
been though a lot more of them than IDE drives.

Of course the tape library was expensive.  A new LTO-2 model can be had for
around $11,600 on pricewatch.  The 16 bay IDE case, CPUs/MB/memory and 3ware
controllers were much less.  But the cost of the media is a lot less for tapes
than for SATA hard drives.  Especially if you get models with 3 year
warranties.  Once you buy enough drives/tapes you'll break even on a $/GB
comparison.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Thu Oct  9 06:04:20 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Thu, 9 Oct 2003 10:04:20 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
 <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>

Greg,

> Is it a 100x100 matrix LU decomposition? Well, no, because Intel's
> MKL and the free ATLAS library run at a respectable % of peak.

Our benchmarks concentrate on xGEQRF, xGESVD, xGETRF, xGETRS, xGESV,
xPOTRF, xPOSV, xPPTRF, xGEEV, extending to xGETRI, and xTRTRI.

Have you tried DPPSV or DPOSV on Itanium, for example?  I would be
interested in the percentage of peak that you achieve with MKL and
ATLAS, for up to 10000x10000 matrices.

ATLAS does not have full LAPACK implementation.

> 4) Put your performance whitepapers on your website, or it looks
> fishy.

Our white papers are not on the Web they contain performance data, and
particularly, performance data comparing against our competitors.  It
may expose us to libel legal issues.  Putting legitimacy of any legal
issues aside, it is not good for any business to be engulf in legal
squabbles.  We are in the process of clearing this with our legal
department at the moment.

As I have noted in my previous e-mail, anyone who wants to get a hold
of the white papers are welcome to please send me an e-mail.

> I looked and didn't see a single performance claim there.

There is one on the front page!


Ken
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 06:13:21 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 03:13:21 -0700 (PDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.4.04.10310090130570.2480-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com>


On Thu, 9 Oct 2003, Trent Piepho wrote:

> On Thu, 9 Oct 2003, Alvin Oga wrote:
> > > > backup is hard.  you can get 160 or 200G tapes, but they're almost 
> > > > as expensive as IDE disks, not to mention the little matter of a 
> > > 
> > > 100GB LTO tapes can be had for $36, that's less than half the price of the
> > > cheapest 200 GB drives. 
> > 
> > we/i like to build systems that backup 1TB or 2TB per 1U server ...
> >  	- tapes doesn't come close ... different ballpark 
> 
> How do you stick 2TB in a 1 U server?  I've seen 1U cases with four IDE bays,
> and the largest IDE drive I've seen is 250 GB.

8 drives ... 250GB or 300GB each ..

> I've got two 4U rackmount systems sitting side by side on the same shelf.  One
> is a ADIC Scalar 24, which holds 24 100 GB LTO tapes.  The other is a 16 drive
> server with 200GB SATA drives and two 8 port 3ware cards.  The tape library
> has 2.4 TB and the IDE server is 3.2 TB.  To be fair, the IDE server is brand
> new, while the ADIC is around a year old.  If the tape library were bought
> today, it would have a LTO-2 drive with double the capacity and could store
> 4.8 TB.  So tapes seem to come pretty close to me.  It also quite a bit more
> practical changes tapes with the library than to be swapping hard drives

nobody swaps disks around ... unless one is using those 5.25" drive bay
thingies in which case ... thats a different ball game 

i/we claim that if the drives fail, something is wrong ... its not
necessary for the disks to be removable 

> around.  The libraries built in barcode reader keeps track of them all for me. 
> I can type a command and have it spit out all the tapes that a certain set of
> backups are on.  They fit nicely in a box in their plastic cases and if I drop
> one it will be ok.  I can stick them on a shelf for five years and still

i prefer hands off backups and restore .... esp if the machine is not
within your hands reach ...

> expect to read them.  And the tapes don't take up any rackspace or power or
> need any cooling.  I've never had a tape go bad on me either, even though I've
> been though a lot more of them than IDE drives.
> 
> Of course the tape library was expensive.  A new LTO-2 model can be had for
> around $11,600 on pricewatch.  The 16 bay IDE case, CPUs/MB/memory and 3ware

for $11.6K ... i can build two 2TB servers or more ...
	8 * $400 --> $3200 in drives  ... for 2.4TB each ...  
		 + $700 for misc cpu/mem/1u case
		and it'd be 2 live backups of the primary 2TB system
		or about 2-3 months of weekly full backups depending ondata

> controllers were much less.  But the cost of the media is a lot less for
tapes
> than for SATA hard drives.  Especially if you get models with 3 year
> warranties.  Once you buy enough drives/tapes you'll break even on a $/GB
> comparison.

i dont want to be baby sitting tapes ... on a daily basis and cleaning its
heads  or assume that someone else did

c ya
alvin 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From seth at hogg.org  Thu Oct  9 06:38:54 2003
From: seth at hogg.org (Simon Hogg)
Date: Thu, 09 Oct 2003 11:38:54 +0100
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.co
 m>
References: <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
 <20031006112102.GC15837@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
 <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
Message-ID: <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net>

At 10:04 09/10/03 +0000, C J Kenneth Tan -- Heuchera Technologies wrote:

>Our white papers are not on the Web they contain performance data, and
>particularly, performance data comparing against our competitors.  It
>may expose us to libel legal issues.  Putting legitimacy of any legal
>issues aside, it is not good for any business to be engulf in legal
>squabbles.  We are in the process of clearing this with our legal
>department at the moment.
>
>As I have noted in my previous e-mail, anyone who wants to get a hold
>of the white papers are welcome to please send me an e-mail.

I would just like to comment that if you are releasing the white papers by 
email, what difference is that to putting it on the web?  They are both 
still publishing.

Although IANAL, I would doubt that these figures expose you legally, as 
long as they are correct and truthful in the figures you claim (and 
probabily the methodology would be pretty handy, too).

Simon Hogg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Thu Oct  9 06:31:00 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Thu, 09 Oct 2003 06:31:00 -0400
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com>
References: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com>
Message-ID: <3F8538E4.9020400@lmco.com>


> > How do you stick 2TB in a 1 U server?  I've seen 1U cases with four 
> IDE bays,
> > and the largest IDE drive I've seen is 250 GB.
>
> 8 drives ... 250GB or 300GB each ..
>

   Cool. Do you have pictures? How do you get the other 4 drives
out? I assume they're not accessible from the front so do you
have to pull the unit out, pop the cover and replace the drive?

> > I've got two 4U rackmount systems sitting side by side on the same 
> shelf.  One
> > is a ADIC Scalar 24, which holds 24 100 GB LTO tapes.  The other is 
> a 16 drive
> > server with 200GB SATA drives and two 8 port 3ware cards.  The tape 
> library
> > has 2.4 TB and the IDE server is 3.2 TB.  To be fair, the IDE server 
> is brand
> > new, while the ADIC is around a year old.  If the tape library were 
> bought
> > today, it would have a LTO-2 drive with double the capacity and 
> could store
> > 4.8 TB.  So tapes seem to come pretty close to me.  It also quite a 
> bit more
> > practical changes tapes with the library than to be swapping hard 
> drives
>
> nobody swaps disks around ... unless one is using those 5.25" drive bay
> thingies in which case ... thats a different ball game
>
> i/we claim that if the drives fail, something is wrong ... its not
> necessary for the disks to be removable
>

   Are you saying that it's not necessary to have hot-swappable
drives? (I'm just trying to undertand your point).

   Does everyone remember this:

http://www.tomshardware.com/storage/20030425/index.html

My only problem with this approach is off-site storage of
backups. Do you pull a huge number of drives and move them
off-site? (I still love the idea of using inexpensive drives for
backup instead of tape though).

Jeff

-- 
Dr. Jeff Layton
Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Thu Oct  9 07:07:26 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Thu, 9 Oct 2003 11:07:26 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net>
References: <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
 <20031006112102.GC15837@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
 <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
 <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net>
Message-ID: <Pine.LNX.4.56.0310091101030.18109@krylov.OptimaNumerics.com>

Simon,

> I would just like to comment that if you are releasing the white papers by
> email, what difference is that to putting it on the web?  They are both
> still publishing.

I am not a lawyer, so I cannot comment on the legal aspects of
things.

What if an e-mail and its attachments have a confidentiality clause
attached?


Ken
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 07:26:56 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 07:26:56 -0400 (EDT)
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.3.96.1031008192744.22819B-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>

On Wed, 8 Oct 2003, Alvin Oga wrote:

> > > 	- it supposedly can sustain 320MB/sec transfers
> > 
> > that's silly, of course.  outer tracks of current disks run at 
> > between 50 and 100 MB/s, so that's the max sustained.  you can even
> > argue that's not really 'sustained', since you'll eventually get
> > to slower inner tracks.
> 
> yup ... those are just marketing numbers... all averages ...

It probably refers to burst delivery out of its 8 MB cache.  The actual
sustained bps speed is a pure matter of N*2*\pi*\R*f/S, where N = number
of heads, 2piRf is the linear/tangential speed of the platter at R the
read radius, and S is the linear length per bit.  This is an upper
bound.  Similarly average latency (seek time) is something like 1/2f,
the time the platter requires to move half a rotation.

> and bigg differences between inner tracks and outer tracks

Well, proportional to R, at any rate.  Given the physical geometry of
the platters (which I get to look at when I rip open old drives to
salvage their magnets) about a factor of two.

> > > independent of which raid system is built, you wil need 2 or 3
> > > more backup systems to backup your Terabyte sized raid systems
> > 
> > backup is hard.  you can get 160 or 200G tapes, but they're almost 
> 
> to me ... backup of terabyte sized systems is trivial ...
> 	- just give me lots of software raid subsystems
> 	( 2 backups for each "main" system )
> 
> 	- lot cheaper than tape drives and 1000x faster than tapes
> 	for live backups
> 
> 	- will never touch a tape backup again ... too sloow
> 	and too unreliable no matter how clean the tape heads are
> 		( too slow being the key problem for restoring )

C'mon, Alvin.  Sometimes this is a workable solution, sometimes it just
plain is not.  What about archival storage?  What about offsite storage?
What about just plain moving certain data around (where networks of ANY
sort might be held to be untrustworthy).  What about due diligence if
you were are corporate IT exec held responsible for protecting client
data against loss where the data was worth real money (as in millions to
billions) compared to the cost of archival media and mechanism?  "never
touch a tape backup again" is romantic and passionate, but not
necessarily sane or good advice for the vast range of humans out there.

To backup a terabyte scale system, one needs a good automated tape
changer and a pile of tapes.  These days, this will (as Mark noted) cost
more than your original RAID, in all probability, although this depends
on how gold-plated your RAID is and whether or not you install two of
them and use one to backup the other.  I certainly don't have a tape
changer in my house as it would cost more than my server by a factor
of two or three to set up.  I backup key data by spreading it around on
some of the massive amounts of leftover disk that accumulates in any LAN
of systems in days where the smallest drives one can purchase are 40-60
GB but install images take at most a generous allotment of 5 GB
including swap.

In the physics department, though, we are in the midst of a perpetual
backup crisis, because it IS so much more expensive than storage and our
budget is limited.  Our primary department servers are all RAID and
total (IIRC) over a TB and growing.  We do actually back up to disk
several times a day so that most file restores for dropped files take at
most a few seconds to retrieve (well, more honestly a few minutes of FTE
labor between finding the file and putting it back in a user's home
directory).  However, we ALSO very definitely make tape backups using a
couple of changers, keep offsite copies and long term archives, and give
users tapes of special areas or data on request.  The tape system is
expensive, but a tiny fraction of the cost of the loss of data due to
(say) a server room fire, or a monopole storm, or a lightning strike on
the primary room feed that fries all the servers to toast.

I should also point out that since we've been using the RAIDs we have
experienced multidisk failures that required restoring from backup on
more than one occasion.  The book value probability for even one
occasion is ludicrously low, but the book value assumes event
independence and lies.  Disks are often bought in batches, and batches
of disk often fail (if they fail at all) en masse.  Failures are often
due to e.g. overheating or electrical problems, and these are often
common to either all the disks in an enclosure or all the enclosures in
a server room.

I don't think a sysadmin is ever properly paranoid about data loss until
they screw up and drop somebody's data for which they were responsible
because of inadequate backups.  Our campus OIT just dropped a big chunk
of courseware developed for active courses this fall because they
changed the storage system for the courseware without verifying their
backup, experienced a crash during the copy over, and discovered that
the backup was corrupt.  That's real money, people's effort, down the
drain.

Pants AND suspenders.  Superglue around the waistband, actually.  Who
wants to be caught with their pants down in this way?

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 09:16:43 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 09:16:43 -0400 (EDT)
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
Message-ID: <Pine.LNX.4.44.0310090818050.1253-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote:

> > 4) Put your performance whitepapers on your website, or it looks
> > fishy.
> 
> Our white papers are not on the Web they contain performance data, and
> particularly, performance data comparing against our competitors.  It
> may expose us to libel legal issues.  Putting legitimacy of any legal

Expose you to libel suits? Say what?

Only if you lie about your competitor's numbers (or "cook" them so that
they aren't an accurate reflection of their capabilities, as is often
done in the industry) does it expose you to libel charges or more likely
to the ridicule of the potential consumers (who tend to be quite
knowledgeable, like Greg).

One essential element to win those crafty consumers over is to compare
apples to apples, not apples to apples that have been picked green,
bruised, left on the ground for a while in the company of some handy
worms, and then picked up so you can say "look how big and shiny and red
and worm-free our apple is and how green and tiny and worm-ridden our
competitor's apple is".  A wise consumer is going to eschew BOTH of your
"display apples" (as your competitor will often have an equally shiny
and red apple to parade about and curiously bruised and sour apples from
YOUR orchard) and instead insist on wandering into the various orchards
to pick REAL apples from your trees for their OWN comparison.

What exactly prevents you from putting your own raw numbers up, without
any listing of your competitor's numbers?  You can claim anything you
like for your own product and it isn't libel.  False advertising,
possibly, but not libel.  Or put the numbers up with your competitor's
numbers up "anonymized" as A, B, C.  And nobody will sue you for beating
ATLAS/GCC/GSL numbers -- ATLAS etc are open source tools and nobody
"owns" them to sue you or cares in the slightest if you beat them. The
most that might happen is that if you manipulate(d) ATLAS numbers so
they aren't what real humans get on real systems, people might laugh at
you or more likely just ignore you thereafter.

What makes you any LESS liable to libel if you distribute the white
papers to (potential) customers individually?  Libel is against the law
no matter how, and to who, you distribute libelous material; it is
against the law even if shrouded in NDA. It is against the law if you
whisper it in your somebody's ears -- it is just harder to prove.
Benchmark comparisons, by the way, are such a common marketing tool (and
so easily bent to your own needs) that I honestly think that there is a
tacit agreement among vendors not to challenge competitors' claims in
court unless they are openly egregious, only to put up their own
competing claims.  After all, no sane company WOULD actually lie, right
-- they would have a testbed system on which they could run the
comparisons listed right there in court and everybody knows it.  Whether
the parameters, the compiler, the system architecture, the tests run
etc. were carefully selected so your product wins is moot -- if it ain't
a lie it ain't libel, and it is caveat emptor for the rest (and the rest
is near universal practice -- show your best side, compare to their
worst).

> issues aside, it is not good for any business to be engulf in legal
> squabbles.  We are in the process of clearing this with our legal
> department at the moment.
> 
> As I have noted in my previous e-mail, anyone who wants to get a hold
> of the white papers are welcome to please send me an e-mail.

As if your distributing them on a person by person basis is somehow less
libelous?  Or so that you can ask me to sign an NDA so that your
competitors never learn that you are libelling them?  I rather think
that an NDA that was written to protect illegal activity be it libel or
drug dealing or IP theft would not stand up in court.  Finally, product
comparisons via publically available benchmarks of products that are
openly for sale don't sound like trade secrets to me as I could easily
duplicate the results at home (or not) and freely publish them.

Your company's apparent desire to conceal this comes across remarkably
poorly to the consumer.  It has the feel of "Hey, buddy, wanna buy a
watch?  Come right down this alley so I can show you my watches where
none of the bulls can see" compared to an open storefront with your
watches on display to anyone, consumer or competitor.  This is simply my
own viewpoint, of course.  I've simply never heard of a company
shrinking away from making the statement "we are better than our
competitors and here's why" as early and often as they possibly could.
AMD routinely claims to be faster than Intel and vice versa, each has
numbers that "prove" it -- for certain tests that just happen to be the
tests that they tout in their claims, which they can easily back up.
For all the rest of us humans, our mileage may vary and we know it, and
so we mistrust BOTH claims and test the performance of our OWN programs
on both platforms to see who wins.  

I'm certain that the same will prove true for your own product.  I don't
care about your benchmarks except as a hook to "interest" me.  Perhaps
they will convince me to get you to loan me access to your libraries etc
to link them into my own code to see if MY code speeds up relative to
the way I have it linked now, or relative to linking with a variety of
libraries and compilers.  Then I can do a real price/performance
comparison and decide if I'm better off buying your product (and buying
fewer nodes) or using an open source solution that is free (and buying
more nodes).  Which depends on the scaling properties of MY application,
costs, and so forth, and cannot be predicted on the basis of ANY paper
benchmark.

Finally, don't assume that this audience is naive about benchmarking or
algorithms, or at all gullible about performance numbers and vendor
claims.  A lot of people on the list (such as Greg) almost certainly
have far more experience with benchmarks than your development staff;
some are likely involved in WRITING benchmarks.  If you want to be taken
seriously, put up a full suite of benchmarks, by all means, and also
carefully indicate how those benchmarks were run as people will be
interested in duplicating them and irritated if they are unable to.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jsims at csiopen.com  Thu Oct  9 09:02:11 2003
From: jsims at csiopen.com (Joey Sims)
Date: Thu, 9 Oct 2003 09:02:11 -0400
Subject: building a RAID system - 8 drives  
Message-ID: <812B16724C38EE45A802B03DD01FD5472A3BF4@exchange.concen.com>

300GB Maxtor ATA133 5400RPM drives are the largest currently available.  250GB is the largest SATA currently.  You can achieve 2TB in a 1U by using a drive sled that will hold two drives.  The drives are mounted opposing each other and share a backplane.  This is a proprietary solution.  Or, if you have a chassis with 4 external trays and a few internal 3.5" bays it could be done.  I personally don't believe cramming this many drives in a 1U is a good idea.  Increased heat due to lack of airflow would have to decrease the lifespan of the drives.

----------------------------------------------------
|Joey P. Sims		          800.995.4274 x 242
|Sales Manager		          770.442.5896 - Fax
|HPC/Storage Division	           jsims at csiopen.com
|Concentric Systems,Inc.             www.csilabs.net
----------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Thu Oct  9 07:02:57 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Thu, 09 Oct 2003 07:02:57 -0400
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.3.96.1031009042433.22865A-100000@Maggie.Linux-Consulting.com>
References: <Pine.LNX.3.96.1031009042433.22865A-100000@Maggie.Linux-Consulting.com>
Message-ID: <3F854061.3040208@lmco.com>

Alvin Oga wrote:

>
> On Thu, 9 Oct 2003, Jeff Layton wrote:
>
> > > > How do you stick 2TB in a 1 U server?  I've seen 1U cases with four
> > > IDE bays,
> > > > and the largest IDE drive I've seen is 250 GB.
> > >
> > > 8 drives ... 250GB or 300GB each ..
> > >
> >
> >    Cool. Do you have pictures? How do you get the other 4 drives
> > out? I assume they're not accessible from the front so do you
> > have to pull the unit out, pop the cover and replace the drive?
>
> yup.. pull the cover off and pop out the drive the hard way vs
> "hot swap ide tray"
>
>         autocad generated *.jpg file
>         http://linux-1u.net/Dwg/jpg.sm/c2500.jpg
>
>         ( newer version has the mb and ps swapped for better cpu cooling)
>         http://linux-1u.net/Dwg/jpg.sm/c2610.jpg ( also holds 8 drives )
>
> > > i/we claim that if the drives fail, something is wrong ... its not
> > > necessary for the disks to be removable
> > >
> >
> >    Are you saying that it's not necessary to have hot-swappable
> > drives? (I'm just trying to undertand your point).
>
> if the drive is dying ....
>         - find out which brand/model# it is and avoid it
>         - find out if others are having similar problems
>         - put a 40x40x20mm fan on the (7200rpm) disks and see if it helps
>
> i'm not convinced that hotswap ide works w/o special ide controllers
>         - pull the ide disk out while its powered up
>         - pull the ide disk out while you're writing a 2GB file to it
>
>         - or insert the disk while the rest of the systme is up and
>         running
>
> if you have to power down to take the ide disk out, you might as
> well do a clean shutdown  and replace the disk the hard way with
> a screw driver instead of nice ($50 expensive) drive bay handle
>         $ 50 can be an extra 80GB of disk space when a good sale
>         is occuring at the local fries stores 
>

   We've got several NAS boxes with hot-swappable IDE drives
and without it we'd be toast. Granted the controller is specialized,
coming from one vendor, but it allows us to have a fail-over
drive with auto-rebuild in the background. Then we just pull
the bad drive, put in a new one, and designate it as the new hot
spare. Works great! It's saved our bacon a few times. I've
wanted to test hot-swap with 3ware controllers, but have
never done it. Has anyone tested the hotswap capability of
the 3ware controllers/cases?
   Another comment. If you have to pull the node to replace
the drive, then you have to bring down the filesystem which
might not be the best thing to do. Hot-swapping allows the
filesystem to keep functioning, albeit at a lower performance
level.

> >    Does everyone remember this:
> >
> > http://www.tomshardware.com/storage/20030425/index.html
> >
> > My only problem with this approach is off-site storage of
> > backups. Do you pull a huge number of drives and move them
> > off-site? (I still love the idea of using inexpensive drives for
> > backup instead of tape though).
>
> i suppose you can do "incremental" backups across the wire ...
> and "inode" based backups too ...
>
>         - it'd be crazy to xfer the entire 1MB file if
>         only 1 line changed in it
>

   We can't do backups across the wire to an offsite storage
facility. So we have to do backups, pull the tapes, and store
them off-site. I'm just not sure how this would work with
disks instead of tapes. Oh, you can full and incremental backups
to disk - most backup software doesn't care what the media
is anyway - but I'm just not sure if you pull a set of disks
and store them. How does off-site backup recovery work?
Do you pop them in, mount them as read-only, and copy them
to a live filesystem? However, despite all of these questions,
at some point soon, disk will be the only way to get backups
of LARGE filesystems in a reasonable amount of time.

Jeff

-- 
Dr. Jeff Layton
Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 07:36:40 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 04:36:40 -0700 (PDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <3F8538E4.9020400@lmco.com>
Message-ID: <Pine.LNX.3.96.1031009042433.22865A-100000@Maggie.Linux-Consulting.com>


On Thu, 9 Oct 2003, Jeff Layton wrote:

> > > How do you stick 2TB in a 1 U server?  I've seen 1U cases with four 
> > IDE bays,
> > > and the largest IDE drive I've seen is 250 GB.
> >
> > 8 drives ... 250GB or 300GB each ..
> >
> 
>    Cool. Do you have pictures? How do you get the other 4 drives
> out? I assume they're not accessible from the front so do you
> have to pull the unit out, pop the cover and replace the drive?

yup.. pull the cover off and pop out the drive the hard way vs
"hot swap ide tray"

	autocad generated *.jpg file
	http://linux-1u.net/Dwg/jpg.sm/c2500.jpg

 	( newer version has the mb and ps swapped for better cpu cooling)
	http://linux-1u.net/Dwg/jpg.sm/c2610.jpg ( also holds 8 drives )

> > i/we claim that if the drives fail, something is wrong ... its not
> > necessary for the disks to be removable
> >
> 
>    Are you saying that it's not necessary to have hot-swappable
> drives? (I'm just trying to undertand your point).

if the drive is dying .... 
	- find out which brand/model# it is and avoid it
 	- find out if others are having similar problems
	- put a 40x40x20mm fan on the (7200rpm) disks and see if it helps

i'm not convinced that hotswap ide works w/o special ide controllers
	- pull the ide disk out while its powered up
	- pull the ide disk out while you're writing a 2GB file to it

	- or insert the disk while the rest of the systme is up and
	running

if you have to power down to take the ide disk out, you might as
well do a clean shutdown  and replace the disk the hard way with
a screw driver instead of nice ($50 expensive) drive bay handle
	$ 50 can be an extra 80GB of disk space when a good sale 
	is occuring at the local fries stores	

>    Does everyone remember this:
> 
> http://www.tomshardware.com/storage/20030425/index.html
> 
> My only problem with this approach is off-site storage of
> backups. Do you pull a huge number of drives and move them
> off-site? (I still love the idea of using inexpensive drives for
> backup instead of tape though).

i suppose you can do "incremental" backups across the wire ...
and "inode" based backups too ... 

	- it'd be crazy to xfer the entire 1MB file if
	only 1 line changed in it

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Thu Oct  9 08:24:20 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Thu, 9 Oct 2003 14:24:20 +0200 (CEST)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310091414440.29108-100000@kenzo.iwr.uni-heidelberg.de>

On Wed, 8 Oct 2003, Mark Hahn wrote:

> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.

Yes, but the kernel might be dumb at times, like when splitting large 
requests into small pieces to be fed to the block subsystem just to be 
reassembled again before being sent to the disk :-)
Another issue is how this memory is used by the drive firmware. I've seen 
tests that show some Fujitsu SCSI disks (MAN or MAP series, IIRC) perform 
much better than competitors in multi-user situations (lots of different 
files accessed by different users, supposedly scattered on the disk) while 
the competitors were better at streaming media (one big file used by a 
single user, supposedly contiguously placed on disk).

> unless your workload is dominated by tiny, random seeks,

Or your file-system becomes full and thus fragmented. Been there, done 
that! I've had a big storage device changed from ext3 to XFS because ext3 
at about 50% fragmentation was horribly slow; XFS allows live (without
unmounting or mounting "ro") defragmentation.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 09:42:46 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 09:42:46 -0400 (EDT)
Subject: building a RAID system - yup - superglue
In-Reply-To: <Pine.LNX.3.96.1031009044052.22865B-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310090923090.1253-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Alvin Oga wrote:

> > Pants AND suspenders.  Superglue around the waistband, actually.  Who
> > wants to be caught with their pants down in this way?
> 
> always got bit by tapes... somebody didnt change the tape on the 13th
> a couple months ago ... and critical data is now found to be missing
> 	- people do forget to change tapes ... or clean heads...
> 	( thats the part i dont like about tapes .. and is the most
> 	( common failure mode for tapes ... easily/trivially avoided by
> 	( disks-to-disk backups
> 
> 	- people get sick .. people go on vacations .. people forget
> 
> 
> - no (similar) problems since doing disk-to-disk backups
> 	- and i usually have 3-6 months of full backups floating around
> 	in compressed form

All agreed.  And tapes aren't that permanent a medium either -- they
deteriorate on a timescale of years to decades, with data bleeding
through the film, dropped bits due to cosmic ray strikes,
depolymerization of the underlying tape itself.  Even before the tape
itself is unreadable, you are absolutely certain to be unable to find a
working drive to read it with.  I have a small pile of obsolete tapes in
my office -- tapes made with drives that no longer "exist", and that is
after dumping the most egregiously useless of them.

Still, I'd argue that the best system for many environments is to use
all three: RAID, real backup to (separate) disk, possibly a RAID as
well, and tape for offsite and archival purposes.  The first two layers
protect you against the TIME required to handle users accidentally
deleting files (the most common reason to access a backup) as retrieval
is usually nearly instantaneous and not at all labor intensive.  It also
protects you agains the most common single-server failures that get past
the protection of RAID itself (multidisk failures, blown controllers).
The tape (with periodic offsite storage) protects you against server
room fire, brownouts or spikes that cause immediate data corruption or
disk loss on both original and backup servers, and tapes can be saved
for years -- far longer than one typically can go back on a disk backup
mechanism.  Users not infrequently want to get at a file version they
had LAST YEAR, especially if they don't use CVS.  Finally, some research
groups generate data that exceeds even TB-scale disk resources -- they
constantly move data in and out of their space in GB-sized chunks.  They
often like to create their own tape library as a virtual extension of
the active space.  Tapes aren't only about backup.

So you engineer according to what you can afford and what you need,
making the usual compromises brought about by finite resources.

BTW, one point that hasn't been made in the soft vs hard RAID argument
is that with hard RAID you are subject to (proprietary) HARDWARE
obsolescence, which typically is more difficult to control than
software.  You build a RAID, populate it, use it.  After a few years,
the RAID controller itself dies (but the disks are still good).

Can you get another?  One that can actually retrieve the data on your
disks?  There are no guarantees.  Maybe the company that made your
controller is still in business (or rather, still in the RAID business).
Maybe they either still carry old models, or can do depot repair, or
maybe new models can still handle the raid encoding they implemented
with the old model.  Maybe you can AFFORD a new model, or maybe it has
all sorts of new features and costs 3x as much as the first one did
(which may not have been cheap).  Maybe it takes you weeks to find a
replacement and restore access to your data.

Soft RAID can have problems of its own (if the software for example
evolves to where it is no longer backwards compatible) but it is a whole
lot easier to cope with these problems and they are strictly under your
control.  You are very unlikely to have any "event" like the death of
the RAID server that prevents you from retrieving what is on the disks
(at a cost likely to be quite controllable and in a timely way) as long
as the disks themselves are not corrupted.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From michael.worsham at mci.com  Thu Oct  9 09:07:25 2003
From: michael.worsham at mci.com (Michael.Worsham)
Date: Thu, 09 Oct 2003 09:07:25 -0400
Subject: CAD
Message-ID: <000201c38e66$49b2aa40$94022aa6@Wcomnet.com>

My wife works for a construction/architecture firm and handles AutoCad files
like this all the time (some even larger at times, depending on the client).

One thing we looked at first was what platform they were running the AutoCad
on. Windows XP or 95/98 can't really handle Autocad as it is a highly
intensive CPU application. We had a similar 'old' layout where CAD machines
were more based as a word processing workstation than as a CAD station.
Given the amount of work this firm produced in a single day, we went for a
Dual Xeon P4 setup w/ 4 GB ram and 36 GB SCSI hard drives loaded with
Windows 2000 Pro Workstation. When deciding the P4 hardware platform, look
for boards that have PCI-X slots... esp for giganet NIC cards and if needed,
Hardware RAID SCSI adapters. Refrain from using ATA, esp since CAD likes to
really utilize the hard drives and ATA would most likely wear out faster.
(Though some might look that using Xeon is overkill, lets just say there are
many times it has come in handy when the customer shows up on-site
unexpectedly and wants to see a progress report or has changes to be added.
Pulling up the program and the data file in a couple of seconds rather than
several minutes makes a beliver out of you in an instant.)

If the file is being downloaded from a file server, using standard 10/100
via a cheap hub isn't going to cut it. Best to utilize something of a
10/100/1000 switch (ie. copper giganet) and 10/100/1000 NICs in each of the
machines. Make sure the card is set for FULL-DUPLEX to fully utilize the
bandwidth needed esp for downloading large files from the file server. Based
on the file server specs, its is similar to that of the workstations however
it is running Windows 2000 Advanced Server w/ Veritas Backup... can't be too
careful for DR measures, esp with CAD files of this caliber.

-- M

Michael Worsham
MCI/Intermedia Communications
System Administrator & Applications Engineer
Phone: 813-829-6845   Vnet: 838-6845
E-mail: michael.worsham at mci.com

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 07:47:55 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 04:47:55 -0700 (PDT)
Subject: building a RAID system - yup - superglue
In-Reply-To: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.3.96.1031009044052.22865B-100000@Maggie.Linux-Consulting.com>


hi ya robert

On Thu, 9 Oct 2003, Robert G. Brown wrote:

> On Wed, 8 Oct 2003, Alvin Oga wrote:
> 
> > 	- will never touch a tape backup again ... too sloow
> > 	and too unreliable no matter how clean the tape heads are
> > 		( too slow being the key problem for restoring )
> 
> C'mon, Alvin.  Sometimes this is a workable solution, sometimes it just
> plain is not.  What about archival storage?  What about offsite storage?
> What about just plain moving certain data around (where networks of ANY
> sort might be held to be untrustworthy).  What about due diligence if
> you were are corporate IT exec held responsible for protecting client
> data against loss where the data was worth real money (as in millions to
> billions) compared to the cost of archival media and mechanism?  "never
> touch a tape backup again" is romantic and passionate, but not
> necessarily sane or good advice for the vast range of humans out there.

yup .. maybe an oversimplied statement ... tapes are my (distant) 2nd
choice for backups of xx-Terabyte sized servers..
	disk-to-disk being my first choice
	( preferrably to 2 other similar sized machines )

	( it's obviously not across a network :-)

i randomly restore from backups and do a diff w/  the current
servers before it dies ..

> Pants AND suspenders.  Superglue around the waistband, actually.  Who
> wants to be caught with their pants down in this way?

always got bit by tapes... somebody didnt change the tape on the 13th
a couple months ago ... and critical data is now found to be missing
	- people do forget to change tapes ... or clean heads...
	( thats the part i dont like about tapes .. and is the most
	( common failure mode for tapes ... easily/trivially avoided by
	( disks-to-disk backups

	- people get sick .. people go on vacations .. people forget


- no (similar) problems since doing disk-to-disk backups
	- and i usually have 3-6 months of full backups floating around
	in compressed form

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Thu Oct  9 09:26:45 2003
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Thu, 9 Oct 2003 09:26:45 -0400 (EDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <3F854061.3040208@lmco.com>
Message-ID: <Pine.LNX.4.44.0310090843340.27761-100000@chaos.egr.duke.edu>

On Thu, 9 Oct 2003 at 7:02am, Jeff Layton wrote

> spare. Works great! It's saved our bacon a few times. I've
> wanted to test hot-swap with 3ware controllers, but have
> never done it. Has anyone tested the hotswap capability of
> the 3ware controllers/cases?

Yes, and it works just as advertised.  To add my $.05 to the discussion, 
I'm a pretty big fan of the 3wares -- I currently have 5TB of formatted 
space (with about 2TB of data) on them.  I have two servers with 2 cards 
and 16 drives in them, and one with 1 card and 8 drives.  On the two board 
servers, I run the 3wares in hardware RAID mode (R5 with a hot spare), and 
then do a software stripe across the two hardware arrays.  With the boards 
on separate PCI busses, this lets the stripe go faster than the 266MB/s 
that the boards are limited to (these are 7500 boards, which are 64/33).

3ware's 3DM also lets you monitor the status of your arrays (it's almost 
too verbose, actually), and do all sorts of online maintenance.  Not 
having used mdadm much, I can't really compare the functionality of the 
two.  A couple of nice features of 3DM is that it lets you schedule array 
verification and background disk scanning, which can find problems before 
they affect the array.

I'm not sure what cases or backplane these systems use (I bought 'em from 
Silicon Mechanics, who I highly recommend), but the hot swap has always 
just worked.

If anyone's interested, I have benchmarks (bonnie++ and tiobench) of one 
of the 2 board systems using pure software RAID as well as the setup 
above.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 10:09:56 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 10:09:56 -0400 (EDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.3.96.1031009042433.22865A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310091003550.2897-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Alvin Oga wrote:

> > My only problem with this approach is off-site storage of
> > backups. Do you pull a huge number of drives and move them
> > off-site? (I still love the idea of using inexpensive drives for
> > backup instead of tape though).
> 
> i suppose you can do "incremental" backups across the wire ...
> and "inode" based backups too ... 
> 
> 	- it'd be crazy to xfer the entire 1MB file if
> 	only 1 line changed in it

   http://rdiff-backup.stanford.edu/

The name says it all.  I believe it is built on top of rsync -- at any
rate it is distributed in an rpm named librsync.

Awesome tool -- creates a mirror, then saves incremental compressed
diffs.  It is the way we can restore so quickly and yet maintain a
decent archival/historical backup where a user CAN request file X from
last friday (or even the version between the hours of midnight and noon
on last friday).  Efficient enough to run several times a day on the
most active part of your space and not eat a hell of a lot of either
disk or network BW.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Thu Oct  9 09:48:10 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Thu, 09 Oct 2003 09:48:10 -0400
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>
Message-ID: <1065707290.4708.28.camel@protein.scalableinformatics.com>

On Thu, 2003-10-09 at 07:26, Robert G. Brown wrote:

> users tapes of special areas or data on request.  The tape system is
> expensive, but a tiny fraction of the cost of the loss of data due to
> (say) a server room fire, or a monopole storm, or a lightning strike on
> the primary room feed that fries all the servers to toast.

Monopole storm... (smile) I seem to remember (old bad and likely wrong
memory) that Max Dresden had predicted one monopole per universe as a
consequence of the standard model.  Not my area of (former) expertise,
so reality may vary from my memory ...

[...]

> I don't think a sysadmin is ever properly paranoid about data loss until
> they screw up and drop somebody's data for which they were responsible
> because of inadequate backups.  Our campus OIT just dropped a big chunk

I always ask my customers a simple question:  What is the cost to you to
recreate all the data you lost when your disk/tape dies?  That is I tend
to recommend multiple redundant systems for backup.  I also like to
point out that you can build a single point of failure into any system,
and the cost of recovering from that failure needs to be considered when
designing systems to back up the possibly failing systems.  

If you backup all your systems over the network, and your network dies,
are you in a bad way when you need to restore?  What about, if you back
up everything to a single tape drive, and the drive dies (and you need
your backup).

Single points of failure are critical to identify.  They are also
critical to estimate impact from.  Most folks have a backup solution of
some sort.  Some of them are even reasonable, though few of them are
about to withstand a single failure in a critical component.

My old research group has a tape changer robot and drive from a well
known manufacturer.  Said well known manufacturer recently told them
that since the unit was EOLed about 2 years ago, there would be no more
fixes available for it.  They (the research group) told me that they
were having trouble with it... 

One tape drive, one point of failure.  Tape drive company is happy
because you now have to drop a chunk of change on their new units, or
scour eBay for old ones.


-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Thu Oct  9 08:30:45 2003
From: angel at wolf.com (Angel Rivera)
Date: Thu, 09 Oct 2003 12:30:45 GMT
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com> 
References: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com>
Message-ID: <20031009123045.7582.qmail@houston.wolf.com>

Alvin Oga writes: 

> 
> On Thu, 9 Oct 2003, Trent Piepho wrote: 
> 
>> On Thu, 9 Oct 2003, Alvin Oga wrote:

> nobody swaps disks around ... unless one is using those 5.25" drive bay
> thingies in which case ... thats a different ball game 

No quite true. We use Rare drives (one box) to move up to a TB of data 
around w/o having to take the time to create tapes and then download them. 
That takes a lot of time, even w/ LTOs. 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rtomek at cis.com.pl  Thu Oct  9 10:22:53 2003
From: rtomek at cis.com.pl (Tomasz Rola)
Date: Thu, 9 Oct 2003 16:22:53 +0200 (CEST)
Subject: PocketPC Cluster
In-Reply-To: <200310090947.33601.csamuel@vpac.org>
Message-ID: <Pine.LNX.3.96.1031009141233.7777A-100000@pioneer.space.nemesis.pl>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 9 Oct 2003, Chris Samuel wrote:

> Not strictly a Beowulf as there's no Linux, but interesting nonetheless. :-)
> 
> IrDA for the networking, 11 compute + 1 management, slower than "a mainstream 
> Pentium II-class desktop PC" (they don't specify what spec).
> 
> http://www.spbsoftwarehouse.com/dev/articles/pocketcluster/index.html

Yes, it's nice of course. One can also build such cluster with
Linux-based devices:

http://www.handhelds.org/

I myself would like to see if the performance changes after switching to
Linux.

One thing that should be considered is cooling. On my iPAQ, when cpu load
gets too high for too long, the joy button warms itself. This means, cpu
is even more heated.

The other issue is power consumption. If I understand what SBP did, they
run the cluster on electricity from the wall, not from the battery. My own
observetion suggests, that running high load on battery consumes about 2-3
times more power than things like reading html files.

- From the performance side, I wonder how this compares to the following
page:

http://www.applieddata.net/design_Benchmark.asp

which suggests StrongARM SA 1100 @200 is 3x faster than Pentium @166?

I was interested myself, so I ran the quick test on my own iPAQ 3630 (SA
1110 @206) and on AMD-k6-2 @475.

On iPAQ:

- -bash-2.05b# `which time` -p python /tmp/erasieve.py --limit 1000 --quiet
real 0.94
user 0.91
sys 0.04

On K6:

=>  (1020 29): /usr/bin/time -p  erasieve.py --limit 1000 --quiet
real 0.51
user 0.49
sys 0.02

So, how can 12 PocketPCs be slower than 1 p2 (with no clock given at
all, but if I remember they were about 500MHz at best)? If I haven't
misunderstood something, they probably didn't tuned their experiment too
well.

BTW, most PDA cpus lack fpu. So, while such claster may be nice to ad-hoc
password breaking, with nanoscale simulation it will be rather the
opposite, I think.

bye
T.

- --
** A C programmer asked whether computer had Buddha's nature.      **
** As the answer, master did "rm -rif" on the programmer's home    **
** directory. And then the C programmer became enlightened...      **
**                                                                 **
** Tomasz Rola          mailto:tomasz_rola at bigfoot.com             **

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
Charset: noconv

iQA/AwUBP4VvRBETUsyL9vbiEQJfvwCeLU3/270BajC74e+r2HEKs27QoXgAn0fP
C8FHl6mDchvmMBr04oWioqg0
=wFOr
-----END PGP SIGNATURE-----


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 10:32:12 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 10:32:12 -0400 (EDT)
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310091329030.18109@krylov.OptimaNumerics.com>
Message-ID: <Pine.LNX.4.44.0310091016500.2897-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote:

> Robert,
> 
> You covered some of the issues that we are addressing with our lawyers
> right now.  It's a process which, as knowledgeable as you are, I am
> sure you can understand we have to go through.

The comparison, sure, go through the process.  Putting your own numbers
up, no, I cannot see why you need lawyers to tell you you can do this.
How can somebody sue you for putting up the results of your own
good-faith tests of your own product?  There wouldn't be a manufacturer
in existence not bogged down in court if you could (successfully) sue
Tide for claiming that it gets clothes cleaner and removes stains when
the first time you wash a shirt with it the shirt remains dirty and
stains don't come out, for example.  Why, I myself would quit work and
live on the proceeds of my many suits, if every product out there had to
strictly live up to its claims.

The most recourse the consumer has is to not buy Tide (or whatever other
detergent offendeth thee, nothing against Tide but there are plenty of
stains NO detergent removes except maybe xylene or fuming nitric acid
based ones:-).  Or, if they are really irritated -- it is a GRASS stain
and the Tide ad on TV last night shows Tide succeeding against GRASS
stains in particular -- they can take the box back to the store and
likely get their money back. But sue Tide?  Only in Ralph Nader's
dreams...

Caveat emptor is more than a latin phrase, it is a principle of law.
You have to look at the horse's teeth yourself, or don't blame the
vendor for claiming that the old nag they sold you was really a young
and vibrant horse.  To them perhaps it was -- it is a question of just
what an old nag is (opinion) vs the age of the horse as indicated by its
teeth (fact).  Only if the claims are egregious (this here snake oil
will cause hair to grow on your head, cure erectile dysfunction, and
make you smell nice all for the reasonable price of a dollar a bottle)
is there any likelihood of grievance that might be addressed.

Surely your claims aren't egregious.  Your product doesn't slice, dice,
and even eat your meatloaf for you...does it?;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Thu Oct  9 11:48:02 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Thu, 09 Oct 2003 11:48:02 -0400
Subject: [Fwd: [Bioclusters] 2004 Bioclusters Workshop 1st Announcement --
	March 2004, Boston MA USA]
Message-ID: <1065714482.4713.73.camel@protein.scalableinformatics.com>

-----Forwarded Message-----
> =======================================================================
>               MEETING ANNOUNCEMENT / CALL FOR PRESENTERS
> =======================================================================
>                      BIOCLUSTERS 2004 Workshop
>                           March 30, 2004
>                  Hynes Convention Center, Boston MA USA
> =======================================================================
> 
>     * Speakers Wanted - Please Distribute Where Appropriate *
> 
>   Organized by several members of the bioclusters at bioinformatics.org
>   mailing list, the Bioclusters 2004 Workshop is a networking and
>   educational forum for people involved in all aspects of cluster and
>   grid computing within the life sciences.
> 
>   The motivation for organizers of this event was the cancellation of the
>   O'Reilly Bioinformatics Technology Conference series and the general
>   lack of forums for researchers and professionals involved with the
>   applied use of high performance IT and distributed computing techniques
>   in the life sciences.
> 
>   The primary focus of the workshop will be technical presentations from
>   experienced IT professionals and scientific researchers discussing real
>   world systems, solutions, use-cases and best practices.
> 
>   This event is being held onsite at the Hynes Convention Center on the
>   first day of the larger 2004 Bio-IT World Conference+Expo. BioIT-World
>   Magazine is generously providing space and logistical support for the
>   meeting and workshop attendees will have access to the expo floor and
>   keynote addresses. Registration & fees will be finalized in short
>   order.
> 
>   Presentations will be broken down among a few general content areas:
> 
>   1.	Researcher, Application & End user Issues
>   2.	Builder, Scaling & Integration Issues
>   3.	Future Directions
> 
>   The organizing committee is actively soliciting presentation proposals
>   from members of the life science and technical computing communities.
>   Interested parties should contact the committee at bioclusters04 at open-
>   bio.org.
> 
> 
>   Bioclusters 2004 Workshop Committee Members
> 
>   J.W Bizzaro ? Bioinformatics Organization Inc.
>   James Cuff  - MIT/Harvard Broad Institute
>   Chris Dwan  - The University of Minnesota
>   Chris Dagdigian ? Open Bioinformatics Foundation & BioTeam Inc.
>   Joe Landman ? Scalable Informatics LLC
> 
>   The committee can be reached at: bioclusters04 at open-bio.org
> 
> 
>   About the Bioclusters Mailing List Community
> 
>   The bioclusters at bioinformatics.org mailing list is a 600+ member forum
>   for users, builders and programmers of distributed systems used in life
>   science research and bioinformatics. For more information about the
>   list including the public archives and subscription information please
>   visit http://bioinformatics.org/mailman/listinfo/bioclusters
> 

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Thu Oct  9 10:35:16 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Thu, 09 Oct 2003 10:35:16 -0400
Subject: building a RAID system - yup - superglue
In-Reply-To: <Pine.LNX.4.44.0310090923090.1253-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310090923090.1253-100000@lilith.rgb.private.net>
Message-ID: <3F857224.9040801@lmco.com>

Robert G. Brown wrote:

> On Thu, 9 Oct 2003, Alvin Oga wrote:
>
> > > Pants AND suspenders.  Superglue around the waistband, actually.  Who
> > > wants to be caught with their pants down in this way?
> >
> > always got bit by tapes... somebody didnt change the tape on the 13th
> > a couple months ago ... and critical data is now found to be missing
> >       - people do forget to change tapes ... or clean heads...
> >       ( thats the part i dont like about tapes .. and is the most
> >       ( common failure mode for tapes ... easily/trivially avoided by
> >       ( disks-to-disk backups
> >
> >       - people get sick .. people go on vacations .. people forget
> >
> >
> > - no (similar) problems since doing disk-to-disk backups
> >       - and i usually have 3-6 months of full backups floating around
> >       in compressed form
>
> All agreed.  And tapes aren't that permanent a medium either -- they
> deteriorate on a timescale of years to decades, with data bleeding
> through the film, dropped bits due to cosmic ray strikes,
> depolymerization of the underlying tape itself.  Even before the tape
> itself is unreadable, you are absolutely certain to be unable to find a
> working drive to read it with.  I have a small pile of obsolete tapes in
> my office -- tapes made with drives that no longer "exist", and that is
> after dumping the most egregiously useless of them.
>
> Still, I'd argue that the best system for many environments is to use
> all three: RAID, real backup to (separate) disk, possibly a RAID as
> well, and tape for offsite and archival purposes. 
>

I can say with some authority that this is what we at Lockheed
Aeronautics do. And rather than extend this email by quoting
Bob below, we also have an HSM system that we use for data
we may need in the next couple of years.

Jeff


-- 
Dr. Jeff Layton
Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Thu Oct  9 07:59:54 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Thu, 9 Oct 2003 07:59:54 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310090749050.19503-100000@ra.thebes>


I would also echo most of Mark's points aside from the 8 MB cache issue.  
I have seen some noticeable speed improvements using 2 MB vs 8 MB drives.

I would also offer one other point.  No matter whether you use SCSI or 
IDE drives, be absolutely certain that you keep the drives cool.  The 
"internal" 3.5 bays in most cases are normally useless because they place 
several drives in almost direct contact.  The drive(s) sandwiched in the 
middle have only their edges exposed to air and have to dissipate the bulk 
of their heat through the neighboring drives.  I like mount the drives in 
5.25 bays.  This at least provides an air gap for some cooling.  For large 
raid servers, I like to use the cheap fan coolers.  They can be had for $5 
- $8 each and include 2 or 3 small fans that fill in the 5.25 opening and 
the 5.25-to-3.5 mounting brackets.  Of course, that makes for a lot of fan 
noise.

We typically build 2 identical raid servers connected by a dedicated
gigabit link to do nightly backups, both to protect from raid failure and
user error.

I would like to ask if anyone has investigated Benjamin LaHaise netmd
application yet?

http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf

I think there was some discussion of it a few months ago, but I haven't 
seen anything lately.

Thanks,

Mike Prinkey
Aeolus Research, Inc.

On Wed, 8 Oct 2003, Mark Hahn wrote:

> > 	- get those drives w/ 8MB buffer disk cache
> 
> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.
> 
> > 	- slower rpm disks ... usually it tops out at 7200rpm
> 
> unless your workload is dominated by tiny, random seeks,
> the RPM of the disk isn't going to be noticable.
> 
> > 	- it supposedly can sustain 133MB/sec transfers
> 
> it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE
> disks in raid0.  interestingly, the chipset controller is normally
> not competing for the same bandwidth as the PCI, so even with 
> entry-level hardware, it's not hard to break 133.
> 
> > 	- if you use software raid, you can monitor the raid status
> 
> this is the main and VERY GOOD reason to use sw raid.
> 
> > 	- some say scsi disks are faster ... 
> 
> usually lower-latency, often not higher bandwidth.  interestingly,
> ide disks usually fall off to about half peak bandwidth on inner 
> tracks.  scsi disks fall off too, but usually less so - they 
> don't push capacity quite as hard.
> 
> > 	- it supposedly can sustain 320MB/sec transfers
> 
> that's silly, of course.  outer tracks of current disks run at 
> between 50 and 100 MB/s, so that's the max sustained.  you can even
> argue that's not really 'sustained', since you'll eventually get
> to slower inner tracks.
> 
> > independent of which raid system is built, you wil need 2 or 3
> > more backup systems to backup your Terabyte sized raid systems
> 
> backup is hard.  you can get 160 or 200G tapes, but they're almost 
> as expensive as IDE disks, not to mention the little matter of a 
> tape drive that costs as much as a server.  raid5 makes backup
> less about robustness than about archiving or rogue-rm-protection.
> I think the next step is primarily a software one - 
> some means of managing storage, versioning, archiving, etc...
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Thu Oct  9 09:34:56 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Thu, 9 Oct 2003 13:34:56 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.44.0310090818050.1253-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310090818050.1253-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.56.0310091329030.18109@krylov.OptimaNumerics.com>

Robert,

You covered some of the issues that we are addressing with our lawyers
right now.  It's a process which, as knowledgeable as you are, I am
sure you can understand we have to go through.


Ken
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------


On Thu, 9 Oct 2003, Robert G. Brown wrote:

> Date: Thu, 9 Oct 2003 09:16:43 -0400 (EDT)
> From: Robert G. Brown <rgb at phy.duke.edu>
> To: C J Kenneth Tan -- Heuchera Technologies <cjtan at OptimaNumerics.com>
> Cc: Greg Lindahl <lindahl at keyresearch.com>, beowulf at beowulf.org
> Subject: Re: Intel compilers and libraries
>
> On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote:
>
> > > 4) Put your performance whitepapers on your website, or it looks
> > > fishy.
> >
> > Our white papers are not on the Web they contain performance data, and
> > particularly, performance data comparing against our competitors.  It
> > may expose us to libel legal issues.  Putting legitimacy of any legal
>
> Expose you to libel suits? Say what?
>
> Only if you lie about your competitor's numbers (or "cook" them so that
> they aren't an accurate reflection of their capabilities, as is often
> done in the industry) does it expose you to libel charges or more likely
> to the ridicule of the potential consumers (who tend to be quite
> knowledgeable, like Greg).
>
> One essential element to win those crafty consumers over is to compare
> apples to apples, not apples to apples that have been picked green,
> bruised, left on the ground for a while in the company of some handy
> worms, and then picked up so you can say "look how big and shiny and red
> and worm-free our apple is and how green and tiny and worm-ridden our
> competitor's apple is".  A wise consumer is going to eschew BOTH of your
> "display apples" (as your competitor will often have an equally shiny
> and red apple to parade about and curiously bruised and sour apples from
> YOUR orchard) and instead insist on wandering into the various orchards
> to pick REAL apples from your trees for their OWN comparison.
>
> What exactly prevents you from putting your own raw numbers up, without
> any listing of your competitor's numbers?  You can claim anything you
> like for your own product and it isn't libel.  False advertising,
> possibly, but not libel.  Or put the numbers up with your competitor's
> numbers up "anonymized" as A, B, C.  And nobody will sue you for beating
> ATLAS/GCC/GSL numbers -- ATLAS etc are open source tools and nobody
> "owns" them to sue you or cares in the slightest if you beat them. The
> most that might happen is that if you manipulate(d) ATLAS numbers so
> they aren't what real humans get on real systems, people might laugh at
> you or more likely just ignore you thereafter.
>
> What makes you any LESS liable to libel if you distribute the white
> papers to (potential) customers individually?  Libel is against the law
> no matter how, and to who, you distribute libelous material; it is
> against the law even if shrouded in NDA. It is against the law if you
> whisper it in your somebody's ears -- it is just harder to prove.
> Benchmark comparisons, by the way, are such a common marketing tool (and
> so easily bent to your own needs) that I honestly think that there is a
> tacit agreement among vendors not to challenge competitors' claims in
> court unless they are openly egregious, only to put up their own
> competing claims.  After all, no sane company WOULD actually lie, right
> -- they would have a testbed system on which they could run the
> comparisons listed right there in court and everybody knows it.  Whether
> the parameters, the compiler, the system architecture, the tests run
> etc. were carefully selected so your product wins is moot -- if it ain't
> a lie it ain't libel, and it is caveat emptor for the rest (and the rest
> is near universal practice -- show your best side, compare to their
> worst).
>
> > issues aside, it is not good for any business to be engulf in legal
> > squabbles.  We are in the process of clearing this with our legal
> > department at the moment.
> >
> > As I have noted in my previous e-mail, anyone who wants to get a hold
> > of the white papers are welcome to please send me an e-mail.
>
> As if your distributing them on a person by person basis is somehow less
> libelous?  Or so that you can ask me to sign an NDA so that your
> competitors never learn that you are libelling them?  I rather think
> that an NDA that was written to protect illegal activity be it libel or
> drug dealing or IP theft would not stand up in court.  Finally, product
> comparisons via publically available benchmarks of products that are
> openly for sale don't sound like trade secrets to me as I could easily
> duplicate the results at home (or not) and freely publish them.
>
> Your company's apparent desire to conceal this comes across remarkably
> poorly to the consumer.  It has the feel of "Hey, buddy, wanna buy a
> watch?  Come right down this alley so I can show you my watches where
> none of the bulls can see" compared to an open storefront with your
> watches on display to anyone, consumer or competitor.  This is simply my
> own viewpoint, of course.  I've simply never heard of a company
> shrinking away from making the statement "we are better than our
> competitors and here's why" as early and often as they possibly could.
> AMD routinely claims to be faster than Intel and vice versa, each has
> numbers that "prove" it -- for certain tests that just happen to be the
> tests that they tout in their claims, which they can easily back up.
> For all the rest of us humans, our mileage may vary and we know it, and
> so we mistrust BOTH claims and test the performance of our OWN programs
> on both platforms to see who wins.
>
> I'm certain that the same will prove true for your own product.  I don't
> care about your benchmarks except as a hook to "interest" me.  Perhaps
> they will convince me to get you to loan me access to your libraries etc
> to link them into my own code to see if MY code speeds up relative to
> the way I have it linked now, or relative to linking with a variety of
> libraries and compilers.  Then I can do a real price/performance
> comparison and decide if I'm better off buying your product (and buying
> fewer nodes) or using an open source solution that is free (and buying
> more nodes).  Which depends on the scaling properties of MY application,
> costs, and so forth, and cannot be predicted on the basis of ANY paper
> benchmark.
>
> Finally, don't assume that this audience is naive about benchmarking or
> algorithms, or at all gullible about performance numbers and vendor
> claims.  A lot of people on the list (such as Greg) almost certainly
> have far more experience with benchmarks than your development staff;
> some are likely involved in WRITING benchmarks.  If you want to be taken
> seriously, put up a full suite of benchmarks, by all means, and also
> carefully indicate how those benchmarks were run as people will be
> interested in duplicating them and irritated if they are unable to.
>
>    rgb
>
> --
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Thu Oct  9 10:57:21 2003
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Thu, 09 Oct 2003 09:57:21 -0500
Subject: building a RAID system
In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org>
References: <1065642419.9483.55.camel@qeldroma.cttc.org>
Message-ID: <3F857751.4090009@tamu.edu>

I've recently built a 2TB (well, a little less really) ATA RAID using a 
pair of HighPoint 374 controlers and 10 250-GB Maxtor 8 MB cache drives 
(plus a 60 GB drive for the system).

It's running as 2 1TB arrays, because of disparate applications, right 
now.

Initially, the drivers for RH9 were not available so we started with 
RH7.3 and all the updates; they're there now and  and allow cross-card 
arrays.  Down the pike we might re-install and span the controllers.

I've also recently done a 2-drive striped array supporting a meteorology 
data application with a lot of data acquisition and database work.  It's 
mounted to a number of other systems via NFS.  Uses a Promise 
Technologies TX2000 and a pair of 80 GB Maxtors.

Both RAID systems have worked very well.  I suspect the next one I build 
will incorporate Serial ATA instead of parallel.  I doubt I'll build 
another SCSI RAID for my applications.

Gerry Creager
Texas Mesonet
Texas A&M University

Daniel Fernandez wrote:
> Hi,
> 
> I would like to know some advice about what kind of technology apply
> into a RAID file server ( through NFS ) . We started choosing hardware
> RAID to reduce cpu usage.
> 
> We have two options , SCSI RAID and ATA RAID. The first would give the
> best results but on the other hand becomes really expensive so we have
> in mind two ATA RAID controllers:
> 
>                 Adaptec 2400A
> 		3Ware 6000/7000 series controllers
> 
> Any one of these has its strong and weak points, after seeing various
> benchmarks/comparisons/reviews these are the only candidates that
> deserve our attention.
> 
> The server has a dozen of client workstations connected through a
> switched 100Mbit LAN , all of these equipped with it's own OS and
> harddisk, all home directories will be stored under the main server,
> main workload (compilation and edition) would be done on the local
> machines tough, server only takes care of file sharing.
> 
> Also parallel MPI executions will be done between the clients.
> 
> Considering that not all the workstantions would be working full time
> and with cost in mind ? it's worth an ATA RAID solution ?
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 10:39:48 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 10:39:48 -0400 (EDT)
Subject: building a RAID system - yup
In-Reply-To: <1065707290.4708.28.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.44.0310091032570.2897-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Joseph Landman wrote:

> On Thu, 2003-10-09 at 07:26, Robert G. Brown wrote:
> 
> > users tapes of special areas or data on request.  The tape system is
> > expensive, but a tiny fraction of the cost of the loss of data due to
> > (say) a server room fire, or a monopole storm, or a lightning strike on
> > the primary room feed that fries all the servers to toast.
> 
> Monopole storm... (smile) I seem to remember (old bad and likely wrong
> memory) that Max Dresden had predicted one monopole per universe as a
> consequence of the standard model.  Not my area of (former) expertise,
> so reality may vary from my memory ...

Hell, there are more than that in California alone.  So far monopoles
have been discovered there at least twice; once on superconducting
niobium balls in a Milliken experiement (but they went away when the
balls were washed and never returned, go figure) and once in a
superconduction flux trap although the events MIGHT have been caused by
somebody flicking a light switch down the hall...:-)

Seriously, this is theory vs experiment, and as a theorist I firmly
defer to experiment.  Until we find an (isolated) monopole, they are
just a very attractive, compelling even, extension of Maxwell's
equations and related field theories that (as a "defect") help us
understand why certain quanties are quantized, or add a certain symmetry
to the theory that is otherwise broken.

However, it does amuse me to think of hard disks as being "experiments"
like the flux loop experiment to measure the existence of monopoles.  It
would be interesting to determine a "signature" of disk penetration by a
cosmic ray monopole and scan a small mountain of crashed disks for the
signature, if such a signature is in any way unique.  Such a mountain
represents a lot more event phase space than a single loop or set of
loops in a California laboratory.

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at yahoo.com  Thu Oct  9 12:08:20 2003
From: lathama at yahoo.com (Andrew Latham)
Date: Thu, 9 Oct 2003 09:08:20 -0700 (PDT)
Subject: Raid Deffinitions
Message-ID: <20031009160820.2217.qmail@web60304.mail.yahoo.com>

Discussing a client setup the other day a cohort and I came to a different
opinion on what each raid level does. Is there a guide/standard to define how
it should work. Also do any vendors stray from the beaten path and add there
own levels?

=====
Andrew Latham

Penguin loving, moralist agnostic.

LathamA.com - (lay-th-ham-eh)
lathama at lathama.com - lathama at yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Oct  9 14:24:22 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 9 Oct 2003 11:24:22 -0700
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
Message-ID: <20031009182422.GB1865@greglaptop.internal.keyresearch.com>

On Thu, Oct 09, 2003 at 10:04:20AM +0000, C J Kenneth Tan -- Heuchera Technologies wrote:

> Our white papers are not on the Web they contain performance data, and
> particularly, performance data comparing against our competitors.  It
> may expose us to libel legal issues.

Welcome to the Internet. In the US, that's not an issue, so we're used
to being able to get our performance data without having to ask a
human.  BTW, in the US, your lawyers would recommend that your "Up to
32X faster" claim would need a "results not typical" disclaimer.

> > I looked and didn't see a single performance claim there.
> 
> There is one on the front page!

Sorry, I should have said "didn't see a single credible performance
claim there". Bogus-looking claims do not help you sell to the HPC
market, either in the US or Europe.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dag at sonsorol.org  Thu Oct  9 11:42:53 2003
From: dag at sonsorol.org (chris dagdigian)
Date: Thu, 09 Oct 2003 11:42:53 -0400
Subject: 2004 Bioclusters Workshop 1st Announcement -- March 2004, Boston
 MA USA
Message-ID: <3F8581FD.3080404@sonsorol.org>

=======================================================================
              MEETING ANNOUNCEMENT / CALL FOR PRESENTERS
=======================================================================
                     BIOCLUSTERS 2004 Workshop
                          March 30, 2004
                 Hynes Convention Center, Boston MA USA
=======================================================================

    * Speakers Wanted - Please Distribute Where Appropriate *

  Organized by several members of the bioclusters at bioinformatics.org
  mailing list, the Bioclusters 2004 Workshop is a networking and
  educational forum for people involved in all aspects of cluster and
  grid computing within the life sciences.

  The motivation for organizers of this event was the cancellation of the
  O'Reilly Bioinformatics Technology Conference series and the general
  lack of forums for researchers and professionals involved with the
  applied use of high performance IT and distributed computing techniques
  in the life sciences.

  The primary focus of the workshop will be technical presentations from
  experienced IT professionals and scientific researchers discussing real
  world systems, solutions, use-cases and best practices.

  This event is being held onsite at the Hynes Convention Center on the
  first day of the larger 2004 Bio-IT World Conference+Expo. BioIT-World
  Magazine is generously providing space and logistical support for the
  meeting and workshop attendees will have access to the expo floor and
  keynote addresses. Registration & fees will be finalized in short
  order.

  Presentations will be broken down among a few general content areas:

  1.	Researcher, Application & End user Issues
  2.	Builder, Scaling & Integration Issues
  3.	Future Directions

  The organizing committee is actively soliciting presentation proposals
  from members of the life science and technical computing communities.
  Interested parties should contact the committee at bioclusters04 at open-
  bio.org.


  Bioclusters 2004 Workshop Committee Members

  J.W Bizzaro ? Bioinformatics Organization Inc.
  James Cuff  - MIT/Harvard Broad Institute
  Chris Dwan  - The University of Minnesota
  Chris Dagdigian ? Open Bioinformatics Foundation & BioTeam Inc.
  Joe Landman ? Scalable Informatics LLC

  The committee can be reached at: bioclusters04 at open-bio.org


  About the Bioclusters Mailing List Community

  The bioclusters at bioinformatics.org mailing list is a 600+ member forum
  for users, builders and programmers of distributed systems used in life
  science research and bioinformatics. For more information about the
  list including the public archives and subscription information please
  visit http://bioinformatics.org/mailman/listinfo/bioclusters


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Thu Oct  9 14:40:02 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Thu, 9 Oct 2003 11:40:02 -0700 (PDT)
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.04.10310091040490.4003-100000@12-207-199-254.client.attbi.com>

On Thu, 9 Oct 2003, Robert G. Brown wrote:
> It probably refers to burst delivery out of its 8 MB cache.  The actual
> sustained bps speed is a pure matter of N*2*\pi*\R*f/S, where N = number

A hard drive only reads from one head at a time.  It's not possible to align
every head with each other to such a degree that every track in a cylinder is
readable at once.  If you look at a given drive family of drives, each
different sized drive is the same basic hardware with more discs/heads.  For
instance Seagate's Cheetah 15K.3 family
(http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah15k.3.pdf) has the
exact same internal transfer rate (609-891 megabits/sec) for the 18 GB model
with 2 heads, the 36GB with 4 heads, and the 73GB with 8.

> read radius, and S is the linear length per bit.  This is an upper
> bound.  Similarly average latency (seek time) is something like 1/2f,
> the time the platter requires to move half a rotation.

The average latency is indeed 1/2 the rotational period.  For a 7200 RPM drive
it is 4.16 ms, for a 15k RPM drive it's 2 ms.  Seek time is something
completely different.  It's how long it takes the head to move from one track
to another.  It does not included the latency.  You might see track-to-track,
full stroke, and average seek times in a datasheet.

> I should also point out that since we've been using the RAIDs we have
> experienced multidisk failures that required restoring from backup on
> more than one occasion.  The book value probability for even one

I've had one multidisk failure in a RAID5 system.  It was after moving into a
new building, one array had three out of six disks fail to spin up.  Of course
I had anticipated this, and made a backup, to tape, just before the move. 
None of the tapes were damaged in transit.  I've had several single drive
failures.  I've never seen anyone with significant number of drive-years of
experience say they've never seen a drive fail.  And no manufacture has a
failure rate anywhere near 0%.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Thu Oct  9 13:47:43 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Thu, 9 Oct 2003 13:47:43 -0400 (EDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <20031009123045.7582.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.4.44.0310091344310.19503-100000@ra.thebes>

On Thu, 9 Oct 2003, Angel Rivera wrote:

> Alvin Oga writes: 
> 
> > 
> > On Thu, 9 Oct 2003, Trent Piepho wrote: 
> > 
> >> On Thu, 9 Oct 2003, Alvin Oga wrote:
> 
> > nobody swaps disks around ... unless one is using those 5.25" drive bay
> > thingies in which case ... thats a different ball game 
> 
> No quite true. We use Rare drives (one box) to move up to a TB of data 
> around w/o having to take the time to create tapes and then download them. 
> That takes a lot of time, even w/ LTOs. 

Jim Grey just recommends moving the whole computer:

http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=43

<quote>
JG It's a very convenient way of distributing data.

DP Are you sending them a whole PC?

JG Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, 
and seven 300-GB disks--all for about $3,000.
</quote>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From canon at nersc.gov  Thu Oct  9 14:30:46 2003
From: canon at nersc.gov (canon at nersc.gov)
Date: Thu, 09 Oct 2003 11:30:46 -0700
Subject: building a RAID system 
In-Reply-To: Message from Daniel Fernandez <daniel@labtie.mmt.upc.es> 
   of "Wed, 08 Oct 2003 21:46:59 +0200." <1065642419.9483.55.camel@qeldroma.cttc.org> 
Message-ID: <200310091830.h99IUkNr014912@pookie.nersc.gov>


Daniel,

We have around 50 3ware boxes with a total 
formated space of around 50 TB.  We run all of these in
HW raid mode.  I would avoid using software raid
if you plan to have more than a dozen or so clients.
Our experience is that while software raid works
great, it scales poorly.  This was very noticeable
when the server processors were PIII class.  It may
be less of an issue with newer processors, but I
would still recommend HW raid if the card supports
it.  Also, we like the 3ware cards because they have
been supported by linux for ages now.  Some of the
other cards have been a little dicey.

With our newest systems we've seen aggregate performance 
for a single server of around 70 MB/s and they appear to 
scale quite well (handle over 50 clients).  This last batch of
systems have 12 250 GB drives, a 12 port 3ware card,
dual Xeon, on-board gigE and cost less than $7k.

Also, the 3ware systems hot swap very well.  We make
use of it all the time.


--Shane


------------------------------------------------------------------------
Shane Canon                             voice: 510-486-6981
PSDF Project Lead                       fax:   510-486-7520
National Energy Research Scientific
  Computing Center
1 Cyclotron Road Mailstop 943-256
Berkeley, CA 94720                      canon at nersc.gov
------------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 17:20:25 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 14:20:25 -0700 (PDT)
Subject: Raid Deffinitions
In-Reply-To: <20031009160820.2217.qmail@web60304.mail.yahoo.com>
Message-ID: <Pine.LNX.3.96.1031009141924.4881C-100000@Maggie.Linux-Consulting.com>


On Thu, 9 Oct 2003, Andrew Latham wrote:

> Discussing a client setup the other day a cohort and I came to a different
> opinion on what each raid level does. Is there a guide/standard to define how
> it should work. Also do any vendors stray from the beaten path and add there
> own levels?


http://www.1U-Raid5.net/Differences
	- definitions, and pretty pictures too

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gsheppar at gpc.edu  Thu Oct  9 15:43:57 2003
From: gsheppar at gpc.edu (Gene Sheppard)
Date: Thu, 09 Oct 2003 15:43:57 -0400
Subject: Inquiry small system S/W
In-Reply-To: <GNENLCJOFGONOBMCHGFLAEOICCAA.gsheppar@gpc.edu>
Message-ID: <GNENLCJOFGONOBMCHGFLIEOICCAA.gsheppar@gpc.edu>

 We here are Georgia Perimeter College are planning on putting together a 5
or 6 node Beowulf system.

 My question:
 Is there any software for a system like this?
 What applications have been tested on a small system?

 If there are none, what is the smallest system out there?

 Thank you for your help.

 GEne

 ==============================================
 Gene Sheppard
 Georgia Perimeter College
 Computer Science
 1000 University Center Lane
 Lawrenceville, GA 30043
 678-407-5243


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Thu Oct  9 17:04:30 2003
From: rodmur at maybe.org (Dale Harris)
Date: Thu, 9 Oct 2003 14:04:30 -0700
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.4.44.0310091344310.19503-100000@ra.thebes>
References: <20031009123045.7582.qmail@houston.wolf.com> <Pine.LNX.4.44.0310091344310.19503-100000@ra.thebes>
Message-ID: <20031009210430.GD11051@maybe.org>

On Thu, Oct 09, 2003 at 01:47:43PM -0400, Michael T. Prinkey elucidated:
> > 
> > No quite true. We use Rare drives (one box) to move up to a TB of data 
> > around w/o having to take the time to create tapes and then download them. 
> > That takes a lot of time, even w/ LTOs. 
> 
> Jim Grey just recommends moving the whole computer:
> 
> http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=43
> 
> <quote>
> JG It's a very convenient way of distributing data.
> 
> DP Are you sending them a whole PC?
> 
> JG Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, 
> and seven 300-GB disks--all for about $3,000.
> </quote>


Kind of reminds me of a favorite fortune cookie quotes:

"Never underestimate the bandwidth of a station wagon full of tapes
hurling down the highway" -- Andrew S. Tannenbaum


Dale
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel at labtie.mmt.upc.es  Thu Oct  9 14:50:17 2003
From: daniel at labtie.mmt.upc.es (Daniel Fernandez)
Date: Thu, 09 Oct 2003 20:50:17 +0200
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca>
References: 	 <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <1065725416.1136.59.camel@qeldroma.cttc.org>

Hi again,

Thanks for the advice, also it has started an interesting thread.

On Thu, 2003-10-09 at 01:39, Mark Hahn wrote:
> > I would like to know some advice about what kind of technology apply
> > into a RAID file server ( through NFS ) . We started choosing hardware
> > RAID to reduce cpu usage.
> 
> that's unfortunate, since the main way HW raid saves CPU usage is 
> by running slower ;)
> 
I cannot get the point here, the dedicated processor should take all
transfer commands and offload the CPU why it would run slower ? In some
tests a raid system for a single workstation ( no networking ) it's a
bit useless (slower) unless you want to transfer really big files. In a
networked environment there could be a massive number of I/O commands so
should be critical.

> seriously, CPU usage is NOT a problem with any normal HW raid,
> simply because a modern CPU and memory system is *so* much better
> suited to performing raid5 opterations than the piddly little
> controller in a HW raid card.  the master/fileserver for my 
> cluster is fairly mundane (dual-xeon, i7500, dual PC1600), and 
> it can *easily* saturate its gigabit connection.  after all, ram
> runs at around 2 GB/s sustained, and the CPU can checksum at 3 GB/s!
> 
Agreed, our server would not be doing anything more than managing NFS
so, there is power to spare, where talking about an Athlon XP2600+
processor. But, a really good  Parallel ATA 100/133 controller is
needed, and 4 channels at least... 4 HDs in 2 master/slave channels
reduces drastically performance
? any controller recommended ?

But must be noted that HW RAID offers better response time.

HW raid offers hotswap capability and offload our work instead of
maintaining a SW raid solution ...we'll see ;)

> concern for PCI congestion is a much more serious issue.
> 
We're limited at 32 bit PCI, we cannot get around this unless spend on a
highly priced PCI 64 mainboard.

> finally, why do you care at all?  are you fileserving through
> a fast (>300 MB/s) network like quadrics/myrinet/IB?  most people
> limp along at a measly gigabit, which even a two-ide-disk raid0
> can saturate...
> 
> > The server has a dozen of client workstations connected through a
> > switched 100Mbit LAN , all of these equipped with it's own OS and
> 
> jeez, since your limited to 10 MB/s, you could do raid5 on a 486
> and still saturate the net.  seriously, CPU consumption is NOT an issue
> at 10 MB/s.

There would not be noticeable difference between SW/HW mode here. The
clients would be doing write bursts of 2-5Mb per second so there must
not be any problem.

> > machines tough, server only takes care of file sharing.
> 
> so excess cycles on the fileserver will be wasted unless used.
> 
> > Considering that not all the workstantions would be working full time
> > and with cost in mind ? it's worth an ATA RAID solution ?
> 
> you should buy a single promise sata150 tx4 and four big sata disks
> (7200 RPM 3-year models, please).
> 
> regards, mark hahn.
> 

In fact we have two choices:

	- Use an spare existing ( relatively obsolete ) computer and couple it
with a HW RAID card.

        - Spend on a fast CPU computer and a good but cheap Parallel ATA
controller.


> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 17:17:34 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 17:17:34 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310091557240.3275-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Daniel Fernandez wrote:

> > that's unfortunate, since the main way HW raid saves CPU usage is 
> > by running slower ;)
> > 
> I cannot get the point here, the dedicated processor should take all
> transfer commands and offload the CPU why it would run slower ? In some
> tests a raid system for a single workstation ( no networking ) it's a
> bit useless (slower) unless you want to transfer really big files. In a
> networked environment there could be a massive number of I/O commands so
> should be critical.

Key word:  "should"

Benchmark results: "often does not"

Your best bet is to try both and run your own benchmarks and do your own
cost/benefit analysis.  When you say things like "better response time"
one is fairly naturally driven to ask "does the difference matter", for
example.  Given that we run over 100 workstations from a SW RAID with
nearly instantaneous (entirely satisfactory) performance, you'd have to
really be hammering it to perceive a difference.

> In fact we have two choices:
> 
> 	- Use an spare existing ( relatively obsolete ) computer and couple it
> with a HW RAID card.
> 
>         - Spend on a fast CPU computer and a good but cheap Parallel ATA
> controller.

Or a cheap computer + PATA or SATA controller.  Even a cheap computer
has 2+ GHz CPUs and hundreds of MB of RAM these days. Spend more on
what you put the disks in, power, cooling.

If it is an old/obsolete computer, will it have enough power, enough
cooling?  Regardless, the disk cost itself will dominate your costs.

   rgb

> 
> 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 17:06:44 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 14:06:44 -0700 (PDT)
Subject: building a RAID system - 8 drives - drive-net
In-Reply-To: <20031009123045.7582.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.3.96.1031009140519.4881A-100000@Maggie.Linux-Consulting.com>


hi ya angel

On Thu, 9 Oct 2003, Angel Rivera wrote:

> Alvin Oga writes: 
> 
> > nobody swaps disks around ... unless one is using those 5.25" drive bay
> > thingies in which case ... thats a different ball game 
> 
> No quite true. We use Rare drives (one box) to move up to a TB of data 
> around w/o having to take the time to create tapes and then download them. 
> That takes a lot of time, even w/ LTOs. 

yes.. guess it makes sense to move disks around for moving tb of data
like floppy-net or sneaker-net
	- done that ( moving disks around ) myself once in a while 
	for a quickie fix

c ya
alvin


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Thu Oct  9 16:35:04 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Thu, 9 Oct 2003 13:35:04 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.04.10310091322230.4364-100000@12-207-199-254.client.attbi.com>

On Thu, 9 Oct 2003, Daniel Fernandez wrote:
> On Thu, 2003-10-09 at 01:39, Mark Hahn wrote:
> > > I would like to know some advice about what kind of technology apply
> > > into a RAID file server ( through NFS ) . We started choosing hardware
> > > RAID to reduce cpu usage.
> > 
> > that's unfortunate, since the main way HW raid saves CPU usage is 
> > by running slower ;)
> > 
> I cannot get the point here, the dedicated processor should take all
> transfer commands and offload the CPU why it would run slower ? In some

Easy, said dedicated processor and memory is quite a bit slower than the main
CPU and memory.  If you look at thoughput in MB/sec, the latest linux software
RAID is usually much faster than hardware raid implimentations.

Usually CPU usage is (stupidly) reported as just a % used during a benchmark. 
If you transfer fewer megabytes in second, obviously the number of CPU cycles
used in that second go down as well.  If CPU usage is correctly reported in
units of % per MB/sec, then you get a real measure of hardware efficiency.

> needed, and 4 channels at least... 4 HDs in 2 master/slave channels
> reduces drastically performance
> ? any controller recommended ?

It seems that most good 4-12 channel (NOT drive, channel!) IDE cards ARE
hardware raid controllers.  Lots of people use the 3ware RAID cards in JBOD
mode with software raid, because their isn't a cheaper non-hardware raid card
comparable to something like the 3ware 7508-8 or 7508-12.  I know about
cheaper 2 and 4 channel non-raid cards, but they're 32/33 PCI and not
comparable to the 3ware.


> > concern for PCI congestion is a much more serious issue.
> > 
> We're limited at 32 bit PCI, we cannot get around this unless spend on a
> highly priced PCI 64 mainboard.

AMD 760MPX and Intel E7501 motherboards have high speed 64/66 PCI and PCI-X
for the E7501.  They're not that expensive really.  An additional $100-$200 at
most over a single PCI 32/33 motherboard.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rokrau at yahoo.com  Thu Oct  9 18:02:34 2003
From: rokrau at yahoo.com (Roland Krause)
Date: Thu, 9 Oct 2003 15:02:34 -0700 (PDT)
Subject: Experience with Omni anyone?
Message-ID: <20031009220234.64852.qmail@web40010.mail.yahoo.com>

Folks,
I came across the Omni OpenMP compiler lately and I was wondering
whether anyone here has used it and what the experience was. 

I.o.w., is it "industrial strength"? 

I know of and use Portland and Intel compilers but I am also curious.

Roland


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at lathama.com  Thu Oct  9 17:52:22 2003
From: lathama at lathama.com (Andrew Latham)
Date: Thu, 9 Oct 2003 14:52:22 -0700 (PDT)
Subject: Raid Deffinitions
In-Reply-To: <Pine.LNX.4.44.0310091727510.2276-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20031009215222.68022.qmail@web60307.mail.yahoo.com>

thanks.

I know that all the different raid levels are here for a reason and raid5 is
great but what are the benefits of the rest?

--- Mark Hahn <hahn at physics.mcmaster.ca> wrote:
> > Discussing a client setup the other day a cohort and I came to a different
> > opinion on what each raid level does. Is there a guide/standard to define
> how
> > it should work. Also do any vendors stray from the beaten path and add
> there
> > own levels?
> 
> sure they do.  IMO the only important levels are:
> 
> raid0 - striping
> raid1 - mirroring
> raid5 - rotating parity-based array
> 
> vendors who make a big deal of obvious extensions like raid 10 
> (mirrored stripes or vice versa) are immediately hung up on by me...
> 


=====
Andrew Latham

Penguin loving, moralist agnostic.

LathamA.com - (lay-th-ham-eh)
lathama at lathama.com - lathama at yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rocky at atipa.com  Thu Oct  9 16:24:36 2003
From: rocky at atipa.com (Rocky McGaugh)
Date: Thu, 9 Oct 2003 15:24:36 -0500 (CDT)
Subject: Inquiry small system S/W
In-Reply-To: <GNENLCJOFGONOBMCHGFLIEOICCAA.gsheppar@gpc.edu>
Message-ID: <Pine.LNX.4.44.0310091516010.21540-100000@rocky>


On Thu, 9 Oct 2003, Gene Sheppard wrote:
>  We here are Georgia Perimeter College are planning on putting together a 5
>  or 6 node Beowulf system.
> 
>  My question:
>  Is there any software for a system like this?
>  What applications have been tested on a small system?
> 
>  If there are none, what is the smallest system out there?
> 
>  Thank you for your help.
> 
>  GEne

System or application software?

For system software, any of the beowulf kits will work. 

	http://warewulf-cluster.org/
	http://www.scyld.com/
	http://oscar.sourceforge.net/
	http://rocks.npaci.edu/
	http://clic.mandrakesoft.com/index-en.html

and others.

Most applications will run just fine on 5 or 6 nodes. To start with, i'd 
get HPL and PMB running to ensure everything is working fine. Then you can
look at other applications to see what you might actually be able to 
benefit from.
	
-- 
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://67.8450073/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Thu Oct  9 16:44:28 2003
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Thu, 9 Oct 2003 16:44:28 -0400 (EDT)
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.4.44.0310091032570.2897-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310091640140.25626-100000@boltzmann.basement-supercomputing.com>

On Thu, 9 Oct 2003, Robert G. Brown wrote:

> Hell, there are more than that in California alone.  So far monopoles

Forgot to mention the California "megapoll" which just occurred on 
Tuesday.

Sorry, I could not help myself.

Doug

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Oct  9 18:08:51 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 9 Oct 2003 15:08:51 -0700 (PDT)
Subject: Raid Deffinitions
In-Reply-To: <20031009215222.68022.qmail@web60307.mail.yahoo.com>
Message-ID: <Pine.LNX.4.44.0310091456000.16205-100000@twin.uoregon.edu>

On Thu, 9 Oct 2003, Andrew Latham wrote:

> thanks.
> 
> I know that all the different raid levels are here for a reason and raid5 is
> great but what are the benefits of the rest?

0 is fast (interleaved chunks) but provides no redundancy.

1 is a a 1 + 1 mirror... can be faster on reads but is generally slower on 
writes depending on your controller/implementation...

0 + 1 or 1 + 0  striped mirror or mirrored stripe. less space efficient 
than raid 5 but faster in general. can survive multiple disk failures so 
long as both disks containing the same information don't fail at once.
 
> --- Mark Hahn <hahn at physics.mcmaster.ca> wrote:
> > > Discussing a client setup the other day a cohort and I came to a different
> > > opinion on what each raid level does. Is there a guide/standard to define
> > how
> > > it should work. Also do any vendors stray from the beaten path and add
> > there
> > > own levels?
> > 
> > sure they do.  IMO the only important levels are:
> > 
> > raid0 - striping
> > raid1 - mirroring
> > raid5 - rotating parity-based array
> > 
> > vendors who make a big deal of obvious extensions like raid 10 
> > (mirrored stripes or vice versa) are immediately hung up on by me...
> > 
> 
> 
> =====
> Andrew Latham
> 
> Penguin loving, moralist agnostic.
> 
> LathamA.com - (lay-th-ham-eh)
> lathama at lathama.com - lathama at yahoo.com
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Thu Oct  9 19:19:16 2003
From: angel at wolf.com (Angel Rivera)
Date: Thu, 09 Oct 2003 23:19:16 GMT
Subject: building a RAID system - 8 drives - drive-net
In-Reply-To: <Pine.LNX.3.96.1031009140519.4881A-100000@Maggie.Linux-Consulting.com> 
References: <Pine.LNX.3.96.1031009140519.4881A-100000@Maggie.Linux-Consulting.com>
Message-ID: <20031009231916.21008.qmail@houston.wolf.com>

Alvin Oga writes: 

> 
> hi ya angel 
> 
> On Thu, 9 Oct 2003, Angel Rivera wrote: 
> 
>> Alvin Oga writes:  
>> 
>> > nobody swaps disks around ... unless one is using those 5.25" drive bay
>> > thingies in which case ... thats a different ball game  
>> 
>> No quite true. We use Rare drives (one box) to move up to a TB of data 
>> around w/o having to take the time to create tapes and then download them. 
>> That takes a lot of time, even w/ LTOs. 
> 
> yes.. guess it makes sense to move disks around for moving tb of data
> like floppy-net or sneaker-net
> 	- done that ( moving disks around ) myself once in a while 
> 	for a quickie fix

When you have that much data, it is easier and faster to load 8 drives into 
a box than tons of tapes.  take out the old drives and place the new ones 
in, mount it, export it and voila-it is on-line.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 19:36:29 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 16:36:29 -0700 (PDT)
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <20031009231916.21008.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com>


hi ya

On Thu, 9 Oct 2003, Angel Rivera wrote:

..
> > yes.. guess it makes sense to move disks around for moving tb of data
> > like floppy-net or sneaker-net
> > 	- done that ( moving disks around ) myself once in a while 
> > 	for a quickie fix
> 
> When you have that much data, it is easier and faster to load 8 drives into 
> a box than tons of tapes.  take out the old drives and place the new ones 
> in, mount it, export it and voila-it is on-line.

yes and a "bunch of disks" (raid5) survives the loss of one dropped disk
and is relatively secure from prying eyes ....
	- ceo gets one disk
	- cfo gets one disk
	- hr gets one disk
	- eng gets one disk
	- sys admin gets one disk
 	( combine all[-1] disks together to recreate the (raid5) TB data )

	- a single (raid5) disk by itself is basically worthless

tape backups are insecure ...
	- lose a tape ( bad tape, lost tape ) and and all its data is lost
	- anybody can read the entire contents of the full backup

	( one could tar up one disk per tape, instead of tar'ing the
	( whole raid5 subsystem, to provide the
	( same functionality as a raid5 offsite disk backup

c ya
alvin

and hopefully .. the old disks are not MFM drives..
	or ata-133 in a new sata system :-)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Thu Oct  9 19:51:05 2003
From: angel at wolf.com (Angel Rivera)
Date: Thu, 09 Oct 2003 23:51:05 GMT
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com> 
References: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com>
Message-ID: <20031009235105.23420.qmail@houston.wolf.com>

Alvin Oga writes: 

>> yes and a "bunch of disks" (raid5) survives the loss of one dropped disk
> and is relatively secure from prying eyes ....

Well, let's see. We can backup the data to tapes or to disks-disks are 
faster.  From the time the data is on the disk, 1/2-1.0 hours to get to us, 
a few minutes to install them and voila you are on-line. 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 21:31:13 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 21:31:13 -0400 (EDT)
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Alvin Oga wrote:

> yes and a "bunch of disks" (raid5) survives the loss of one dropped disk
> and is relatively secure from prying eyes ....
> 	- ceo gets one disk
> 	- cfo gets one disk
> 	- hr gets one disk
> 	- eng gets one disk
> 	- sys admin gets one disk
>  	( combine all[-1] disks together to recreate the (raid5) TB data )
> 
> 	- a single (raid5) disk by itself is basically worthless

Secure from prying eyes, maybe (as in casually secure).  "Secure" as in
your secret plans for world domination or the details of your
flourishing cocaine business are safe from the feds, not at all, unless
the information is encrypted.

Each disk has about one fourth of the information.  English is about 3:1
compressible (really more; this is using simple symbolic compression).
A good cryptanalyst could probably recover "most" of what is on the
disks from any one disk, depending on what kind of data is there.
Numbers, possibly not, but written communications, quite possibly.
Especially if it falls in the hands of somebody who really wants it and
has LOTS of good cryptanalysts.

> tape backups are insecure ...
> 	- lose a tape ( bad tape, lost tape ) and and all its data is lost
> 	- anybody can read the entire contents of the full backup

Unless it is encrypted.  Without strong encryption there is no
data-level security.  With it there is.  Maybe.  Depending on what is
"strong" to you and what is strong to, say, the NSA, whether your
systems and network is secure, depending on whether you have dual
isolation power inside a faraday cage with dobermans at the door.

However, there can be as much or as little physical security for the
tape as you care to put there.  Tape in a locked safe, tape in an
armored car.

Disks are far more fragile than tapes -- drop a disk one meter onto the
ground and chances are quite good that it is toast and will at best cost
hundreds of dollars and a trip to specialized facilities to remount and
mostly recover.  Drop a tape one meter onto the ground and chance are
quite good that it is perfectly fine, and even if it isn't (because e.g.
the case cracked) ordinary humans can generally remount the tape in a
new case without needing a clean room and special tools.  Tapes are
cheap -- you can afford to send almost three tapes compared to one disk.

I get the feeling that you just don't like tapes, Alvin...;-)

    rgb

> 
> 	( one could tar up one disk per tape, instead of tar'ing the
> 	( whole raid5 subsystem, to provide the
> 	( same functionality as a raid5 offsite disk backup
> 
> c ya
> alvin
> 
> and hopefully .. the old disks are not MFM drives..
> 	or ata-133 in a new sata system :-)
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From smuelas at mecanica.upm.es  Fri Oct 10 01:20:06 2003
From: smuelas at mecanica.upm.es (smuelas)
Date: Fri, 10 Oct 2003 07:20:06 +0200
Subject: Inquiry small system S/W
In-Reply-To: <GNENLCJOFGONOBMCHGFLIEOICCAA.gsheppar@gpc.edu>
References: <GNENLCJOFGONOBMCHGFLAEOICCAA.gsheppar@gpc.edu>
	<GNENLCJOFGONOBMCHGFLIEOICCAA.gsheppar@gpc.edu>
Message-ID: <20031010072006.54dfd8a4.smuelas@mecanica.upm.es>

I have put together a 8 node beowulf cluster to my greatest satisfaction and results.
You don't need nothing special; if it is beowulf it must be Linux. If you use, for example, RedHat 9, what I do, you have everything needed in the standard 3 CD's distribution, that you can download at no cost.
Apart from that, and in my particular case, I use fortran90 and the compiler from Intel, also free for non-comercial use.
Perhaps, the only special hardware to buy is a simple, 8 nodes switch for your ethernet connections.
Then, what is really important is to learn to make your software really able to use the cluster. So, some time to study MPI or similar, and work, work, work...  :-)
Before being an 8 nodes cluster, mine has been 4-nodes, then 6-nodes and at 8 I stopped. But there is no difference in the work to do. Just the possibilities and speed increase.

Good luck!!


On Thu, 09 Oct 2003 15:43:57 -0400
Gene Sheppard <gsheppar at gpc.edu> wrote:

>  We here are Georgia Perimeter College are planning on putting together a 5
> or 6 node Beowulf system.
> 
>  My question:
>  Is there any software for a system like this?
>  What applications have been tested on a small system?
> 
>  If there are none, what is the smallest system out there?
> 
>  Thank you for your help.
> 
>  GEne
> 
>  ==============================================
>  Gene Sheppard
>  Georgia Perimeter College
>  Computer Science
>  1000 University Center Lane
>  Lawrenceville, GA 30043
>  678-407-5243
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Santiago Muelas
E.T.S. Ingenieros de Caminos, (U.P.M)    Tf.: (34) 91 336 66 59
e-mail: smuelas at mecanica.upm.es          Fax: (34) 91 336 67 61
www: http://w3.mecanica.upm.es/~smuelas
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at math.ucdavis.edu  Fri Oct 10 01:43:57 2003
From: bill at math.ucdavis.edu (Bill Broadley)
Date: Thu, 9 Oct 2003 22:43:57 -0700
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
References: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com> <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
Message-ID: <20031010054357.GB13480@sphere.math.ucdavis.edu>


On the hardware vs software RAID thread.  A friend needed a few TB and
bought a high end raid card (several $k), multiple channels, enclosure,
and some 10's of 73GB drives for somewhere in the $50k-$100k neighborhood.

He needed the capacity and a minumum of 50MB/sec sequential write
performance (on large sequential writes).  He didn't get it.  Call #1 to
dell resulted in well it's your fault, it's our top of the line, it should
be plenty fast, bleah, bleah, bleah.  Call #2 lead to an escalation to
someone with more of a clue, tune paramater X, tune Y, try a different
raid setup, swap out X, etc.  After more testing without helping call #3
was escalated again someone fairly clued answered.  The conversation went
along the lines of what, yeah, it's dead slow.  Yeah most people only
care about the reliability.  Oh performance?  We use linux + software
raid on all the similar hardware we use internally at Dell.

So the expensive controller was returned, and 39160's were used in it's
place (dual channel U160) and performance went up by a factor of 4 or
so.  

In my personal benchmarking on a 2 year old machine with 15 drives
I managed 200-320 MB/sec sustained (large sequential read or write),
depending on filesystem and strip size.  I've not witnessed any "scaling
problems", I've been quite impressed with linux software raid under
all conditions and have had it run significantly faster then several
expensive raid cards I've tried over the years.  Surviving hotswap, over
500 day uptimes, and substantial performance advantages seem to be common.

Anyone have numbers comparing hardware and software raid using bonnie++
for random access or maybe postmark (netapp's diskbenchmark)

Failures so far:
* 3ware 6800 (awful, evil, slow, unreliable, terrible tech support)
* quad channel scsi card from Digital/storage works, rather slow, then started
  crashing   
* More recently (last 6 months) the top of the line dell raid card (PERSC?)
* A few random others

One alternative solution I figured I'd mention is the Apple 2.5 TB array
for $10-$11k isnt' a bad solution for a mostly turnkey, hotswap, redundant
powersupply setup with a warranty.  Dual 2 Gigabit Fiber channels does make
it easier to scale to 10's of TB's then some other solutions.  I managed
70 MB/sec read/write to a 1/2 Xraid (on a single FC).  Of course there
are cheaper solutions.

Oh, I also wanted to mention one gotcha for the DIY methods.  I've had
I think 4 machines now with 8-15 disks, and dual 400 watt powersupplies
or 3x225 watt (n+1) boot just fine for 6 months, but start complaining
at boot due to to high power consumption.  This is of course especially
bad with EIDEs since they all spin up at boot (SCSI can usually be spun
up one at a time).  I suspect a slight decrease in lubrication and or
degradation in the powersupplies which were possibly running above 100%
to be the cause.

In any case great thread, I've yet to see a performance or functionality
benefit from hardware raid.

-- 
Bill Broadley
Mathematics
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Fri Oct 10 03:24:15 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Fri, 10 Oct 2003 09:24:15 +0200
Subject: building a RAID system
In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org>
References: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca> <1065725416.1136.59.camel@qeldroma.cttc.org>
Message-ID: <20031010072415.GI17432@unthought.net>

On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote:
> Hi again,
...

Others have already answered your other questions, I'll try to take one
that went unanswered (as far as I can see).

...
> 
> But must be noted that HW RAID offers better response time.

In a HW RAID setup you *add* an extra layer: the dedicated CPU on the
RAID card.  Remember, this CPU also runs software - calling it
'hardware RAID' in itself is misleading, it could just as well be called
'offloaded SW RAID'.

The problem with offloading is, that while it made great sense in the
days of 1 MHz CPUs, it really doesn't make a noticable difference in the
load on your typical N GHz processor.

However, you added a layer with your offloaded-RAID. You added one extra
CPU in the 'chain of command' - and an inferior CPU at that. That layer
means latency even in the most expensive cards you can imagine (and
bottleneck in cheap cards).  No matter how you look at it, as long as
the RAID code in the kernel is fairly simple and efficient (which it
was, last I looked), then the extra layers needed to run the PCI
commands thru the CPU and then to the actual IDE/SCSI controller *will*
incur latency.  And unless you pick a good controller, it may even be
your bottleneck.

Honestly I don't know how much latency is added - it's been years since
I toyed with offload-RAID last  ;)

I don't mean to be handwaving and spreading FUD - I'm just trying to say
that the people who advocate SW RAID here are not necessarily smoking
crack - there are very good reasons why SW RAID will outperform HW RAID
in many scenarios.

> 
> HW raid offers hotswap capability and offload our work instead of
> maintaining a SW raid solution ...we'll see ;)

That, is probably the best reason I know of for choosing hardware RAID.
And depending on who you will have administering your system, it can be
a very important difference.

There are certainly scenarios where you will be willing to trade a lot
of performance for a blinking LED marking the failed disk - I am not
kidding.

Cheers,

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Fri Oct 10 02:58:37 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Fri, 10 Oct 2003 08:58:37 +0200
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
References: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com> <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
Message-ID: <20031010065837.GH17432@unthought.net>

On Thu, Oct 09, 2003 at 09:31:13PM -0400, Robert G. Brown wrote:
...
> Each disk has about one fourth of the information.  English is about 3:1
> compressible (really more; this is using simple symbolic compression).
> A good cryptanalyst could probably recover "most" of what is on the
> disks from any one disk, depending on what kind of data is there.

You overlook the fact that data on a RAID-5 is distributed in 'chunks'
of sizes around 4k-128k (depending...)

So you would get the entire first 'Introduction to evil empire plans',
but the entire 'Subverting existing banana government' chapter may be on
one of the disks that you are missing.

> Numbers, possibly not, but written communications, quite possibly.
> Especially if it falls in the hands of somebody who really wants it and
> has LOTS of good cryptanalysts.

You'd probably need historians and psychologists rather than
cryptographers - but of course the point remains the same.  Just
nit-picking here.

> 
> > tape backups are insecure ...
> > 	- lose a tape ( bad tape, lost tape ) and and all its data is lost
> > 	- anybody can read the entire contents of the full backup
> 
> Unless it is encrypted.  Without strong encryption there is no
> data-level security.  With it there is.  Maybe.  Depending on what is
> "strong" to you and what is strong to, say, the NSA, whether your
> systems and network is secure, depending on whether you have dual
> isolation power inside a faraday cage with dobermans at the door.

I'm just thinking of distributing two tapes for each disk - one with
200G of random numbers, the other with 200G of data XOR'ed with the data
from the first tape.

Enter the one-time pad - unbreakable encryption (unless you get a hold
of both tapes of course).

You'd need to make sure you have good random numbers - as an extra
measure of safety one should probably wear a tinfoil hat while working
with the tapes, just in case...   ;)

Of course, if any tape is lost, everything is lost. But one bad KB
on either tape will only result in one bad KB total.

> 
> However, there can be as much or as little physical security for the
> tape as you care to put there.  Tape in a locked safe, tape in an
> armored car.

No no no no no!  Think big!

Think: cobalt bomb in own backyard - threaten anyone who steals your
data, that you'll make the planet inhabitable for a few hundred
decades unless they hand back your tapes.   ;)

(I'm drafting up 'Introduction to evil empire plans' soon by the way  ;)

...
> I get the feeling that you just don't like tapes, Alvin...;-)

Where did you get that idea?  ;)

Cheers,

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From pesch at attglobal.net  Fri Oct 10 13:34:41 2003
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Fri, 10 Oct 2003 10:34:41 -0700
Subject: building a RAID system
References: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca> <1065725416.1136.59.camel@qeldroma.cttc.org> <20031010072415.GI17432@unthought.net>
Message-ID: <3F86EDB1.6264A405@attglobal.net>

You write:

"The problem with offloading is, that while it made great sense in the
days of 1 MHz CPUs, it really doesn't make a noticable difference in the
load on your typical N GHz processor."

Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the
practical limit of SW RAID?

Paul

Jakob Oestergaard wrote:

> On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote:
> > Hi again,
> ...
>
> Others have already answered your other questions, I'll try to take one
> that went unanswered (as far as I can see).
>
> ...
> >
> > But must be noted that HW RAID offers better response time.
>
> In a HW RAID setup you *add* an extra layer: the dedicated CPU on the
> RAID card.  Remember, this CPU also runs software - calling it
> 'hardware RAID' in itself is misleading, it could just as well be called
> 'offloaded SW RAID'.
>
> The problem with offloading is, that while it made great sense in the
> days of 1 MHz CPUs, it really doesn't make a noticable difference in the
> load on your typical N GHz processor.
>
> However, you added a layer with your offloaded-RAID. You added one extra
> CPU in the 'chain of command' - and an inferior CPU at that. That layer
> means latency even in the most expensive cards you can imagine (and
> bottleneck in cheap cards).  No matter how you look at it, as long as
> the RAID code in the kernel is fairly simple and efficient (which it
> was, last I looked), then the extra layers needed to run the PCI
> commands thru the CPU and then to the actual IDE/SCSI controller *will*
> incur latency.  And unless you pick a good controller, it may even be
> your bottleneck.
>
> Honestly I don't know how much latency is added - it's been years since
> I toyed with offload-RAID last  ;)
>
> I don't mean to be handwaving and spreading FUD - I'm just trying to say
> that the people who advocate SW RAID here are not necessarily smoking
> crack - there are very good reasons why SW RAID will outperform HW RAID
> in many scenarios.
>
> >
> > HW raid offers hotswap capability and offload our work instead of
> > maintaining a SW raid solution ...we'll see ;)
>
> That, is probably the best reason I know of for choosing hardware RAID.
> And depending on who you will have administering your system, it can be
> a very important difference.
>
> There are certainly scenarios where you will be willing to trade a lot
> of performance for a blinking LED marking the failed disk - I am not
> kidding.
>
> Cheers,
>
> --
> ................................................................
> :   jakob at unthought.net   : And I see the elder races,         :
> :.........................: putrid forms of man                :
> :   Jakob ?stergaard      : See him rise and claim the earth,  :
> :        OZ9ABN           : his downfall is at hand.           :
> :.........................:............{Konkhra}...............:
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Fri Oct 10 07:12:48 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Fri, 10 Oct 2003 04:12:48 -0700 (PDT)
Subject: building a RAID system - 8 drives - drive-net - tapes - preferences
In-Reply-To: <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.3.96.1031010040605.12902A-100000@Maggie.Linux-Consulting.com>


hi ya robert

On Thu, 9 Oct 2003, Robert G. Brown wrote:

> > tape backups are insecure ...
> > 	- lose a tape ( bad tape, lost tape ) and and all its data is lost
> > 	- anybody can read the entire contents of the full backup
> 
> Unless it is encrypted.  Without strong encryption there is no
> data-level security.  With it there is.  Maybe.  Depending on what is
> "strong" to you and what is strong to, say, the NSA, whether your
> systems and network is secure, depending on whether you have dual
> isolation power inside a faraday cage with dobermans at the door.

just trying to protect the tapes ( backups ) against the casual
"oops look what i found"  and they go and look at the HR records
or the  salary records or employee reviews etc..etc..

not trying to protect the tapes against the [cr/h]ackers ( different
ball game )  and even not protecting against the spies of nsa/kgb etc
either  ( whole new ballgame for those types of backup issues )

 
> However, there can be as much or as little physical security for the
> tape as you care to put there.  Tape in a locked safe, tape in an
> armored car.

dont forget to lock the car/safe too :-)
and log who goes in and out of the "safe" area :-)

> I get the feeling that you just don't like tapes, Alvin...;-)

not my first choice for backups .. even offsite backups...

but if "management" takes out the $$$ to do tape backups... so it shall be
done ...
	ideally, everything works ...  but unfortunately, tapes
	are highly prone to people's "oops i forgot to change it
	yesterday"  or the weekly catridge

have fun
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Fri Oct 10 07:56:39 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Fri, 10 Oct 2003 13:56:39 +0200
Subject: building a RAID system
In-Reply-To: <3F86EDB1.6264A405@attglobal.net>
References: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca> <1065725416.1136.59.camel@qeldroma.cttc.org> <20031010072415.GI17432@unthought.net> <3F86EDB1.6264A405@attglobal.net>
Message-ID: <20031010115639.GN17432@unthought.net>

On Fri, Oct 10, 2003 at 10:34:41AM -0700, pesch at attglobal.net wrote:
> You write:
> 
> "The problem with offloading is, that while it made great sense in the
> days of 1 MHz CPUs, it really doesn't make a noticable difference in the
> load on your typical N GHz processor."
> 
> Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the
> practical limit of SW RAID?

In this forum, I run small storage only. Around 150G for the most busy
server that I have.

Linux has problems with >2TB devices as far as I know, so that sort of
puts an upper limit to whatever you can do with SW/HW RAID there.

In between, it's just one order of magnitude  :)

More seriously - the SW RAID code is extremely simple, and it performs
two different tasks:
*) Reconstruction - which has time complexity T(n) for n bytes of data
*) Read/write - which has time complexity T(1) for n bytes of data

In other words - the more data you have, the longer a resync is going to
take - HW or SW makes no difference (except for a factor, which tends to
be rediculously large on cheap HW RAID cards but acceptable on more
expensive ones).  Reads and writes are not affected by the amount of
data, in the SW RAID layer (and hopefully not in the HW RAID layer
either).

The scalability limits you will run into are:
*) Number of disks you can attach to your box (HW RAID may hide this
   from you and may thus buy you some scalability there)
*) Filesystem limits/performance problems. HW/SW RAID makes no difference
*) Device size limits. HW/SW RAID makes no difference
*) Reconstruction time after unclean shutdown - SW performs much better
   than crap/cheap HW solutions, but I don't know about the expensive
   ones.

There are others on this list with much larger servers and less antique
hardware - guys, speak up - where does it begin to hurt?    :)

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Fri Oct 10 07:59:22 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Fri, 10 Oct 2003 04:59:22 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <3F86EDB1.6264A405@attglobal.net>
Message-ID: <Pine.LNX.4.44.0310100449040.20691-100000@twin.uoregon.edu>

On Fri, 10 Oct 2003 pesch at attglobal.net wrote:

> You write:
> 
> "The problem with offloading is, that while it made great sense in the
> days of 1 MHz CPUs, it really doesn't make a noticable difference in the
> load on your typical N GHz processor."
> 
> Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the
> practical limit of SW RAID?

size-wise software raid (I'm talking specifically about linux here) scales
far better than most hardware raid controllers (san subsystems are another
kettle of fish entirely), among other reasons because you can spread the
disks out between multiple controllers.

> Paul
> 
> Jakob Oestergaard wrote:
> 
> > On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote:
> > > Hi again,
> > ...
> >
> > Others have already answered your other questions, I'll try to take one
> > that went unanswered (as far as I can see).
> >
> > ...
> > >
> > > But must be noted that HW RAID offers better response time.
> >
> > In a HW RAID setup you *add* an extra layer: the dedicated CPU on the
> > RAID card.  Remember, this CPU also runs software - calling it
> > 'hardware RAID' in itself is misleading, it could just as well be called
> > 'offloaded SW RAID'.
> >
> > The problem with offloading is, that while it made great sense in the
> > days of 1 MHz CPUs, it really doesn't make a noticable difference in the
> > load on your typical N GHz processor.
> >
> > However, you added a layer with your offloaded-RAID. You added one extra
> > CPU in the 'chain of command' - and an inferior CPU at that. That layer
> > means latency even in the most expensive cards you can imagine (and
> > bottleneck in cheap cards).  No matter how you look at it, as long as
> > the RAID code in the kernel is fairly simple and efficient (which it
> > was, last I looked), then the extra layers needed to run the PCI
> > commands thru the CPU and then to the actual IDE/SCSI controller *will*
> > incur latency.  And unless you pick a good controller, it may even be
> > your bottleneck.
> >
> > Honestly I don't know how much latency is added - it's been years since
> > I toyed with offload-RAID last  ;)
> >
> > I don't mean to be handwaving and spreading FUD - I'm just trying to say
> > that the people who advocate SW RAID here are not necessarily smoking
> > crack - there are very good reasons why SW RAID will outperform HW RAID
> > in many scenarios.
> >
> > >
> > > HW raid offers hotswap capability and offload our work instead of
> > > maintaining a SW raid solution ...we'll see ;)
> >
> > That, is probably the best reason I know of for choosing hardware RAID.
> > And depending on who you will have administering your system, it can be
> > a very important difference.
> >
> > There are certainly scenarios where you will be willing to trade a lot
> > of performance for a blinking LED marking the failed disk - I am not
> > kidding.
> >
> > Cheers,
> >
> > --
> > ................................................................
> > :   jakob at unthought.net   : And I see the elder races,         :
> > :.........................: putrid forms of man                :
> > :   Jakob ?stergaard      : See him rise and claim the earth,  :
> > :        OZ9ABN           : his downfall is at hand.           :
> > :.........................:............{Konkhra}...............:
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 10 09:35:35 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 10 Oct 2003 09:35:35 -0400 (EDT)
Subject: building a RAID system - 8 drives - drive-net - tapes - preferences
In-Reply-To: <Pine.LNX.3.96.1031010040605.12902A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310100718260.3542-100000@lilith.rgb.private.net>

On Fri, 10 Oct 2003, Alvin Oga wrote:

> 
> hi ya robert
> 
> On Thu, 9 Oct 2003, Robert G. Brown wrote:
> 
> > > tape backups are insecure ...
> > > 	- lose a tape ( bad tape, lost tape ) and and all its data is lost
> > > 	- anybody can read the entire contents of the full backup
> > 
> > Unless it is encrypted.  Without strong encryption there is no
> > data-level security.  With it there is.  Maybe.  Depending on what is
> > "strong" to you and what is strong to, say, the NSA, whether your
> > systems and network is secure, depending on whether you have dual
> > isolation power inside a faraday cage with dobermans at the door.
> 
> just trying to protect the tapes ( backups ) against the casual
> "oops look what i found"  and they go and look at the HR records
> or the  salary records or employee reviews etc..etc..
> 
> not trying to protect the tapes against the [cr/h]ackers ( different
> ball game )  and even not protecting against the spies of nsa/kgb etc
> either  ( whole new ballgame for those types of backup issues )

Hmmm, this is morphing offtopic, but data security is a sufficiently
universal problem that I'll chance one more round.  Pardon me while I
light up my crack pipe here...:-7... so I can babble properly. Ah, ya,
NOW I'm awake...:-)

The point is that you cannot do this with precisely HR records or salary
records or employee reviews.  If anybody gets hold of the data (casually
or not) be it on tape or disk or the network while it is in transit then
you are liable to the extent that you failed to take adequate measures
to ensure the data's security.  By the numbers:

  1) Tapes even more than disk are most unlikely to be viewed casually.
Disks have street value, tapes (really) don't.  There is a high entry
investment required before one can even view the contents of an e.g. LTO
tape, plus a fair degree of expertise.  A disk can be pulled from a box
and remounted in any system by a pimple-faced kid with a screwdriver and
an attitude.  The net can be snooped by anyone, with a surprisingly low
entry level of expertise (or rather a high level expertise encapsulated
in openly distributed rootkits and exploits so anybody can do it).

  2) All three are clearly vulnerable to someone (e.g. a private
investigator, an insurance company, a competitor, an identity thief, the
government) seeking to snoop and violate the privacy of the individuals
who have entrusted their data to you.  HR records contain SSNs, bank
numbers (to facilitate direct deposit), names addresses, health records,
employment records, CVs and/or transcripts, disciplinary records: they
are basically everything you never wanted the world to know in one
compact and efficient package.  Federal and state laws regulate the
handling of this data in quite rigorous ways.

  3) An IT officer who was responsible for holding sensitive data secure
according to law and who failed to employ reasonable measures for
maintaining it secure and who subsequently had it stolen (violating his
trust) would be publically eviscerated.  Career ruined, bankrupted by
suits, tormented by guilt, possibly even put in jail, driven to suicide
kind of stuff in the worst case.  The company that employed that officer
would be right behind -- suits, clean sweep firings of the entire
management team in the chain of responsibility, plunging stock prices,
public recriminations and humiliation.  EVEN IF reasonable measures were
employed there would likely be trouble and recrimination, but careers
might survive, damages would be limited, jail might be avoided, and one
wouldn't feel so irresponsibly guilty.

  4) Strong encryption of the data to protect it in transit is an
obvious, inexpensive, knee-jerk sort of reasonable measure (again,
independent of the means of transport presuming only that the data
passes out of your fortress keep where you keep the cobalt bomb and
dobermans and make all of your staff wear tinfoil caps while looking at
the data).  It might even be mandated by law for certain forms of data
-- the federal government just passed a sweeping right to privacy
measure for health data, for example, that may well have highly explicit
provisions for data transport and security.

  5) Therefore... only someone with a death wish would send sensitive,
valuable data for which they are responsible for security, through any
transport layer not under their direct control and deemed secure of its
own right, between secure sites, without encrypting it first (and
otherwise complying with relevant federal and state laws, if any apply
to the case at hand).

Properly paranoid ITvolken would likely consider ALL transport layers
including their own internal LAN not to be secure and would use
ssh/ssl/vpn bidirectional encryption of all network traffic period.  If
it weren't for the fact that there is less motivation to encrypt the
data on the physically secured actual server disks (so the only means of
access are through dobermans and locked doors or by cracking the servers
from outside, in which case you've already lost the game) one would
extend the data encryption to the database itself, and I'm sure that
there are sites that don't trust even their own staff or the moral
character of their dobermans that so do.  I don't want to THINK about
what one has to endure to obtain access to e.g. NSA or certain military
datasites -- probably body cavity searches in and out, shaved heads and
paper suits, and metal detectors, that sort of thing...:-)

> > However, there can be as much or as little physical security for the
> > tape as you care to put there.  Tape in a locked safe, tape in an
> > armored car.
> 
> dont forget to lock the car/safe too :-)
> and log who goes in and out of the "safe" area :-)

Ya, precisely.  It is only partly a joke, you see.  If my Duke HR or my
medical records turn up on the street, with somebody purporting to be me
cleaning out my bank account and maxing my visa, with my applications
for health insurance denied because they've learned about my heavy
drinking problem and all the crack that I smoke (I don't know where
Jakob got the idea that I don't sit here fuming away all day:-) and the
consequent liver failure and bouts of wild-eyed babbling (like this one,
strangely enough:-), my plans for a fusion generator that you can build
in your garage turning up being patented by Exxon and so forth Duke had
DAMN WELL better be able to show my attorney and a court logs of who had
access to this data, proofs that it was never left lying around in cars
(locked or unlocked), proofs that it was transmitted in encrypted form,
etc.  Otherwise I'm detonating the cobalt bomb in my backyard and Duke
will be a radioactive wasteland for a few kiloyears...(it is only a
couple of miles away).

This is the kind of thing that gives IT security officers ulcers.
Duke's current SO is actually a former engineering school beowulfer (and
good friend of mine) whose voice is scattered through the list archives
(Chris Cramer).  As a former 'wulfer (and EE), he is damn smart and
computer-expert (and handsome and witty, just like everybody else on
this list:-).  However, he sweats bullets because Duke is a huge
organization with lots of architectures scattered all over campus --
Windows here (any flavor), Macs there, Suns, Linux boxen, there are
likely godforsaken nooks on campus that still have IBM mainframes and
VAXes.  Sensitive data is routinely served across the campus backbone
and beyond (e.g. I can see my advisees' current transcripts where I sit
at this very moment).  Even with SSL, this data is vulnerable in
fragments to any successful exploit on any client that belongs to any
partially privileged person and that runs a vulnerable operating system.

Hmmm, you say -- wasn't there recently an RPC exploit on a certain very
common OS that permitted crackers to put anything they wanted including
snoops on all cracked clients (not to mention a steady stream of lesser
but equally troublesome invasions of the viral sort)?  Didn't this cost
institutions all over the world thousands of FTE hours to put right
before somebody actually used it to steal access to valuable data?  Why
yes, I believe that there was!  I believe it did!  However, as one who
got slammed (blush) a year ago on an unfortunately unpatched linux box
and who has seen countless exploits succeed against all flavors of
networked OS over many years, I avoid feeling too cocky about it.  

Nevertheless, Chris just keeps suckin' down the prozac and phillips
cocktails dealing with crap like this and knowing that it is his butt on
line should a malevolent attack succeed in compromising Duke's mountains
of sensitive data (gulp) being served by minions whose primary systems
expertise was developed back when knowing cobol was a part of the job
description (gulp) running on servers with, um "interesting" base
architectures (gulp)...

> > I get the feeling that you just don't like tapes, Alvin...;-)
> 
> not my first choice for backups .. even offsite backups...
> 
> but if "management" takes out the $$$ to do tape backups... so it shall be
> done ...
> 	ideally, everything works ...  but unfortunately, tapes
> 	are highly prone to people's "oops i forgot to change it
> 	yesterday"  or the weekly catridge

They are indeed (as the example I gave of a recent small-scale disaster
at Duke clearly shows). A site run by a wise IT human would use a pretty
rigorous protocol to regulate the process so that even if you have e.g.
student labor doing the tape changes there is strict accountability and
people checking the people checking the people who do the job, and so
that tapes are randomly pulled every month and checked to be sure that
the data is actually getting on the tapes in retrievable form.

You can bet that Duke has such a process in place now, if they didn't
before, although Universities tend to be a loose amalgamation of
quasi-independent fiefdoms that accept control and adopt security
measures for the common good and hire competent systems administrators
and develop shared protocols for ensuring data integrity about as often
and as easily as one would expect.  (Sound of Chris in the background
crunching another mylantin and washing it down with P&P:-) So in place
or not, the risk remains.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 10 09:34:25 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 10 Oct 2003 09:34:25 -0400 (EDT)
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <20031010065837.GH17432@unthought.net>
Message-ID: <Pine.LNX.4.44.0310100839340.3542-100000@lilith.rgb.private.net>

On Fri, 10 Oct 2003, Jakob Oestergaard wrote:

> On Thu, Oct 09, 2003 at 09:31:13PM -0400, Robert G. Brown wrote:
> ...
> > Each disk has about one fourth of the information.  English is about 3:1
> > compressible (really more; this is using simple symbolic compression).
> > A good cryptanalyst could probably recover "most" of what is on the
> > disks from any one disk, depending on what kind of data is there.
> 
> You overlook the fact that data on a RAID-5 is distributed in 'chunks'
> of sizes around 4k-128k (depending...)

Overlook, hell.  I'm using my usual strategy of feigning knowledge with
the complete faith that my true state of ignorance will be publically
displayed to the entire internet.  This humiliation, in turn, will
eventually cause such mental anguish that I'll be able to claim mental
disability and retire to tending potted plants on a disability check for
the rest of my life...

You probably noticed that I used the same strategy quite recently
regarding things like factors of N in disk read speed estimates, certain
components in disk latency, and oh, too many other things to mention.
Pardon me if I babble on a bit this morning, but my lawy... erm,
"psychiatrist" insists that I need fairly clear evidence of disability
to get away with this.

I personally find that smoking crack cocaine induces a pleasant tendency
to babble nonsense.  And there is no place to babble for the record like
the beowulf list archives, I always say...:-)

> So you would get the entire first 'Introduction to evil empire plans',
> but the entire 'Subverting existing banana government' chapter may be on
> one of the disks that you are missing.
...
> I'm just thinking of distributing two tapes for each disk - one with
> 200G of random numbers, the other with 200G of data XOR'ed with the data
> from the first tape.

Or just one tape, xor'd with 200G worth of random numbers generated from
a cryptographically strong generator via a relatively short key that you
can (as you note) send or carry separately and which is smaller, easier
to secure, and less susceptible to degradation or loss than a second
tape.  It's cheaper that way, and even if you use two tapes people are
going to try cracking the master tape by trying to guess the
key+algorithm you almost certainly used to generate it (see below), so
the xor is no stronger than the key+algorithm combination.;-)

> Enter the one-time pad - unbreakable encryption (unless you get a hold
> of both tapes of course).

Or determine the method and key you used for (oxymoronically) generating
200 Gigarands (which is NOT going to be a hardware generator, I don't
think, unless you are a very patient person or build/buy a quantum
generator or the like -- entropy based things like /dev/random are too
slow, and even quantum generators I've looked into are barely fast
enough:-).

> You'd need to make sure you have good random numbers - as an extra

Ah, that's the rub.  "Good random numbers" isn't quite an oxymoron.
Why, there is even a government standard measure for cryptographic
strength in the US (which many/most generators fail, by the way).
Entropy based generators tend to be very slow -- order of 10-100 kbps
depending on the source of entropy, last I looked.  Quantum generators
IIRC that rely on e.g. single photon transmission events at
half-silvered mirrors have to run at light intensities where single
photon events are discernible (rare, that is) and STILL have to wait for
an autocorrelation time or ten before opening a window for the next
event because even quantum events like this have an associated
correlation time due to the existence of extended correlated states in
the radiating system.  Photon emission from a single atom itself is
antibunched, for example, as after an emission the system requires time
for the single radiating atom to regain a degree of excitation
sufficient to enable re-emission.  I believe that they can achieve more
like 1 mbps of randomness or at least unpredictability.  As you'd need
1.6x10^12 bits to encode your tape, you'd have to wait around 1.6x10^6
seconds to generate the key.  That is, hmmm, between two and three week,
twenty to thirty weeks with an entropy generator, unless you used a
beowulf of entropy generators to shorten the time:-).

Not exactly in the category of "generate a one-time pad while I go have
a cup of coffee".

Using a truly oxymoronic but much faster (and cryptographically strong)
random number generator, e.g.  the mt19937 from the GSL one can generate
a respectable ballpark of 16 MBps (note B, not b) of random bytes and be
done in a mere four hours.  Alas, mt19937 is seeded from a long int and
the seed probably doesn't have enough bits to be secure against a brute
force attack, so one would likely have to fall back on one of the actual
algorithms that permit the use of long keys (1024 bits or even more).

> No no no no no!  Think big!
> 
> Think: cobalt bomb in own backyard - threaten anyone who steals your
> data, that you'll make the planet inhabitable for a few hundred
> decades unless they hand back your tapes.   ;)
> 
> (I'm drafting up 'Introduction to evil empire plans' soon by the way  ;)

Hmm, I'll have to mail you some of my lithium pills, Jakob.  Your own
prescription obviously ran out...:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From msnitzer at lnxi.com  Fri Oct 10 11:28:03 2003
From: msnitzer at lnxi.com (Mike Snitzer)
Date: Fri, 10 Oct 2003 09:28:03 -0600
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>; from cjtan@optimanumerics.com on Thu, Oct 09, 2003 at 10:04:20AM +0000
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
Message-ID: <20031010092803.A5136@lnxi.com>

On Thu, Oct 09 2003 at 04:04,
C J Kenneth Tan -- Heuchera Technologies <cjtan at optimanumerics.com> wrote:

> Greg,
> 
> > Is it a 100x100 matrix LU decomposition? Well, no, because Intel's
> > MKL and the free ATLAS library run at a respectable % of peak.
> 
> Our benchmarks concentrate on xGEQRF, xGESVD, xGETRF, xGETRS, xGESV,
> xPOTRF, xPOSV, xPPTRF, xGEEV, extending to xGETRI, and xTRTRI.
> 
> Have you tried DPPSV or DPOSV on Itanium, for example?  I would be
> interested in the percentage of peak that you achieve with MKL and
> ATLAS, for up to 10000x10000 matrices.
> 
> ATLAS does not have full LAPACK implementation.

This gets ATLAS to provide its faster LAPACK routines to a full LAPACK
library: 
http://math-atlas.sourceforge.net/errata.html#completelp

Mike

-- 
Mike Snitzer                           msnitzer at lnxi.com
Linux Networx                          http://www.lnxi.com 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Patrick.Begou at hmg.inpg.fr  Fri Oct 10 13:55:43 2003
From: Patrick.Begou at hmg.inpg.fr (Patrick Begou)
Date: Fri, 10 Oct 2003 19:55:43 +0200
Subject: PVM errors at startup
Message-ID: <3F86F29F.8A37AC5B@hmg.inpg.fr>

Hi 

I'm new on this list so, just 2 lines about me:
A small linux beowulf cluster (10 nodes) for computational fluids
dynamics in
south-est of France (National Polytechnique Institute from Grenoble) .

I've just updated my cluster (from AMD1500+/ Eth100BT to P4 2.8G +
Gigabit ethernet) and I've updated my system to Red-Hat 7.3, Kernel
2.4.20-20-7. The current version of pvm is pvm-3.4.4-2 from the RedHat
7.3. The previous system was RH7.1.

Since this update I'm unable to start PVM from a node to another (with
the add command).
The console hang for several tenth of seconds then says OK.
The pvmd3 is started on the remote node but the conf command do not show
the additionnal node and I get these errors in the /tmp/pvml.xx file:

[t80040000] 10/10 15:58:31 craya.hmg.inpg.fr (xxx.xxx.xxx.xxx:32772)
LINUX 3.4.4
[t80040000] 10/10 15:58:31 ready Fri Oct 10 15:58:31 2003
[t80040000] 10/10 16:01:46 netoutput() timed out sending to craya02
after 14, 190.000000
[t80040000] 10/10 16:01:46  hd_dump() ref 1 t 0x80000 n "craya02" a ""
ar "LINUX" dsig 0x408841
[t80040000] 10/10 16:01:46            lo "" so "" dx "" ep "" bx "" wd
"" sp 1000
[t80040000] 10/10 16:01:46            sa 192.168.81.2:32770 mtu 4080 f
0x0 e 0 txq 1
[t80040000] 10/10 16:01:46            tx 2 rx 1 rtt 1.000000 id "(null)"


rsh and rexec are working (from master to nodes, from nodes to master
and from nodes to nodes). The transfert speed is near 600Mbits/s on the
network (binary ftp on /dev/null)

variables are set:
PVM_ARCH=LINUX
PVM_RSH=/usr/bin/rsh
PVM_DPATH=/usr/local/pvm3/lib/LINUX/pvmd3
PVM_ROOT=/usr/local/pvm3


I've tried so manythings since thes last 3 days:

- trying to compile install pvm3.4.4.tgz from sources file
- uninstall iptables, ipchains and iplock.
- remove /etc/security (to test this with root authority)
- added .rhosts and hosts.equiv file
- on the master eth0 is 100Mbits toward internet and eth1 is GB towards
the nodes.
I've tried the oposite config: eth0 become GB and eth1 100BT.

Always the same problem!

The cluster is down and I do not know where looking for a solution
now....

If some one could help me solving this problem

Thanks for your help

Patrick
-- 
===============================================================
|  Equipe M.O.S.T.         | http://most.hmg.inpg.fr          |
|  Patrick BEGOU           |       ------------               |
|  LEGI                    | mailto:Patrick.Begou at hmg.inpg.fr |
|  BP 53 X                 | Tel 04 76 82 51 35               |
|  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71               |
===============================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Fri Oct 10 21:53:34 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 10 Oct 2003 21:53:34 -0400
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <20031010054357.GB13480@sphere.math.ucdavis.edu>
References: 	 <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com>
	 <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
	 <20031010054357.GB13480@sphere.math.ucdavis.edu>
Message-ID: <1065837212.18644.0.camel@QUIGLEY.LINIAC.UPENN.EDU>

On Fri, 2003-10-10 at 01:43, Bill Broadley wrote:
> On the hardware vs software RAID thread.  A friend needed a few TB and
> bought a high end raid card (several $k), multiple channels, enclosure,
> and some 10's of 73GB drives for somewhere in the $50k-$100k neighborhood.
> 
> He needed the capacity and a minumum of 50MB/sec sequential write
> performance (on large sequential writes).  He didn't get it.  Call #1 to
> dell resulted in well it's your fault, it's our top of the line, it should
> be plenty fast, bleah, bleah, bleah.  Call #2 lead to an escalation to
> someone with more of a clue, tune paramater X, tune Y, try a different
> raid setup, swap out X, etc.  After more testing without helping call #3
> was escalated again someone fairly clued answered.  The conversation went
> along the lines of what, yeah, it's dead slow.  Yeah most people only
> care about the reliability.  Oh performance?  We use linux + software
> raid on all the similar hardware we use internally at Dell.
> 
> So the expensive controller was returned, and 39160's were used in it's
> place (dual channel U160) and performance went up by a factor of 4 or
> so.  

Can you give more concrete pointers to the hardware that they ended up
using ? -- specifically the enclosure.

Thanks!
Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Fri Oct 10 13:55:14 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Fri, 10 Oct 2003 17:55:14 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <20031010092803.A5136@lnxi.com>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
 <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
 <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
 <20031010092803.A5136@lnxi.com>
Message-ID: <Pine.LNX.4.56.0310101749460.27811@krylov.OptimaNumerics.com>

Mike,

> > Have you tried DPPSV or DPOSV on Itanium, for example?  I would be
> > interested in the percentage of peak that you achieve with MKL and
> > ATLAS, for up to 10000x10000 matrices.
> >
> > ATLAS does not have full LAPACK implementation.
>
> This gets ATLAS to provide its faster LAPACK routines to a full LAPACK
> library:
> http://math-atlas.sourceforge.net/errata.html#completelp

Inserting the LU factorization code from ATLAS to publicly available
LAPACK will only get you faster LU code in the rest of the publicly
available LAPACK library.  You will not gain from QR factorization
code, Cholesky factorization code, etc..


Ken
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------
This e-mail (and any attachments) is confidential and privileged.  It
is intended only for the addressee(s) stated above.  If you are not an
addressee, please accept my apologies and please do not use,
disseminate, disclose, copy, publish or distribute information in this
e-mail nor take any action through knowledge of its contents: to do so
is strictly prohibited and may be unlawful.  Please inform me that
this e-mail has gone astray, and delete this e-mail from your system.
Thank you for your co-operation.
-----------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Sat Oct 11 13:01:17 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Sat, 11 Oct 2003 13:01:17 -0400 (EDT)
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310101749460.27811@krylov.OptimaNumerics.com>
Message-ID: <Pine.LNX.4.44.0310111300460.10307-100000@coffee.psychology.mcmaster.ca>

> Inserting the LU factorization code from ATLAS to publicly available
> LAPACK will only get you faster LU code in the rest of the publicly
> available LAPACK library.  You will not gain from QR factorization
> code, Cholesky factorization code, etc..

oh, sure, but LU is the only important one because of top500 ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Sat Oct 11 16:16:12 2003
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Sat, 11 Oct 2003 15:16:12 -0500
Subject: Help in rsh
In-Reply-To: <LAW12-F11DXeBNKYnvQ000098e2@hotmail.com>; from diego_naruto@hotmail.com on Sat, Oct 11, 2003 at 07:14:13PM +0000
References: <LAW12-F11DXeBNKYnvQ000098e2@hotmail.com>
Message-ID: <20031011151612.A22568@mikee.ath.cx>

On Sat, 11 Oct 2003, diego lisboa wrote:

> Hi,
> I?m having problems with a cluster that i?ve had mount here, it?s a small 
> cluster with 3 machines, i already have instaled NIS and NFS and it?s 
> working very well, with red hat 9.0. When i use LAM 6.5.8 rpm it works 
> beutiful, but i need to install XMPI that needs LAM6.5.8.tar.gz (compiled) 
> and with trilliun, when i install on master it works, but on slaves i have a 
> problem with rsh, and hboot doens?t find "squema LAM" or something like 
> that. Someboy can help me?
> Thanks

Try something more simple first. What happens when you do

$ rsh -l USER HOST uptime

does that work?

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From diego_naruto at hotmail.com  Sat Oct 11 15:14:13 2003
From: diego_naruto at hotmail.com (diego lisboa)
Date: Sat, 11 Oct 2003 19:14:13 +0000
Subject: Help in rsh
Message-ID: <LAW12-F11DXeBNKYnvQ000098e2@hotmail.com>

Hi,
I?m having problems with a cluster that i?ve had mount here, it?s a small 
cluster with 3 machines, i already have instaled NIS and NFS and it?s 
working very well, with red hat 9.0. When i use LAM 6.5.8 rpm it works 
beutiful, but i need to install XMPI that needs LAM6.5.8.tar.gz (compiled) 
and with trilliun, when i install on master it works, but on slaves i have a 
problem with rsh, and hboot doens?t find "squema LAM" or something like 
that. Someboy can help me?
Thanks

_________________________________________________________________
MSN Hotmail, o maior webmail do Brasil.  http://www.hotmail.com

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Sat Oct 11 19:10:29 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Sat, 11 Oct 2003 16:10:29 -0700 (PDT)
Subject: building a RAID system - 8 drives - drive-net - tapes - preferences
In-Reply-To: <Pine.LNX.4.44.0310100718260.3542-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.3.96.1031011160255.4595A-100000@Maggie.Linux-Consulting.com>


hi ya robert

On Fri, 10 Oct 2003, Robert G. Brown wrote:

> > not trying to protect the tapes against the [cr/h]ackers ( different
> > ball game )  and even not protecting against the spies of nsa/kgb etc
> > either  ( whole new ballgame for those types of backup issues )
> 
> Hmmm, this is morphing offtopic, but data security is a sufficiently
> universal problem that I'll chance one more round.  Pardon me while I
> light up my crack pipe here...:-7... so I can babble properly. Ah, ya,
> NOW I'm awake...:-)

humm .. gimme some of that :-)

> The point is that you cannot do this with precisely HR records or salary
> records or employee reviews.  If anybody gets hold of the data (casually
> or not) be it on tape or disk or the network while it is in transit then
> you are liable to the extent that you failed to take adequate measures
> to ensure the data's security.  By the numbers:

security of clusters vs security of normal compute environments
and normal users from home and/or w/ laptops requires varying
degreee of security policies

- from looking at the various incoming sven virus (MS update virus stuff)
- about 75% of the incoming junk is coming from (mis-managed) clusters


80% of the security issues will be due to internal folks and not
the outsiders..
	and i'd hate to be the one responsible for security on
	an university network where there are tons of bright young
	and ambitious kids looking for a "trophy"
 
my security rules, assume the hacker is sitting in the firewall .. 
w/ root passwds .. now protect your data is my model ... 
	- if they have a keyboard sniffer installed .. game over ..
	( there'd be no need to guess what the pass phrase was )

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From victor_ms at brturbo.com  Sun Oct 12 11:27:23 2003
From: victor_ms at brturbo.com (Victor Lima)
Date: Sun, 12 Oct 2003 12:27:23 -0300
Subject: Benchmarks
Message-ID: <3F8972DB.6080802@brturbo.com>

Hi All.
I'm new on list.
Well I have a small linux clusters with 18 P4 2.8 Ghz with FastEthernet 
100Mbits
I need some benchmarks softwares for Latency, Thoughtput on Ethernet, etc.
Ate.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at bellsouth.net  Sun Oct 12 19:10:07 2003
From: laytonjb at bellsouth.net (Jeffrey B. Layton)
Date: Sun, 12 Oct 2003 19:10:07 -0400
Subject: Benchmarks
In-Reply-To: <3F8972DB.6080802@brturbo.com>
References: <3F8972DB.6080802@brturbo.com>
Message-ID: <3F89DF4F.1070500@bellsouth.net>


I'm suprised no one has jumped on this yet. There are several
packages for testing basic network performance from one node
to another. My personal favorite is netpipe:

http://www.scl.ameslab.gov/netpipe/

The other one is netperf:

http://www.netperf.org/netperf/NetperfPage.html

The web pages are pretty good about explaining things.

Good Luck!

Jeff


> Hi All.
> I'm new on list.
> Well I have a small linux clusters with 18 P4 2.8 Ghz with 
> FastEthernet 100Mbits
> I need some benchmarks softwares for Latency, Thoughtput on Ethernet, 
> etc.
> Ate.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Mon Oct 13 03:34:33 2003
From: john.hearns at clustervision.com (John Hearns)
Date: Mon, 13 Oct 2003 09:34:33 +0200 (CEST)
Subject: Benchmarks
In-Reply-To: <3F8972DB.6080802@brturbo.com>
Message-ID: <Pine.LNX.4.44.0310130933470.25773-100000@druifje.clustervision.com>

On Sun, 12 Oct 2003, Victor Lima wrote:

> Hi All.
> I'm new on list.
> Well I have a small linux clusters with 18 P4 2.8 Ghz with FastEthernet 
> 100Mbits
> I need some benchmarks softwares for Latency, Thoughtput on Ethernet, etc.

Have a look at Pallas http://www.pallas.com/e/products/pmb/  


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Mon Oct 13 09:38:36 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Mon, 13 Oct 2003 15:38:36 +0200
Subject: Intel and GNU C++ compilers
Message-ID: <20031013133836.GA1083@sgirmn.pluri.ucm.es>

Hello:

I just wanna thank everybody for the responses to my last question about
Intel compiler, I tried both 'gcc' and 'icc', and got the following results
for one of our work files containing 10^6 steps of calculation:

**************************
*** gcc version 2.95.4 ***
**************************
flags                bin-size   elapsed-time
-----                --------   ------------
none                 9.5 KB     311 sec
"-O3"                8.7 KB     192 sec
"-O3 -ffast-math"    8.7 KB     165 sec
********************************************

***********************
*** icc version 7.1 ***
***********************
flags                                    bin-size   elapsed-time
-----                                    --------   ------------
none                                     597 KB     100 sec
"-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0"   563 KB      89 sec
****************************************************************

the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions
respectively, I guess that using a newer 'gcc', capable of '-march=pentium4'
and SSE2 extensions would improve 'gcc' results.

I am running on a Dual Xeon 2.4 Ghz machine, with 2Gb of RAM. I use
Debian Woody with a 2.4.22 kernel compiled by myself. HyperThreading
is disabled at the BIOS level.

The test were run on one processor only.

Thanks,

Jose M. Perez.
Madrid, Spain.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Mon Oct 13 12:04:47 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Mon, 13 Oct 2003 12:04:47 -0400 (EDT)
Subject: Intel and GNU C++ compilers
In-Reply-To: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
Message-ID: <Pine.LNX.4.44.0310131158390.10307-100000@coffee.psychology.mcmaster.ca>

> *** gcc version 2.95.4 ***

that's god-aweful ancient.

> none                 9.5 KB     311 sec
> "-O3"                8.7 KB     192 sec
> "-O3 -ffast-math"    8.7 KB     165 sec

-fomit-frame-pointer usually helps, sometimes noticably,
since x86 is so short of registers.  -O3 is often not 
better than -O2 or -Os, mainly because of interactions 
between unrolling, Intel's microscopic L1's, and the 
difficulty of scheduling onto a tiny reg set...

I'd be surprised if 3.3 or 3.4 (pre-release) didn't perform
noticably better.

> flags                                    bin-size   elapsed-time
> -----                                    --------   ------------
> none                                     597 KB     100 sec
> "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0"   563 KB      89 sec

isn't -tpp2 redundant if you have -xW?

> the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions
> respectively, I guess that using a newer 'gcc', capable of '-march=pentium4'
> and SSE2 extensions would improve 'gcc' results.

yes.  '-march=pentium4 -fpmath=sse' seems to do it.  gcc doesn't have 
an auto-vectorizer yet, unfortunately.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From indigoneptune at yahoo.com  Mon Oct 13 13:37:47 2003
From: indigoneptune at yahoo.com (stanley george)
Date: Mon, 13 Oct 2003 10:37:47 -0700 (PDT)
Subject: benchmarks for performance
Message-ID: <20031013173747.37343.qmail@web14912.mail.yahoo.com>

Hi,
I have a cluster of 8 P-III machines running redhat 8.
I am trying to measure combined performance in MFLOPS.


I have tried using linpakd and 1000d. It gives me an
error with 'Make.inc' file while compiling. How do I
get rid of this?

Which are the other bechmarking sotwares that I could
use?

Thank you very much

Stanley George

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Mon Oct 13 12:22:17 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Mon, 13 Oct 2003 18:22:17 +0200
Subject: Intel and GNU C++ compilers
In-Reply-To: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
References: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
Message-ID: <200310131822.17717.joachim@ccrl-nece.de>

Jos? M. P?rez S?nchez:
> I just wanna thank everybody for the responses to my last question about
> Intel compiler, I tried both 'gcc' and 'icc', and got the following results
> for one of our work files containing 10^6 steps of calculation:

Jos?,

thanks for the information, but you really should (also) use the latest gcc 
(3.3x) for such a comparision. It will be interesting to see how it performs 
relative to the latest icc on the one hand, and to the old gcc on the other 
hand.

And some information on the application (or libraries used) would be helpful, 
too. Like: is it memory-bound or compute-bound, etc..

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Mon Oct 13 15:26:55 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Mon, 13 Oct 2003 12:26:55 -0700
Subject: Intel and GNU C++ compilers
In-Reply-To: <Pine.LNX.4.44.0310131158390.10307-100000@coffee.psychology.mcmaster.ca>
References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> <Pine.LNX.4.44.0310131158390.10307-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20031013192655.GC16033@greglaptop.internal.keyresearch.com>

On Mon, Oct 13, 2003 at 12:04:47PM -0400, Mark Hahn wrote:

> -fomit-frame-pointer usually helps, sometimes noticably,
> since x86 is so short of registers.

Actually it's a lot more of a tossup than it used to be: having a
frame pointer means you have another 256 bytes accessible via a
single-byte offset, and the SSE registers help relieve the register
pressure problem.

On the Opteron, which has more of both general purpose and SSE
registers, the frame pointer is often a win.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Mon Oct 13 21:21:34 2003
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Mon, 13 Oct 2003 18:21:34 -0700 (PDT)
Subject: The Canadian Internetworked Scientific Supercomputer
Message-ID: <20031014012134.21517.qmail@web11403.mail.yahoo.com>

Just found an interesting paper written by Paul Lu (the auther of
PBSWeb):

http://hpcs2003.ccs.usherbrooke.ca/papers/Lu.pdf

CISS homepage:
http://www.cs.ualberta.ca/~ciss/

Rayson


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Tue Oct 14 10:19:11 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Tue, 14 Oct 2003 16:19:11 +0200
Subject: Intel and GNU C++ compilers
In-Reply-To: <200310131822.17717.joachim@ccrl-nece.de>
References: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
 <200310131822.17717.joachim@ccrl-nece.de>
Message-ID: <20031014141911.GA995@sgirmn.pluri.ucm.es>

On Mon, Oct 13, 2003 at 06:22:17PM +0200, Joachim Worringen wrote:
> thanks for the information, but you really should (also) use the latest gcc 
> (3.3x) for such a comparision. It will be interesting to see how it performs 
> relative to the latest icc on the one hand, and to the old gcc on the other 
> hand.
> 
> And some information on the application (or libraries used) would be helpful, 
> too. Like: is it memory-bound or compute-bound, etc..
> 
>  Joachim

I installed gcc-3.3.2 from the debian testing distribution, here it is
the full report including gcc-3.3.2:

**************************
*** gcc version 2.95.4 ***
**************************
flags                bin-size   elapsed-time
-----                --------   ------------
none                 9.5 KB     311 sec
"-O3"                8.7 KB     192 sec
"-O3 -ffast-math"    8.7 KB     165 sec
********************************************

*************************
*** gcc version 3.3.2 ***
*************************
flags                                     bin-size   elapsed-time
-----                                     --------   ------------
none                                      9.1 KB     245 sec
"-O3"                                     8.8 KB     161 sec
"-O2"                                     8.7 KB     157 sec
"-O2 -ffast-math -fomit-frame-pointer"    8.5 KB     127 sec
"-O2 -ffast-math"                         8.5 KB     125 sec
"-O2 -ffast-math -march=pentium4"         8.5 KB     120 sec
"-O2 -ffast-math -march=pentium4 -msse2"  8.5 KB     120 sec
"-O3 -ffast-math -march=pentium4 -msse2"  8.5 KB     120 sec
********************************************

***********************
*** icc version 7.1 ***
***********************
flags                                    bin-size   elapsed-time
-----                                    --------   ------------
none                                     597 KB     100 sec
"-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0"   563 KB      89 sec
****************************************************************

For this test, we actually wrote a version of the program with many parameters
hardcoded, so that we make it as compute bound as posible, we aimed at
evaluating how the different compilers took advantage of the Xeon
processors.

I will repeat the tests with the full version, which includes more
memory usage, maybe about 80Mb each process, but it will finally depend
on how big we make the files we use to split the calculations.

The main calculation is the phase of a particle, we use an
implementation of the MersenneTwister algorithm:

http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html

and have to compute sqrt(-2*log(x)/x) and sin(C*x/y) (x and y are not
position, they correspond to other variables in the program), C is a
constant hardcoded in the code like sin(9.7438473847*x/y).

I measured how much it it took to compute sqrt(-2*log(x)/x), and it was
about 412 processor cycles (I used rdtscll() ).

I will submit other results as soon as I get them, probably using another
computing algorithm which runs quite faster.

Regards,

Jose M. Perez.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Tue Oct 14 10:32:19 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Tue, 14 Oct 2003 16:32:19 +0200
Subject: Intel and GNU C++ compilers
In-Reply-To: <Pine.LNX.4.56.0310131628350.15969@krylov.OptimaNumerics.com>
References: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310131628350.15969@krylov.OptimaNumerics.com>
Message-ID: <20031014143219.GB995@sgirmn.pluri.ucm.es>

On Mon, Oct 13, 2003 at 04:40:31PM +0000, C J Kenneth Tan -- Heuchera Technologies wrote:
> Jose,
> 
> Can we benchmark our OptimaNumerics Linear Algebra Library with you on
> the same machine?
> 
> Thank you very much!
> 
> 
> Best wishes,
> Kenneth Tan
> -----------------------------------------------------------------------
> C. J. Kenneth Tan, Ph.D.
> Heuchera Technologies Ltd.
> E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
> Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
> -----------------------------------------------------------------------

Hi Kenneth:

Thank you very much for your message, unfortunately we have a pretty
tied schedule here, and lot's of different things to do. Right now I
cannot spend time benchmarking your library on my system, and we cannot
provide access to anyone from outside.

On the other hand I don't know if the calculations I am running at
this moment can exploit your libraries.

Thanks again and best regards,

Jose M. Perez
Madrid. Spain.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From michael.fitzmaurice at ngc.com  Mon Oct 13 10:52:18 2003
From: michael.fitzmaurice at ngc.com (Fitzmaurice, Michael)
Date: Mon, 13 Oct 2003 07:52:18 -0700
Subject: Beowulf Users Group meeting
Message-ID: <03E95480F0B2D042A7598115FB3F5D9D49F3E4@XCGVA009>

Please join us at the Baltimore-Washington Beowulf Users Group meeting this
Tuesday the 14th at 2:45 at the Northrop Grumman building on 7575 Colshire
Drive; McLean, VA 22102. For more details please go to <http://bwbug.org> 

Who should attend?

Sales, marketing and Business Development people
Pre sales engineers 
High Performance Computer professionals
IT generalist 
Data Center Managers
Program and Project Managers

Beowulf Clusters installations are one of the fastest growing areas with in
the IT market. Beowulf Clusters are replacing old slower SMP systems for
half the cost and with twice the performance. Beowulf Clusters will grow
even faster with the introduction of easier to use parallel programming
tools.

Engineered Intelligence is leading the revolution in break through parallel
programming tools for the HPC market. So now application on older SMP
machines can be easily moved to COTS cost effective Intel or AMD based
servers, which have been clustered to improve performance and reduce costs.
Come hear the folks from Engineered Intelligence how your projects can use C
x C to make your applications ready to use Beowulf Clusters today.

This will be one of our best topics regarding the Beowulf Cluster market.
There is no cost for the briefing and you do not need to be a BWBUG member.
As always there will be great door prizes and free parking. If you can not
make it to the meeting pass the word to a colleague or business associate.

T. Michael Fitzmaurice, Jr.
Coordinator of the BWBUG
8110 Gatehouse Road, Suite 400W
Falls Church, VA 22042
703-205-3132 office
240-475-7877 cell


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Tue Oct 14 12:08:50 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Tue, 14 Oct 2003 18:08:50 +0200
Subject: Pentium4 vs Xeon
Message-ID: <20031014160850.GA1163@sgirmn.pluri.ucm.es>

Hi:

We are going to buy a second machine! :-) It will be a diskless dual
processor node. We are thinking about buying the same configuration:
Xeon 2.4Ghz 533Mhz FSB, but since Xeon and the motherboards supporting
them are so expensive, we have been thinking about dual normal Pentium4
instead. We don't have now any P4 comparable processor to run some
tests, and after looking at the Intel docs, the only difference we see
between Xeon and P4 is Xeon having more cache. Does anyone has any
idea about the relative performance of these processors, what about the
price/performance ratio? Is it worth paying for more Xeon?

The other point I wanna ask about is the "host bus speed" reported by
the kernel at boot time, it reports 133Mhz, and our memories are
supposed to run at 266Mhz, is it normal, is it just the double rate
thing?

Thanks in advance,

Jose M. Perez.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Patrick.Begou at hmg.inpg.fr  Tue Oct 14 12:53:54 2003
From: Patrick.Begou at hmg.inpg.fr (Patrick Begou)
Date: Tue, 14 Oct 2003 18:53:54 +0200
Subject: PVM errors at startup
References: <3F86F29F.8A37AC5B@hmg.inpg.fr>
Message-ID: <3F8C2A22.83F751A3@hmg.inpg.fr>

This email just to close the thread with the solution.

The problem was not related to any PVM misconfiguration but to the
ethernet driver. Looking at the ethernet communications between 2 nodes
with tcpdump has shown that pvmd was started using tcp communications
BUT that pvmd were trying to talk each other with UDP protocol (it is
also detailed in the PVM doc) and this was the problem. The UDP
communications was unsuccessfull between the nodes.

Details:
The nodes are P4 2.8 with Asustek P4P800 motherboard and on board 3C940
(gigabit) controler. I was using the 3c2000 driver (from the cdrom).
Kernel is 2.4.20-20.7bigmem from RedHat 7.3.
rsh, rexec and rcp are working fine but this driver seems not to work
with UDP protocol???

The solution was to download the sk68lin driver (v6.18) and run the
shell script to patch the kernel sources for the current kernel. Then
correct the module.conf file and set up the gigabit interface. Now PVM
is working fine between the two first nodes and the measured throughput
is the same as with 3c2000 asustek driver. I should now setup the other
nodes!

I would like to thanks Pr. Kenneth R. Koehler and Dr James Arthur Kohl
for their great help in checking the full PVM configuration and leading
me towards a network driver problem.

Patrick
-- 
===============================================================
|  Equipe M.O.S.T.         | http://most.hmg.inpg.fr          |
|  Patrick BEGOU           |       ------------               |
|  LEGI                    | mailto:Patrick.Begou at hmg.inpg.fr |
|  BP 53 X                 | Tel 04 76 82 51 35               |
|  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71               |
===============================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From josip at lanl.gov  Tue Oct 14 13:38:35 2003
From: josip at lanl.gov (Josip Loncaric)
Date: Tue, 14 Oct 2003 11:38:35 -0600
Subject: Pentium4 vs Xeon
In-Reply-To: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
References: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
Message-ID: <3F8C349B.5040302@lanl.gov>

Jos? M. P?rez S?nchez wrote:

> [...]  we have been thinking about dual normal Pentium4 [...]

SMP operation and larger caches appear to be threshold features in 
Xeons.  Old Pentium III could be used in duals, but Intel's marketing 
has changed.  Normal Pentium4 is *not* dual processor enabled:

http://www.intel.com/products/desktop/processors/pentium4/index.htm?iid=ipp_browse+dsktopprocess_p4p&
http://www.intel.com/products/server/processors/server/xeon/index.htm?iid=ipp_browse+srvrprocess_xeon512&

If you really want a fast dual CPU machine from Intel, you'll probably 
have to pay for a Xeon...

Sincerely,
Josip


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From djholm at fnal.gov  Tue Oct 14 13:40:46 2003
From: djholm at fnal.gov (Don Holmgren)
Date: Tue, 14 Oct 2003 12:40:46 -0500
Subject: Pentium4 vs Xeon
In-Reply-To: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
References: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
Message-ID: <Pine.SGI.4.58.0310141226530.3997086@hppc.fnal.gov>


On Tue, 14 Oct 2003, [iso-8859-1] Jos? M. P?rez S?nchez wrote:

> Hi:
>
> We are going to buy a second machine! :-) It will be a diskless dual
> processor node. We are thinking about buying the same configuration:
> Xeon 2.4Ghz 533Mhz FSB, but since Xeon and the motherboards supporting
> them are so expensive, we have been thinking about dual normal Pentium4
> instead. We don't have now any P4 comparable processor to run some
> tests, and after looking at the Intel docs, the only difference we see
> between Xeon and P4 is Xeon having more cache. Does anyone has any
> idea about the relative performance of these processors, what about the
> price/performance ratio? Is it worth paying for more Xeon?
>
> The other point I wanna ask about is the "host bus speed" reported by
> the kernel at boot time, it reports 133Mhz, and our memories are
> supposed to run at 266Mhz, is it normal, is it just the double rate
> thing?
>
> Thanks in advance,
>
> Jose M. Perez.


The major difference between P4 and Xeon is that P4's are available with
up to 800 MHz FSB, and Xeon's with up to 533 MHz FSB.  If your code is
sensitive to memory bandwidth, a P4 can be a big win.  Otherwise they
are essentially equivalent.  P4 and standard Xeon both have 512K L2
caches.  Xeon's with larger L2 caches are available, but if I'm not
mistaken there's a big price difference.

Pricewise (YMMV), cheap desktop P4's can be had very roughly for half
the price of a comparable dual Xeon.  You may very well prefer to admin
half the number of boxes and so would prefer the Xeon.  If you are using
an expensive interconnect, you may also come out ahead with the dual
processor boxes, buying only half of the PCI adapters and half the
switch ports.

Currently P4 motherboards are only available (AFAIK) with 33MHz/32bit
PCI.  That can be a big bottleneck if your cluster application is
sensitive to I/O bandwidth.  Early in 2004, if the rumours are true,
there will be a P4 chipset supporting 66MHz/64bit PCI-X.  And in late
2004, PCI Express should be available on both P4 and Xeon motherboards,
providing a big increase in I/O bandwidth if one has a network which can
take advantage.

Xeon's and P4's do four transfers per clock - so, a 533MHz FSB is really
a 133MHz clock doing 4 transfers per cycle.  The kernel on my 800 MHz
FSB P4 reports a 200 MHz host bus speed.

Don Holmgren
Fermilab
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Tue Oct 14 17:14:26 2003
From: rodmur at maybe.org (Dale Harris)
Date: Tue, 14 Oct 2003 14:14:26 -0700
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310011000290.18171-100000@lilith.rgb.private.net>
References: <200310011613.46297.lepalom@upc.es> <Pine.LNX.4.44.0310011000290.18171-100000@lilith.rgb.private.net>
Message-ID: <20031014211426.GI8116@maybe.org>

On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated:
> 
> <?xml version="1.0"?>
> <sensors>
>   <cpu_temperature id="0" units="C">54.2</cpu_temperature>


You know... one problem I see with this, assuming this information is
going to pass across the net (or did I miss something).  Is that instead
of passing something like four bytes (ie "54.2"), you are going to be
passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
a little bit of data 14 times.  I can't see this being particularly
efficient way of using a network.  Sure, it looks pretty, but seems like
a waste of bandwidth.  


--
Dale Harris   
rodmur at maybe.org
/.-)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Tue Oct 14 18:13:53 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Tue, 14 Oct 2003 18:13:53 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031014211426.GI8116@maybe.org>
Message-ID: <Pine.LNX.4.44.0310141803570.5310-100000@coffee.psychology.mcmaster.ca>

> > <?xml version="1.0"?>
> > <sensors>
> >   <cpu_temperature id="0" units="C">54.2</cpu_temperature>
> 
> You know... one problem I see with this, assuming this information is
> going to pass across the net (or did I miss something).  Is that instead
> of passing something like four bytes (ie "54.2"), you are going to be
> passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
> a little bit of data 14 times.  I can't see this being particularly
> efficient way of using a network.  Sure, it looks pretty, but seems like
> a waste of bandwidth.  

I'm sure some would claim that 56 bytes is not measurable overhead,
especially considering the size of tcp/eth/etc headers.  but it's 
damn ugly, to be sure.  this sort of thing has been discussed several
times on the linux-kernel list as well - formatting of /proc entries.

it's clear that some form of human-readability is a good thing.
what's not clear is that it has to be so exceptionally verbose.

think of it this way: lmsensors output for a machine is a record
whose type will not change (very fast, if you insist!).  so why should
all the metadata about the record format, units, etc be sent each time?
suppose you could fetch the fully verbose record once, and then on 
subsequent queries, just get '54.2 56.7 40.1 3650 4150 5.0 3.3 12.0 -12.0'.
the only think you've lost is same-packet-self-description 
(and, incidentally, insensitivity to reordering of elements...)

there *is* actually a very mind-bending binarification procedure for xml.
it seems totally cracked to me, though, since afaikt, it completely tosses 
the self-description aspect, which is almost the main point of xml...
of course, the whole xml thing is a massive fraud, since it does nothing 
at all towards actual interoperability - there must already be thousands
of different xml schemas for "SKU", each better than the last, and therefore
mutually incompatible...

does ASN.1 improve on this situation at all?

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Oct 14 20:45:12 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 14 Oct 2003 20:45:12 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031014211426.GI8116@maybe.org>
Message-ID: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>

On Tue, 14 Oct 2003, Dale Harris wrote:

> On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated:
> > 
> > <?xml version="1.0"?>
> > <sensors>
> >   <cpu_temperature id="0" units="C">54.2</cpu_temperature>
> 
> 
> 
> You know... one problem I see with this, assuming this information is
> going to pass across the net (or did I miss something).  Is that instead
> of passing something like four bytes (ie "54.2"), you are going to be
> passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
> a little bit of data 14 times.  I can't see this being particularly
> efficient way of using a network.  Sure, it looks pretty, but seems like
> a waste of bandwidth.  

Ah, an open invitation to waste a little more:-)

Permit me to rant (the following can be freely skipped by the
rant-averse:-).  Note that this is not a flame, merely an impassioned
assertion of an admittedly personal religious viewpoint.  Like similar
rants concerning the virtues of C vs C++ vs Fortran vs Java or Python vs
Perl, it is intended to amuse or possible educate, but doubtless won't
change many human minds.

<rant> 

This is an interesting question and one I kicked around a long time when
designing xmlsysd.  Of course it is also a very longstanding issue -- as
old as computers or just about.  Binary formats (with need for endian
etc translation) are obviously the most efficient but are impossible to
read casually and difficult to maintain or modify.  Compressed binary
(or binary that only uses e.g.  one bit where one bit will do) the most
impossible and most difficult.  Back in the old days, memory and
bandwidth on all computers was a precious and rare thing.  ALL programs
tended to use one bit where one bit was enough.  Entire formats with
headers and metadata and all were created where every bit was
parsimoniously allocated out of a limited pool.  Naturally, those
allocations proved to be inadequate in the long run so that only a few
years ago lilo would complain if the boot partition had more than 1023
divisions because once upon a time somebody decided that 10 bits was all
this particular field was ever going to get.

In order to parse such a binary stream, it is almost essential to use a
single library to both format and write the stream and to read and parse
it, and to maintain both ends at the same time.  Accessing the data ONLY
occurs through the library calls.

This is a PITA.  Cosmically.  Seriously.  Yes, there are many computer
subsystems that do just this, but they are nightmarish to use even via
the library (which from a practical point of view becomes an API, a
language definition of its own, with its own objects and tools for
creating them and extracting them, and the need to be FULLY DOCUMENTED
at each step as one goes along) and require someone with a high level of
devotion and skill to keep them roughly bugfree.  For example, if you
write your code for single CPU systems, it becomes a major problem to
add support for duals, and then becomes a major problem again to add
support for N-CPU SMPs.  Debugging becomes a multistep problem -- is the
problem in the unit that assembles and provides the data, the encoding
library, the decoding library (both of which are one-offs,
written/maintained just for the base application) or is it in the client
application seeking access to the data?

Fortunately, in the old days, nearly all programming was done by
professional programmers working for a wage for giant (or not so giant)
companies.  Binary interfaces were ideal -- they became Intellectual
Property >>because<< they were opaque and required a special library
whose source was hidden to access the actual binary, which might be
entirely undocumented (except via its API library calls).  BECAUSE they
were so bloomin' hidden an difficult/expensive to modify, software
evolved very, very slowly, breaking like all hell every time e.g. MS
Word went from revision 1 to 2 to 3 to... because of broken binary
incompatibility.

ASCII, OTOH, has the advantage of being (in principle) easy to read.
However, it is easy to make it as obscure and difficult to read as
binary.  Examples abound, but let's pull one from /proc, since the
entire /proc interface is designed around the premise that ascii is good
relative to binary (although that seems to be the sole thing that the
many designers of different subsystems agree on).  When parsing the
basic status data of an application, one can work through:

rgb at lilith|T:105>cat /proc/1214/stat
1214 (pine) S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0
14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168
4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0

(which, as you can see, contains the information on the pine application
within which I am currently working on my laptop).

What?  You find that hard to read?  Surely it is obvious that the first
field is the PID, the second the application name (inside parens,
introducing a second, fairly arbitrary delimiter to parse), the runtime
status (which is actually NOT a single character, it can vary) and
then... ooo, my.  Time to check out man proc, kernel source
(/usr/src/linux/fs/proc/array.c) and maybe the procps sources.

One does better with:

rgb at lilith|T:106>cat /proc/1214/status
Name:   pine
State:  S (sleeping)
Tgid:   1214
Pid:    1214
PPid:   1205
TracerPid:      0
Uid:    1337    1337    1337    1337
Gid:    1337    1337    1337    1337
FDSize: 32
Groups: 1337 0 
VmSize:    11752 kB
VmLck:         0 kB
VmRSS:      5652 kB
VmData:     2496 kB
VmStk:        52 kB
VmExe:      2804 kB
VmLib:      3708 kB
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 8000000008001003
SigCgt: 0000000040016c5c
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

This is an almost human readable view of MUCH of the same data that is
in /proc/stat.  Of course there is the little ASCII encoded hexadecimal
garbage at the bottom that could make strong coders weep (again, without
a fairly explicit guide into what every byte or even BIT in this array
does, as one sort of expects that there are binary masked values stuck
in here).  In this case man proc doesn't help -- because this is
supposedly "human readable" they don't provide a reference there.
Still, some of the stuff that is output by ps aux is clearly there in a
fairly easily parseable form.

Mind you, there are still mysteries.  What are the four UID entries?
What is the resolution on the memory, and are kB x1000 or x1024?  What
about the rest of the data in /proc/stat (as there are a lot more fields
there).  What about the contents of /proc/PID/statm? (Or heavens
preserve us, /proc/PID/maps)?  

Finally, what about other things in /proc, e.g.:

rgb at lilith|T:119>cat /proc/stat
cpu  3498 0 2122 239197
cpu0 3498 0 2122 239197
page 128909 55007
swap 1 0
intr 279199 244817 13604 0 3427 6 0 4 4 1 3 2 2 1436 0 15893 0
disk_io: (3,0):(15946,11130,257194,4816,109992) 
ctxt 335774
btime 1066170139
processes 1261

Again, ASCII yes, but now (count them) there are whitespace, :, (, and
',' separators, and one piece of data (the CPU's index) is a part of a
field value (cpu0) so that the entire string "cpu" becomes a sort of
separator (but only in one of the lines).  An impressive ratio of
separators used to field labels.  I won't even begin to address the LIVE
VILE EVIL of overloading nested data structures nested in sequential,
arbitrary separators inside the "values" for a single field, disk_io (or
is that disk_io:?)

If this isn't enough for you, consider /proc/net/dev, which has two
separators (: and ws) but is in COLUMNS, /proc/bus/pci/devices (which I
still haven't figured out) and yes, the aforementioned sensors interface
in /proc.

I offer all of the above as evidence of a fairly evil (did you ever
notice how evil, live, vile, veil and elvi are all anagrams of one
another he asks in a mindless parenthetical insertion to see if you're
still awake:-) middle ground between a true binary interface accessible
only through library calls (which can actually be fairly clean, if one
creates objects/structs with enough mojo to hold the requisite data
types so that one can then create a relatively simple set of methods for
accessing them) and xml.

XML is the opposite end of the binary spectrum.  It asserts as its
primary design principle that the objects/structs with the right kind of
mojo share certain features -- precisely those that constitute the
rigorous design requirements of XML (nesting, attributes, values, etc).
There is a fairly obvious mapping between a C struct, a C++ object, and
an XMLified table.  It also asserts implicitly that whether or not the
object tags are chosen to be human readable (nobody insists that the
tags encapsulating CPU temperature readings be named <cpu_temperature>
-- they could have been just <t>) there MUST be some sort of dictionary
created at the same time as the XML implementation.  If (very) human
readable tags are chosen they are nearly self-documenting, but whole
layers of DTD and CSS and so forth treatment of XML compliant markup are
predicated upon a clear definition of the tag rules and hierarchy.

Oh, and by its very design XML is highly scalable and extensible.  Just
as one can easily enough add fields into a struct without breaking code
that uses existing fields, one can often add tags into an XML document
description without breaking existing tags or tag processing code
(compare with adding a field anywhere into /proc/stat -- ooo, disaster).
This isn't always the case in either case -- sometimes one converts a
field in a struct into a struct in its own right, for example, which can
do violence to both the struct and an XML realization of it.  Still,
often one can and when one can't it is usually because you've had a
serious insight into the "right" way to structure your data and before
the encoding was just plain wrong in some deep way.  This happens, but
generally only fairly early in the design and implementation process.

Note that XML need not be inefficient in transit.  BECAUSE it is so
highly structured, it compresses very efficiently.  Library calls exist
to squeeze out insignificant whitespace, for example (ignored by the
parser anyway).  I haven't checked recently to see whether compression
is making its way into the library, but either way one can certainly
compress/decompress and/or encrypt/decrypt the assembled XML messages
before/after transmission, if CPU is cheaper to you than network or
security is an issue.

I think that it then comes down to the following.  XML may or may not be
perfect, but it does form the basis for a highly consistent
representation of data structures that is NOT OPAQUE and is EASILY
CREATED AND EASILY PARSED with STANDARD TOOLS AND LIBRARIES.  When
designing an XMLish "language" for your data, you can make the same kind
of choices that you face in any program. Do you document your code or
not?  Do you use lots of variable names like egrp1 or do you write out
something roughly human readable like extra_group_1?  Do you write your
loops so that they correspond to the actual formulae or basic algorithm
(and let the compiler do as well as it can with them) or do you block
them out to be cache-friendly, insert inline assembler, and so forth to
make them much faster but impossible to read or remember even yourself
six months after you write them?  Some choices make the code run fast
and short but hard to maintain.  Other choices make it run slower but be
more readable and easier to maintain.

In the long run, I think most programmers eventually come to a sort of
state of natural economy in most of these decisions; one that expresses
their personal style, the requirements of their job, the requirements of
the task, and a reflection of their experience(s) coding.  It is a
cost/benefit problem, after all (as is so much in computing).  You have
to ask how much it costs you to do something X way instead of Y way, and
what the payoff/benefits are, in the long run.

For myself only, years of experience have convinced me that as far as
things like /proc or task/hardware monitoring are concerned, the
bandwidth vs ease of development and maintenance question comes down
solidly in favor of ease of development and maintenance.  Huge amounts
of human time are wasted writing parsers and extracting de facto data
dictionaries from raw source (the only place where they apparently
reside).  Tools that are built to collect data from a more or less
arbitrary interface have to be almost completely rewritten when that
interface changes signficantly (or break horribly in the meantime).

So the cost is this human time (programmers'), more human time (the time
and productivity lost by people who lack the many tools a better
interface would doubtless spawn), and the human time and productivity
lost due to the bugs the more complex and opaque and multilayered
interface generates.  The benefit is that you save (as you note)
anywhere from a factor of 3-4 to 10 or more in the total volume of data
delivered by the interface.  Data organization and human readability
come at a price.

But what is the REAL cost of this extra data?  Data on computers is
typically manipulated in pages of memory, and a page is what, 4096
bytes?  Data movement (especially of contiguous data) is also very rapid
on modern computers -- you are talking about saving a very tiny fraction
of a second indeed when you reduce the message from 54 bytes to 4 bytes.
Even on the network, on a 100BT connection one is empirically limited by
LATENCY on messages less than about 1000 bytes in length.  So if you ask
how long it takes to send a 4 byte packet or a 54 byte packet (either
one of which is TCP encapsulated inside a header that is longer than the
data) the answer is that they take exactly the same amount of time
(within a few tens of nanoseconds).

If the data in question is truly a data stream -- a more or less
continuous flow of data going through a channel that represents a true
bottleneck, then one should probably use a true binary representation to
send the data (as e.g. PVM or MPI generally do), handling endian
translation and data integrity and all that.  If the data in question is
a relatively short (no matter how it is wrapped and encoded) and
intermittant source -- as most things like a sensors interface, the proc
interface(s) in general, the configuration file of your choice, and most
net/web services are, arguably -- then working hard to compress or
minimally encapsulate the data in an opaque form is hard to justify in
terms of the time (if any) that it saves, especially on networks, CPUs,
memory that are ever FASTER.  If it doesn't introduce any
human-noticeable delay, and the overall load on the system(s) in
question remain unmeasurably low (as was generally the case with e.g.
the top command ten Moore's Law years or more ago) then why bother?

I think (again noting that this is my own humble opinion:-) that there
is no point.  /proc should be completely rewritten, probably by being
ghosted in e.g. /xmlproc as it is ported a little at a time, to a
single, consistent, well documented xmlish format.  procps should
similarly be rewritten in parallel with this process, as should the
other tools that extract data from /proc and process it for human or
software consumption.  Perhaps experimentation will determine that there
are a FEW places in /proc where the extra overhead of parsing xml isn't
acceptable for SOME applications -- /proc/pid/stat for example.  In
those few cases it may be worthwhile to make the ghosting permanent --
to provide an xmlish view AND a binary or minimal ASCII view, as is
done now, badly, with /proc/pid/stat and /proc/pid/status.

This is especially true, BTW, in open source software, where a major
component of the labor that creates and maintains both low level/back
end service software and high level/front end client software is unpaid,
volunteer, part time, and of a wide range of skill and experience.  Here
the benefits of having a documented, rigorously organized,
straightforwardly parsed API layer between tools are the greatest.

Finally, to give the rotting horse one last kick, xmlified documents
(deviating slightly from API's per se) are ideal for archival storage
purposes.  Microsoft is being scrutinized now by many agencies concerned
about the risks associated from having 90% of our vital services
provided by an operating system that has proven in practice to be
appallingly vulnerable.  Their problem has barely begun.  The REAL
expense associated with using Microsoft-based documents is going to
prove in the long run to be the expense of de-archiving old
proprietary-binary-format documents long after the tools that created
them have gone away.  This is a problem worthy of a rant all by itself
(and I've written one or two in other venues) but it hasn't quite
reached maturity as it requires enough years of document accumulation
and toplevel drift in the binary "standard" before it jumps out and
slaps you in the face with six and seven figure expenses.  XMLish
documents (especially when accompanied by a suitable DTD and/or data
dictionary) simply cannot cost that much to convert because their
formats are intrinsically open.

</rant>

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From chrismiles1981 at hotmail.com  Tue Oct 14 22:02:28 2003
From: chrismiles1981 at hotmail.com (Chris Miles)
Date: Wed, 15 Oct 2003 03:02:28 +0100
Subject: Condor Problem
Message-ID: <Law11-F42Li7wvyFJWN00012be1@hotmail.com>

Does anyone have any condor experience? im trying to submit a job
which is a Borland C++ console application.. the application writes
a final output to the screen... but this is not being saved to the output
file i specified in the jobs configuration. When i use a simple batch file
and echo some text to the screen and submit that as a job it works
fine and the echoed text is in the output file.

Is there a problem with condor? or is there a problem with c++ or stdout?

any help would be greatly appreciated.

Thanks in advance... Chris Miles, NeuralGrid, Paisley University, Scotland

_________________________________________________________________
Express yourself with cool emoticons - download MSN Messenger today! 
http://www.msn.co.uk/messenger

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kohlja at ornl.gov  Tue Oct 14 15:08:54 2003
From: kohlja at ornl.gov (James Kohl)
Date: Tue, 14 Oct 2003 15:08:54 -0400
Subject: PVM errors at startup
In-Reply-To: <3F8C2A22.83F751A3@hmg.inpg.fr>
References: <3F86F29F.8A37AC5B@hmg.inpg.fr> <3F8C2A22.83F751A3@hmg.inpg.fr>
Message-ID: <20031014190854.GA31004@neo.csm.ornl.gov>

Hey Patrick,

Glad you found the problem.  This is usually manifested when the
networking config is off slightly, or when internal/external networks
are confused, but it sounds like you had a much more interesting
problem...!  :-)

Yes, PVM uses rsh/ssh/TCP to start a remote PVM daemon (pvmd) but
then the daemons themselves use UDP to talk and route PVM messages.
FYI, any PVM tasks that use the "PvmRouteDirect" will use direct TCP
sockets.

Again, glad you figured it out!  (And you're most welcome! :)

All the Best,

	Jim

  On Tue, Oct 14, 2003 at 06:53:54PM +0200, Patrick Begou wrote:
  > This email just to close the thread with the solution.

  > The problem was not related to any PVM misconfiguration but to the
  > ethernet driver. Looking at the ethernet communications between 2 nodes
  > with tcpdump has shown that pvmd was started using tcp communications
  > BUT that pvmd were trying to talk each other with UDP protocol (it is
  > also detailed in the PVM doc) and this was the problem. The UDP
  > communications was unsuccessfull between the nodes.

  > Details:
  > The nodes are P4 2.8 with Asustek P4P800 motherboard and on board 3C940
  > (gigabit) controler. I was using the 3c2000 driver (from the cdrom).
  > Kernel is 2.4.20-20.7bigmem from RedHat 7.3.
  > rsh, rexec and rcp are working fine but this driver seems not to work
  > with UDP protocol???

  > The solution was to download the sk68lin driver (v6.18) and run the
  > shell script to patch the kernel sources for the current kernel. Then
  > correct the module.conf file and set up the gigabit interface. Now PVM
  > is working fine between the two first nodes and the measured throughput
  > is the same as with 3c2000 asustek driver. I should now setup the other
  > nodes!

  > I would like to thanks Pr. Kenneth R. Koehler and Dr James Arthur Kohl
  > for their great help in checking the full PVM configuration and leading
  > me towards a network driver problem.

  > Patrick

(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:

   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
   Oak Ridge National Laboratory              still owe you money, Fool!"
   kohlja at ornl.gov
   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!

:):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Wed Oct 15 04:49:26 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 15 Oct 2003 10:49:26 +0200 (CEST)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.33.0310151044520.27655-100000@maloney.ethz.ch>

On Tue, 14 Oct 2003, Robert G. Brown wrote:
> On Tue, 14 Oct 2003, Dale Harris wrote:
> > On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated:
> > >
> > > <?xml version="1.0"?>
> > > <sensors>
> > >   <cpu_temperature id="0" units="C">54.2</cpu_temperature>
> >
> >
> >
> > You know... one problem I see with this, assuming this information is
> > going to pass across the net (or did I miss something).  Is that instead
> > of passing something like four bytes (ie "54.2"), you are going to be
> > passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
> > a little bit of data 14 times.  I can't see this being particularly
> > efficient way of using a network.  Sure, it looks pretty, but seems like
> > a waste of bandwidth.
>
> Ah, an open invitation to waste a little more:-)

Isn't it a bit cynical to write a 20 KByte e-mail on the topic of
saving 56 Bytes? ;-)

SCNR,
Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H16             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From djholm at fnal.gov  Wed Oct 15 11:43:09 2003
From: djholm at fnal.gov (Don Holmgren)
Date: Wed, 15 Oct 2003 10:43:09 -0500
Subject: Some application performance results on a dual G5
Message-ID: <Pine.SGI.4.58.0310151040340.4469448@hppc.fnal.gov>


For those who might be interested, I've posted some lattice QCD
application performance results on a 2.0 GHz dual G5 PowerMac.  See

  http://lqcd.fnal.gov/benchmarks/G5/

As expected from the specifications, strong memory bandwidth, reasonable
scaling, and good floating point performance.

Don Holmgren
Fermilab
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct 15 09:46:45 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 15 Oct 2003 09:46:45 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.33.0310151044520.27655-100000@maloney.ethz.ch>
Message-ID: <Pine.LNX.4.44.0310150943490.5854-100000@lilith.rgb.private.net>

On Wed, 15 Oct 2003, Felix Rauch wrote:

> > > passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
> > > a little bit of data 14 times.  I can't see this being particularly
> > > efficient way of using a network.  Sure, it looks pretty, but seems like
> > > a waste of bandwidth.
> >
> > Ah, an open invitation to waste a little more:-)
> 
> Isn't it a bit cynical to write a 20 KByte e-mail on the topic of
> saving 56 Bytes? ;-)

Cynical?  No, not really.  Stupid?  Probably.

If only I could get SOMEBODY to pay me ten measely cents a word for my
rants...

Alas this is not to be.  So the alternative is to see if I can extort
ten cents from everybody on the list NOT to write 20K rants like this.
Sort of like National Lampoon's famous "Buy this magazine or we'll shoot
this dog" issue...:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Wed Oct 15 14:16:06 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Wed, 15 Oct 2003 11:16:06 -0700
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310150943490.5854-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.33.0310151044520.27655-100000@maloney.ethz.ch> <Pine.LNX.4.44.0310150943490.5854-100000@lilith.rgb.private.net>
Message-ID: <20031015181606.GA1574@greglaptop.internal.keyresearch.com>

On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote:

> So the alternative is to see if I can extort
> ten cents from everybody on the list NOT to write 20K rants like this.

Do you accept pay-pal? Do you promise to spend all the money buying
yourself beer?

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From chrismiles1981 at hotmail.com  Wed Oct 15 21:22:01 2003
From: chrismiles1981 at hotmail.com (Chris Miles)
Date: Thu, 16 Oct 2003 02:22:01 +0100
Subject: Condor Problem
Message-ID: <Law11-F866YHkidcDjo00017362@hotmail.com>

Hi, thanks for the reply

Using all this instead of condor/globus?

The only thing was I need to do this on windows.

What i want to do is setup a Grid but also need a cluster to run
jobs on

Chris

>From: Andrew Wang <andrewxwang at yahoo.com.tw>
>To: Chris Miles <chrismiles1981 at hotmail.com>
>CC: beowulf at beowulf.org
>Subject: Re: Condor Problem
>Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST)
>MIME-Version: 1.0
>Received: from mc11-f10.hotmail.com ([65.54.167.17]) by 
>mc11-s20.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Wed, 15 Oct 
>2003 18:13:50 -0700
>Received: from web16812.mail.tpe.yahoo.com ([202.1.236.152]) by 
>mc11-f10.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Wed, 15 Oct 
>2003 18:11:09 -0700
>Received: from [65.49.83.96] by web16812.mail.tpe.yahoo.com via HTTP; Thu, 
>16 Oct 2003 09:11:03 CST
>X-Message-Info: JGTYoYF78jHqyjkG27RbQOhxNCLEO1Jq
>Message-ID: <20031016011103.41833.qmail at web16812.mail.tpe.yahoo.com>
>Return-Path: andrewxwang at yahoo.com.tw
>X-OriginalArrivalTime: 16 Oct 2003 01:11:09.0941 (UTC) 
>FILETIME=[62388A50:01C39382]
>
>If all you need is a batch system, I would suggest SGE
>and Scalable PBS, which have more users and better
>support.
>
>Both of them are free and opensource, so you can try
>both and see which one you like better!
>
>SGE: http://gridengine.sunsource.net
>SPBS: http://www.supercluster.org/projects/pbs/
>
>Andrew.
>
>-----------------------------------------------------------------
>?C???? Yahoo!?_??
>?????C???B?????????B?R?A???????A???b?H??????
>http://tw.promo.yahoo.com/mail_premium/stationery.html

_________________________________________________________________
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Oct 15 21:11:03 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST)
Subject: Condor Problem
Message-ID: <20031016011103.41833.qmail@web16812.mail.tpe.yahoo.com>

If all you need is a batch system, I would suggest SGE
and Scalable PBS, which have more users and better
support.

Both of them are free and opensource, so you can try
both and see which one you like better!

SGE: http://gridengine.sunsource.net
SPBS: http://www.supercluster.org/projects/pbs/

Andrew.

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eric at fnordsystems.com  Wed Oct 15 21:37:36 2003
From: eric at fnordsystems.com (Eric Kuhnke)
Date: Wed, 15 Oct 2003 18:37:36 -0700
Subject: Pentium4 vs Xeon
In-Reply-To: <Pine.SGI.4.58.0310141226530.3997086@hppc.fnal.gov>
References: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
 <20031014160850.GA1163@sgirmn.pluri.ucm.es>
Message-ID: <5.2.0.9.2.20031015183031.03c57ce8@216.82.101.6>

There are single-Xeon boards using the Serverworks GC series of chipsets with 64-bit PCI, but they're just as expensive as a budget dual Xeon board (Tyan S2723 or Supermicro X5DPA-GG)...  In the $280 to $310 per board price range.  Seems rather silly, as the "Prestonia" Socket-604 Xeon CPUs are nothing but a P4 repackaged.  

There's also this board:
http://www.tyan.com/products/html/trinitygcsl.html
Which uses a single P4 @ 533MHz FSB, with the same Serverworks chipset.  

Supermicro X5-SS* series (scroll down):
http://www.supermicro.com/Product_page/product-mS.htm


>Currently P4 motherboards are only available (AFAIK) with 33MHz/32bit
>PCI.  That can be a big bottleneck if your cluster application is
>sensitive to I/O bandwidth.  Early in 2004, if the rumours are true,
>there will be a P4 chipset supporting 66MHz/64bit PCI-X. 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Oct 15 22:15:56 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Thu, 16 Oct 2003 10:15:56 +0800 (CST)
Subject: Condor Problem
In-Reply-To: <Law11-F866YHkidcDjo00017362@hotmail.com>
Message-ID: <20031016021556.52225.qmail@web16812.mail.tpe.yahoo.com>

Unluckily, SGE has very limited Windows support.
PBSPro, which supports MS-Windows (the free versions
do not), does offer free licenses to .edu sites.

BTW, may be there are more people with condor
knowledge from the condor mailing list can answer your
questions.

http://www.cs.wisc.edu/~lists/archive/condor-users/

Andrew.

 --- Chris Miles <chrismiles1981 at hotmail.com>
????> Hi, thanks for the reply
> 
> Using all this instead of condor/globus?
> 
> The only thing was I need to do this on windows.
> 
> What i want to do is setup a Grid but also need a
> cluster to run
> jobs on
> 
> Chris
> 
> >From: Andrew Wang <andrewxwang at yahoo.com.tw>
> >To: Chris Miles <chrismiles1981 at hotmail.com>
> >CC: beowulf at beowulf.org
> >Subject: Re: Condor Problem
> >Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST)
> >MIME-Version: 1.0
> >Received: from mc11-f10.hotmail.com
> ([65.54.167.17]) by 
> >mc11-s20.hotmail.com with Microsoft
> SMTPSVC(5.0.2195.5600); Wed, 15 Oct 
> >2003 18:13:50 -0700
> >Received: from web16812.mail.tpe.yahoo.com
> ([202.1.236.152]) by 
> >mc11-f10.hotmail.com with Microsoft
> SMTPSVC(5.0.2195.5600); Wed, 15 Oct 
> >2003 18:11:09 -0700
> >Received: from [65.49.83.96] by
> web16812.mail.tpe.yahoo.com via HTTP; Thu, 
> >16 Oct 2003 09:11:03 CST
> >X-Message-Info: JGTYoYF78jHqyjkG27RbQOhxNCLEO1Jq
> >Message-ID:
>
<20031016011103.41833.qmail at web16812.mail.tpe.yahoo.com>
> >Return-Path: andrewxwang at yahoo.com.tw
> >X-OriginalArrivalTime: 16 Oct 2003 01:11:09.0941
> (UTC) 
> >FILETIME=[62388A50:01C39382]
> >
> >If all you need is a batch system, I would suggest
> SGE
> >and Scalable PBS, which have more users and better
> >support.
> >
> >Both of them are free and opensource, so you can
> try
> >both and see which one you like better!
> >
> >SGE: http://gridengine.sunsource.net
> >SPBS: http://www.supercluster.org/projects/pbs/
> >
> >Andrew.
> >
>
>-----------------------------------------------------------------
> >??? Yahoo!??
> >??????????????????????
>
>http://tw.promo.yahoo.com/mail_premium/stationery.html
> 
>
_________________________________________________________________
> Stay in touch with absent friends - get MSN
> Messenger 
> http://www.msn.co.uk/messenger
>  

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From graham.mullier at syngenta.com  Thu Oct 16 04:47:12 2003
From: graham.mullier at syngenta.com (graham.mullier at syngenta.com)
Date: Thu, 16 Oct 2003 09:47:12 +0100
Subject: XML for formatting (Re: Environment monitoring)
Message-ID: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12>

[Hmm, and will the rants be longer or shorter after he's bought the mental
lubricant?]

I'm in support of the original rant, however, having had to reverse-engineer
several data formats in the past. Most recently a set of molecular-orbital
output data. Very frustrating trying to count through data fields and
convince myself that we have mapped it correctly.

Anecdote from a different field (weather models) that's related - for a
while, a weather model used calibration data a bit wrong - sea temperature
and sea surface wind speed were swapped. All because someone had to look at
a data dump and guess which column was which.

So, sure, XML is very wordy, but the time saving (when trying to decipher
the data) and potential for avoiding big mistakes more than makes up for it
(IMO).

Graham


Graham Mullier
Chemoinformatics Team Leader,
Chemistry Design Group,
Syngenta, Bracknell, RG42 6EY, UK.
direct line: +44 (0) 1344 414163
mailto:Graham.Mullier at syngenta.com


-----Original Message-----
From: Greg Lindahl [mailto:lindahl at keyresearch.com]
Sent: 15 October 2003 19:16
Cc: beowulf at beowulf.org
Subject: Re: XML for formatting (Re: Environment monitoring)


On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote:

> So the alternative is to see if I can extort
> ten cents from everybody on the list NOT to write 20K rants like this.

Do you accept pay-pal? Do you promise to spend all the money buying
yourself beer?

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Thu Oct 16 08:12:36 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Thu, 16 Oct 2003 14:12:36 +0200
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
References: <20031014211426.GI8116@maybe.org> <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
Message-ID: <20031016121236.GE8711@unthought.net>

On Tue, Oct 14, 2003 at 08:45:12PM -0400, Robert G. Brown wrote:
...
> rgb at lilith|T:105>cat /proc/1214/stat
> 1214 (pine) S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0
> 14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168
> 4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0

While this has nothing to do with your (fine as always ;) rant, I just
need to add a comment (which has everything to do with /proc
stupidities):

> (which, as you can see, contains the information on the pine application
> within which I am currently working on my laptop).
> 
> What?  You find that hard to read?

Imagine I had a process with the (admittedly unlikely but entirely
possible) name 'pine) S 1205 ('

Your stat output would read:
1214 (pine) S 1205 () S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0
14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168
4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0

Parsing the ASCII-art in /proc/mdstat is at least as fun  ;)

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct 16 08:08:12 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 16 Oct 2003 08:08:12 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031015181606.GA1574@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0310160759330.8096-100000@lilith.rgb.private.net>

On Wed, 15 Oct 2003, Greg Lindahl wrote:

> On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote:
> 
> > So the alternative is to see if I can extort
> > ten cents from everybody on the list NOT to write 20K rants like this.
> 
> Do you accept pay-pal? Do you promise to spend all the money buying
> yourself beer?

I do accept pay-pal, by strange chance and will cheerfully delete one
word out of a 20Kword base for every dime received (and to make it clear
to the list that I've done so, naturally I'll post the diff with the
original as well as the modified rant:-).  I can't promise to spend ALL
of the money buying beer, because my liver is old and has already
tolerated much abuse over many years and I want it to last a few more
decades, but I'll certainly lift a glass t'alla yer health from time to
time...:-)

On the other hand, given my experiences with people sending me free
money via pay-pal up to this point, it would probably be safe to promise
to spend it "all" on beer.  Even my aged liver can tolerate beer by the
thimbleful...if I didn't end up a de facto teetotaller.;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Thu Oct 16 12:02:18 2003
From: becker at scyld.com (Donald Becker)
Date: Thu, 16 Oct 2003 12:02:18 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12>
Message-ID: <Pine.LNX.4.44.0310161131420.3313-100000@training.scyld.com>

On Thu, 16 Oct 2003 graham.mullier at syngenta.com wrote:

> [Hmm, and will the rants be longer or shorter after he's bought the mental
> lubricant?]

Buy the right amount and they will be eloquent enough that you won't
mind, or too much and they will be short and slurred ;-o

> I'm in support of the original rant, however, having had to reverse-engineer
> several data formats in the past. Most recently a set of molecular-orbital
> output data. Very frustrating trying to count through data fields and
> convince myself that we have mapped it correctly.

What you want is not XML, but a data format description language.

When I first read about XML, that what I believed it was.  I was
expecting that file optionally described the data format as a prologue,
and then had a sequence of efficently packed data structures.

But the XML designers created the evil twin of that idea.  The header is
a schema of parser rules, and each data element had verbose syntax that
conveyes little semantic information.  A XML file 
  - is difficult for humans to read, yet is even larger than
  human-oriented output
  - requires both syntax and rule checking after human editing, yet is
    complex for machines to parse. 
  - is intended for large data sets, where the negative impacts are
    multiplied
  - encourages "cdata" shortcuts that bypass the few supposed advantages.

> Anecdote from a different field (weather models) that's related - for a
> while, a weather model used calibration data a bit wrong - sea temperature
> and sea surface wind speed were swapped. All because someone had to look at
> a data dump and guess which column was which.

Versus looking at an XML output and guessing what "load_one" means?
I see very little difference: repeating a low-content label once for
each data element doesn't convey more information.  The only XML adds
here is avoiding miscounting fields for undocumented data structures.

What we really want in both the weather code case and when reporting
cluster statistics is a data format description language.  That
description includes the format of the packed fields, and should
include what the fields mean and their units, which is what we are
missing in both cases.  With such an approach we can efficiently
assemble, transmit and deconstruct packed data while having automatic
tools to check its validity.  And an general-purpose tools can even
combine a descrition and compact data set to product XML.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Thu Oct 16 16:12:13 2003
From: rodmur at maybe.org (Dale Harris)
Date: Thu, 16 Oct 2003 13:12:13 -0700
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310161131420.3313-100000@training.scyld.com>
References: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> <Pine.LNX.4.44.0310161131420.3313-100000@training.scyld.com>
Message-ID: <20031016201213.GV8116@maybe.org>

On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated:
> 
> What you want is not XML, but a data format description language.
> 


I think the S-expression guys would say that they have one.  And it is
supermon uses, FWIW.


http://sexpr.sourceforge.net/

http://supermon.sourceforge.net/


(supermon pages are currently unavailable.)


Dale
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Thu Oct 16 16:52:03 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: Thu, 16 Oct 2003 16:52:03 -0400
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031016201213.GV8116@maybe.org>
References: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12>
	 <Pine.LNX.4.44.0310161131420.3313-100000@training.scyld.com>
	 <20031016201213.GV8116@maybe.org>
Message-ID: <1066337523.11093.20.camel@roughneck.liniac.upenn.edu>

On Thu, 2003-10-16 at 16:12, Dale Harris wrote:
> On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated:
> > 
> > What you want is not XML, but a data format description language.
> > 
> 
> 
> I think the S-expression guys would say that they have one.  And it is
> supermon uses, FWIW.
> 
> 
> http://sexpr.sourceforge.net/
> 
> http://supermon.sourceforge.net/

We use supermon as the data gathering mechanism for Clubmask, and I
really like it. You can mask to get just certain values, and it is
_really_ fast.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dtj at uberh4x0r.org  Tue Oct 14 23:31:55 2003
From: dtj at uberh4x0r.org (Dean Johnson)
Date: 14 Oct 2003 22:31:55 -0500
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
Message-ID: <1066188715.3200.120.camel@terra>

On Tue, 2003-10-14 at 19:45, Robert G. Brown wrote:
...
> <rant> 
> 
> ...
>
> </rant>
> 
>     rgb

As someone who has done programming environment tools most of his
reasonably long professional life, I must say you have hit the nail on
the head. I have rooted through more than my share of shitty binary
formats in my day, and I can honestly say that I go home happier as a
result of dealing with an XML trace file in my current project. I was
happily working away dealing with only XML, but then it happened. The
demons of my past rose their ugly heads when I decided that it would be
a good thing to get some ELF information outta some files. Being the
industrious guy I am, I went and got ELF docs from Dave Anderson's
stash. Did that help? Nope, not really, as it was mangled 64-bit focused
ELF. Was it documented? Nope, not really. You could look at the elfdump
code to see what that does, so in a backwards way, it was documented.
The alternative was to ferret out the format by bugging enough compiler
geeks until they gave up the secret handshake. The alternative that I
eventually took was to go lay down until the desire to have the ELF
information went away. ;-)

-- 

	-Dean

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Thu Oct 16 17:36:25 2003
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Thu, 16 Oct 2003 16:36:25 -0500
Subject: OT: same commands to multiple servers?
Message-ID: <20031016163625.C11181@mikee.ath.cx>

I now have control over many AIX servers and I know there
are some programs that allow you (once configured) to send
the same command to multiple nodes/servers, but do these
commands exist within the AIX environment?

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bryce at jfet.net  Thu Oct 16 16:15:08 2003
From: bryce at jfet.net (Bryce Bockman)
Date: Thu, 16 Oct 2003 16:15:08 -0400 (EDT)
Subject: A Petaflop machine in 20 racks?
Message-ID: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>

Hi all,

	Check out this article over at wired:

http://www.wired.com/news/technology/0,1282,60791,00.html

  It makes all sorts of wild claims, but what do you guys think?  
Obviously, there's memory bandwidth limitations due to PCI.  Does anyone 
know anything else about these guys?

Cheers,
Bryce

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Thu Oct 16 17:54:31 2003
From: becker at scyld.com (Donald Becker)
Date: Thu, 16 Oct 2003 17:54:31 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031016201213.GV8116@maybe.org>
Message-ID: <Pine.LNX.4.44.0310161729220.3313-100000@training.scyld.com>

On Thu, 16 Oct 2003, Dale Harris wrote:
> On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated:
> > 
> > What you want is not XML, but a data format description language.
>
> I think the S-expression guys would say that they have one.  And it is
> supermon uses, FWIW.

No, S-expressions are an ancient concept, developed back in the early
days of computing.  They were needed in Lisp to linearize tree
structures so that they could be saved to, uhmm, paper tape or clay tablets.

Sexprs are oriented toward "structured" data.  In this context
"structured" means "Lisp-like linked lists" rather than "a series of 'C'
structs".

More directly related concepts are
   XDR, part of SunRPC
   MPI packed data
   Object brokers
all of which are trying to solve similar problem.  But, except for a few
of the "object broker" systems, they don't have the metadata language to
translate between domains.  For instance, you can't take MPI packed data
and
   automatically convert it to (useful) XML,
   pass it to an object broker system, or
   call a non-MPI remote procedure

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glen at mail.cert.ucr.edu  Thu Oct 16 19:21:58 2003
From: glen at mail.cert.ucr.edu (Glen Kaukola)
Date: Thu, 16 Oct 2003 16:21:58 -0700
Subject: OT: same commands to multiple servers?
In-Reply-To: <20031016163625.C11181@mikee.ath.cx>
References: <20031016163625.C11181@mikee.ath.cx>
Message-ID: <3F8F2816.9030606@cert.ucr.edu>

Mike Eggleston wrote:

>I now have control over many AIX servers and I know there
>are some programs that allow you (once configured) to send
>the same command to multiple nodes/servers, but do these
>commands exist within the AIX environment?
>

No idea if it would work on AIX, but you could try out pconsole:
http://www.heiho.net/pconsole/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Mark at MarkAndrewSmith.co.uk  Thu Oct 16 19:36:23 2003
From: Mark at MarkAndrewSmith.co.uk (Mark Andrew Smith)
Date: Fri, 17 Oct 2003 00:36:23 +0100
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <LMENKNIMPLKLBBCCBFGKOEADCNAA.Mark@MarkAndrewSmith.co.uk>

Comment: As each generation of this chip gets more powerful, in an
exponential way, then clusters of these chips could be used to break
encryption algorithms via brute force approaches.  If this became anywhere
near an outside chance of a possibility of succeeding, or even threat of,
then I would expect Governments to carefully consider export requirements
and restrictions, or even in the extreme, classify it as a military armament
similar to early RSA 128bit software encryption ciphers.

However it could be the dawn of a new architecture for us all.....


Kindest regards,


Mark Andrew Smith
Tel: (01942)722518
Mob: (07866)070122
http://www.MarkAndrewSmith.co.uk/


-----Original Message-----
From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf
Of Bryce Bockman
Sent: 16 October 2003 21:15
To: beowulf at beowulf.org
Subject: A Petaflop machine in 20 racks?


Hi all,

	Check out this article over at wired:

http://www.wired.com/news/technology/0,1282,60791,00.html

  It makes all sorts of wild claims, but what do you guys think?
Obviously, there's memory bandwidth limitations due to PCI.  Does anyone
know anything else about these guys?

Cheers,
Bryce

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


This email has been scanned for viruses by NetBenefit using Sophos
anti-virus technology

---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.525 / Virus Database: 322 - Release Date: 09/10/2003

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.525 / Virus Database: 322 - Release Date: 09/10/2003


This email has been scanned for viruses by NetBenefit using Sophos anti-virus technology


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Thu Oct 16 19:46:19 2003
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 16 Oct 2003 16:46:19 -0700
Subject: A Petaflop machine in 20 racks?
References: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <000f01c3943f$b40ac100$32a8a8c0@laptop152422>

Browsing through ClearSpeed's fairly "content thin" website, one turns up
the following:
http://www.clearspeed.com/downloads/overview_cs301.pdf
The CS302 has an array of 64 processors and 256Kbytes of memory in the array
+ 128 Kbytes SRAM on chip.  That's 4 Kbytes/processor (much like a cache)..
It doesn't say how many bits wide each processor is, though..
51.2 Gbyte/sec bandwidth is quoted.. that's 800 Mbyte/sec per processor,
which is a reasonable sort of rate.

10 microsecond 1K complex FFTs are reasonably fast, but without knowing how
many bits, it's hard to say whether it's outstanding.
It also doesn't say whether the architecture is, for instance, SIMD.  It
could well be a systolic array, which would be very well suited to cranking
out FFTs or other similar things, but probably not so hot for general
purpose crunching.

For all their vaunted patent and IP portfolio, they have only one patent
listed in the USPTO database under their own name, and that's some sort of
DRAM.

----- Original Message -----
From: "Bryce Bockman" <bryce at jfet.net>
To: <beowulf at beowulf.org>
Sent: Thursday, October 16, 2003 1:15 PM
Subject: A Petaflop machine in 20 racks?


> Hi all,
>
> Check out this article over at wired:
>
> http://www.wired.com/news/technology/0,1282,60791,00.html
>
>   It makes all sorts of wild claims, but what do you guys think?
> Obviously, there's memory bandwidth limitations due to PCI.  Does anyone
> know anything else about these guys?
>
> Cheers,
> Bryce
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Thu Oct 16 21:23:57 2003
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Thu, 16 Oct 2003 21:23:57 -0400 (EDT)
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <Pine.LNX.4.44.0310162103280.27590-100000@boltzmann.basement-supercomputing.com>


Looking at the standard "we have the solution to everyones computing needs
press release" a few things are clear:

"... multi-threaded array processor ..."

which is further verified later in the press release:

"... where the CS301 is acting as a co-processor, dynamic libraries
offload an application's inner loops to the CS301. Although these inner
loops only make up a small portion of the source code, these loops are
responsible for the vast majority of the application's running time. By
offloading the inner loops, the CS301 can bypass the traditional
bottleneck caused by a CPU's limited mathematical capability..."

It seems to be a low power array processor which may be of some real value
to some people. The real issue is can they keep pace in terms of cost and
performance with the commodity CPU market. And what about code
portability. Quite a few people have spent quite a lot of time porting and
tweaking codes for architectures that seemed to have a rather short lived
history.

Of course, there is no hardware yet. 

Doug


On Thu, 16 Oct 2003, Bryce Bockman wrote:

> Hi all,
> 
> 	Check out this article over at wired:
> 
> http://www.wired.com/news/technology/0,1282,60791,00.html
> 
>   It makes all sorts of wild claims, but what do you guys think?  
> Obviously, there's memory bandwidth limitations due to PCI.  Does anyone 
> know anything else about these guys?
> 
> Cheers,
> Bryce
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Fri Oct 17 03:48:17 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Fri, 17 Oct 2003 09:48:17 +0200
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <000f01c3943f$b40ac100$32a8a8c0@laptop152422>
References: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net> <000f01c3943f$b40ac100$32a8a8c0@laptop152422>
Message-ID: <200310170948.17224.joachim@ccrl-nece.de>

Jim Lux:
> It also doesn't say whether the architecture is, for instance, SIMD.  It
> could well be a systolic array, which would be very well suited to cranking
> out FFTs or other similar things, but probably not so hot for general
> purpose crunching.

Exactly. Such coprocessor-boards (typically DSP-based, which also achieve some 
GFlop/s) already exist for a long time, but obviously are not suited to 
change "the way we see computing" (place your marketing slogan here). 

One reason is the lack of portability for code making use of such hardware, 
but I think if the performance for a wider range of applications would 
effectively come anywhere close to the peak performance, this problem would 
be overcome by the premise of getting teraflop-performance for some 10k of $.

Thus, the problem probably is that typical applications do not achieve the 
promised performance. All memory-bound applications will get stuck on the 
PCI-bus, by both, memory access latency and bandwidth. High sustained 
performance for real problems can, in the general case, only be achieved in a 
balanced system.

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Fri Oct 17 04:23:46 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Fri, 17 Oct 2003 10:23:46 +0200
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310161729220.3313-100000@training.scyld.com>
References: <Pine.LNX.4.44.0310161729220.3313-100000@training.scyld.com>
Message-ID: <200310171023.46865.joachim@ccrl-nece.de>

Donald Becker:
> More directly related concepts are
>    XDR, part of SunRPC
>    MPI packed data

Hmm, as you note below, they both do not describe the data they handle, just 
transform in into a uniform representation. 

>    Object brokers
> all of which are trying to solve similar problem.  But, except for a few
> of the "object broker" systems, they don't have the metadata language to
> translate between domains.  For instance, you can't take MPI packed data
> and
>    automatically convert it to (useful) XML,
>    pass it to an object broker system, or
>    call a non-MPI remote procedure

You might want to check HDF5, or for a simpler yet widely used approach, 
NetCDF. They are self-describing file formats. But as you can send everything 
via the net the same way you access it in a file, this should be useful. 

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cap at nsc.liu.se  Fri Oct 17 04:40:56 2003
From: cap at nsc.liu.se (Peter Kjellstroem)
Date: Fri, 17 Oct 2003 10:40:56 +0200 (CEST)
Subject: OT: same commands to multiple servers?
In-Reply-To: <20031016163625.C11181@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0310171036480.5214-100000@papput.nsc.liu.se>

There is something called dsh (distributed shell) part of some IBM 
package. The guys at llnl has done further work in this direction with 
pdsh which I belive runs fine on AIX. pdsh can be found at:
 http://www.llnl.gov/linux/pdsh/

/Peter

On Thu, 16 Oct 2003, Mike Eggleston wrote:

> I now have control over many AIX servers and I know there
> are some programs that allow you (once configured) to send
> the same command to multiple nodes/servers, but do these
> commands exist within the AIX environment?
> 
> Mike

-- 
------------------------------------------------------------
  Peter Kjellstroem              | 
  National Supercomputer Centre  | 
  Sweden                         | http://www.nsc.liu.se


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From scheinin at crs4.it  Fri Oct 17 04:35:48 2003
From: scheinin at crs4.it (Alan Scheinine)
Date: Fri, 17 Oct 2003 10:35:48 +0200
Subject: A Petaflop machine in 20 racks?
Message-ID: <200310170835.h9H8ZmY02530@dali.crs4.it>


   I have not read carefully descriptions of the Opteron architecture
until a few minutes ago.  I was not able to find a picture of the
layout in silicon at the AMD site, I found a picture at Tom's Hardware.

 http://www.tomshardware.com/cpu/20030422/opteron-04.html

The page before shows that 50 percent of the silicon is cache.
Of what is not cache, it seems that the floating point unit
occupies about 1/6 or 1/7th of the area, moreover, the authors
Frank Voelkel, Thomas Pabst, Bert Toepelt, and Mirko Doelle
describe the Opteron as having three floating point units,
FADD, FMUL and FMISC.  Just counting FADD and FMUL and considering
the entire area of the Opteron, using 2 GHz for the frequency,
that would be about 12 FP units times 2 GHz, 24 GFLOPS.  So it
is doable.  I do not know the depth of the pipeline, but it is
likely it is deep.  How do you keep the pipeline full?  PCI is
around 0.032 Giga floating point words per second?  The entire
memory subsystem needs to be changed drastically.  Moreover,
whereas integer units might be used to solve problems that are
logically complex, floating point problems are typically ones
that use a large amount of data, more than what can fit into cache.

But you-all knew that already,
Alan Scheinin
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 17 09:43:07 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 17 Oct 2003 09:43:07 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310161729220.3313-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.44.0310170933050.9543-100000@lilith.rgb.private.net>

On Thu, 16 Oct 2003, Donald Becker wrote:

> translate between domains.  For instance, you can't take MPI packed data
> and
>    automatically convert it to (useful) XML,
>    pass it to an object broker system, or
>    call a non-MPI remote procedure

Yes indeedy.  And since XML is at heart linked lists (trees) of structs
as well, you still can't get around the difficulty of mapping a
previously unseen data file containing XMLish into a set of efficiently
accessible structs.  Which is doable, but is a royal PITA and requires
that you maintain DISTINCT (and probably non-portable)
images/descriptions of the data structures and then write all this glue
code to import and export.

So yeah, I have fantasies of ways of encapsulating C header files and a
data dictionary in an XMLified datafile and a toolset that at the very
least made it "easy" to relink a piece of C code to read in the datafile
and just put the data into the associated structs where I could
subsequently use them EFFICIENTLY by local or global name.  I haven't
managed to make this really portable even in my own code, though -- it
isn't an easy problem (so difficult that ad hoc workarounds seem the
simpler route to take).

This really needs a committee or something and a few zillion NSF dollars
to resolve, because it is a fairly serious and widespread problem.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 17 09:29:47 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 17 Oct 2003 09:29:47 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <1066188715.3200.120.camel@terra>
Message-ID: <Pine.LNX.4.44.0310170802010.9543-100000@lilith.rgb.private.net>

On 14 Oct 2003, Dean Johnson wrote:

> As someone who has done programming environment tools most of his
> reasonably long professional life, I must say you have hit the nail on
> the head. I have rooted through more than my share of shitty binary
> formats in my day, and I can honestly say that I go home happier as a
> result of dealing with an XML trace file in my current project. I was
> happily working away dealing with only XML, but then it happened. The
> demons of my past rose their ugly heads when I decided that it would be
> a good thing to get some ELF information outta some files. Being the
> industrious guy I am, I went and got ELF docs from Dave Anderson's
> stash. Did that help? Nope, not really, as it was mangled 64-bit focused
> ELF. Was it documented? Nope, not really. You could look at the elfdump
> code to see what that does, so in a backwards way, it was documented.
> The alternative was to ferret out the format by bugging enough compiler
> geeks until they gave up the secret handshake. The alternative that I
> eventually took was to go lay down until the desire to have the ELF
> information went away. ;-)

And yet Don's points are also very good ones, although I think that is
at least partly a matter of designer style.  XML isn't, after all, a
markup language -- it is a markup language specification.  As an
interface designer, you can implement tags that are reasonably human
readable and well-separated in function or not.  His observation that
what one would REALLY like is a self-documenting interface, or an
interface with its data dictionary included as a header, is very
apropos.  I also >>think<< that he is correct (if I understood his final
point correctly) in saying that someone could sit down and write an
XML-compliant "DML" (data markup language) with straightforward and
consistent rules for encapsulating data streams.

Since those rules would presumably be laid down in the design phase, and
since a wise implementation of them would release a link-level library
with prebuilt functions for creating a new data file and its embedded
data dictionary, writing data out to the file, opening the file, and
reading/parsing data in from the file, it would actually reduce the
amount of wheel reinventing (and very tedious coding!) that has to be
done now while creating/enforcing a fairly rigorous structural
organization on the data itself.

One has to be very careful not to assume that XML will necessarily make
a data file tremendously longer than it likely is now.  For short files
nobody (for the most part) cares, where by short I mean short enough
that file latencies dominate the read time -- using long very
descriptive tags is easy in configuration files.  For longer data files
(which humans cannot in general "read" anyway unless they have a year or
so to spare) there is nothing to prevent XMLish of the following sort of
very general structure:

<?xml version="1.0"?>
<dml>
 <description>
This is part of the production data of Joe's Orchards.  Eat Fruit from
Joe's!
 </description>
 <dict>
  <line id="0">
   <field id="0"><name>apples</name><fmt>%-10.6f</fmt><units>bushels</units></field>
   <field id="1"><sep>|</sep></field>
   <field id="2"><name>oranges</name><fmt>%-12.5e</fmt><units>crates</units></field>
   <field id="3"><sep>|</sep></field>
   <field id="4"><name>price</name><fmt>%-10.2f</fmt><units>dollars</units></field>
  </line>
 </dict>
 <data>
13.400000  |77.00000e+2 |450.00
589.200000 |102.00000e+8|6667.00
...
 </data>
</dml>

The stuff between the <data> tags could even be binary.  Note that the
data itself isn't individually wrapped and tagged, so this might be a
form of XML heresy, but who cares?  For a configuration file or a
small/short data file containing numbers that humans might want to
browse/read without an intermediary software layer, I would say this is
a bad thing, but for a 100 MB data file (a few million lines of data)
the overhead introduced by adding the XML encapsulation and dictionary
is utterly ignorable and the mindless repetition of tags in the
datastream itself pointless.

Note well that this encapsulation is STILL nearly perfectly human
readable, STILL easily machine parseable, and will still be both in
twenty years after Joe's Orchard has been cut down and turned into
firewood (or would be, if Joe had bothered to tell us a bit more about
the database in question in the description).  The data can even be
"validated", if the associated library has appropriate functions for
doing so (which are more or less the data reading functions anyway, with
error management).  I should note that the philosophy above might be
closer to that of e.g. TeX/LaTeX than XML/SGML/MML (as discussed below).

I've already done stuff somewhat LIKE this (without the formal data
dictionary, because I haven't taken the time to write a general purpose
tool for my own specific applications, which is likely a mistake in the
long run but in the long run, after all, I'll be dead:-) in wulfstat.
The .wulfhosts xml permits a cluster to be entered "all at once" using a
format like:

         <hostrange>
           <hostfmt>g%02d</hostfmt>
           <imin>1</imin>
           <imax>15</imax>
         </hostrange>

which is used to generate the hostname strings required to open
connections to hosts e.g. g01, g02, ... g15.  Obviously the same trick
could be used to feed scanf, or to feed a regex parser.

The biggest problem I have with XML as a data description/configuration
file base isn't really details like these, as I think they are all
design decisions and can be done poorly or done well.  It is that on the
parsing end, libxml2 DOES all of the above, more or less.  It generates
on the fly a linked list that mirrors the XML source, and then provides
tools and a consistent framework of rules for walking the list to find
your data.  How else could it do it?  The one parser has to read
arbitrary markup, and it cannot know what the markup is until opens the
file, and it opens/reads the file in one pass, so all it can do is mosey
along and generate recursive structs and link them.

However, that is NOT how one wants to access the data in code that wants
to be efficient.  Walking a llist to find a float data entry that has a
tag name that matches "a" and an index attribute that matches "32912" is
VERY costly compared to accessing a[32912].  At this point, the only
solution I've found is to know what the data encapsulation is (easy,
since I created it:-), create my own variables and structs to hold it
for actual reference in code, open and read in the xml data, and then
walk the list with e.g. xpath and extract the data from the list and
repack it into my variables and structs.

This latter step really sucks.  It is very, very tedious (although
perfectly straightforward to write the parsing/repacking code (so much
so that the libxml guy "apologizes" for the tedium of the parsing code
in the xml.org documentation:-).  It is this latter step that could be
really streamlined by the use of an xmlified data dictionary or even (in
the extreme C case) encapsulating the actual header file with the
associated variable struct definitions.

It is interesting and amusing to compare two different approaches to the
same problem in applications where the issue really is "markup" in a
sense.  I write lots of things using latex, because with latex one can
write equations in a straightforward ascii encoding like $1 =
\sin^2(\theta) + \cos^2(\theta)$.  This input is taken out of an ascii
stream by the tex parser, tokenized and translated into characters, and
converted into an actual equation layout according to the prescriptions
in a (the latex) style file plus any layered modifications I might
impose on top of it.

[Purists could argue about whether or not latex is a true markup language
-- tex/latex are TYPESETTING languages and not really intended to
support other functions (such as translating this equation into an
internal algebraic form in a computer algebra program such as macsyma or
maple).  However, even though it probably isn't, because every ENTITY
represented in the equation string isn't individually tagged wrt
function, it certainly functions like markup at a high level with
entries entered inside functional delimiters and presented in a
way/style that is associated with the delimiters "independent" of the
delimiters themselves.]

If one compares this to the same equation wrapped in MML (math markup
language, which I don't know well enough to be able to reproduce here)
it would likely occupy twenty or thirty lines of markup and be utterly
unreadable by humans.  At least "normal" humans.  Machines, however,
just love it, as one can write a parser that can BOTH display the
equation AND can create the internal objects that permit its
manipulation algebraically and/or numerically.  This would be difficult
to do with the latex, because who knows what all these components are?
Is \theta a constant, a label, a variable?  Are \sin and \cos variables,
functions, or is \s the variable and in a string (do I mean
s*i*n*(theta) where all the objects are variables)?  The equation that
is HUMAN readable and TYPESETTABLE without ambiguity with a style file
and low level definition that recognizes these elements as
non-functional symbols of certain size and shape to be assembled
according to the following rules is far from adequately described for
doing math with it.

For all that, one could easily write an XML compliant LML -- "latex
markup language" -- a perfectly straightforward translation of the
fundamental latex structures into XML form.  Some of these could be
utterly simple (aside for dealing with special character issues:

{\em emphasized text} ->  <em>emphasized text</em>
\begin{equation}a = b+c\end{equation} -> <equation>a = b+c</equation>

linuxdoc is very nearly this translation, actually, except that it
doesn't know how to handle equation content AFAIK.  This sort of
encapsulation is highly efficient for document creation/typesetting
within a specific domain, but less general purpose.

The point is <beep>.... [the following text that isn't there was omitted
in the fond hope that my paypal account will swell, following which I
will make a trip to a purveyor of fine beverages.]

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jac67 at georgetown.edu  Fri Oct 17 09:41:19 2003
From: jac67 at georgetown.edu (Jess Cannata)
Date: Fri, 17 Oct 2003 09:41:19 -0400
Subject: OT: same commands to multiple servers?
In-Reply-To: <20031016163625.C11181@mikee.ath.cx>
References: <20031016163625.C11181@mikee.ath.cx>
Message-ID: <3F8FF17F.3030008@georgetown.edu>

Mike Eggleston wrote:

>I now have control over many AIX servers and I know there
>are some programs that allow you (once configured) to send
>the same command to multiple nodes/servers, but do these
>commands exist within the AIX environment?
>  
>

I'm not sure it will run on AIX, but we use C3 from Oak Ridge National 
Laboratory on all of our Linux Beowulf clusters, and I really like it. 
You might want to take a look at it:

http://www.csm.ornl.gov/torc/C3/index.html

-- 
Jess Cannata
Advanced Research Computing
Georgetown University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From czarek at sun1.chem.univ.gda.pl  Thu Oct 16 18:49:27 2003
From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski)
Date: Fri, 17 Oct 2003 00:49:27 +0200 (CEST)
Subject: Pentium4 vs Xeon
In-Reply-To: <Pine.SGI.4.58.0310141226530.3997086@hppc.fnal.gov>
Message-ID: <Pine.LNX.4.44.0310170030040.31398-100000@sun1.chem.univ.gda.pl>

On Tue, 14 Oct 2003, Don Holmgren wrote:

> Pricewise (YMMV), cheap desktop P4's can be had very roughly for half
> the price of a comparable dual Xeon.

This is true if you look at pricewatch, but the quotes I received shown
that good P4's is less than half of the price (in my case around 36%) of a
comparable dual Xeon.  I am talking about comparison of the price of Asus
PC-DL Dual Xeon 2.8 GHz 512K 533 FSB with 3 GB DDR333 and two 36GB SATA
10K RPM hardrives against Asus P4P800-VM P4 2.8 GHz 800 FSB with 1.5 GB
DDR 400 and one 36GB SATA 10K RPM hardrive. Xeons machines are not very
popular and it is hard to get a good price for them at your local shop (in
my case Ithaca US, in Poland difference would be even bigger).

I am benchmarking this P4 2.8 GHz against dual Opteron 1400MHz, dual
Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+). If you are interested 
in some numbers I can send benchmarks of Gaussian 03, Gamess, and our own 
F77 code.

				czarek

----------------------------------------------------------------------
                     Dr. Cezary Czaplewski
Department of Chemistry                  Box 431 Baker Lab of Chemistry
University of Gdansk                     Cornell University
Sobieskiego 18, 80-952 Gdansk, Poland    Ithaca, NY 14853
phone: +48 58 3450-430                   phone: (607) 255-0556
fax: +48 58 341-0357                     fax: (607) 255-4137 
e-mail: czarek at chem.univ.gda.pl     e-mail: cc178 at cornell.edu
----------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bropers at lsu.edu  Fri Oct 17 03:20:54 2003
From: bropers at lsu.edu (Brian D. Ropers-Huilman)
Date: Fri, 17 Oct 2003 20:20:54 +1300
Subject: OT: same commands to multiple servers?
In-Reply-To: <20031016163625.C11181@mikee.ath.cx>
References: <20031016163625.C11181@mikee.ath.cx>
Message-ID: <3F8F9856.60602@lsu.edu>

I have administered over 100 AIX boxes for a living for over 5 years now. The 
tool of choice for me is dsh, which ships as part of the PSSP LPP, a canned 
implementation of Kerberos 4. We simply install the ssp.clients fileset on 
each node and use our control workstation as the Kerberos realm master. We add 
the external nodes by hand.

I know that dsh is open sourced now and available at:

    http://dsh.sourceforge.net/

There are several other cheap (as in Libris) solutions as well:

1) Use rsh (with TCPwrappers)
2) Use ssh with a password-less key
3) Write your own code around either of the above
4) Implement Kerberos, either as an LPP from IBM, or get the source and 
compile yourself

I think you'll find dsh a good starting point though.

Mike Eggleston wrote:

> I now have control over many AIX servers and I know there
> are some programs that allow you (once configured) to send
> the same command to multiple nodes/servers, but do these
> commands exist within the AIX environment?
> 
> Mike
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
Brian D. Ropers-Huilman                        (225) 578-0461 (V)
Systems Administrator                 AIX      (225) 578-6400 (F)
Office of Computing Services       GNU Linux   brian at ropers-huilman.net
High Performance Computing            .^.      http://www.ropers-huilman.net/
Fred Frey Building, Rm. 201, E-1Q     /V\                          \o/
Louisiana State University           (/ \)           --  __o   /    |
Baton Rouge, LA 70803-1900           (   )          --- `\<,  /    `\\,
                                      ^^-^^              O/ O /     O/ O

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Daniel.Kidger at quadrics.com  Fri Oct 17 10:07:07 2003
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Fri, 17 Oct 2003 15:07:07 +0100
Subject: OT: same commands to multiple servers?
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE1FD@stegosaurus.bristol.quadrics.com>

Consider also pdsh:
   http://www.llnl.gov/linux/pdsh/

It is an open source varient of IBM's dsh
builds on Linux (IA32/IA64, etc.), AIX et al.
 

Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


> -----Original Message-----
> From: Jess Cannata [mailto:jac67 at georgetown.edu]
> Sent: 17 October 2003 14:41
> To: Mike Eggleston
> Cc: beowulf at beowulf.org
> Subject: Re: OT: same commands to multiple servers?
> 
> 
> Mike Eggleston wrote:
> 
> >I now have control over many AIX servers and I know there
> >are some programs that allow you (once configured) to send
> >the same command to multiple nodes/servers, but do these
> >commands exist within the AIX environment?
> >  
> >
> 
> I'm not sure it will run on AIX, but we use C3 from Oak Ridge 
> National 
> Laboratory on all of our Linux Beowulf clusters, and I really 
> like it. 
> You might want to take a look at it:
> 
> http://www.csm.ornl.gov/torc/C3/index.html
> 
> -- 
> Jess Cannata
> Advanced Research Computing
> Georgetown University
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eccf at super.unam.mx  Fri Oct 17 12:42:15 2003
From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores)
Date: Fri, 17 Oct 2003 10:42:15 -0600 (CST)
Subject: RLX?
In-Reply-To: <200310170846.h9H8kbA29081@NewBlue.scyld.com>
Message-ID: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>


Have you ever try or test RLX server for HPC?
What is their performance?


cafe


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Fri Oct 17 14:05:49 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Fri, 17 Oct 2003 11:05:49 -0700
Subject: POVray, beowulf, etc.
Message-ID: <5.2.0.9.2.20031017105940.03129888@mailhost4.jpl.nasa.gov>

I'm aware of some MPI-aware POVray stuff, but is there anything out there 
that can facilitate something where you want to render a sequence of frames 
(using, e.g., POVray), one frame to a processor, then gather the images 
back to a head node for display, in quasi-real time.

For instance, say you had a image that takes 1 second to render, and you 
had 30 processors free to do the rendering.  Assuming you set everything up 
ahead of time, it should be possible to set all the processors spinning, 
and feeding the rendered images back to a central point where they can be 
displayed as an animation at 30 fps (with a latency of 1 second)

Obviously, the other approach is to have each processor render a part of 
the image, and assemble them all, but it seems that this might actually be 
slower overall, because you've got the image assembling time added.

I'm looking for a way to do some real-time visualization of modeling 
results as opposed to a batch oriented "render farm", so it's the pipeline 
to gather the rendered images from the nodes to the display node that I'm 
interested in. I suppose one could write a little MPI program that gathers 
the images up as bitmaps and feeds them to a window, but, if someone has 
already solved this in a reasonably facile and elegant way, why not use it.


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From johnb at quadrics.com  Fri Oct 17 12:01:12 2003
From: johnb at quadrics.com (John Brookes)
Date: Fri, 17 Oct 2003 17:01:12 +0100
Subject: OT: same commands to multiple servers?
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E328@stegosaurus.bristol.quadrics.com>

How are the startup times of IBM's dsh these days? I seem to remember that
it was somewhat on the slow side on big machines. Many moons have passed
since I was last on an AIX machine, though, so I assume the situation's
improved drastically.

Cheers,

John Brookes
Quadrics 

> -----Original Message-----
> From: Brian D. Ropers-Huilman [mailto:bropers at lsu.edu]
> Sent: 17 October 2003 08:21
> To: Mike Eggleston
> Cc: beowulf at beowulf.org
> Subject: Re: OT: same commands to multiple servers?
> 
> 
> I have administered over 100 AIX boxes for a living for over 
> 5 years now. The 
> tool of choice for me is dsh, which ships as part of the PSSP 
> LPP, a canned 
> implementation of Kerberos 4. We simply install the 
> ssp.clients fileset on 
> each node and use our control workstation as the Kerberos 
> realm master. We add 
> the external nodes by hand.
> 
> I know that dsh is open sourced now and available at:
> 
>     http://dsh.sourceforge.net/
> 
> There are several other cheap (as in Libris) solutions as well:
> 
> 1) Use rsh (with TCPwrappers)
> 2) Use ssh with a password-less key
> 3) Write your own code around either of the above
> 4) Implement Kerberos, either as an LPP from IBM, or get the 
> source and 
> compile yourself
> 
> I think you'll find dsh a good starting point though.
> 
> Mike Eggleston wrote:
> 
> > I now have control over many AIX servers and I know there
> > are some programs that allow you (once configured) to send
> > the same command to multiple nodes/servers, but do these
> > commands exist within the AIX environment?
> > 
> > Mike
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> --
> Brian D. Ropers-Huilman                        (225) 578-0461 (V)
> Systems Administrator                 AIX      (225) 578-6400 (F)
> Office of Computing Services       GNU Linux   
> brian at ropers-huilman.net
> High Performance Computing            .^.      
http://www.ropers-huilman.net/
Fred Frey Building, Rm. 201, E-1Q     /V\                          \o/
Louisiana State University           (/ \)           --  __o   /    |
Baton Rouge, LA 70803-1900           (   )          --- `\<,  /    `\\,
                                      ^^-^^              O/ O /     O/ O

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Fri Oct 17 14:38:41 2003
From: becker at scyld.com (Donald Becker)
Date: Fri, 17 Oct 2003 14:38:41 -0400 (EDT)
Subject: RLX?
In-Reply-To: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
Message-ID: <Pine.LNX.4.44.0310171352560.3313-100000@training.scyld.com>

On Fri, 17 Oct 2003, Eduardo Cesar Cabrera Flores wrote:

> Have you ever try or test RLX server for HPC?

Yes, we had access to their earliest machines and I was there at the
NYC announcement.

> What is their performance?

It depends on the generation.

The first generation was great at what it was designed to do: pump out
data, such as static web pages, from memory to two 100Mbps Ethernet
ports per blade.  It used Transmeta chips, 2.5" laptop drives and
fans only on the chassis to fit 24 blades in 3U.
The blades didn't do well at computational tasks or disk I/O.  A third
Ethernet port on each blade was connected to an internal repeater.  They
could only PXE boot using that port, making a flow-controlled boot
server important.

The second generation switched to Intel ULV (Ultra Low Voltage)
processors in the 1GHz range.  This approximately doubled the speed over
Transmeta chips, especially with floating point.  But ULV CPUs are
designed for laptops, and the interconnect was no faster.  Thus this
still was not a computational cluster box.

The current generation blades are much faster, with full speed (and
heat) CPUs and chipset, fast interconnect and good I/O potential.

But lets look at the big picture for HPC cluster packaging:
  --> Beowulf clusters have crossed the density threshold <--
This happened about two years ago.

At the start of the Beowulf project a legitimate problem with clusters
was the low physical density.  This didn't matter in some installations,
as much larger machines were retired leaving plenty of empty space, but
it was a large (pun intended) issue for general use.

As we evolved to 1U rack-mount servers, the situation changed.  Starting
with the API CS-20, Beowulf cluster hardware met and even exceeded the
compute/physical density of contemporary air-cooled Crays.

Since standard 1U dual processor machines can now exceed the air cooled
thermal density supported by an average room, selecting non-standard
packaging (blades, back-to-back mounting, or vertical motherboard
chassis) must be motivated by some other consideration that justifies
the lock-in and higher cost.  At least with blade servers there are a
few opportunities:
   Low-latency backplane communication
   Easier connections to shared storage
   Hot-swap capability to add nodes or replace failed hardware


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Fri Oct 17 14:37:22 2003
From: angel at wolf.com (Angel Rivera)
Date: Fri, 17 Oct 2003 18:37:22 GMT
Subject: RLX?
In-Reply-To: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx> 
References: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
Message-ID: <20031017183722.754.qmail@houston.wolf.com>

Eduardo Cesar Cabrera Flores writes: 

> 
> Have you ever try or test RLX server for HPC?
> What is their performance? 
> 

We have not but will be getting a couple of bricks for testing soon.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Fri Oct 17 14:37:22 2003
From: angel at wolf.com (Angel Rivera)
Date: Fri, 17 Oct 2003 18:37:22 GMT
Subject: RLX?
In-Reply-To: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx> 
References: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
Message-ID: <20031017183722.754.qmail@houston.wolf.com>

Eduardo Cesar Cabrera Flores writes: 

> 
> Have you ever try or test RLX server for HPC?
> What is their performance? 
> 

We have not but will be getting a couple of bricks for testing soon.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Oct 17 15:57:24 2003
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 17 Oct 2003 21:57:24 +0200 (CEST)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310170802010.9543-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310172149340.30441-100000@druifje.clustervision.com>

I just saw YAML announced on www.ntk.net

http://www.yaml.org
YAML (rhymes with camel) is a sraightforward machine parsable
data serialization format designed for human readability and
interaction with scripting languages such as Perl and Python.
YAML is optimised for serialization , configuration settings,
log files, Internet messaging ad filtering.

There are YAML writers and parsers fo Perl, Python, Java, Ruby and C.


Sounds like it might be good for the purposes we are discussing!


BTW, has anyon experimented with Beep for messaging system status,
environment variables, logging etc?
http://www.beepcore.org


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From srihari at mpi-softtech.com  Fri Oct 17 15:34:42 2003
From: srihari at mpi-softtech.com (Srihari Angaluri)
Date: Fri, 17 Oct 2003 15:34:42 -0400
Subject: POVray, beowulf, etc.
References: <5.2.0.9.2.20031017105940.03129888@mailhost4.jpl.nasa.gov>
Message-ID: <3F904452.4090406@mpi-softtech.com>

Jim,
   Not sure if you came across the parallel ray tracer application 
written using MPI. This does real-time rendering.

http://jedi.ks.uiuc.edu/~johns/raytracer/

Jim Lux wrote:
> I'm aware of some MPI-aware POVray stuff, but is there anything out 
> there that can facilitate something where you want to render a sequence 
> of frames (using, e.g., POVray), one frame to a processor, then gather 
> the images back to a head node for display, in quasi-real time.
> 
> For instance, say you had a image that takes 1 second to render, and you 
> had 30 processors free to do the rendering.  Assuming you set everything 
> up ahead of time, it should be possible to set all the processors 
> spinning, and feeding the rendered images back to a central point where 
> they can be displayed as an animation at 30 fps (with a latency of 1 
> second)
> 
> Obviously, the other approach is to have each processor render a part of 
> the image, and assemble them all, but it seems that this might actually 
> be slower overall, because you've got the image assembling time added.
> 
> I'm looking for a way to do some real-time visualization of modeling 
> results as opposed to a batch oriented "render farm", so it's the 
> pipeline to gather the rendered images from the nodes to the display 
> node that I'm interested in. I suppose one could write a little MPI 
> program that gathers the images up as bitmaps and feeds them to a 
> window, but, if someone has already solved this in a reasonably facile 
> and elegant way, why not use it.
> 
> 
> James Lux, P.E.
> Spacecraft Telecommunications Section
> Jet Propulsion Laboratory, Mail Stop 161-213
> 4800 Oak Grove Drive
> Pasadena CA 91109
> tel: (818)354-2075
> fax: (818)393-6875
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Oct 17 16:19:15 2003
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 17 Oct 2003 22:19:15 +0200 (CEST)
Subject: Also on NTK
Message-ID: <Pine.LNX.4.44.0310172215360.30604-100000@druifje.clustervision.com>

Sorry if this is off topic too far.
Also on NTK, an implementation of zeroconf
for Linux, Windows, BSD
http://www.swampwolf.com/products/howl/GettingStarted.html

Anyone care to speculate on uses for zeroconf in big clusters?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathog at mendel.bio.caltech.edu  Fri Oct 17 16:47:08 2003
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Fri, 17 Oct 2003 13:47:08 -0700
Subject: When is cooling air cool enough?
Message-ID: <E1AAbV6-0001su-00@mendel.bio.caltech.edu>

Most computer rooms shuttle the air back and forth
between the computers and the A/C.  I'm
wondering if one could not construct a less expensive
facility (less power running the A/C which is rarely
on, smaller A/C units) if the computer room was a
lot more like a wind tunnel: ambient air in (after
filtering out any dust or rain),
pass it through the computers, and then blow it out
the other side of the building.   Note the room
wouldn't be wide open like a normal computer room.
Instead essentially each rack and other largish
computer unit would sit in its own separate air flow,
so that hot air from one wouldn't heat the next.

The question is, how hot can the cooling air be and
still keep the computers happy?

The answer will determine how big an A/C unit is
needed to handle cooling the intake air for those
times when it exceeds this upper limit.

I'm guessing that so long as a lot of air is moving through
the computers most would be ok in a sustained 30C (86F) flow.  
Remember, this isn't 30C in dead air, it's 30C with high
pressure on the intake side of the computer and low
pressure on the outlet side, so that the generated heat
is rapidly moved out of the computer and away.  (But not
so much flow as to blow cards out of their sockets!)
Somewhere between 30C and 40C one might expect poorly
ventilated CPUs and disks to begin to have problems.  Above
40C seems a tad too warm.  At that temperature it's going
to be pretty uncomfortable for the operators too.

Anybody have a good estimate for what this upper limit is.
For instance, from a computer room with an A/C that failed
slowly?

There's clearly a lower temperature limit too.  However on cold
days opening a feedback duct from the outlet back into the intake
should do the trick.  In really cold climates the intake
duct might be closed entirely - when it's 20 below outside.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Fri Oct 17 19:45:24 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Fri, 17 Oct 2003 16:45:24 -0700
Subject: When is cooling air cool enough?
In-Reply-To: <E1AAbV6-0001su-00@mendel.bio.caltech.edu>
Message-ID: <5.2.0.9.2.20031017162321.031343c0@mailhost4.jpl.nasa.gov>

For component life, colder is better (10 degrees is factor of 2 
life/reliability), and the temperature rise inside the box is probably more 
than you think.

You also have some more subtle tradeoffs to address.

You don't need as much colder air as warmer air to remove some quantity of 
heat, and a significant energy cost is pushing the air around (especially 
since the work involved in running the fan winds up heating the air being 
moved).

This is a fairly standard HVAC design problem.

The additional cost to cool the room to, say, 15C instead of 20C is fairly 
low, if the room is insulated, and there's a lot of recirculation (which is 
typical for this kind of thing). It's not like you're cooling the room 
repeatedly after warming up. Once you've reached equilibrium, cooling the 
mass of equipment down, you're moving the same number of joules of heat 
either way and the refrigeration COP doesn't change much over that small a 
temperature range.  The heat leakage through the walls is fairly small, 
compared to the heat dissipated in the equipment.

If you were cooling something that doesn't generate heat itself (i.e. a 
wine cellar or freezer), then the temperature does affect the power consumed.

This all said, I worked for a while on a fairly complex electronic system 
installed at a test facility on a ridge on the island of Kauai, and they 
had no airconditioning. They had big fans and thermostatically controlled 
louvers, and could show that statistically, the air temperature never went 
high enough to cause a problem.  I seem to recall something like the 
calculations showed we'd have to shut down for environmental reasons no 
more than once every 5 years.  Humidity is an issue also, though.


At 01:47 PM 10/17/2003 -0700, David Mathog wrote:
>Most computer rooms shuttle the air back and forth
>between the computers and the A/C.  I'm
>wondering if one could not construct a less expensive
>facility (less power running the A/C which is rarely
>on, smaller A/C units) if the computer room was a
>lot more like a wind tunnel: ambient air in (after
>filtering out any dust or rain),
>pass it through the computers, and then blow it out
>the other side of the building.   Note the room
>wouldn't be wide open like a normal computer room.
>Instead essentially each rack and other largish
>computer unit would sit in its own separate air flow,
>so that hot air from one wouldn't heat the next.
>
>The question is, how hot can the cooling air be and
>still keep the computers happy?
>
>The answer will determine how big an A/C unit is
>needed to handle cooling the intake air for those
>times when it exceeds this upper limit.
>
>I'm guessing that so long as a lot of air is moving through
>the computers most would be ok in a sustained 30C (86F) flow.
>Remember, this isn't 30C in dead air, it's 30C with high
>pressure on the intake side of the computer and low
>pressure on the outlet side, so that the generated heat
>is rapidly moved out of the computer and away.  (But not
>so much flow as to blow cards out of their sockets!)
>Somewhere between 30C and 40C one might expect poorly
>ventilated CPUs and disks to begin to have problems.  Above
>40C seems a tad too warm.  At that temperature it's going
>to be pretty uncomfortable for the operators too.
>
>Anybody have a good estimate for what this upper limit is.
>For instance, from a computer room with an A/C that failed
>slowly?
>
>There's clearly a lower temperature limit too.  However on cold
>days opening a feedback duct from the outlet back into the intake
>should do the trick.  In really cold climates the intake
>duct might be closed entirely - when it's 20 below outside.
>
>Thanks,
>
>David Mathog
>mathog at caltech.edu
>Manager, Sequence Analysis Facility, Biology Division, Caltech
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Oct 17 21:41:49 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 17 Oct 2003 18:41:49 -0700
Subject: RLX?
In-Reply-To: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
References: <200310170846.h9H8kbA29081@NewBlue.scyld.com> <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
Message-ID: <20031018014149.GB3774@greglaptop.PEATEC.COM>

On Fri, Oct 17, 2003 at 10:42:15AM -0600, Eduardo Cesar Cabrera Flores wrote:

> Have you ever try or test RLX server for HPC?
> What is their performance?

.. what's their price/performance? That decides against them for most
of us el-cheapo HPC customers. RLX has some nice features for enterprise
computing that may justify a higher cost for enterprises, but...

-- greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 17 21:11:39 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 17 Oct 2003 21:11:39 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310172149340.30441-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.44.0310172110120.10139-100000@lilith.rgb.private.net>

On Fri, 17 Oct 2003, John Hearns wrote:

> I just saw YAML announced on www.ntk.net
> 
> http://www.yaml.org

yaml.org doesn't resolve for me in nameservice (yet), but whoa, dude,
rippin' ntk site.  That's one very seriously geeked news site.

   rgb

> YAML (rhymes with camel) is a sraightforward machine parsable
> data serialization format designed for human readability and
> interaction with scripting languages such as Perl and Python.
> YAML is optimised for serialization , configuration settings,
> log files, Internet messaging ad filtering.
> 
> There are YAML writers and parsers fo Perl, Python, Java, Ruby and C.
> 
> 
> Sounds like it might be good for the purposes we are discussing!
> 
> 
> 
> BTW, has anyon experimented with Beep for messaging system status,
> environment variables, logging etc?
> http://www.beepcore.org
> 
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Oct 17 21:39:57 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 17 Oct 2003 18:39:57 -0700
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
References: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <20031018013957.GA3774@greglaptop.PEATEC.COM>

On Thu, Oct 16, 2003 at 04:15:08PM -0400, Bryce Bockman wrote:
> Hi all,
> 
> 	Check out this article over at wired:
> 
> http://www.wired.com/news/technology/0,1282,60791,00.html
> 
>   It makes all sorts of wild claims, but what do you guys think?  

I think it's the Return of the Array Processor.

There's very little new in computing these days -- and it has the
usual flaws of APs: low bandwidth communication to the host.

So if you have a problem that actually fits in the limited memory, and
doesn't need to communicate with anyone else very often, it may be a
win for you.

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 17 21:21:42 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 17 Oct 2003 21:21:42 -0400 (EDT)
Subject: When is cooling air cool enough?
In-Reply-To: <E1AAbV6-0001su-00@mendel.bio.caltech.edu>
Message-ID: <Pine.LNX.4.44.0310172112380.10139-100000@lilith.rgb.private.net>

On Fri, 17 Oct 2003, David Mathog wrote:

> Most computer rooms shuttle the air back and forth
> between the computers and the A/C.  I'm
> wondering if one could not construct a less expensive
> facility (less power running the A/C which is rarely
> on, smaller A/C units) if the computer room was a
> lot more like a wind tunnel: ambient air in (after
> filtering out any dust or rain),
> pass it through the computers, and then blow it out
> the other side of the building.   Note the room
> wouldn't be wide open like a normal computer room.
> Instead essentially each rack and other largish
> computer unit would sit in its own separate air flow,
> so that hot air from one wouldn't heat the next.
> 
> The question is, how hot can the cooling air be and
> still keep the computers happy?

I personally have strong feelings about this, although there probably
are sites out there with hard data and statistics and engineering
recommendations.

70F or cooler would be my recommendation.  In fact, cooler would be my
recommendation -- 60F would be better still.

I think the number is every 10F costs roughly a year of component life
in the 60-80F ranges and even brief periods where the temperature at the
intake gets significantly above 80F makes it uncomfortably likely that
some component is damaged enough to fail within a year.

> The answer will determine how big an A/C unit is
> needed to handle cooling the intake air for those
> times when it exceeds this upper limit.

It costs roughly $1/watt/year to feed AND cool a computer, order of
$100-150/cpu/year, with about 1/4 of that for cooling per se.  The
computer itself costs anywhere from $500 lowball to a couple of thousand
per CPU (more if you have an expensive network).  The HUMAN cost of
screwing around with broken hardware can be crushing, and high
temperatures are an open invitation for hardware to break a lot more
often (and it breaks all too often at LOW temperatures).  It just isn't
worth it.

> 
> I'm guessing that so long as a lot of air is moving through
> the computers most would be ok in a sustained 30C (86F) flow.  
> Remember, this isn't 30C in dead air, it's 30C with high
> pressure on the intake side of the computer and low
> pressure on the outlet side, so that the generated heat
> is rapidly moved out of the computer and away.  (But not
> so much flow as to blow cards out of their sockets!)
> Somewhere between 30C and 40C one might expect poorly
> ventilated CPUs and disks to begin to have problems.  Above
> 40C seems a tad too warm.  At that temperature it's going
> to be pretty uncomfortable for the operators too.

So an 86F wind keeps YOU cool in the summer time?  Only because you're
damp on the outside and evaporating sweat cools you.  Think 86F humid,
and you're only at 98F at core.  The CPU is considerably hotter, and is
cooled by the temperature DIFFERENCE.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Fri Oct 17 21:41:25 2003
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 17 Oct 2003 21:41:25 -0400
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <200310170846.h9H8kGA29022@NewBlue.scyld.com>
References: <200310170846.h9H8kGA29022@NewBlue.scyld.com>
Message-ID: <20031018014123.GB4857@piskorski.com>

> > http://www.wired.com/news/technology/0,1282,60791,00.html

> From: "Jim Lux" <james.p.lux at jpl.nasa.gov>
> Subject: Re: A Petaflop machine in 20 racks?
> Date: Thu, 16 Oct 2003 16:46:19 -0700
>
> Browsing through ClearSpeed's fairly "content thin" website, one turns up
> the following:
> http://www.clearspeed.com/downloads/overview_cs301.pdf

> It also doesn't say whether the architecture is, for instance, SIMD.  It
> could well be a systolic array, which would be very well suited to cranking
> out FFTs or other similar things, but probably not so hot for general
> purpose crunching.

If it is SIMD, this sounds rather reminiscent of the streaming
supercomputer designs people hope to build using SIMD commodity GPU
(Graphics Processing Unit) chips, and Peter Schroeder's 2002 "Hacking
the GPU" class at CalTech.  I don't know much of anything about it,
but these older links made for some interesting reading:

  http://www.cs.caltech.edu/courses/cs101.3/
  http://www.cs.caltech.edu/cspeople/faculty/schroder_p.html

  http://merrimac.stanford.edu/whitepaper.pdf
  http://merrimac.stanford.edu/resources.html

  http://graphics.stanford.edu/~hanrahan/talks/why/

I am really not clear how any of that relates to vector co-processor
add-on cards like the older design mentioned here (I think FPGA
based):

  http://aggregate.org/ECard/

nor to newer MIMD to SIMD compiling technology (and parallel
"nanoprocessors"!) like this:

  http://aggregate.org/KYARCH/

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Fri Oct 17 21:41:25 2003
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 17 Oct 2003 21:41:25 -0400
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <200310170846.h9H8kGA29022@NewBlue.scyld.com>
References: <200310170846.h9H8kGA29022@NewBlue.scyld.com>
Message-ID: <20031018014123.GB4857@piskorski.com>

> > http://www.wired.com/news/technology/0,1282,60791,00.html

> From: "Jim Lux" <james.p.lux at jpl.nasa.gov>
> Subject: Re: A Petaflop machine in 20 racks?
> Date: Thu, 16 Oct 2003 16:46:19 -0700
>
> Browsing through ClearSpeed's fairly "content thin" website, one turns up
> the following:
> http://www.clearspeed.com/downloads/overview_cs301.pdf

> It also doesn't say whether the architecture is, for instance, SIMD.  It
> could well be a systolic array, which would be very well suited to cranking
> out FFTs or other similar things, but probably not so hot for general
> purpose crunching.

If it is SIMD, this sounds rather reminiscent of the streaming
supercomputer designs people hope to build using SIMD commodity GPU
(Graphics Processing Unit) chips, and Peter Schroeder's 2002 "Hacking
the GPU" class at CalTech.  I don't know much of anything about it,
but these older links made for some interesting reading:

  http://www.cs.caltech.edu/courses/cs101.3/
  http://www.cs.caltech.edu/cspeople/faculty/schroder_p.html

  http://merrimac.stanford.edu/whitepaper.pdf
  http://merrimac.stanford.edu/resources.html

  http://graphics.stanford.edu/~hanrahan/talks/why/

I am really not clear how any of that relates to vector co-processor
add-on cards like the older design mentioned here (I think FPGA
based):

  http://aggregate.org/ECard/

nor to newer MIMD to SIMD compiling technology (and parallel
"nanoprocessors"!) like this:

  http://aggregate.org/KYARCH/

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Fri Oct 17 23:15:21 2003
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 17 Oct 2003 23:15:21 -0400
Subject: When is cooling air cool enough?
In-Reply-To: <200310180131.h9I1VYA16665@NewBlue.scyld.com>
References: <200310180131.h9I1VYA16665@NewBlue.scyld.com>
Message-ID: <20031018031519.GB19525@piskorski.com>

> From: "David Mathog" <mathog at mendel.bio.caltech.edu>
> Date: Fri, 17 Oct 2003 13:47:08 -0700

> if the computer room was a lot more like a wind tunnel: ambient air
> in (after filtering out any dust or rain), pass it through the
> computers, and then blow it out the other side of the building.

> The question is, how hot can the cooling air be and still keep the
> computers happy?

That sounds like a pretty neat undergraduate heat transfer homework
problem.  No seriously, since you're at a university, if you want a
rough estimate go over to the Chemical Engineering department and
borrow their heat transfer textbook, or better, borrow somebody to set
up the problem and calculate it for you.  That could work, although
what assumptions to make might be sticky.

It's been too many years since I've forgotten all that, so perhaps
fortunately, I don't quite remember where my old undergrad heat
transfer book is right now anyway.  :)

> I'm guessing that so long as a lot of air is moving through
> the computers most would be ok in a sustained 30C (86F) flow.  

But I bet the other respondants were right when they said that's
probably too hot...

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Fri Oct 17 23:15:21 2003
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 17 Oct 2003 23:15:21 -0400
Subject: When is cooling air cool enough?
In-Reply-To: <200310180131.h9I1VYA16665@NewBlue.scyld.com>
References: <200310180131.h9I1VYA16665@NewBlue.scyld.com>
Message-ID: <20031018031519.GB19525@piskorski.com>

> From: "David Mathog" <mathog at mendel.bio.caltech.edu>
> Date: Fri, 17 Oct 2003 13:47:08 -0700

> if the computer room was a lot more like a wind tunnel: ambient air
> in (after filtering out any dust or rain), pass it through the
> computers, and then blow it out the other side of the building.

> The question is, how hot can the cooling air be and still keep the
> computers happy?

That sounds like a pretty neat undergraduate heat transfer homework
problem.  No seriously, since you're at a university, if you want a
rough estimate go over to the Chemical Engineering department and
borrow their heat transfer textbook, or better, borrow somebody to set
up the problem and calculate it for you.  That could work, although
what assumptions to make might be sticky.

It's been too many years since I've forgotten all that, so perhaps
fortunately, I don't quite remember where my old undergrad heat
transfer book is right now anyway.  :)

> I'm guessing that so long as a lot of air is moving through
> the computers most would be ok in a sustained 30C (86F) flow.  

But I bet the other respondants were right when they said that's
probably too hot...

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From czarek at sun1.chem.univ.gda.pl  Sat Oct 18 00:52:59 2003
From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski)
Date: Sat, 18 Oct 2003 06:52:59 +0200 (CEST)
Subject: some ab initio benchmarks
In-Reply-To: <Pine.LNX.4.44.0310170030040.31398-100000@sun1.chem.univ.gda.pl>
Message-ID: <Pine.LNX.4.44.0310180606300.6403-100000@sun1.chem.univ.gda.pl>


Hi,

quite recently I did some benchmarks of P4 2.8 GHz against dual Opteron
1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+) and some
older machines. For comparison I am including benchmarks of dual P3 512
1200MHz I got from Wayne Fisher, The University of Texas at Dallas. On
Opteron I have also tried PC GAMESS program which I received from Alex
Granovsky.


1. Single point HF energy calculation for Ace-Gly-NMe in 6-31G*
    (155 basis functions)

g03: mem=100MW TEST 6-31G* nosym  scf=(tight,incore)
gamess: MEMORY=20000000 DIRSCF=.TRUE.			  [sec]

 itek       g03    Itanium2 1400MHz          efc 7.1      26.5
 prototype  g03    p4 512   2800MHz           pgi4        41.1
 dahlia     g03    Opteron  1400MHz           pgi4        49.5
 m211       g03    k7mp     2133MHz(MP 2600+) pgi4        83.3
 Wayne      g03    p3 512   1200MHz           pgi4        85
 m211       gamess k7mp     2133MHz(MP 2600+) ifc7.1      92.5
 prototype  gamess p4 512   2800MHz           ifc7.1     106.5 
 dahlia     PCgamess   Opteron   1400MHz	         112.9
 dahlia     gamess Opteron  1400MHz           ifc7.1     128.5
 itek       gamess Itanium2 1400MHz          efc 7.1     150.8

2. Single point MP2 energy calculation for Ace-Gly-NMe in 6-31G*
     (155 basis functions)

g03: mem=100mw rwf=a,250MW,b,250MW,c,250MW TEST rmp2/6-31G* nosym 
scf=(tight,incore)
MaxDisk=750MW
gamess: MEMORY=50000000 DIRSCF=.TRUE.

 itek       g03    Itanium2 1400MHz           efc 7.1     51.7
 prototype  g03    p4 512   2800MHz	       pgi4      111.0
 dahlia     g03    Opteron  1400MHz            pgi4      150.7
 m211       gamess k7mp     2133MHz(MP 2600+)  ifc7.1    154.2
 prototype  gamess p4 512   2800MHz            ifc7.1    157.0
 dahlia     PCgamess   Opteron   1400MHz		 163.8
 dahlia     gamess Opteron  1400MHz            ifc7.1    191.0
 itek       gamess Itanium2 1400MHz           efc 7.1    194.8
 m211       g03    k7mp     2133MHz(MP 2600+)  pgi4      251.6
 Wayne      g03    p3 512   1200MHz            pgi4      303

3. Manfreds Gaussian Benchmark
http://www.chemie.uni-dortmund.de/groups/ocb/projekte/mg98b.html

243 basis functions 399 primitive gaussians
RHF/3-21G* Freq
                                                       [sec]

itek       g03     Itanium2  1400MHz    efc 7.1        2843
prototype  g03     p4 512    2800MHz pgi 4	       8084
dahlia     g03     Opteron   1400MHz pgi 4             9332
m211       g03     k7mp 2133MHz(MP 2600+) pgi 4       10289
Wayne      g03     p3 512    1200MHz	pgi 4         12920
galera     g03     p3xenon    700MHz    pgi 3         19317
m001       g03     p3         650MHz    pgi 4         22824

4. test397.com from gaussian03

882 basis functions,  1440 primitive gaussians
rb3lyp/3-21g force test scf=novaracc
                                                      [sec]

itek       g03     Itanium2  1400MHz    efc 7.1       6733
prototype  g03     p4 512    2800MHz	pgi 4        12980
dahlia     g03     Opteron   1400MHz    pgi 4        17879
m211       g03     k7mp 2133MHz(MP 2600+) pgi 4      20521
Wayne      g03     p3 512    1200MHz    pgi 4        24521
galera     g03     p3xenon    700MHz    pgi 3        39353

5. Gaussian calculations of NMR chemical shifts for GlyGlyAlaAla

207 basis functions,   339 primitive gaussians
%MEM=800MB
B3LYP/GEN  NMR                                       [sec]

itek       g03     Itanium2  1400MHz    efc 7.1      275
prototype  g03     p4 512    2800MHz	  pgi 4      614
dahlia     g03     Opteron   1400MHz      pgi 4      849
m211       g03     k7mp 2133MHz(MP 2600+) pgi 4      948
Wayne      g03     p3 512    1200MHz      pgi 4     1134

some details:

g03 is GAUSSIAN 03 rev. B04 with gaussian blas compiled with 32-bit pgi4.0

gamess is  VERSION 6 SEP 2001 (R4) compiled with 32-bit ifc 7.1, for P4 I 
have used additional options -tpp7 -axKW

Opteron (dahlia) had 64bit GinGin64 Linux and I had to use static 32-bit 
binaries.  It should have SuSE Linux Enterprise soon and I will repeat 
tests using PGI 5.0 64-bit compiler when it will be ready.

Itanium2 (itek) uses gamess  VERSION = 14 JAN 2003 (R3) compiled with
64-bit efc and GAUSSIAN 03 rev. B04 with mkl60 compiled with 64-bit efc 
7.1

P3xenon (galera) uses gamess VERSION = 6 SEP 2001 (R4) compiled with ifc
6.0 and GAUSSIAN 03 rev B.01 with gaussian blas compiled with pgi 3.3


                                czarek

----------------------------------------------------------------------
                     Dr. Cezary Czaplewski
Department of Chemistry                  Box 431 Baker Lab of Chemistry
University of Gdansk                     Cornell University
Sobieskiego 18, 80-952 Gdansk, Poland    Ithaca, NY 14853
phone: +48 58 3450-430                   phone: (607) 255-0556
fax: +48 58 341-0357                     fax: (607) 255-4137
e-mail: czarek at chem.univ.gda.pl          e-mail: cc178 at cornell.edu
----------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sun Oct 19 10:39:36 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sun, 19 Oct 2003 22:39:36 +0800 (CST)
Subject: some ab initio benchmarks
In-Reply-To: <Pine.LNX.4.44.0310180606300.6403-100000@sun1.chem.univ.gda.pl>
Message-ID: <20031019143936.29602.qmail@web16807.mail.tpe.yahoo.com>

I have 2 pts:

1. The compilers used across different platforms were
not the same, why not use the Intel compiler for the
P4 as well?

2. What is the working set of the benchmark? If the
benchmark fit in the 6MB on-chip L3 of the Itanium2,
it is very likely to perform very well.

Another benchmark that shows the G5 wins the large
memory case, loses small/medium cases, while the
Itanium2 loses most of its advantages when the working
set does not fit the L3:

http://www.xlr8yourmac.com/G5/G5_fluid_dynamics_bench/G5_fluid_dynamics_bench.html

Andrew.

 --- Cezary Czaplewski <czarek at sun1.chem.univ.gda.pl>
????> 
> Hi,
> 
> quite recently I did some benchmarks of P4 2.8 GHz
> against dual Opteron
> 1400MHz, dual Itanium2 1400MHz and dual k7mp
> 2133MHz(MP 2600+) and some
> older machines. For comparison I am including
> benchmarks of dual P3 512
> 1200MHz I got from Wayne Fisher, The University of
> Texas at Dallas. On
> Opteron I have also tried PC GAMESS program which I
> received from Alex
> Granovsky.
> 
> 
> 1. Single point HF energy calculation for
> Ace-Gly-NMe in 6-31G*
>     (155 basis functions)
> 
> g03: mem=100MW TEST 6-31G* nosym  scf=(tight,incore)
> gamess: MEMORY=20000000 DIRSCF=.TRUE.			  [sec]
> 
>  itek       g03    Itanium2 1400MHz          efc 7.1
>      26.5
>  prototype  g03    p4 512   2800MHz           pgi4  
>      41.1
>  dahlia     g03    Opteron  1400MHz           pgi4  
>      49.5
>  m211       g03    k7mp     2133MHz(MP 2600+) pgi4  
>      83.3
>  Wayne      g03    p3 512   1200MHz           pgi4  
>      85
>  m211       gamess k7mp     2133MHz(MP 2600+) ifc7.1
>      92.5
>  prototype  gamess p4 512   2800MHz           ifc7.1
>     106.5 
>  dahlia     PCgamess   Opteron   1400MHz	        
> 112.9
>  dahlia     gamess Opteron  1400MHz           ifc7.1
>     128.5
>  itek       gamess Itanium2 1400MHz          efc 7.1
>     150.8
> 
> 2. Single point MP2 energy calculation for
> Ace-Gly-NMe in 6-31G*
>      (155 basis functions)
> 
> g03: mem=100mw rwf=a,250MW,b,250MW,c,250MW TEST
> rmp2/6-31G* nosym 
> scf=(tight,incore)
> MaxDisk=750MW
> gamess: MEMORY=50000000 DIRSCF=.TRUE.
> 
>  itek       g03    Itanium2 1400MHz           efc
> 7.1     51.7
>  prototype  g03    p4 512   2800MHz	       pgi4     
> 111.0
>  dahlia     g03    Opteron  1400MHz            pgi4 
>     150.7
>  m211       gamess k7mp     2133MHz(MP 2600+) 
> ifc7.1    154.2
>  prototype  gamess p4 512   2800MHz           
> ifc7.1    157.0
>  dahlia     PCgamess   Opteron   1400MHz		 163.8
>  dahlia     gamess Opteron  1400MHz           
> ifc7.1    191.0
>  itek       gamess Itanium2 1400MHz           efc
> 7.1    194.8
>  m211       g03    k7mp     2133MHz(MP 2600+)  pgi4 
>     251.6
>  Wayne      g03    p3 512   1200MHz            pgi4 
>     303
> 
> 3. Manfreds Gaussian Benchmark
>
http://www.chemie.uni-dortmund.de/groups/ocb/projekte/mg98b.html
> 
> 243 basis functions 399 primitive gaussians
> RHF/3-21G* Freq
>                                                     
>   [sec]
> 
> itek       g03     Itanium2  1400MHz    efc 7.1     
>   2843
> prototype  g03     p4 512    2800MHz pgi 4	      
> 8084
> dahlia     g03     Opteron   1400MHz pgi 4          
>   9332
> m211       g03     k7mp 2133MHz(MP 2600+) pgi 4     
>  10289
> Wayne      g03     p3 512    1200MHz	pgi 4        
> 12920
> galera     g03     p3xenon    700MHz    pgi 3       
>  19317
> m001       g03     p3         650MHz    pgi 4       
>  22824
> 
> 4. test397.com from gaussian03
> 
> 882 basis functions,  1440 primitive gaussians
> rb3lyp/3-21g force test scf=novaracc
>                                                     
>  [sec]
> 
> itek       g03     Itanium2  1400MHz    efc 7.1     
>  6733
> prototype  g03     p4 512    2800MHz	pgi 4       
> 12980
> dahlia     g03     Opteron   1400MHz    pgi 4       
> 17879
> m211       g03     k7mp 2133MHz(MP 2600+) pgi 4     
> 20521
> Wayne      g03     p3 512    1200MHz    pgi 4       
> 24521
> galera     g03     p3xenon    700MHz    pgi 3       
> 39353
> 
> 5. Gaussian calculations of NMR chemical shifts for
> GlyGlyAlaAla
> 
> 207 basis functions,   339 primitive gaussians
> %MEM=800MB
> B3LYP/GEN  NMR                                      
> [sec]
> 
> itek       g03     Itanium2  1400MHz    efc 7.1     
> 275
> prototype  g03     p4 512    2800MHz	  pgi 4     
> 614
> dahlia     g03     Opteron   1400MHz      pgi 4     
> 849
> m211       g03     k7mp 2133MHz(MP 2600+) pgi 4     
> 948
> Wayne      g03     p3 512    1200MHz      pgi 4    
> 1134
> 
> some details:
> 
> g03 is GAUSSIAN 03 rev. B04 with gaussian blas
> compiled with 32-bit pgi4.0
> 
> gamess is  VERSION 6 SEP 2001 (R4) compiled with
> 32-bit ifc 7.1, for P4 I 
> have used additional options -tpp7 -axKW
> 
> Opteron (dahlia) had 64bit GinGin64 Linux and I had
> to use static 32-bit 
> binaries.  It should have SuSE Linux Enterprise soon
> and I will repeat 
> tests using PGI 5.0 64-bit compiler when it will be
> ready.
> 
> Itanium2 (itek) uses gamess  VERSION = 14 JAN 2003
> (R3) compiled with
> 64-bit efc and GAUSSIAN 03 rev. B04 with mkl60
> compiled with 64-bit efc 
> 7.1
> 
> P3xenon (galera) uses gamess VERSION = 6 SEP 2001
> (R4) compiled with ifc
> 6.0 and GAUSSIAN 03 rev B.01 with gaussian blas
> compiled with pgi 3.3
> 
> 
>                                 czarek
> 
>
----------------------------------------------------------------------
>                      Dr. Cezary Czaplewski
> Department of Chemistry                  Box 431
> Baker Lab of Chemistry
> University of Gdansk                     Cornell
> University
> Sobieskiego 18, 80-952 Gdansk, Poland    Ithaca, NY
> 14853
> phone: +48 58 3450-430                   phone:
> (607) 255-0556
> fax: +48 58 341-0357                     fax: (607)
> 255-4137
> e-mail: czarek at chem.univ.gda.pl          e-mail:
> cc178 at cornell.edu
>
----------------------------------------------------------------------
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sun Oct 19 11:37:14 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sun, 19 Oct 2003 23:37:14 +0800 (CST)
Subject: Long lived OpenPBS bug fixed!
Message-ID: <20031019153714.3905.qmail@web16808.mail.tpe.yahoo.com>

All versions of OpenPBS have this problem: the
scheduler uses blocking sockets to contact the nodes,
and if a node is dead, the scheduler hangs for several
minutes, and all user commands will hang (no so
good!).

Scalable PBS finally fixed this problem:

"... In local testing, we are able to issue a 'kill
-STOP' on one node or even all nodes and the
pbs_server daemon continues to be highly responsive to
user commands, scheduler queries, and job
submissions."

http://www.supercluster.org/pipermail/scalablepbsusers/2003-October/000162.html

*Also*, don't miss the Supercluster Newsletter, which
talked about the next generation Maui scheduler called
"Moab":

http://www.supercluster.org/pipermail/scalablepbsusers/2003-October/000132.html

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gmpc at sanger.ac.uk  Sun Oct 19 15:32:42 2003
From: gmpc at sanger.ac.uk (Guy Coates)
Date: Sun, 19 Oct 2003 20:32:42 +0100 (BST)
Subject: RLX
In-Reply-To: <200310181602.h9IG2HA27890@NewBlue.scyld.com>
References: <200310181602.h9IG2HA27890@NewBlue.scyld.com>
Message-ID: <Pine.OSF.4.44.0310191917080.2168354-100000@ecs2d.internal.sanger.ac.uk>

> > Have you ever try or test RLX server for HPC?
> > What is their performance?
>
> .. what's their price/performance?

Well, it all depends.  The performance of the current generation of blade
systems are on a par with 1U systems, and you can now get chassis with
myrinet or SAN connectivity if you need it.

The part of price/performance that tends to get overlooked is
manageability. Do you factor in the time and salaries of you admin staff
who have to look after the thing?

We run clusters with blade servers from various manufacturers (including
RLX) and traditional 1U machines.  The management overhead on blade
systems is significantly lower than for 1U machines, and streets ahead of
"beige boxes on shelves".

On blade systems the network and SAN switching infrastructure is nicely
integrated with the server chassis, and their management interfaces tied in
with OS deployment, remove power management etc.

The difference in management overhead gets more pronounced as your cluster
size increases. The time it takes to look after a 24 node cluster of 1U
boxes isn't going to be that different to the time it takes to look after
24 blades, but running a 1000 blades is much less effort than running a
1000 1U servers.

Whether this actually matters or not depends on your circumstances.

If you have a limitless supply of PhD student slave labour, (eg Virginia
Tech and their G5s), then time and cost of management isn't so much of an
issue. If you have to pay money for your sys-admins and want to run big
clusters, then blades may end up being cost effective.

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eric at fnordsystems.com  Mon Oct 20 04:35:03 2003
From: eric at fnordsystems.com (Eric Kuhnke)
Date: Mon, 20 Oct 2003 01:35:03 -0700
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <5.2.0.9.2.20031020013259.03c0a4e0@216.82.101.6>

Quoting from the article:
An ordinary desktop PC outfitted with six PCI cards, each containing four of the chips, would perform at about 600 gigaflops (or more than half a teraflop). 

Assuming you were to build cluster systems with six PCI cards each, it would require 4U rack cases...  Unless these floating point cards come as low-profile PCI (MD2 form factor)?  

20 racks * 42U per rack = 840U / 4 = 210 nodes, not counting switching equipment.  Petaflop with 210 compute nodes?


At 04:15 PM 10/16/2003 -0400, you wrote:
>Hi all,
>
>        Check out this article over at wired:
>
>http://www.wired.com/news/technology/0,1282,60791,00.html
>
>  It makes all sorts of wild claims, but what do you guys think?  
>Obviously, there's memory bandwidth limitations due to PCI.  Does anyone 
>know anything else about these guys?
>
>Cheers,
>Bryce
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Mon Oct 20 08:16:19 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Mon, 20 Oct 2003 14:16:19 +0200
Subject: some ab initio benchmarks
In-Reply-To: <Pine.LNX.4.44.0310180606300.6403-100000@sun1.chem.univ.gda.pl>
References: <Pine.LNX.4.44.0310170030040.31398-100000@sun1.chem.univ.gda.pl> <Pine.LNX.4.44.0310180606300.6403-100000@sun1.chem.univ.gda.pl>
Message-ID: <20031020121619.GM8711@unthought.net>

On Sat, Oct 18, 2003 at 06:52:59AM +0200, Cezary Czaplewski wrote:
> 
> Hi,
> 
> quite recently I did some benchmarks of P4 2.8 GHz against dual Opteron
> 1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+) and some
> older machines. For comparison I am including benchmarks of dual P3 512
> 1200MHz I got from Wayne Fisher, The University of Texas at Dallas. On
> Opteron I have also tried PC GAMESS program which I received from Alex
> Granovsky.

Could you please specify which version of which operating system was
used for this?

If the kernel does not have NUMA scheduling, the Opterons are severely
disadvantaged - it would be useful to know.

Thank you,

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From richardlbj at yahoo.com  Sat Oct 18 23:56:37 2003
From: richardlbj at yahoo.com (Richard Brown)
Date: Sat, 18 Oct 2003 20:56:37 -0700 (PDT)
Subject: cluseter node freezes while running namd 2.5/2.5b1
Message-ID: <20031019035637.5382.qmail@web41211.mail.yahoo.com>

I have been try to figure this out for the past two
months with no luck.

I have a 8-node PC cluster that consists of 16 athlon
mp2200+, msi k7d master-l mb, intel i82557/i82558
10/100 on-board lan, 500mb kingston ddr266 pc2100
unbuffered, 3com superstack III baseline 24 port
10/100 switch.

The cluster was built using oscar2.1/redhat7.3 w/ the
kernel update 2.4.20-20. namd used includes 2.5b1 and
the latest 2.5, both linux binary distributions and
source code builds. the simulation tested is apoa1
benchmark example.

namd/apoa1 only runs w/o problems on a single cluster
node, either with one or two cpus. Every time it runs
on two or more nodes, either using one or two cpus
from each node, namd/apoa1 stops somewhere in the
middle of run. One of the nodes freezes and does not
respond to ping, ssh or the directly attached
keyboard. Most of the time there were no error
messages. A few times I received apic error or sorcket
receive failure. I tried plugging a ps/2 mouse into
the nodes as some people suggested for a bug of the
motherboad but it did not help.

I don't know how to proceed from here. Any suggestions
would be appreciated.

Thanks,
Richard


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cb4 at tigertiger.de  Sun Oct 19 21:00:53 2003
From: cb4 at tigertiger.de (Christoph Best)
Date: Sun, 19 Oct 2003 21:00:53 -0400
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <20031018013957.GA3774@greglaptop.PEATEC.COM>
References: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
	<20031018013957.GA3774@greglaptop.PEATEC.COM>
Message-ID: <16275.13253.833239.996985@random.tigertiger.de>

 > > http://www.wired.com/news/technology/0,1282,60791,00.html

Greg Lindahl writes:
 > I think it's the Return of the Array Processor.
 > 
 > There's very little new in computing these days -- and it has the
 > usual flaws of APs: low bandwidth communication to the host.
 > 
 > So if you have a problem that actually fits in the limited memory, and
 > doesn't need to communicate with anyone else very often, it may be a
 > win for you.

They actually say in this document
 http://www.clearspeed.com/downloads/overview_cs301.pdf
that the chip can be used as stand-alone processor and resembles a
standard RISC processor. I do not see whether it would be SIMD or MIMD
- the block diagram at least does not show a central control unit
separate from the PEs.

Given the small on-chip memory, they will have to connect external
memory. The thing that would worry me is that the external machine
balance is 32 Flops/Word (on 32-bit words), so it will only be useful
for applications that do a lot of operations inside a few 100Kb of
memory.

IBM is following a slightly different approach with the QCDOC and
BlueGene/L supercomputers which are based on systems-on-a-chip where
they put a two PowerPC cores and all support logic on a single chip,
wire it up with one or two GB of memory and connect a lot (64K) of
these chips together. They expect 5.5 GFlops/s per node peak and to
have 360 TFlops operational in 2004/5 (in 64 racks). You would need 
about 200 racks to get to a PetaFlops machine...
  http://sc-2002.org/paperpdfs/pap.pap207.pdf
  http://www.arxiv.org/abs/hep-lat/0306023
[QCDOC is a Columbia University project in collaboration with IBM -
IBM is transitioning the technology from high-energy physics to
biology which makes a lot of sense... :-)]

To put 64 processors on a chip, I am sure ClearSpeed have to sacrifice
a lot in memory and functionality/programmability, and who wins in
this tradeoff remains to be seen. Depends on the application, too, of
course.

BTW, who or what is behind ClearSpeed? Their Bristol address is
identical to Infineon's Design Centre there, and Hewlett Packard seems
to have a lab there, too. If they have that kind of support, I am sure
they thought hard before making these design choices, and it may just
be tarketed at certain problems (vector/matrix/FFT-like stuff).

-Christoph
-- 
Christoph Best                                      cbst at tigertiger.de
Bioinformatics group, LMU Muenchen                http://tigertiger.de/cb
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mof at labf.org  Mon Oct 20 10:13:56 2003
From: mof at labf.org (Mof)
Date: Mon, 20 Oct 2003 23:43:56 +0930
Subject: Solaris Fire Engine.
Message-ID: <200310202343.56524.mof@labf.org>

http://www.theregister.co.uk/content/61/33440.html

... "We worked hard on efficiency, and we now measure, at a given network 
workload on identical x86 hardware, we use 30 percent less CPU than Linux."

Mof.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Mon Oct 20 11:17:24 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 20 Oct 2003 11:17:24 -0400
Subject: cluseter node freezes while running namd 2.5/2.5b1
In-Reply-To: <20031019035637.5382.qmail@web41211.mail.yahoo.com>
References: <20031019035637.5382.qmail@web41211.mail.yahoo.com>
Message-ID: <3F93FC84.7020808@scalableinformatics.com>

Hi Richard:

  Are your Intel network drivers up to date?  Check on the Intel site.  
If only one node repeatedly freezes (the same node), you might look at 
taking it out of the cluster, and seeing if that improves the 
situation.  If it does, swap the one you took out, with one that is 
still in there, and see if the problem returns.  This will help you 
determine if the problem is node based or system based.

Joe

Richard Brown wrote:

>I have been try to figure this out for the past two
>months with no luck.
>
>I have a 8-node PC cluster that consists of 16 athlon
>mp2200+, msi k7d master-l mb, intel i82557/i82558
>10/100 on-board lan, 500mb kingston ddr266 pc2100
>unbuffered, 3com superstack III baseline 24 port
>10/100 switch.
>
>The cluster was built using oscar2.1/redhat7.3 w/ the
>kernel update 2.4.20-20. namd used includes 2.5b1 and
>the latest 2.5, both linux binary distributions and
>source code builds. the simulation tested is apoa1
>benchmark example.
>
>namd/apoa1 only runs w/o problems on a single cluster
>node, either with one or two cpus. Every time it runs
>on two or more nodes, either using one or two cpus
>from each node, namd/apoa1 stops somewhere in the
>middle of run. One of the nodes freezes and does not
>respond to ping, ssh or the directly attached
>keyboard. Most of the time there were no error
>messages. A few times I received apic error or sorcket
>receive failure. I tried plugging a ps/2 mouse into
>the nodes as some people suggested for a bug of the
>motherboad but it did not help.
>
>I don't know how to proceed from here. Any suggestions
>would be appreciated.
>
>Thanks,
>Richard
>
>
>__________________________________
>Do you Yahoo!?
>The New Yahoo! Shopping - with improved product search
>http://shopping.yahoo.com
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  
>

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jschauma at netbsd.org  Mon Oct 20 11:03:50 2003
From: jschauma at netbsd.org (Jan Schaumann)
Date: Mon, 20 Oct 2003 11:03:50 -0400
Subject: New tech-cluster mailing list for NetBSD
Message-ID: <20031020150350.GA26140@netmeister.org>

Hello,

A new tech-cluster at netbsd.org mailing list has been created.  As the
name suggests, this list is intended for technical discussions on
building and using clusters of NetBSD hosts.  Initially, this list is
expected to be of low volume, but we hope to advocate and advance the
use of NetBSD in such environments significantly.

Subscription is via majordomo -- please see
http://www.NetBSD.org/MailingLists/ for details.

-Jan

-- 
http://www.netbsd.org -
         Multiarchitecture OS, no hype required.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Oct 20 14:03:23 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 20 Oct 2003 14:03:23 -0400 (EDT)
Subject: Solaris Fire Engine.
In-Reply-To: <200310202343.56524.mof@labf.org>
Message-ID: <Pine.LNX.4.44.0310201351400.1674-100000@training.scyld.com>

On Mon, 20 Oct 2003, Mof wrote:

> http://www.theregister.co.uk/content/61/33440.html
> 
> ... "We worked hard on efficiency, and we now measure, at a given network 
> workload on identical x86 hardware, we use 30 percent less CPU than Linux."

Linux uses much more CPU per packet than it used to.  The structural
change for IPtable/IPchains capability is very expensive, even when it
is not used.  And there have been substantial, CPU-costly changes to protect
against denial-of-service attacks at many levels.  The only protocol
stack changes that might benefit cluster use are sendfile/zero-copy, and
that doesn't apply to most current hardware or typical cluster message
passing.

I would be technially easy to revert to the interface of old Linux
kernels and see much better than a 30% CPU reduction, but it's very
unlikely that would happen politically: Linux development is
feature-driven, not performance-driven.  And that's easy to understand
when your pet feature is at stake, or there is a news story of "Linux
Kernel Vulnerable to <obscure attack #452844>".


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kinghorn at pqs-chem.com  Mon Oct 20 13:36:28 2003
From: kinghorn at pqs-chem.com (Donald B. Kinghorn)
Date: Mon, 20 Oct 2003 12:36:28 -0500
Subject: parllel eigen solvers
Message-ID: <200310201236.28901.kinghorn@pqs-chem.com>

Does anyone know of any recent progress on parallel eigensolvers suitable for 
beowulf clusters running over gigabit ethernet?

 It would be nice to have something that scaled moderately well and at least 
gave reasonable approximations to some subset of eigenvalues and vectors for 
large (10,000x10,000) symmetric systems.

My interests are primarily for quantum chemistry.

It's pretty obvious that you can compute eigenvectors in parallel after you 
get the eigenvalues but it would be nice to get eigenvalues mostly in 
parallel requiring maybe just a couple of serial iterates ...

Best regards to all
-Don
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From czarek at sun1.chem.univ.gda.pl  Mon Oct 20 15:08:21 2003
From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski)
Date: Mon, 20 Oct 2003 21:08:21 +0200 (CEST)
Subject: some ab initio benchmarks
In-Reply-To: <20031020121619.GM8711@unthought.net>
Message-ID: <Pine.LNX.4.44.0310201459170.27198-100000@sun1.chem.univ.gda.pl>

On Mon, 20 Oct 2003, Jakob Oestergaard wrote:

> Could you please specify which version of which operating system was
> used for this?

Opteron machine (dahlia) was a prototype which dr Paulette Clancy got for
evaluation from local computer shop. It had RedHat GinGin 64 operating
system preistalled when I did testing.

> If the kernel does not have NUMA scheduling, the Opterons are severely
> disadvantaged - it would be useful to know.

I don't remember which kernel was installed when I did benchmarks, I
suppose standard kernel which is coming with GinGin64. Machine should
have SuSE installed now so I cannot check it. I will repeat benchmarks
with PGI 5 64bit compiler and SuSE when I will have some time. 


				czarek


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Mon Oct 20 17:50:56 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 20 Oct 2003 14:50:56 -0700
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <16275.13253.833239.996985@random.tigertiger.de>
References: <20031018013957.GA3774@greglaptop.PEATEC.COM>
 <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
 <20031018013957.GA3774@greglaptop.PEATEC.COM>
Message-ID: <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov>

At 09:00 PM 10/19/2003 -0400, Christoph Best wrote:


>BTW, who or what is behind ClearSpeed? Their Bristol address is
>identical to Infineon's Design Centre there, and Hewlett Packard seems
>to have a lab there, too. If they have that kind of support, I am sure
>they thought hard before making these design choices, and it may just
>be tarketed at certain problems (vector/matrix/FFT-like stuff).

Off their web site...http://www.clearspeed.com/about.php?team
The CEO and president are marketing oriented (CEO: "he focused on taking 
new technologies to market", President: "..successfully grown glabal sales 
mangement and field application organizations and instrumental in creating 
key partnership agreements".

The CTO (Ray McConnell) does parallel processing with 300K processors, etc. 
VP Engr (Russell David) designed mixed signal baseband ICs for wireless 
market.  I didn't turn up any papers in the IEEE on-line library, but 
that's not particularly signficant, in and of itself.

McConnell has a paper 
http://www.hotchips.org/archive/hc11/hc11pres_pdf/hc99.s3.2.McConnell.pdf 
shows architectures from PixelFusion, Ltd... SIMD core with 32 bit embedded 
processor running a 256 PE "Fuzion block". Each PE has an 8 bit ALU and 
2kByte PE memory... (sound familiar?)
 From "Hot Chips 99"


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Mon Oct 20 18:46:31 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Tue, 21 Oct 2003 00:46:31 +0200 (CEST)
Subject: Solaris Fire Engine.
In-Reply-To: <Pine.LNX.4.44.0310201351400.1674-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.44.0310210036060.17148-100000@kenzo.iwr.uni-heidelberg.de>

On Mon, 20 Oct 2003, Donald Becker wrote:

> The only protocol stack changes that might benefit cluster use are
> sendfile/zero-copy, and that doesn't apply to most current hardware or
> typical cluster message passing.

Has actually somebody tried to use sendfile in MPICH or LAM-MPI ? I 
planned to do it, but this is somewhere in the middle of my always growing 
TODO queue... Recipes for how to use it were posted a few times at least 
on netdev list, so those interested can find them easily.

> I would be technially easy to revert to the interface of old Linux
> kernels and see much better than a 30% CPU reduction, but it's very
> unlikely that would happen politically:

But there are many projects that live outside the official kernel, the 
Scyld network drivers being one good example. What's wrong with replacing 
the IP stack with one maintained separately with performance in mind ?
I agree though that this would mean somebody to take care of it and make 
sure that it works with newer kernels...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Oct 20 19:08:12 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 20 Oct 2003 19:08:12 -0400 (EDT)
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>

On Mon, 20 Oct 2003, Trent Piepho wrote:

> I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> messages I've sent to this list has started bouncing back to me from
> dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> day.  Is anyone else having this problem?

Oh yes, and it is a SERIOUS problem.  I was just mulling on the right
procmail recipe to consign this domain to the dark depths of hell, but
if it were done at the list level instead it would only be a good thing.
My .procmailrc is already getting quite long indeed.

BTW, you (and of course the rest of the list) are just the man to ask;
what is the status of Opterons and fortran compilers.  I myself don't
use fortran any more, but a number of folks at Duke do, and they are
starting to ask what the choices are for Opterons.  A websearch reveals
that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
Opteron fortran, but rumor also suggests that a number of these are
really "beta" quality with bugs that may or may not prove fatal to any
given project.  Then there is Gnu.

Any comments on any of these from you (or anybody, really)?  Is there a
functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
Do the compilers permit access to large (> 3GB) memory, do they optimize
the use of that memory, do they support the various SSE instructions?

I'm indirectly interested in this as it looks like I'm getting Opterons
for my next round of cluster purchases personally, although I'll be
using C on them (hopefully 64 bit Gnu C).

   rgb

> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Mon Oct 20 18:52:03 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: Mon, 20 Oct 2003 18:52:03 -0400
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
References: 	 <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
Message-ID: <1066690323.7027.17.camel@roughneck.liniac.upenn.edu>

On Mon, 2003-10-20 at 18:41, Trent Piepho wrote:
> I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> messages I've sent to this list has started bouncing back to me from
> dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> day.  Is anyone else having this problem?

Yes -- quite annoying :/

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Mon Oct 20 18:41:51 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Mon, 20 Oct 2003 15:41:51 -0700 (PDT)
Subject: flood of bounces from postmaster@systemsfirm.net
Message-ID: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>

I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
messages I've sent to this list has started bouncing back to me from
dan at systemsfirm.com.  I'm getting about ten copies of each one every other
day.  Is anyone else having this problem?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Oct 20 20:08:31 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 20 Oct 2003 20:08:31 -0400 (EDT)
Subject: Solaris Fire Engine.
In-Reply-To: <Pine.LNX.4.44.0310210036060.17148-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.LNX.4.44.0310201957050.1674-100000@training.scyld.com>

On Tue, 21 Oct 2003, Bogdan Costescu wrote:
> On Mon, 20 Oct 2003, Donald Becker wrote:
> 
> > The only protocol stack changes that might benefit cluster use are
> > sendfile/zero-copy, and that doesn't apply to most current hardware or
> > typical cluster message passing.
> 
> Has actually somebody tried to use sendfile in MPICH or LAM-MPI ? I 
> planned to do it, but this is somewhere in the middle of my always growing 
> TODO queue... Recipes for how to use it were posted a few times at least 
> on netdev list, so those interested can find them easily.

The trick is to
   memory map a file
   use that memory region as message buffers
   send the message buffers using sendfile()

My belief is that the page locking involved with sendfile() would be too
costly for anything smaller than about 32KB.  While I'm certain that
there are a few MPI applications that use messages that large, they
don't seem to be typical.

> But there are many projects that live outside the official kernel, the 
> Scyld network drivers being one good example. What's wrong with replacing 
> the IP stack with one maintained separately with performance in mind ?
> I agree though that this would mean somebody to take care of it and make 
> sure that it works with newer kernels...

>From my experience trying to keep the network driver interface stable, I
very much doubt that it would be possible to separately maintain a
network protocol stack.  Especially since it would be perceived as
competition with the in-kernel version, which brings out the worst
behavior...

As a specific example, a few years ago we had cluster performance
patches for the 2.2 kernel.  Even while the 2.3.99 development was going
on, the 2.2 kernel changed too quickly to keep those patches up to date
and tested.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cb4 at tigertiger.de  Mon Oct 20 19:31:33 2003
From: cb4 at tigertiger.de (Christoph Best)
Date: Tue, 21 Oct 2003 01:31:33 +0200
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov>
References: <20031018013957.GA3774@greglaptop.PEATEC.COM>
	<Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
	<5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov>
Message-ID: <16276.28757.858683.189030@random.tigertiger.de>

Jim Lux writes:
 > At 09:00 PM 10/19/2003 -0400, Christoph Best wrote:
 > >BTW, who or what is behind ClearSpeed? Their Bristol address is
 > >identical to Infineon's Design Centre there, and Hewlett Packard seems
 > >to have a lab there, too. If they have that kind of support, I am sure
 > >they thought hard before making these design choices, and it may just
 > >be tarketed at certain problems (vector/matrix/FFT-like stuff).
 > 
 > The CTO (Ray McConnell) does parallel processing with 300K processors, etc. 
 > VP Engr (Russell David) designed mixed signal baseband ICs for wireless 
 > market.  I didn't turn up any papers in the IEEE on-line library, but 
 > that's not particularly signficant, in and of itself.

I actually found some more info about them: Clearspeed used to be
Pixelfusion, a spin-off from Inmos, who made the original Transputer.
  http://www.eetimes.com/sys/news/OEG20010524S0044
Clearspeed tried to design a SIMD processor called Fuzion for graphics
applications, then around 2001 turned to the networking sector, and
now it seems to high-performance computing. So its a processor in
search of an application. 
  http://www.eetimes.com/semi/news/OEG20000208S0039
  http://www.eetimes.com/semi/news/OEG19990512S0012
Poor guys went through at least three CEOs during the last four
years...
-Christoph
-- 
Christoph Best                                      cbst at tigertiger.de
Bioinformatics group, LMU Muenchen                http://tigertiger.de/cb
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Mon Oct 20 20:33:23 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Mon, 20 Oct 2003 17:33:23 -0700 (PDT)
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.3.96.1031020173014.11380D-100000@Maggie.Linux-Consulting.com>


hi ya trent

just add the ip# of systemsfirm.net to your  /etc/mail/access files

	# a polite msg i added for them/somebody to see ..
	systemsfirm.net   REJECT - geez .. do you need help to fix your PC

	cd /etc/mail ; make ; restart-sendmail or your exim or ...

c ya
alvin

and about 75% or more of the sven virus is coming from
mis-managed/mis-configured clusters
	http://www.Linux-Sec.net/MSJunk


On Mon, 20 Oct 2003, Trent Piepho wrote:

> I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> messages I've sent to this list has started bouncing back to me from
> dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> day.  Is anyone else having this problem?
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Mon Oct 20 23:34:44 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Mon, 20 Oct 2003 20:34:44 -0700 (PDT)
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.4.44.0310202033530.8262-100000@twin.uoregon.edu>

yes...

I've tried contacting the admin contact for that domain and got no 
response...

joelja

On Mon, 20 Oct 2003, Trent Piepho wrote:

> I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> messages I've sent to this list has started bouncing back to me from
> dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> day.  Is anyone else having this problem?
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From pesch at attglobal.net  Tue Oct 21 08:06:56 2003
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Tue, 21 Oct 2003 05:06:56 -0700
Subject: Solaris Fire Engine.
References: <Pine.LNX.4.44.0310201351400.1674-100000@training.scyld.com>
Message-ID: <3F95215F.9DE43BD9@attglobal.net>

In a cluster, would it not make more sense to catch an attack in a firewall rather than at the kernel level? If
so, should cluster builders  perhaps look for other - more cluster specific - kernels? Should kernel development
at some point split in two distinct lines: one for single computer applications and one for clusters?

Paul Schenker

Donald Becker wrote:

> On Mon, 20 Oct 2003, Mof wrote:
>
> > http://www.theregister.co.uk/content/61/33440.html
> >
> > ... "We worked hard on efficiency, and we now measure, at a given network
> > workload on identical x86 hardware, we use 30 percent less CPU than Linux."
>
> Linux uses much more CPU per packet than it used to.  The structural
> change for IPtable/IPchains capability is very expensive, even when it
> is not used.  And there have been substantial, CPU-costly changes to protect
> against denial-of-service attacks at many levels.  The only protocol
> stack changes that might benefit cluster use are sendfile/zero-copy, and
> that doesn't apply to most current hardware or typical cluster message
> passing.
>
> I would be technially easy to revert to the interface of old Linux
> kernels and see much better than a 30% CPU reduction, but it's very
> unlikely that would happen politically: Linux development is
> feature-driven, not performance-driven.  And that's easy to understand
> when your pet feature is at stake, or there is a news story of "Linux
> Kernel Vulnerable to <obscure attack #452844>".
>
> --
> Donald Becker                           becker at scyld.com
> Scyld Computing Corporation             http://www.scyld.com
> 914 Bay Ridge Road, Suite 220           Scyld Beowulf cluster system
> Annapolis MD 21403                      410-990-9993
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From zby at tsinghua.edu.cn  Mon Oct 20 22:34:54 2003
From: zby at tsinghua.edu.cn (Baoyin Zhang)
Date: Tue, 21 Oct 2003 10:34:54 +0800
Subject: Jcluster toolkit v 1.0 releases!
Message-ID: <266703288.27688@mail.tsinghua.edu.cn>

            Apologies if you receive multiple copies of this message.

Dear all,

  I am pleased to annouce the Jcluster Toolkit (Ver 1.0) releases, you can 
freely download it from the website below.

http://vip.6to23.com/jcluster/

The toolkit is a high performance Java parallel environment, implemented in 
pure java. It provides you the popular PVM-like and MPI-like 
message-passing interface, automatic task load balance across large-scale 
heterogeneous cluster and high performance, reliable multithreaded 
communications using UDP protocol.

In the version 1.0, Object passing interface is added into PVM-like and MPI-like message passing interface, 
and provide very convenient deployment -- the classes of user application only need to be
deployed at one node in a large-scale cluster.

I welcome your comments, suggestions, cooperation, and involvement in 
improving the toolkit.

Best regards

Baoyin Zhang   


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Oct 21 08:20:10 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 21 Oct 2003 08:20:10 -0400 (EDT)
Subject: Solaris Fire Engine.
In-Reply-To: <3F95215F.9DE43BD9@attglobal.net>
Message-ID: <Pine.LNX.4.44.0310210653090.2827-100000@lilith.rgb.private.net>

On Tue, 21 Oct 2003 pesch at attglobal.net wrote:

> In a cluster, would it not make more sense to catch an attack in a firewall rather than at the kernel level? If
> so, should cluster builders  perhaps look for other - more cluster specific - kernels? Should kernel development
> at some point split in two distinct lines: one for single computer applications and one for clusters?

It's the usual problem (and a continuation of my XML rant in a way, as
it is at least partly motivated by this).  Sure, one can do this.
However, it is very, very expensive to do so, a classic case of 90% of
the work producing 10% of the benefit, if that.  

As Don pointed out, even Scyld, with highly talented people who are (in
principle:-) even making money doing so found maintaining a separate
kernel line crushingly expensive very quickly.  Whenever expense is
mentioned, especially in engineering, one has to consider benefit, and
do a CBA.  The CBA is the crux of all optimization theory; find the
point of diminishing returns and stay there.  I would argue that
splitting the kernel is WAY far beyond that point.  Folks who agree can
skip the editorial below.  For that matter, so can folks who
disagree...;-)

The expense can be expressed/paid one of several ways -- get a distinct
kernel optimized and stable, get an entire associated distribution
optimized and stable, and then freeze everything except for bugfixes.
You then get a local optimum (after a lot of work) that doesn't take a
lot of work to maintain, BUT you pay the penalty of drifting apart from
the rest of linux and can never resynchronize without redoing all that
work (and accepting all that new expense).  New, more efficient gcc?
Forget it -- the work of testing it with your old kernel costs too much.
New device drivers?  Hours to days of testing for each one.  Eventually
a key application or improvement appears in the main kernel line (e.g.
64 bit, Opteron support) that is REALLY different, REALLY worth more to
nearly everybody than the benefit they might or might not gain from the
custom HPC optimized kernel, and your optimized but stagnant kernel is
abandoned.

Alternatively, you can effectively twin the entire kernel development
cycle, INCLUDING the testing and debugging.  Back in my ill-spent youth
I spent a considerable amount of time on the linux-smp list (I couldn't
take being on the main linux kernel list even then, as its traffic
dwarfs both the beowulf list and the linux-smp list combined).  I also
played a tiny bit with drivers on a couple of occassions.  The amount of
work, and number of human volunteers, required to drive these processes
is astounding, and I would guess that it would have to be done on
twinned lists as the kernelvolken would likely not welcome a near
doubling of traffic on their lists or doubling of the work burden trying
to figure out just who owns a given emergent bug (and inevitably they
WOULD have to help figure out who owns emergent bugs, as some of them
WOULD belong to them, others to the group supporting the split off
sources, if they were to proceed independently but "keep up" with the
development kernel so that true divergence did not occur).

A better alternative exists (and is even used to some extent).  The
linux kernel is already highly modular.  It is already possible to e.g.
bypass the IP stack altogether (as is done by myrinet and other high
speed networks) with custom device drivers that work below the IP and
TCP layers -- just doing this saves you a lot of the associated latency
hit in high speed networks, as TCP/IP is designed for WAN routing and
security and tends to be overkill for a secure private LAN IPC channel
in a beowulf.  This route requires far less maintenance and
customization -- specialized drivers for MPI and/or PVM and/or a network
socket layer, plus a kernel module or three.  Even this is "expensive"
and tends to be done only by companies that make hefty marginal profits
for their specific devices, but it is FAR cheaper than maintaining a
separate kernel altogether.  

I would also lump into this group applying and testing on an ad hoc
basis things like Josip's network optimization patches which make
relatively small, relatively specific changes that might technically
"break" a kernel for WAN application but can produce measureable
benefits for certain classes of communication pattern.  This sort of
thing is NOT for everybody.  It is like a small scale version of the
first alternative -- the patches tend to be put together for some
particular kernel revision and then frozen (or applied "blindly" to
succeeding kernel revisions until they manifestly break).  Again this
motivates one to freeze kernel and distribution once one gets everything
working and live with it until advances elsewhere make it impossible to
continue doing so.  This is the kind of thing where MAYBE one could get
the patches introduced into the mainstream kernel sources in a form that
was e.g.  sysctl controllable -- "modular", as it were, but inside the
non-modular part of the kernel as a "proceed at your own risk" feature.

Expense alternatives in hand, one has to measure benefit.  We could
break up HPC applications very crudely into groups.  One group is code
that is CPU bound -- where the primary/only bottleneck is the number of
double precision floating point (and associated integer) computations
that the computer can retire per second.  Another might be memory bound
-- limited primarily by the speed with which the system can move values
into and out of memory doing some simple operations on them in the
meantime.  Still another might be disk or other non-network I/O bound
(people who crunch large data sets to and from large storage devices).
Finally yes, one group might be bound by the network and network based
IPC's in a parallel division of a program.

This latter group is the ONLY group that would really benefit from the
kernel split; the rest of the kernel is reasonably well optimized for
raw computations, memory access, and even hardware device access (or can
be configured and tuned to be without the need of a separate kernel
line). I would argue that even the network group splits again, into
latency limited and bandwidth limited.  Bandwidth limited applications
would again see little benefit from a hacked kernel split as TCP can
deliver data throughput that is roughly 90% of wire speed (or better)
for ethernet, depending on the quality of hardware as much as the
kernel.  Of course, the degree of the CPU's involvement in sending and
receiving these messages could be improved; one would like to be able to
use DMA as much as possible to send the messages without blocking the
CPU, but this matters only if the CPU can do something useful while
awaiting the network IPC transfers; often it cannot.

The one remaining group that would significantly benefit is the latency
limited group -- true network parallel applications that need to send
lots of little messages that cannot be sensibly aggregated in software.
The benefit there could be profound, as the TCP stack adds quite a lot
of latency (and CPU load) on top of the irreducible hardware latency,
IIRC, even on a switched network where the CPU doesn't have to deal with
a lot of spurious network traffic.  Are there enough members of this
group to justify splitting the kernel?  I very much doubt it.  I don't
even think that the existence of this group has motivated the widespread
adoption of a non-IP ethernet transport layer -- nearly everybody just
lives with the IP stack latency OR...

...uses one of the dedicated HPC networks.

This is the real kicker.  TCP latency is almost two orders of magnitude
greater than either myrinet or dolphin/sci latency (which are both order
of microseconds instead of order of hundreds of microseconds).  They
>>also<< deliver very high bandwidth.  Sure, they are expensive, but you
know that you are paying for precisely what YOU need for YOUR HPC
computations.  I don't have to pay for them (even indirectly, by helping
out with a whole secondary kernel development track) when MY code is CPU
bound; the big DB guys don't have to pay for it when THEIR code depends
on how long it takes to read in those ginormous databases of e.g.
genetic data; the linear algebra folks who need large, fast memory don't
pay for it (unless they try splitting up their linear algebra across the
network, of course:-) -- it is paid for only the people who need it, who
send lots of little messages or who need its bleeding edge bandwidth or
both.

One COULD ask, very reasonably, for just about any of the kernel
optimizations that can be implemented at the modular level -- that is a
matter of writing the module, accepting responsibility for its
integration into the kernel and sequential debugging in perpetuity (that
is, becoming a slave of the lamp, in perpetuity bound to the kernel
lists:-).  Alas, TCP/IP is so bound up inside the main part of the
kernel that I don't think it can be separated out into modules any more
than it already is.

^^^^^ ^^^^^, (closing omitted in the fond hope of remuneration)

    rgb

(C'mon now -- here I am omitting all sorts of words from my rants and my
paypal account is still dry as a bone, dry as a desert, bereft of all
money, parched as my throat in the noonday sun.  Seriously, either I
make some money or I'm gonna compose a 50 kiloword opus for my next
one...:-)

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Tue Oct 21 08:46:57 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Tue, 21 Oct 2003 14:46:57 +0200 (CEST)
Subject: Solaris Fire Engine.
In-Reply-To: <Pine.LNX.4.44.0310201957050.1674-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.44.0310211358520.29398-100000@kenzo.iwr.uni-heidelberg.de>

On Mon, 20 Oct 2003, Donald Becker wrote:

> My belief is that the page locking involved with sendfile() would be too
> costly for anything smaller than about 32KB.

IIRC, both MPICH and LAM-MPI make the distinction between small and large
messages with the default cutoff being 64KB. So large messages could be
sent this way... I don't know what you meant with "too costly", but small
messsages are not too costly to copy in the stack (normal behaviour)  
especially with increasing cache sizes of today CPUs, while the large ones
(where copying time would be significant) could be sent without the extra
copy in the stack.

> While I'm certain that there are a few MPI applications that use
> messages that large, they don't seem to be typical.

... or might not care that much about the speedup.

> >From my experience trying to keep the network driver interface stable, I
> very much doubt that it would be possible to separately maintain a
> network protocol stack.

Well, it was late last night and probably I haven't chosen the most 
appropriate example... the Scyld network drivers are maintained by one 
person, while my suggestion was more going toward a community project.

> Especially since it would be perceived as competition with the in-kernel
> version, which brings out the worst behavior...

Yeah, political issues - I think that making the intent clear would solve
the problem: there is no competition, it serves a completely different
purpose. And given what you wrote in the previous e-mail about
"feature-driven", who would use it on normal computers when it misses
several "high-profile features" like iptables ? Even more, if it's clear
that it should only be used on local fast networks, several aspects of the
stack can be optimized without fear of breaking very high latency
(satellite) or very low bandwidth (phone modems) connections. But I guess 
that I should stop dreaming :-)

> As a specific example, a few years ago we had cluster performance
> patches for the 2.2 kernel.

Those maintained by Josip Loncaric ? Again it was a one-man show. 

I think that this is exactly the problem: there are small projects
maintained by one person but which depend on the free time or interest of
this person. Given that the clustering had moved from research-only into a
lucrative bussiness and that the software (Linux kernel, MPI libraries,
etc.) evolved quite a lot and the entry barrier into let's say kernel
programing is quite high, it's normal that not many people want to make 
the step. I already expressed about a year ago my oppinion that such 
projects can only be carried forward by companies that benefit from them 
or universities where work from students comes for free. But it seems that 
there are no companies thinking that they can benefit or universities 
where students' work is for free...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Tue Oct 21 09:31:37 2003
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Tue, 21 Oct 2003 09:31:37 -0400 (EDT)
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310210908260.28819-100000@chaos.egr.duke.edu>

On Mon, 20 Oct 2003 at 7:08pm, Robert G. Brown wrote

> On Mon, 20 Oct 2003, Trent Piepho wrote:
> 
> > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> > messages I've sent to this list has started bouncing back to me from
> > dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> > day.  Is anyone else having this problem?
> 
> BTW, you (and of course the rest of the list) are just the man to ask;
> what is the status of Opterons and fortran compilers.  I myself don't
> use fortran any more, but a number of folks at Duke do, and they are
> starting to ask what the choices are for Opterons.  A websearch reveals
> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
> Opteron fortran, but rumor also suggests that a number of these are
> really "beta" quality with bugs that may or may not prove fatal to any
> given project.  Then there is Gnu.
> 
> Any comments on any of these from you (or anybody, really)?  Is there a
> functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
> Do the compilers permit access to large (> 3GB) memory, do they optimize
> the use of that memory, do they support the various SSE instructions?

Well, this is as good a place as many to put up the benchmarks I ran using 
DYNA (a commercial FEM code from LSTC, first developed at LLNL, and 
definitely Fortran):

http://www.duke.edu/~jlb17/bench-results.pdf

According to their docs, the 32bit binary was compiled using ifc6.0.  The 
slowdown in the newer point release is due to them dialing back the 
optimizations due to compiler bugs.  The 64bit Opteron binary was compiled 
using PGI, but that's all I know about it.

To sum it up, I bought some Opterons.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Tue Oct 21 09:41:53 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Tue, 21 Oct 2003 15:41:53 +0200 (CEST)
Subject: Solaris Fire Engine.
In-Reply-To: <Pine.LNX.4.44.0310211358520.29398-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.LNX.4.44.0310211537150.29398-100000@kenzo.iwr.uni-heidelberg.de>

On Tue, 21 Oct 2003, Bogdan Costescu wrote:

> But I guess that I should stop dreaming :-)

Well, either I'm not dreaming, or somebody else is dreaming too :-)
Below are some fragments of e-mails from David Miller (one of the Linux 
network maintainers) to netdev today:

> People on clusters use their own special clustering hardware and
> protocol stacks _ANYWAYS_ because ipv4 is too general to serve their
> performance needs.  And I think that is a good thing rather than
> a bad thing.  People should use specialized solutions if that is the
> best way to attack their problem.

...

> The things cluster people want is totally against what a general
> purpose IPV4 implementation should do.  Linux needs to provide a
> general purpose IPV4 stack that works well for everybody, not just
> cluster people.
>
> I'd rather have millions of servers using my IPV4 stack than a handful
> of N-thousand system clusters.
> ...
> Sure, many people would like to simulate the earth and nuclear weapons
> using Linux, but I'm sure as hell not going to put features into the
> kernel to help them if such features hurt the majority of Linux users.


-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Tue Oct 21 11:19:22 2003
From: becker at scyld.com (Donald Becker)
Date: Tue, 21 Oct 2003 11:19:22 -0400 (EDT)
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310211108480.1674-100000@training.scyld.com>

On Mon, 20 Oct 2003, Robert G. Brown wrote:

> On Mon, 20 Oct 2003, Trent Piepho wrote:
> 
> > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> > messages I've sent to this list has started bouncing back to me from
> > dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> > day.  Is anyone else having this problem?
> 
> Oh yes, and it is a SERIOUS problem.  I was just mulling on the right

There are many more problems that list readers do not see.  I delete the
address from the list only when the problem is persistent.
The major problem happens when messages take a few days to bounce, and
the bounce does not follow standards.  In that case there are dozens of
messages in the remote queue, and they all appears to be replies by a
valid list subscriber.

> BTW, you (and of course the rest of the list) are just the man to ask;
> what is the status of Opterons and fortran compilers.  I myself don't
> use fortran any more, but a number of folks at Duke do, and they are
> starting to ask what the choices are for Opterons.  A websearch reveals
> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
> Opteron fortran, but rumor also suggests that a number of these are
> really "beta" quality with bugs that may or may not prove fatal to any
> given project.

A surprising amount of 64 bit software (certainly not limited to the
Opteron) is still not mature enough for general purpose use.  It still
requires more development and testing to get to the stability level
required for real deployment.  And it's not the "64 bit" nature of the
software, since we did have reasonable maturity on the Alpha years ago.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From edwardsa at plk.af.mil  Tue Oct 21 11:06:37 2003
From: edwardsa at plk.af.mil (Arthur H. Edwards)
Date: Tue, 21 Oct 2003 09:06:37 -0600
Subject: parllel eigen solvers
In-Reply-To: <200310211049.OAA18031@nocserv.free.net>
References: <200310201236.28901.kinghorn@pqs-chem.com> <200310211049.OAA18031@nocserv.free.net>
Message-ID: <20031021150637.GA8076@plk.af.mil>

I should point out that density function theorcan be compute-bound on
diagonalization. QUEST, a Sandia Code, easily handles several hundred
atoms, but the eigen solve dominates by ~300-400 atoms. Thus,
intermediate size diagonalization is of strong interest.

Art Edwards

On Tue, Oct 21, 2003 at 02:49:07PM +0400, Mikhail Kuzminsky wrote:
> According to Donald B. Kinghorn
> > 
> > Does anyone know of any recent progress on parallel eigensolvers suitable for 
> > beowulf clusters running over gigabit ethernet?
> >  It would be nice to have something that scaled moderately well and at least 
> > gave reasonable approximations to some subset of eigenvalues and vectors for 
> > large (10,000x10,000) symmetric systems.
> > My interests are primarily for quantum chemistry.
> >
>   In the case you think about semiempirical fockian diagonalisation,
> there is a set of alternative methods for direct construction of density
> matrix avoiding preliminary finding of eigenvectors. This methods
> are realized, in particular, in Gaussian-03 and MOPAC-2002 methods.
>   
>   For non-empirical quantum chemistry diagonalisation usually doesn't limit
> common performance. In the case of methods like CI it's necessary to
> find only some eigenvectors, and it is better to use special diagonalization
> methods. 
> 
>   There is special parallel solver package, but I don't have exact
> reference w/me :-(
> 
> Mikhail Kuzminsky
> Zelinsky Inst. of Orgamic Chemistry
> Moscow
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Art Edwards
Senior Research Physicist
Air Force Research Laboratory
Electronics Foundations Branch
KAFB, New Mexico

(505) 853-6042 (v)
(505) 846-2290 (f)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eccf at super.unam.mx  Tue Oct 21 15:32:05 2003
From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores)
Date: Tue, 21 Oct 2003 13:32:05 -0600 (CST)
Subject: shift bit & performance?
In-Reply-To: <200310211603.h9LG3cA22580@NewBlue.scyld.com>
Message-ID: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>


Hi, 

sometime ago, somebody sent an info about performance working with "<<" & 
">>" doing shift bits instead of using "*" or "/"
 Could anybody help me about it?


cafe


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glen at mail.cert.ucr.edu  Tue Oct 21 15:21:24 2003
From: glen at mail.cert.ucr.edu (Glen Kaukola)
Date: Tue, 21 Oct 2003 12:21:24 -0700
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <Pine.LNX.4.44.0310210908260.28819-100000@chaos.egr.duke.edu>
References: <Pine.LNX.4.44.0310210908260.28819-100000@chaos.egr.duke.edu>
Message-ID: <3F958734.2030300@cert.ucr.edu>


>>On Mon, 20 Oct 2003, Trent Piepho wrote:
>>    
>>
>>BTW, you (and of course the rest of the list) are just the man to ask;
>>what is the status of Opterons and fortran compilers.  I myself don't
>>use fortran any more, but a number of folks at Duke do, and they are
>>starting to ask what the choices are for Opterons.  A websearch reveals
>>that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
>>Opteron fortran, but rumor also suggests that a number of these are
>>really "beta" quality with bugs that may or may not prove fatal to any
>>given project.  Then there is Gnu.
>>
>>Any comments on any of these from you (or anybody, really)?  Is there a
>>functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
>>Do the compilers permit access to large (> 3GB) memory, do they optimize
>>the use of that memory, do they support the various SSE instructions?
>>

I can tell you about PGI's compilers.  They are kinda beta quality as 
you say.  As of now they only want to install on Suse enterprise 
edition.  Although a little fiddling around with the install scripts and 
you can get them to install on other distributions.  But even though you 
can get the compilers installed, they only seem to run on the Suse beta 
for opterons.  PGI says this should all change in the near future 
though.  As far as the code that the compilers produce, we haven't had 
any problems at all as far as I know of.  The great thing about PGI 
compilers though is that you can download them and try them out for free 
for 15 days or so and see for yourself.

As far as the Gnu Fortran compiler goes, it seems to work great on 
Opterons too.  But then as you're probably aware, it's only a Fortran 77 
compiler.


Cheers,
Glen

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dtj at uberh4x0r.org  Tue Oct 21 15:33:33 2003
From: dtj at uberh4x0r.org (Dean Johnson)
Date: 21 Oct 2003 14:33:33 -0500
Subject: shift bit & performance?
In-Reply-To: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>
References: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>
Message-ID: <1066764813.27603.4.camel@terra>

On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote:
> Hi, 
> 
> sometime ago, somebody sent an info about performance working with "<<" & 
> ">>" doing shift bits instead of using "*" or "/"
>  Could anybody help me about it?
> 

There is certainly performance to be had from using a logical shift instead of a 
multiply or divide, but its of declining value. I am fairly sure that with modern
compilers you do a integer divide by a constant power of 2, that it will generate
a logical shift. That aint rocket science.

	-Dean

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Tue Oct 21 16:32:07 2003
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Tue, 21 Oct 2003 15:32:07 -0500
Subject: shift bit & performance?
In-Reply-To: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>; from eccf@super.unam.mx on Tue, Oct 21, 2003 at 01:32:05PM -0600
References: <200310211603.h9LG3cA22580@NewBlue.scyld.com> <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>
Message-ID: <20031021153207.N31870@mikee.ath.cx>

On Tue, 21 Oct 2003, Eduardo Cesar Cabrera Flores wrote:

> 
> 
> Hi, 
> 
> sometime ago, somebody sent an info about performance working with "<<" & 
> ">>" doing shift bits instead of using "*" or "/"
>  Could anybody help me about it?

The operations << and >> are closer to assembler operations
for integer values than * and /. If using * or / there are
many assembler instructions to compute the new values. When
using power of 2s for * or / then << and >> are much faster.

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bhalevy at panasas.com  Tue Oct 21 17:35:06 2003
From: bhalevy at panasas.com (Halevy, Benny)
Date: Tue, 21 Oct 2003 17:35:06 -0400
Subject: shift bit & performance?
Message-ID: <30489F1321F5C343ACF6872B2CF7942A039DF8BC@PIKES.panasas.com>

Could be meaningful on a 32 bit platform doing 64-bit math emulation.
Emulating shift is much cheaper than multiply/divide.

Benny

>-----Original Message-----
>From: Dean Johnson [mailto:dtj at uberh4x0r.org]
>Sent: Tuesday, October 21, 2003 3:34 PM
>To: Eduardo Cesar Cabrera Flores
>Cc: beowulf at beowulf.org
>Subject: Re: shift bit & performance?
>
>
>On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote:
>> Hi, 
>> 
>> sometime ago, somebody sent an info about performance 
>working with "<<" & 
>> ">>" doing shift bits instead of using "*" or "/"
>>  Could anybody help me about it?
>> 
>
>There is certainly performance to be had from using a logical 
>shift instead of a 
>multiply or divide, but its of declining value. I am fairly 
>sure that with modern
>compilers you do a integer divide by a constant power of 2, 
>that it will generate
>a logical shift. That aint rocket science.
>
>	-Dean
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From pesch at attglobal.net  Wed Oct 22 03:32:31 2003
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Wed, 22 Oct 2003 00:32:31 -0700
Subject: flood of bounces from postmaster@systemsfirm.net
References: <Pine.LNX.4.44.0310211108480.1674-100000@training.scyld.com>
Message-ID: <3F96328F.DF327416@attglobal.net>

Perhaps it's not related to the topic but any mail I post to this list results automatically in a "incident
report" to my mail provider (attglobal.net) which then automatically replies with the mail below. Any inquiry to
attglobal.net with the reference number below results always in exactly 0 (zero) replies from attglobal.

Paul Schenker


"Received:
                 from e4.ny.us.ibm.com ([32.97.182.104]) by prserv.net (in5) with ESMTP
                 id <20031021031824105041p20me>; Tue, 21 Oct 2003 03:18:27 +0000
        Received:
                 from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com
                 [9.56.224.149]) by e4.ny.us.ibm.com (8.12.10/8.12.2) with ESMTP id
                 h9L3IN0N801416 for <pesch at attglobal.net>; Mon, 20 Oct 2003
                 23:18:23 -0400
        Received:
                 from BLDVMB.POK.IBM.COM (d01av01.pok.ibm.com [9.56.224.215])
                 by northrelay01.pok.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id
                 h9L3IMqW036946 for
                 <@vm-av.pok.relay.ibm.com:pesch at attglobal.net>; Mon, 20 Oct 2003
                 23:18:22 -0400
      Message-ID:
                 <200310210318.h9L3IMqW036946 at northrelay01.pok.ibm.com>
        Received:
                 by BLDVMB.POK.IBM.COM (IBM VM SMTP Level 320) via spool
                 with SMTP id 7133 ; Mon, 20 Oct 2003 21:09:30 MDT
            Date:
                 Mon, 20 OCT 2003 23:13:12 (-0400 GMT)
            From:
                 notify at attglobal.net
              To:
                 <PESCH at attglobal.net>
             CC:
          Subject:
                 Re: Solaris Fire Engine. (REF:#_CSSEMAIL_0870689)
  X-Mozilla-Status:
                 8011
 X-Mozilla-Status2:
                 00000000
         X-UIDL:
                 200310210327271050a5ammfe0013d2


An incident reported by you has been created.              Sev: 4
The incident # is listed below. No need to respond to this e-mail.
For Account:  CSSEMAIL  Incident Number:     0870689 Status:  INITIAL
Last Updated:        Mon, 20 OCT 2003 23:13:12 (-0400 GMT) PROBLEM CREATED
*************************************************************************

Summary: Re: Solaris Fire Engine.


*************************************************************************


If replying via email, do not alter the reference id in the subject
line and send only new information, do not send entire note again.
Do not send attachments, graphics or images."

Donald Becker wrote:

> On Mon, 20 Oct 2003, Robert G. Brown wrote:
>
> > On Mon, 20 Oct 2003, Trent Piepho wrote:
> >
> > > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> > > messages I've sent to this list has started bouncing back to me from
> > > dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> > > day.  Is anyone else having this problem?
> >
> > Oh yes, and it is a SERIOUS problem.  I was just mulling on the right
>
> There are many more problems that list readers do not see.  I delete the
> address from the list only when the problem is persistent.
> The major problem happens when messages take a few days to bounce, and
> the bounce does not follow standards.  In that case there are dozens of
> messages in the remote queue, and they all appears to be replies by a
> valid list subscriber.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From douglas at shore.net  Wed Oct 22 01:33:02 2003
From: douglas at shore.net (Douglas O'Flaherty)
Date: Wed, 22 Oct 2003 01:33:02 -0400
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <200310211601.h9LG1QA22261@NewBlue.scyld.com>
References: <200310211601.h9LG1QA22261@NewBlue.scyld.com>
Message-ID: <3F96168E.7050908@shore.net>


Here's the short summary of Opteron compilers. When someone offers an 
AMD64 compiler, it typically may be used to create 32-bit or 64-bit 
executables as long as you are specific about which libraries you use. 
Any IA-32 compiler can create code and run on Opterons. Of course, 
32-bit executables don't get the extra memory either, even when running 
on a 64-bit OS, but sometimes a 32-bit executable might be what you want.

With SC2003 coming up, I expect we'll see a flurry of activity relating 
to compilers and tools. This information will likely be stale soon. 
Also, most of these have a free trial period, so you can kick the tires.

Intel compilers work great in 32-bit and can be run on a 32 or 64-bit OS 
natively. Performance and compatability is not an issue. For obvious 
reasons many of the benchmarks have been run using IFC.

PGI's first AMD64 production release was around July 5.  There is a 
limitation on objects greater than 2GB in Linux as a result of the GNU 
assembly linker, but the application can address as much memory as you 
can give it. Only a small fraction of the world has objects that large. 
I've only run into it with synthetic benchmarks. The gal coding is done 
and PGI is working on the next release. As for performance, since this 
was the first AMD64 fortran compiler to market, it was used in AMD 
presentations. You can see performance comparisons in Rich Brunner's 
presentation from ClusterWorld. It's on-line at 
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/RichBrunnerClusterWorldpresFINAL.pdf  
(about slide 39 IIRC) There was a minor patch release near the begining 
of August. I suspect there is always someone finding flaws, but 
generally it's doing well.
NB: Saw Glenn's post re: PGI on SuSE v. RedHat. We've got it running on 
both. There were definately some fiddley bits to make it happy on 
RedHat, but I think they are documented on PGI's site.

Absoft had a long beta of their AMD64 compiler and went GA in September. 
I have no personal experience on it, nor do I know of any public benchmarks.

NAG worked closely with AMD on the AMD Core Math Libraries. They should 
know the processor well.

No experience with the Gnu Fortran or Lahey. I believe GFC to be AMD64 
functional. Lahey would only generate 32-bit code.

Your other question was about SSE2. Yes Opteron has complete SSE2 
support. I *know* PGI & IFC support it, I expect the others do as well.

doug
douglas_at_shore.net

Disclaimer: Among my several hats I am also in AMD Marketing. This is an 
unofficial response. No AMD bits were utlized in the creation of this 
email, etc..  If you want to talk about Opterons 'officially' you need 
to email me at doug.oflaherty(at)amd.com


On Mon, 20 Oct 2003 at 7:08pm, Robert G. Brown wrote

>> On Mon, 20 Oct 2003, Trent Piepho wrote:
>> 
>  
>
>>> > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
>>> > messages I've sent to this list has started bouncing back to me from
>>> > dan at systemsfirm.com.  I'm getting about ten copies of each one every other
>>> > day.  Is anyone else having this problem?
>>    
>>
>> 
>> BTW, you (and of course the rest of the list) are just the man to ask;
>> what is the status of Opterons and fortran compilers.  I myself don't
>> use fortran any more, but a number of folks at Duke do, and they are
>> starting to ask what the choices are for Opterons.  A websearch reveals
>> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
>> Opteron fortran, but rumor also suggests that a number of these are
>> really "beta" quality with bugs that may or may not prove fatal to any
>> given project.  Then there is Gnu.
>> 
>> Any comments on any of these from you (or anybody, really)?  Is there a
>> functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
>> Do the compilers permit access to large (> 3GB) memory, do they optimize
>> the use of that memory, do they support the various SSE instructions?
>  
>

>  
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Wed Oct 22 04:45:08 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Wed, 22 Oct 2003 10:45:08 +0200
Subject: shift bit & performance?
In-Reply-To: <1066764813.27603.4.camel@terra>
References: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx> <1066764813.27603.4.camel@terra>
Message-ID: <20031022084508.GA7048@unthought.net>

On Tue, Oct 21, 2003 at 02:33:33PM -0500, Dean Johnson wrote:
> On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote:
> > Hi, 
> > 
> > sometime ago, somebody sent an info about performance working with "<<" & 
> > ">>" doing shift bits instead of using "*" or "/"
> >  Could anybody help me about it?
> > 
> 
> There is certainly performance to be had from using a logical shift instead of a 
> multiply or divide, but its of declining value. I am fairly sure that with modern
> compilers you do a integer divide by a constant power of 2, that it will generate
> a logical shift. That aint rocket science.
> 

It used to be true that shifts were 'better' on Intel x86 processors,
but it is not that simple anymore.

On the P4 for example, a sequence of 'add's is cheaper than a left 
shift, for three adds or less (because the latency on the shift opcode 
has increased compared to earlier generations).  

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From serguei.patchkovskii at sympatico.ca  Wed Oct 22 10:05:38 2003
From: serguei.patchkovskii at sympatico.ca (serguei.patchkovskii at sympatico.ca)
Date: Wed, 22 Oct 2003 10:05:38 -0400
Subject: (no subject)
Message-ID: <20031022140538.QSP8001.tomts7-srv.bellnexxia.net@[209.226.175.20]>

> Any IA-32 compiler can create code and run on Opterons. Of course,
> 32-bit executables don't get the extra memory either, even when running
> on a 64-bit OS

Not true. A 32-bit binary running on x86-64 Linux has access to full 32-bit
address
space. When I run a very simple 32-bit Fortran program, I see program itself
mapped at very low addresses; the shaped libraries get mapped at 1Gbyte
mark,
while the stack grows down from 4Gbyte mark. On an x86 Linux, the upper
1Gbyte
(but this depends on the kernel options) is taken by the kernel address
space.

What this means in practice, is that on an x86 Linux, I can allocate at most
2.5Gbytes of memory for my data without resorting to ugly tricks; in 32-bit
mode of x86-64 Linux, this goes up to about 3.5Gbytes - enough to make a
difference in some cases.

Serguei


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From brian.dobbins at yale.edu  Wed Oct 22 09:38:00 2003
From: brian.dobbins at yale.edu (Brian Dobbins)
Date: Wed, 22 Oct 2003 09:38:00 -0400 (EDT)
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <3F96168E.7050908@shore.net>
Message-ID: <Pine.LNX.4.44.0310220933320.12746-100000@email.combustion.eng.yale.edu>

> PGI's first AMD64 production release was around July 5.  There is a 
> limitation on objects greater than 2GB in Linux as a result of the GNU 
> assembly linker, but the application can address as much memory as you 

  One simple way to get around the 2GB limit (*) is to simply use FORTRAN 
90 dynamic allocation calls - we've done this, and have run codes up to 
(so far) about 7.7GB in size.  If you're used to static allocations in 
F77, it's only about two lines to alter things to use dynamic mem.

  (*) - I don't think this limitation is in the GNU assembly linker, since 
g77 has no problems here.  I think if you compile to assembly, you'll see 
that PGI has issues with 32-bit wraparound, whereas g77 does not.  Their 
tech people are aware of this, and it's something I expect will be fixed 
farily soon.

  Also, if you do happen to run jobs > 4GB, make sure you update the 'top' 
version you're using (procps.sourceforge.net).  Previous versions had 
wraparound at the 4GB mark, and it's cool seeing a listing say something 
to the effect of "7.7G" next to the size.  :)

  Cheers,
  - Brian

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From shewa at inel.gov  Wed Oct 22 12:22:20 2003
From: shewa at inel.gov (Andrew Shewmaker)
Date: Wed, 22 Oct 2003 10:22:20 -0600
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>
Message-ID: <3F96AEBC.2020107@inel.gov>

Robert G. Brown wrote:

> BTW, you (and of course the rest of the list) are just the man to ask;
> what is the status of Opterons and fortran compilers.  I myself don't
> use fortran any more, but a number of folks at Duke do, and they are
> starting to ask what the choices are for Opterons.  A websearch reveals
> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
> Opteron fortran, but rumor also suggests that a number of these are
> really "beta" quality with bugs that may or may not prove fatal to any
> given project.  Then there is Gnu.

I have used the PGI compiler 5.0-2 on SuSE SLES 8 with Radion
Technologies' (www.radiative.com) Attila Fortran 90 code.  One of our
scientists has run models in which a single Attila process allocates
up to about 7GB of RAM.  The performance of the Opteron was quite
impressive too.

I'm still testing the g77 3.3 prerelease that SuSE includes.  By default
it creates 64 bit binaries.

The gfortran (G95) snapshot doesn't work, but I'm planning on building
it myself later on and trying to compile the above Attila code with it.
Radiative looked at this earlier (months ago) and it wasn't ready at
that time.

Andrew

> 
> Any comments on any of these from you (or anybody, really)?  Is there a
> functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
> Do the compilers permit access to large (> 3GB) memory, do they optimize
> the use of that memory, do they support the various SSE instructions?
> 
> I'm indirectly interested in this as it looks like I'm getting Opterons
> for my next round of cluster purchases personally, although I'll be
> using C on them (hopefully 64 bit Gnu C).
> 
>    rgb
> 
> 
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
> 
> 


-- 
Andrew Shewmaker, Associate Engineer
Phone: 1-208-526-1415
Idaho National Eng. and Environmental Lab.
P.0. Box 1625, M.S. 3605
Idaho Falls, Idaho 83415-3605

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From edwardsa at plk.af.mil  Wed Oct 22 13:21:39 2003
From: edwardsa at plk.af.mil (Arthur H. Edwards)
Date: Wed, 22 Oct 2003 11:21:39 -0600
Subject: Cooling
Message-ID: <20031022172139.GA12958@plk.af.mil>

I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The
cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will
be on metal racks. Does anyone have a simple way to calculate cooling
requirements? We will have fair flexibility with air flow.

Art Edwards

-- 
Art Edwards
Senior Research Physicist
Air Force Research Laboratory
Electronics Foundations Branch
KAFB, New Mexico

(505) 853-6042 (v)
(505) 846-2290 (f)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From JAI_RANGI at SDSTATE.EDU  Wed Oct 22 15:43:36 2003
From: JAI_RANGI at SDSTATE.EDU (RANGI, JAI)
Date: Wed, 22 Oct 2003 14:43:36 -0500
Subject: How to calculate operations on the cluster
Message-ID: <B787DC897037D51182A600508BDFFAE80BE0D7C5@sdsuex2.sdstate.edu>

Hi,
Can some tell me how to find out that how many operations can be performed
on your cluster. If some say 3 million operation can be performed on this
cluster, how to verify that and how to find out the actual performance. 
-Thanks

-Jai Rangi
 
 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Wed Oct 22 16:16:09 2003
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 22 Oct 2003 13:16:09 -0700
Subject: Cooling
References: <20031022172139.GA12958@plk.af.mil>
Message-ID: <001d01c398d9$5b09cd00$32a8a8c0@laptop152422>

To a first order, figure you've got to reject 150-200W per node.. that's
roughly 10kW of heat you need to get rid of.  That's 10kJ/second.

That will tell you right away how many "tons" of A/C you'll need (1 ton =
12000 BTU/hr or, more usefully, here, 3.517 kW)... Looks like you'll need
3-4 tons (3 tons and 5tons are  standard sizes...)

Next, figure out how much temperature rise in the air you can tolerate (say,
10 degrees C)

Use the specific heat of air to calculate how many kilos (or, more
practically, cubic feet) of air you need to move
(use 1000 J/kg deg as an approximation... you need to move 1 kg of air every
second or about a cubic meter...
roughly approximating, a cubic meter is about 35 cubic feet, so you need
around 2100 cubic feet per minute)

As a practical matter, you'll want a lot more flow (using idealized numbers
when it's cheap to put margin in is foolish).
Also, a 10 degree rise is pretty substantial... If you kept the room at 15C,
the air coming out of the racks would be 25C, and I'll bet the processors
would be a good 20C above that.  Calculating for a 5 degree rise might be a
better plan.  Just double the flow.

Unless you're investing in specialized ducting that pushes the AC only
through the racks and not the room, a lot of the flow will be going around
the racks, whether you like it or not.

In general, one likes to keep the duct flow speed below 1000 linear feet per
minute (for noise reasons!), so your ducting will be around 3-4 square feet.

This is not a window airconditioner!... This is the curse of rackmounted
equipment in general.  Getting the heat out of the room is easy. The tricky
part is getting the heat out of the rack.  Think about it, you've got to
pump all those thousands of CFM *through the rack*, which is aerodynamically
not well suited to this, especially in 1U boxes.  How much cross sectional
area is there in that rack chassis aperture for the air?  How fast does that
imply that the air is moving? What sort of pressure drop is there going
through the rack?


Take a look at RGB's Brahma web site.  There's some photos there of their
chiller unit, so you can get an idea of what's involved.

Your HVAC engineer will do a much fancier and useful version of this,
allowing for things such as pressure drop, the amount of recirculation, the
amount of heat leaking in from other sources (lighting, bodies in the room,
etc.), heating from the fans, and so forth; But, at least you've got a ball
park figure for what you're going to need.

Jim Lux

----- Original Message -----
From: "Arthur H. Edwards" <edwardsa at plk.af.mil>
To: <beowulf at beowulf.org>
Sent: Wednesday, October 22, 2003 10:21 AM
Subject: Cooling


> I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The
> cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will
> be on metal racks. Does anyone have a simple way to calculate cooling
> requirements? We will have fair flexibility with air flow.
>
> Art Edwards
>
> --
> Art Edwards
> Senior Research Physicist
> Air Force Research Laboratory
> Electronics Foundations Branch
> KAFB, New Mexico
>
> (505) 853-6042 (v)
> (505) 846-2290 (f)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From toon at moene.indiv.nluug.nl  Wed Oct 22 17:43:16 2003
From: toon at moene.indiv.nluug.nl (Toon Moene)
Date: Wed, 22 Oct 2003 23:43:16 +0200
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
References: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net> <3F96AEBC.2020107@inel.gov>
Message-ID: <3F96F9F4.8050505@moene.indiv.nluug.nl>

Andrew Shewmaker wrote:

> I'm still testing the g77 3.3 prerelease that SuSE includes.  By default
> it creates 64 bit binaries.

Is there any interest in having g77 deal correctly with > 2Gb *direct 
access* records ?  I have a patch in progress (due to 
http://gcc.gnu.org/PR10885) that I can't test myself ...

> The gfortran (G95) snapshot doesn't work, but I'm planning on building
> it myself later on and trying to compile the above Attila code with it.
> Radiative looked at this earlier (months ago) and it wasn't ready at
> that time.

Please do not forget to enter bug reports in our Bugzilla database (see 
http://gcc.gnu.org/bugs.html).

Thanks !

-- 
Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Daniel.Kidger at quadrics.com  Wed Oct 22 09:53:51 2003
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Wed, 22 Oct 2003 14:53:51 +0100
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@syst
	emsfirm.net)
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE210@stegosaurus.bristol.quadrics.com>

>From: Brian Dobbins [mailto:brian.dobbins at yale.edu]
(cut)
> Also, if you do happen to run jobs > 4GB, make sure you 
> update the 'top' 
> version you're using (procps.sourceforge.net).  Previous versions had 
> wraparound at the 4GB mark, and it's cool seeing a listing 
> say something  to the effect of "7.7G" next to the size.  :)

On the subject of top another caveat is that top is hard-coded at compile
time 
about what it thinks the pagesize is.

If you compile kernels with bigger pagesizes (generally a 'good thing' for
large memory nodes) then 'top' gets the memory used by your programs wrong
by a factor of x2,x4 etc. !


Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From leopold.palomo at upc.es  Thu Oct 23 05:33:35 2003
From: leopold.palomo at upc.es (Leopold Palomo Avellaneda)
Date: Thu, 23 Oct 2003 11:33:35 +0200
Subject: OpenMosix,  opinions?
Message-ID: <200310231133.35912.leopold.palomo@upc.es>

Hi,

I'm a newbie in all of this questions of paralelism and clusters. I'm reading 
all of I can. I have found some point that I need some opinions.

Hipotesis,

having a typical beowulf, with some nodes, a switch, etc. All of the nodes 
running GNU/Linux, and the applications that are running are using MPI or 
PVM. All works, etc ....

Imaging that we have an aplication. A pararell aplication that doesn't use a 
lot I/O operation, but intensive cpu, and some messages. Something like a 
pure parallel app. We implement it using PVM or MPI ... MPI. And we make a 
test, and we have some result.

Now, we have our beowulf, with a linux kernel with OpenMosix with a patch that 
can migrate threads (light weith process, Mighsm, http://mcaserta.com/maask/) 
or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, that 
com from here: http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/
benchmark.htm.

We have our program, and we change it that use threads for the paralel 
behaviour and not MPI. And we run the same test. So, what will be better? Any 
one have tested it?

Thank's in advance.

Best regards,

Leo


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct 23 07:52:14 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 23 Oct 2003 07:52:14 -0400 (EDT)
Subject: Cooling
In-Reply-To: <20031022172139.GA12958@plk.af.mil>
Message-ID: <Pine.LNX.4.44.0310230719350.1196-100000@lilith.rgb.private.net>

On Wed, 22 Oct 2003, Arthur H. Edwards wrote:

> I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The
> cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will
> be on metal racks. Does anyone have a simple way to calculate cooling
> requirements? We will have fair flexibility with air flow.

My kill-a-watt shows 1900+ AMD Athlon duals drawing roughly ~230W/node
(or 115 per processor) (under steady, full load).  I don't have a single
CPU system in this class to test, but because of hardware replication I
would guess that one draws MORE than half of this, probably ballpark of
150-160W where YMMV depending on memory and disk and etc configuration.
Your clock is also a bit higher than what I measure and there is a
clockspeed dependence on the CPU side, so you should likely guesstimate
highball, say 175W OR buy a <$50 kill-a-watt (numerous sources online)
and measure your prototyping node yourself and get a precise number.

Then it is a matter of arithmetic.  To be really safe and make the
arithmetic easy enough to do on my fingers, I'll assume 200 W/node.
Times 48 is 9600 watts.  Plus 400 watts for electric lights, a head node
with disk, a monitor, a switch (this is likely lowball, but we
highballed the nodes).  Call it 10 KW in a roughly 1000 cubic foot
space.

One ton of AC removes approximately 3500 watts continuously.  You
therefore need at LEAST 3 tons of AC.  However, you'd really like to be
able to keep the room COLD, not just on a part with its external
environment, and so need to be able to remove heat infiltrating through
the walls, so providing overcapacity is desireable -- 4-5 tons wouldn't
be out of the question.  This also gives you at least limited capacity
for future growth and upgrade without another remodelling job (maybe
you'll replace those singles with duals that draw 250-300W apiece in the
same rack density one day). 

You also have to engineer airflow so that cold air enters on the air
intake side of the nodes (the front) and is picked up by a warm air
return after being exhausted, heated after cooling the nodes, from their
rear.  I don't mean that you need air delivery and returns per rack
necessarily, but the steady state airflow needs to retard mixing and
above all prevent air exhausted by one rack being picked up as intake to
the next.

There are lots of ways to achieve this.  You can set up the racks so
that the node fronts face in one aisle and node exhausts face in the
rear and arrange for cold air delivery into the lower part of the node
front aisle (and warm air return on the ceiling).  You can put all the
racks in a single row and deliver cold air as low as possible on the
front side and remove it on the ceiling of the rear side.  If you have a
raised floor and four post racks with sidepanels you can deliver it from
underneath each rack and remove it from the top.

This is all FYI, but it is a good idea to hire an actual architect or
engineer with experience in server room design to design your
power/cooling system, as there are lots of things (thermal power kill
switch, for example) that you might miss but they should not.  However,
I think that the list wisdom is that you should deal with them armored
with a pretty good idea of what they should be doing, as the unfortunate
experience of many who have done so is that even the pros make costly
mistakes when it comes to server rooms (maybe they just don't do enough
of them, or aren't used to working with 1000 cubic foot spaces).

If you google over the list archives, there are longranging, extended
discussions on server room design that embrace power delivery, cooling,
node issues, costs, and more.

    rgb

> 
> Art Edwards
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From fmahr at gmx.de  Thu Oct 23 09:06:09 2003
From: fmahr at gmx.de (Ferdinand Mahr)
Date: Thu, 23 Oct 2003 15:06:09 +0200
Subject: OpenMosix,  opinions?
References: <200310231133.35912.leopold.palomo@upc.es>
Message-ID: <3F97D241.2180EEF8@gmx.de>

Hi Leo,

> Imaging that we have an aplication. A pararell aplication that doesn't use a
> lot I/O operation, but intensive cpu, and some messages. Something like a
> pure parallel app. We implement it using PVM or MPI ... MPI. And we make a
> test, and we have some result.
> 
> Now, we have our beowulf, with a linux kernel with OpenMosix with a patch that
> can migrate threads (light weith process, Mighsm, http://mcaserta.com/maask/)
> or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, that
> com from here: http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/
> benchmark.htm.
> 
> We have our program, and we change it that use threads for the paralel
> behaviour and not MPI. And we run the same test. So, what will be better? Any
> one have tested it?

I haven't tested your special situation, but here are my thoughts about
it:

- Why changing an application that you already have? It costs you an
unnecessary amount of time and money.

- Migshm seems to enable OpenMosix to migrate System V shared memory
processes, not threads. But, "Threads created using the clone() system
call can also be migrated using Migshm", that's what you want, right? I
don't know how well that works, but it limits you to clone(), and I
don't know if thats sufficient for reasonable thread programming. Still
(as you mentioned before), you really can only write code that uses
minimum I/O and interprocess/thread communication because of network
limitations.

- Programs using PThreads don't run in parallel with OpenMosix/Migshm,
they can only be migrated in whole.

- If your MPI/PVM programs are well designed, they are usually really
fast and can scale very well when CPU-bound.

- Currently (Open)Mosix is better for load-balancing than HPC,
especially in clusters with different hardware configurations. In HPC
clusters, you usually have identical compute nodes.

Hope that helps,
Ferdinand
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From RobertsGP at ncsc.navy.mil  Thu Oct 23 15:07:19 2003
From: RobertsGP at ncsc.navy.mil (Roberts Gregory P DLPC)
Date: Thu, 23 Oct 2003 14:07:19 -0500
Subject: UnitedLinux?
Message-ID: <F42966A0479AD4118AE6009027D08CEB0789A093@dlpcexch02.ncsc.navy.mil>

Has anyone used UnitedLinux 1.0? I am using it on a 2 node dual CPU Opteron
system. 


Greg 


-----Original Message-----
From: Bill Broadley [mailto:bill at math.ucdavis.edu]
Sent: Thursday, September 25, 2003 7:46 PM
To: Brian Dobbins
Cc: Bill Broadley; beowulf at beowulf.org
Subject: Re: A question of OS!!


>   Yikes.. what kernels are used on these systems by default, and how large
> is the code?  I've been running SuSE 8.2 Pro on my nodes, and have gotten 

Factory default in both cases AFAIK.  I don't have access to the SLES
system at the moment, but the redhat box is:

Linux foo.math.ucdavis.edu 2.4.21-1.1931.2.349.2.2.entsmp #1 SMP Fri Jul 18
00:06:19 EDT 2003 x86_64 x86_64 x86_64 GNU/Linux

What relationship that has to the original 2.4.21 I know not.

> varying performance due to motherboard, BIOS level and kernel.  (SuSE 8.2 
> Pro comes a modified 2.4.19, but I've also run 2.6.0-test5)

>   Also, are the BIOS settings the same?  And how are the RAM slots 

I don't have access to the SLES bios.

> populated?  That made a difference, too!

I'm well aware of the RAM slot issues, and I've experimentally verified
that the full bandwidth is available.  Basically each cpu will see 2GB/sec
or so to main memory, and both see a total of 3GB/sec if both use
memory simultaneously.

>   (Oh, and I imagine they're both writing to a local disk, or minimal 
> amounts over NFS?  That could play a big part, too.. )

Yeah, both local disk, and not much.  I didn't notice any difference when
I commented out all output.

>   I should have some numbers at some point for how much things vary, but 
> at the moment we've been pretty busy on our systems.  Any more info on 
> this would be great, though, since I've been looking at the faster chips, 
> too!

ACK, I never considered that the opterons might be slower in some ways
at faster clock speeds.  My main suspicious is that MPICH was messaging
passing for local nodes in some strange way and triggering some corner
case under SLES.  I.e. writing an int at a time between CPUs who are
fighting over the same page.

None of my other MPI benchmarks for latency of bandwidth (at various
message sizes) have found any sign of problems.  Numerous recompiles
of MPICH haven't had any effect either.


-- 
Bill Broadley
Mathematics
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lepalom at wol.es  Thu Oct 23 10:17:17 2003
From: lepalom at wol.es (Leopold Palomo Avellaneda)
Date: Thu, 23 Oct 2003 16:17:17 +0200
Subject: OpenMosix,  opinions?
In-Reply-To: <3F97D241.2180EEF8@gmx.de>
References: <200310231133.35912.leopold.palomo@upc.es> <3F97D241.2180EEF8@gmx.de>
Message-ID: <200310231617.17014.lepalom@wol.es>

A Dijous 23 Octubre 2003 15:06, Ferdinand Mahr va escriure:
> Hi Leo,
>
> > Imaging that we have an aplication. A pararell aplication that doesn't
>
> use a
>
> > lot I/O operation, but intensive cpu, and some messages. Something like a
> > pure parallel app. We implement it using PVM or MPI ... MPI. And we make
>
> a
>
> > test, and we have some result.
> >
> > Now, we have our beowulf, with a linux kernel with OpenMosix with a patch
>
> that
>
> > can migrate threads (light weith process, Mighsm,
>
> http://mcaserta.com/maask/)
>
> > or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/,
>
> that
>
> > com from here:
>
> http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/
>
> > benchmark.htm.
> >
> > We have our program, and we change it that use threads for the paralel
> > behaviour and not MPI. And we run the same test. So, what will be better?
>
> Any
>
> > one have tested it?

Hi, 


> I haven't tested your special situation, but here are my thoughts about
> it:
>
> - Why changing an application that you already have? It costs you an
> unnecessary amount of time and money.

Ok, I just explaining an example. If I have to begin from 0, which approach 
will be better?

> - Migshm seems to enable OpenMosix to migrate System V shared memory
> processes, not threads. But, "Threads created using the clone() system
> call can also be migrated using Migshm", that's what you want, right? I
> don't know how well that works, but it limits you to clone(), and I
> don't know if thats sufficient for reasonable thread programming. Still
> (as you mentioned before), you really can only write code that uses
> minimum I/O and interprocess/thread communication because of network
> limitations.

Yes, you are right. However, I hope than soon it will run pure threads. I have 
heart that 2.6 have a lot of improvements in the thread part, but I'm not 
sure.

>
> - Programs using PThreads don't run in parallel with OpenMosix/Migshm,
> they can only be migrated in whole.

Well, Pthreads can migrate with openMosix (not Linux Threads!), without the 
patch. I have understood that. 

> - If your MPI/PVM programs are well designed, they are usually really
> fast and can scale very well when CPU-bound.

The question that I comment is to make the programation of a parallel program 
as a threads programation, and the rest is a job of the kernel in a cluster. 
If this is avalaible, the management of the parallelism will be a job of the 
SO, in a distributed machine.

> - Currently (Open)Mosix is better for load-balancing than HPC,
> especially in clusters with different hardware configurations. In HPC
> clusters, you usually have identical compute nodes.
>
> Hope that helps,

Yes, of course.

Thank's,

regards.

Leo


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gilberto at ula.ve  Thu Oct 23 17:43:14 2003
From: gilberto at ula.ve (Gilberto Diaz)
Date: 23 Oct 2003 17:43:14 -0400
Subject: Oscar 2.3
Message-ID: <1066945394.1200.132.camel@odie>

Hello everybody

   I'm trying to install a small cluster using RH8.0 and oscar 2.3. The
machines has a sis900 NIC (PXE capable) in the motherboard. When I try
to boot the client nodes they not boot because the sis900.o module is
not present. 

   Does anybody have any idea how to load the module in the init image
in order to boot the nodes without change the kernel using the kernel
picker?

Thanks in advance
Regards
Gilberto


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From erwan at mandrakesoft.com  Fri Oct 24 04:48:03 2003
From: erwan at mandrakesoft.com (Erwan Velu)
Date: Fri, 24 Oct 2003 10:48:03 +0200
Subject: CLIC 2, the newest version is out !
Message-ID: <1066985283.32232.57.camel@revolution.mandrakesoft.com>

CLIC is a GPL Linux based distribution made for meeting the HPC needs.

CLIC 2 now allow people to install a full Linux cluster from scratch in
a few hours. This product contains the Linux core system + the
clustering autoconfiguration tools + the deployement tools + MPI stacks
(mpich, lam/mpi). 

CLIC 2 is based on the results of MandrakeClustering, and includes
several major features:
- New backend engine (fully written in perl)
- A new configure step during the server's graphical installation 
- An automated Dual Ethernet configuration (One NIC for computing, One
Nic for administrating)
- A new kernel (2.4.22)
- A new version of urpmi parallel (a parallel rpm installer)
- A graphical tool for managing users (add/remove) : userdrake
- A new node management
  |- You just need to power on a fresh node to install and integrate it
in your cluster !
  |- Fully automated add/remove procedure

And of course the lastest version of the clustering software:
- Maui 3.2.5-p5
- ScalablePBS 1.0-p4
- Ganglia 2.5.4
- Mpich 1.2.5-2
- LAM/MPI 6.5.9 (will be updated when 7.1 will be available)
- PXELinux 2.06

CLIC 2 will no more being compatible with CLIC 1 due to a fully
rewritten backend. This will no more happen in the future but it was
needed as CLIC 1 was a test release.

We hope this product will meet the CLIC community needs.
CLIC 2 is now available on your favorite mirrors in the mandrake-iso
directory.

For example you can found it at
Europe:
ftp://ftp.lip6.fr:/pub/linux/distributions/mandrake-iso/i586/CLIC-2.0-i586.iso
ftp://ftp.mirror.ac.uk:/sites/sunsite.uio.no/pub/unix/Linux/Mandrake/Mandrake-iso/i586/CLIC-2.0-i586.iso
ftp://ftp.tu-chemnitz.de:/pub/linux/mandrake-iso/i586/CLIC-2.0-i586.iso

USA:
ftp://ftp.rpmfind.net:/linux/Mandrake-iso/i586/CLIC-2.0-i586.iso
ftp://mirrors.usc.edu:/pub/linux/distributions/mandrake-iso/i586/CLIC-2.0-i586.iso

The documentation is included inside the cdrom (/doc/) under pdf and
html format. This is the MandrakeClustering documentation based on the
same core, everything is the same except the configuration GUI which is
only available in MandrakeClustering.

All the configuration scripts that DrakCluster (our GUI) uses are
beginning whith the "setup_" prefix.

So for 
auto configurating your server, you use the setup_auto_server.pl script.
adding new nodes to your cluster, you use setup_auto_add_nodes.pl
removing a node, you can use the setup_auto_remove_nodes.pl 

All this scripts have a really easy to learn syntax :)

I hope this release will please every CLIC user, this new generation of
CLIC is really easier to use than the previous releases.

PS: I've been heard that the 2.4.22 kernel brand may seriously damage LG
cdrom drives.  So be carefull with CLIC2 if you own LG cdrom drives,
remove your cdrom drive before installing it.
-
CLIC Website: http://clic.mandrakesoft.com/index-en.html
-- 
Erwan Velu
Linux Cluster Distribution Project Manager
MandrakeSoft
43 rue d'aboukir 75002 Paris
Phone Number : +33 (0) 1 40 41 17 94
Fax Number   : +33 (0) 1 40 41 92 00
Web site     : http://www.mandrakesoft.com
OpenPGP key  : http://www.mandrakesecure.net/cks/ 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From scheinin at crs4.it  Fri Oct 24 11:06:55 2003
From: scheinin at crs4.it (Alan Scheinine)
Date: Fri, 24 Oct 2003 17:06:55 +0200
Subject: A Petaflop machine in 20 racks?
Message-ID: <200310241506.h9OF6tP02285@dali.crs4.it>

I asked ClearSpeed what is the width of the floating point units
and today I received a reply.
The floating point units in the CS301 are 32 bits wide.

A previous email on the subject noted a earlier design
Each PE has an 8 bit ALU for the 256 PE "Fuzion block".
Evidently, this design is different.

My opinion: 32 bits is more than adequate for many signal
processing applications, not so long ago 24 bits was considered
enough for signal processing.  But for simulations of physical
events the "eigenvalues" have a range that makes 32 bit floating
point too small.

regards,
Alan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Daniel.Kidger at quadrics.com  Fri Oct 24 12:09:31 2003
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Fri, 24 Oct 2003 17:09:31 +0100
Subject: A Petaflop machine in 20 racks?
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE233@stegosaurus.bristol.quadrics.com>

> I asked ClearSpeed what is the width of the floating point units
> and today I received a reply.
> The floating point units in the CS301 are 32 bits wide.

Dont forget that www.clearspeed.com used to be www.pixelfusion.com
Their target market at the time was massively parallel SIMD PCI based
graphics engines.

So that is most likely why they use only 32bit floats.


Yours,
Daniel.
(and yes Clearspeed are based in Bristol,UK but are nothing to do with us.)

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Tue Oct 28 11:09:54 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Tue, 28 Oct 2003 11:09:54 -0500
Subject: SFF boxes for a cluster?
Message-ID: <3F9E94D2.3020307@lmco.com>

Good morning,

   I've seen a few cluster made from the Small Form Factor
(SFF) boxes including "Space Simulator". Has anyone else
made a decent size cluster (n > 16) from these boxes? If so,
how has the reliability been?

Thanks!

Jeff

-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Peter.Lindgren at experian.com  Tue Oct 28 13:19:11 2003
From: Peter.Lindgren at experian.com (Lindgren, Peter)
Date: Tue, 28 Oct 2003 10:19:11 -0800
Subject: SFF boxes for a cluster?
Message-ID: <C3A1925EF61C0941922F990DCEBB7B1101396B72@schexch2.ems.us.experian.local>

We have had 48 Dell GX260 SFF boxes in production since March without a
single hardware failure.

Peter
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eric at fnordsystems.com  Tue Oct 28 15:12:38 2003
From: eric at fnordsystems.com (Eric Kuhnke)
Date: Tue, 28 Oct 2003 12:12:38 -0800
Subject: Beowulf digest, Vol 1 #1515 - 1 msg
In-Reply-To: <09AE3D324A22D511A1A50002A5289F2101030E0A@lao-exchpo1-nt.nv
 .doe.gov>
Message-ID: <5.2.0.9.2.20031028120820.04272e60@216.82.101.6>

One serious problem with the Shuttle and most competing "small form factor" 
PCs is the air intake, which is located on the sides.  You can't put them 
flush with each other side-by-side on shelves...  Most minitower or 
midtower ATX cases (and proper 1U or 2U cases) have air intake entirely on 
the front panel.

air intake on the left side:
http://www.sfftech.com/showdocs.cfm?aid=447

At 11:45 AM 10/28/2003 -0800, you wrote:
>I've got one of those SS51G's at home and I love it.  My only complaint is
>that it does get a bit warm with a video card, but for a cluster you wont
>need one.
>
>-----Original Message-----
>From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com]
>Sent: Tuesday, October 28, 2003 10:07 AM
>To: beowulf at beowulf.org
>Subject: Beowulf digest, Vol 1 #1515 - 1 msg
>
>
>Send Beowulf mailing list submissions to
>         beowulf at beowulf.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.beowulf.org/mailman/listinfo/beowulf
>or, via email, send a message with subject or body 'help' to
>         beowulf-request at beowulf.org
>
>You can reach the person managing the list at
>         beowulf-admin at beowulf.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Beowulf digest..."
>
>
>Today's Topics:
>
>    1. SFF boxes for a cluster? (Jeff Layton)
>
>--__--__--
>
>Message: 1
>Date: Tue, 28 Oct 2003 11:09:54 -0500
>From: Jeff Layton <jeffrey.b.layton at lmco.com>
>Subject: SFF boxes for a cluster?
>To: beowulf at beowulf.org
>Reply-to: jeffrey.b.layton at lmco.com
>Organization: Lockheed-Martin Aeronautics Company
>
>Good morning,
>
>    I've seen a few cluster made from the Small Form Factor
>(SFF) boxes including "Space Simulator". Has anyone else
>made a decent size cluster (n > 16) from these boxes? If so,
>how has the reliability been?
>
>Thanks!
>
>Jeff
>
>--
>Dr. Jeff Layton
>Aerodynamics and CFD
>Lockheed-Martin Aeronautical Company - Marietta
>
>
>
>
>--__--__--
>
>_______________________________________________
>Beowulf mailing list
>Beowulf at beowulf.org
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>
>End of Beowulf Digest
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ZukaitAJ at nv.doe.gov  Tue Oct 28 14:45:02 2003
From: ZukaitAJ at nv.doe.gov (Zukaitis, Anthony)
Date: Tue, 28 Oct 2003 11:45:02 -0800
Subject: Beowulf digest, Vol 1 #1515 - 1 msg
Message-ID: <09AE3D324A22D511A1A50002A5289F2101030E0A@lao-exchpo1-nt.nv.doe.gov>

I've got one of those SS51G's at home and I love it.  My only complaint is
that it does get a bit warm with a video card, but for a cluster you wont
need one.

-----Original Message-----
From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com]
Sent: Tuesday, October 28, 2003 10:07 AM
To: beowulf at beowulf.org
Subject: Beowulf digest, Vol 1 #1515 - 1 msg


Send Beowulf mailing list submissions to
	beowulf at beowulf.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://www.beowulf.org/mailman/listinfo/beowulf
or, via email, send a message with subject or body 'help' to
	beowulf-request at beowulf.org

You can reach the person managing the list at
	beowulf-admin at beowulf.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beowulf digest..."


Today's Topics:

   1. SFF boxes for a cluster? (Jeff Layton)

--__--__--

Message: 1
Date: Tue, 28 Oct 2003 11:09:54 -0500
From: Jeff Layton <jeffrey.b.layton at lmco.com>
Subject: SFF boxes for a cluster?
To: beowulf at beowulf.org
Reply-to: jeffrey.b.layton at lmco.com
Organization: Lockheed-Martin Aeronautics Company

Good morning,

   I've seen a few cluster made from the Small Form Factor
(SFF) boxes including "Space Simulator". Has anyone else
made a decent size cluster (n > 16) from these boxes? If so,
how has the reliability been?

Thanks!

Jeff

-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


--__--__--

_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf


End of Beowulf Digest

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From periea at bellsouth.net  Tue Oct 28 16:08:49 2003
From: periea at bellsouth.net (periea at bellsouth.net)
Date: Tue, 28 Oct 2003 16:08:49 -0500
Subject: SAS running on compute nodes
Message-ID: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net>

Hello All,

Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA...

Phil...

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rossini at blindglobe.net  Tue Oct 28 17:30:35 2003
From: rossini at blindglobe.net (A.J. Rossini)
Date: Tue, 28 Oct 2003 14:30:35 -0800
Subject: SAS running on compute nodes
In-Reply-To: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> (periea@bellsouth.net's
 message of "Tue, 28 Oct 2003 16:08:49 -0500")
References: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net>
Message-ID: <858yn5m1v8.fsf@blindglobe.net>

<periea at bellsouth.net> writes:


> Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA...

Sure, as a bunch of singleton processes.    I don't think you can do
much more than that (but would be interested if I'm wrong).

best,
-tony

-- 
rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be
confidential and privileged. If you received this message in error,
please destroy it and notify the sender. Thank you.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gabriele.butti at unimib.it  Tue Oct 28 04:58:04 2003
From: gabriele.butti at unimib.it (Butti Gabriele - Dottorati di Ricerca)
Date: 28 Oct 2003 10:58:04 +0100
Subject: opteron VS Itanium 2
Message-ID: <1067335084.12500.63.camel@tantalio.mater.unimib.it>

Dear all, 
      we are planning to build up a new cluster (16 nodes) before this
year's end; we are evaluating different proposals from machine sellers,
but the main doubt we have at this moment is whether choosing an Itanium
2 architecture or an AMD Opteron one. 

I know that ther's had already been on this list a debate on such a
topic, but maybe some of you has some new experience to tell about. 

There is a wild bunch of benchmarks on these machines, but we fear that
these are somewhat misleading and are not designed to test CPU's for
intense scientific computing. The code we want to run on these machines
is basically a home-made code, not fully optimized, which allocates
around 500 Mb of RAM per node. Communication between nodes is a quite
rare event and does not affect much computation time. In the past we had
a very nice experience using Alpha CPU's which performed very well.

To sum up, the question is: is the Itanium2 worth the price difference
or is the Opteron the best choice?

Thank you all

Gabriele Butti
-- 
                                 \\|//
                                -(o o)-
              /------------oOOOo--(_)--oOOOo-------------\
              |                                          |
              |             Gabriele Butti               |
              |        -----------------------           |
              |      Department of Material Science      |     
              |      University of Milano-Bicocca        |     
              |      Via Cozzi 53, 20125 Milano, ITALY   |     
              |      Tel (+39)02 64485214                |           
              |             .oooO   Oooo.                |
              \--------------(   )---(   )---------------/
                              \ (     ) /
                               \_)   (_/


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jim at ks.uiuc.edu  Tue Oct 28 21:23:48 2003
From: jim at ks.uiuc.edu (Jim Phillips)
Date: Tue, 28 Oct 2003 20:23:48 -0600 (CST)
Subject: opteron VS Itanium 2
In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
Message-ID: <Pine.GSO.4.44.0310282001040.12264-100000@verdun.ks.uiuc.edu>

Hi,

The Athlon design has some Alpha blood in it, and in my experience they
both excel on branchy, unoptimized, float-intensive code.  The Opteron is
similar to the Athlon, but I wouldn't bother with 64-bit unless you're
actually going to use more than 2 GB of memory per node.  Athlon vs
Pentium 4 or Xeon is a closer match, and you really need to run some
benchmarks to decide between them.  If you have access to an Opteron you
should benchmark it as well, since I've heard they fly on some problems.

Itanium 2 (Madison) is the current NAMD speed champ (although it's tied
with a hyperthreaded P4 running multithreaded code), but it took some
serious work to get the inner loops to the point that the Intel compiler
could software pipeline them to get decent performance.  I've heard that
some Fortran codes had an easier time of it.  Big branches really hurt.

-Jim


On 28 Oct 2003, Butti Gabriele - Dottorati di Ricerca wrote:

> Dear all,
>       we are planning to build up a new cluster (16 nodes) before this
> year's end; we are evaluating different proposals from machine sellers,
> but the main doubt we have at this moment is whether choosing an Itanium
> 2 architecture or an AMD Opteron one.
>
> I know that ther's had already been on this list a debate on such a
> topic, but maybe some of you has some new experience to tell about.
>
> There is a wild bunch of benchmarks on these machines, but we fear that
> these are somewhat misleading and are not designed to test CPU's for
> intense scientific computing. The code we want to run on these machines
> is basically a home-made code, not fully optimized, which allocates
> around 500 Mb of RAM per node. Communication between nodes is a quite
> rare event and does not affect much computation time. In the past we had
> a very nice experience using Alpha CPU's which performed very well.
>
> To sum up, the question is: is the Itanium2 worth the price difference
> or is the Opteron the best choice?
>
> Thank you all
>
> Gabriele Butti
> --
>                                  \\|//
>                                 -(o o)-
>               /------------oOOOo--(_)--oOOOo-------------\
>               |                                          |
>               |             Gabriele Butti               |
>               |        -----------------------           |
>               |      Department of Material Science      |
>               |      University of Milano-Bicocca        |
>               |      Via Cozzi 53, 20125 Milano, ITALY   |
>               |      Tel (+39)02 64485214                |
>               |             .oooO   Oooo.                |
>               \--------------(   )---(   )---------------/
>                               \ (     ) /
>                                \_)   (_/
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From smuelas at mecanica.upm.es  Wed Oct 29 04:30:28 2003
From: smuelas at mecanica.upm.es (smuelas)
Date: Wed, 29 Oct 2003 10:30:28 +0100
Subject: opteron VS Itanium 2
In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
References: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
Message-ID: <20031029103028.5b7a89a7.smuelas@mecanica.upm.es>

Why don't you try a more humble Athlon, (2800 will be enough and you can use DRAM at 400). 
You will economize a lot of money and for intensive operation it is very, very quick. 
I have a small cluster with 8 nodes and Athlon 2400 and the results are astonishing. The important point is the motherboard, and nforce is great.


On 28 Oct 2003 10:58:04 +0100
gabriele.butti at unimib.it (Butti Gabriele - Dottorati di Ricerca) wrote:

> Dear all, 
>       we are planning to build up a new cluster (16 nodes) before this
> year's end; we are evaluating different proposals from machine sellers,
> but the main doubt we have at this moment is whether choosing an Itanium
> 2 architecture or an AMD Opteron one. 
> 
> I know that ther's had already been on this list a debate on such a
> topic, but maybe some of you has some new experience to tell about. 
> 
> There is a wild bunch of benchmarks on these machines, but we fear that
> these are somewhat misleading and are not designed to test CPU's for
> intense scientific computing. The code we want to run on these machines
> is basically a home-made code, not fully optimized, which allocates
> around 500 Mb of RAM per node. Communication between nodes is a quite
> rare event and does not affect much computation time. In the past we had
> a very nice experience using Alpha CPU's which performed very well.
> 
> To sum up, the question is: is the Itanium2 worth the price difference
> or is the Opteron the best choice?
> 
> Thank you all
> 
> Gabriele Butti
> -- 
>                                  \\|//
>                                 -(o o)-
>               /------------oOOOo--(_)--oOOOo-------------\
>               |                                          |
>               |             Gabriele Butti               |
>               |        -----------------------           |
>               |      Department of Material Science      |     
>               |      University of Milano-Bicocca        |     
>               |      Via Cozzi 53, 20125 Milano, ITALY   |     
>               |      Tel (+39)02 64485214                |           
>               |             .oooO   Oooo.                |
>               \--------------(   )---(   )---------------/
>                               \ (     ) /
>                                \_)   (_/
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Santiago Muelas
E.T.S. Ingenieros de Caminos, (U.P.M)    Tf.: (34) 91 336 66 59
e-mail: smuelas at mecanica.upm.es          Fax: (34) 91 336 67 61
www: http://w3.mecanica.upm.es/~smuelas
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csmith at platform.com  Wed Oct 29 10:01:58 2003
From: csmith at platform.com (Chris Smith)
Date: Wed, 29 Oct 2003 07:01:58 -0800
Subject: SAS running on compute nodes
In-Reply-To: <858yn5m1v8.fsf@blindglobe.net>
References: 	 <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net>
	 <858yn5m1v8.fsf@blindglobe.net>
Message-ID: <1067439718.3742.53.camel@plato.dreadnought.org>

On Tue, 2003-10-28 at 14:30, A.J. Rossini wrote:
> <periea at bellsouth.net> writes:
> 
> 
> > Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA...
> 
> Sure, as a bunch of singleton processes.    I don't think you can do
> much more than that (but would be interested if I'm wrong).
> 
Actually ... you can after a fashion. SAS has something called MP
CONNECT as part of the SAS/CONNECT product which allows you to call out
to other SAS processes to have them run code for you, so you can do
parallel SAS programs.

http://support.sas.com/rnd/scalability/connect/index.html

-- Chris


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Wed Oct 29 10:11:19 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Wed, 29 Oct 2003 09:11:19 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310291511.h9TFBJx10935@mycroft.ahpcrc.org>


On Tue Oct 28 19:26:25 2003, Gabriele Butti wrote:

>To sum up, the question is: is the Itanium2 worth the price difference
>or is the Opteron the best choice?

 The SpecFP2000 performance difference between the best I2 and best
 Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000).
 The 1.5 GHz I2 with the 6MB cache is very expensive with a recent 
 estimate here for dual processor nodes with the >>smaller<< cache at over 
 $12,000 per node when Myrinet interconnect costs and other incidentals
 are included.  A dual Opteron 246 at 2.0 GHz with the same interconnect 
 and incidentals included was about $4,250.  Top of the line Pentium 4 
 duals again with same interconnect and incidentals about $750 less
 at $3,500. 

 For bandwidth/memory intensive codes, I think the Opteron is a clear
 winner in a dual processor configuration because of its dual channel
 to memory design.  Stream triad bandwidth during SMP operation is
 ~50% more than a one processor test.  Both the dual Pentium 4 and Itanium 
 2 share their memory bus and split (with some loss) the bandwidth in 
 dual mode. 

 In a single processor configuration the conclusion is less clear.  Itanium's 
 spec numbers are very impressive, but still not high enough to win on price 
 performance.  The new Pentium 4 3.2 GHz Extremem Edition with its 4x200 FSB 
 has very good SpecFP2000 numbers out performing the Opteron by about 100 spec 
 points and may be the best price performance choice in a single processor 
 configuration.

 But of course the above logic means nothing with a benchmark of >>your<<
 application and specific vendor quotes in >>your<< hands.

 rbw

#---------------------------------------------------
# Richard Walsh
# Project Manager, Cluster Computing, Computational
#                  Chemistry and Finance
# netASPx, Inc.
# 1200 Washington Ave. So.
# Minneapolis, MN 55415
# VOX:    612-337-3467
# FAX:    612-337-3400
# EMAIL:  rbw at networkcs.com, richard.walsh at netaspx.com
#         rbw at ahpcrc.org
#---------------------------------------------------
# Nullum magnum ingenium sine mixtura dementiae fuit. 
#                                  - Seneca 
#---------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctierney at hpti.com  Wed Oct 29 11:13:27 2003
From: ctierney at hpti.com (Craig Tierney)
Date: 29 Oct 2003 09:13:27 -0700
Subject: opteron VS Itanium 2
In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
References: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
Message-ID: <1067444007.6209.16.camel@hpti10.fsl.noaa.gov>

On Tue, 2003-10-28 at 02:58, Butti Gabriele - Dottorati di Ricerca
wrote:
> Dear all, 
>       we are planning to build up a new cluster (16 nodes) before this
> year's end; we are evaluating different proposals from machine sellers,
> but the main doubt we have at this moment is whether choosing an Itanium
> 2 architecture or an AMD Opteron one. 
> 
> I know that ther's had already been on this list a debate on such a
> topic, but maybe some of you has some new experience to tell about. 
> 
> There is a wild bunch of benchmarks on these machines, but we fear that
> these are somewhat misleading and are not designed to test CPU's for
> intense scientific computing. The code we want to run on these machines
> is basically a home-made code, not fully optimized, which allocates
> around 500 Mb of RAM per node. Communication between nodes is a quite
> rare event and does not affect much computation time. In the past we had
> a very nice experience using Alpha CPU's which performed very well.
> 
> To sum up, the question is: is the Itanium2 worth the price difference
> or is the Opteron the best choice?
> 

Why don't you run your codes on the two platforms and figure it
out for yourself?  Better yet, get the vendors to do it.

I have seen cases where Itanium 2 performs much better than Opteron,
justifying the price difference.  Other codes did not show the same
difference, but both were faster than a Xeon.

Craig

> Thank you all
> 
> Gabriele Butti

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Thomas.Alrutz at dlr.de  Wed Oct 29 11:15:48 2003
From: Thomas.Alrutz at dlr.de (Thomas Alrutz)
Date: Wed, 29 Oct 2003 17:15:48 +0100
Subject: opteron VS Itanium 2
References: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
Message-ID: <3F9FE7B4.1000607@dlr.de>

Hi Gabriele,

we have bought a similar Linux Cluster (16 nodes) you are lokking for 
with the smallest dual Opteron 240 (1.4 GHz) and two Gigabit networks
(one for communications (MPI) and one for nfs).

> Dear all, 
>       we are planning to build up a new cluster (16 nodes) before this
> year's end; we are evaluating different proposals from machine sellers,
> but the main doubt we have at this moment is whether choosing an Itanium
> 2 architecture or an AMD Opteron one. 
> 
> I know that ther's had already been on this list a debate on such a
> topic, but maybe some of you has some new experience to tell about. 
> 

The nodes have all 2 GB RAM (4*512 MB DDR333 REG), 2 Gigabit NICs 
(Broadcom onboard) and a Harddisk. The board we had choosen was the 
Rioworks HDAMA. I know it is not cheap, but it is stable and 
performances well with the SUSE/United Linux Enterprise Edition.

> There is a wild bunch of benchmarks on these machines, but we fear that
> these are somewhat misleading and are not designed to test CPU's for
> intense scientific computing. The code we want to run on these machines
> is basically a home-made code, not fully optimized, which allocates
> around 500 Mb of RAM per node. Communication between nodes is a quite
> rare event and does not affect much computation time. In the past we had
> a very nice experience using Alpha CPU's which performed very well.

We have done some benchmarking with our TAU-Code (unstructured finite 
volume CFD-code, in multigrid), which hangs extremly on the memory 
bandwith and latency. Therefore we tested 4 different architectures:

1. AMD Athlon MP 1.8 GHz   FSB 133 MHZ - with gcc3.2 in 32 Bit
2. Intel Xeon 2.66 GHz     FSB 133 MHZ - with icc7   in 32 bit
3. Intel Itanium2  1.0 GHz FSB 100 MHZ - with ecc6   in 64 Bit
4. AMD Opteron 240 1.4 GHz FSB 155 MHZ - with gcc3.2 in 64 Bit

For the benchmark we used a "real life" example (aircraft configuration 
with wing, body and engine - approx. 2 million grid points) which 
desires 1.3 GB to 1.7 GB for the job (1 process)
We have performed 30 iterations (Navier Stokes calculation - Spalart 
Allmares - central scheme - multigrid cycle) and taken the total 
(Wallclock) time.

> 
> To sum up, the question is: is the Itanium2 worth the price difference
> or is the Opteron the best choice?

To answer your question take a look on the following chart :

All times in seconds for 1 cpu on the node in use

1. AMD Athlon MP 1.8 GHz   - 30 iter. = 3642.4 sec.
2. Intel Xeon 2.66 GHz     - 30 iter. = 2151.4 sec. <- fastest
3. Intel Itanium2  1.0 GHz - 30 iter. = 3571.8 sec.
4. AMD Opteron 240 1.4 GHz - 30 iter. = 2256.5 sec.

and 2 cpu on the node in use (2 process via MPI)

1. AMD Athlon MP 1.8 GHz   - 30 iter. = 2076.1 sec.
2. Intel Xeon 2.66 GHz     - 30 iter. = 1447.8 sec
3. Intel Itanium2  1.0 GHz - 30 iter. = 1842.8 sec.
4. AMD Opteron 240 1.4 GHz - 30 iter. = 1159.5 sec. <-- fastest

So here you can see why we had to choose an Opteron based node to build 
up the cluster.
The price/performance ratio for the Opteron machine is verry good 
compared to the itanium2 machines.
And the Xeons are not so much cheaper....

Thomas
-- 
  __/|__ | Dipl.-Math. Thomas Alrutz
/_/_/_/ | DLR Institut fuer Aerodynamik und Stroemungstechnik
   |/    | Numerische Verfahren
     DLR | Bunsenstr. 10
         | D-37073 Goettingen/Germany


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel at labtie.mmt.upc.es  Wed Oct 29 14:16:43 2003
From: daniel at labtie.mmt.upc.es (Daniel Fernandez)
Date: Wed, 29 Oct 2003 20:16:43 +0100
Subject: Video-less nodes
Message-ID: <1067455003.21980.11.camel@qeldroma.cttc.org>

Hi all,

I would like to get some opinions about video-less nodes in a cluster,
we know that there is no problem about monitoring nodes remotely and
reading logs but I suppose that in a kernel panic situation there's some
valuable on-screen information... ? any thoughts ?

Of course there's the possibility about putting really cheap video cards
just that we'll able to see the text screen , nothing more ;)

-- 
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Laboratori de Termot?cnia i Energia - CTTC
UPC Campus Terrassa


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Wed Oct 29 15:45:25 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Wed, 29 Oct 2003 12:45:25 -0800 (PST)
Subject: Video-less nodes
In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310291244380.22963-100000@twin.uoregon.edu>

On Wed, 29 Oct 2003, Daniel Fernandez wrote:

> Hi all,
> 
> I would like to get some opinions about video-less nodes in a cluster,
> we know that there is no problem about monitoring nodes remotely and
> reading logs but I suppose that in a kernel panic situation there's some
> valuable on-screen information... ? any thoughts ?

console on serial... let your terminal server collect oopses...

 
> Of course there's the possibility about putting really cheap video cards
> just that we'll able to see the text screen , nothing more ;)
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Wed Oct 29 16:41:21 2003
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Wed, 29 Oct 2003 16:41:21 -0500 (EST)
Subject: Video-less nodes
In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310291639030.28819-100000@chaos.egr.duke.edu>

On Wed, 29 Oct 2003 at 8:16pm, Daniel Fernandez wrote

> I would like to get some opinions about video-less nodes in a cluster,
> we know that there is no problem about monitoring nodes remotely and
> reading logs but I suppose that in a kernel panic situation there's some
> valuable on-screen information... ? any thoughts ?
> 
> Of course there's the possibility about putting really cheap video cards
> just that we'll able to see the text screen , nothing more ;)

As always, the answer is it depends.  A serial console should handle all 
your needs.  But sometimes the BIOS sucks or the console doesn't work 
right or...

IMHO, unless it messes other stuff up (e.g. drags your only PCI bus down 
to 32/33), there's not much reason *not* to stuff cheap video boards into 
nodes. 

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct 29 17:00:47 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 29 Oct 2003 17:00:47 -0500 (EST)
Subject: Video-less nodes
In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310291658140.10069-100000@ganesh.phy.duke.edu>

On Wed, 29 Oct 2003, Daniel Fernandez wrote:

> Hi all,
> 
> I would like to get some opinions about video-less nodes in a cluster,
> we know that there is no problem about monitoring nodes remotely and
> reading logs but I suppose that in a kernel panic situation there's some
> valuable on-screen information... ? any thoughts ?
> 
> Of course there's the possibility about putting really cheap video cards
> just that we'll able to see the text screen , nothing more ;)

To my direct experience, the extra time you waste debugging problems on
videoless nodes by hauling them out of the rack, sticking video in them,
resolving the problem, removing the video, and reinserting the nodes is
far more costly than cheap video, or better yet onboard video (many/most
good motherboards have onboard video these days) and being able to
resolve many of these problems without deracking the nodes.

Just my opinion of course.  When things go well, of course, it doesn't
matter.

Just think about the labor involved in a single BIOS reflash, for
example.

   rgb

> 
> -- 
> Daniel Fernandez <daniel at labtie.mmt.upc.es>
> Laboratori de Termot?cnia i Energia - CTTC
> UPC Campus Terrassa
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Wed Oct 29 22:00:09 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 29 Oct 2003 22:00:09 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310291511.h9TFBJx10935@mycroft.ahpcrc.org>
Message-ID: <Pine.LNX.4.44.0310291847260.16540-100000@coffee.psychology.mcmaster.ca>

> >To sum up, the question is: is the Itanium2 worth the price difference
> >or is the Opteron the best choice?
> 
>  The SpecFP2000 performance difference between the best I2 and best
>  Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000).

which to me indicates that the working set of SPEC codes is a good 
match to the cache of high-end It2's.  this says nothing about It2's,
but rather points out that SPEC components are nearly obsolete
(required to run well in just 64MB core, if I recall correctly!)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jmdavis at mail2.vcu.edu  Wed Oct 29 15:12:20 2003
From: jmdavis at mail2.vcu.edu (Mike Davis)
Date: Wed, 29 Oct 2003 15:12:20 -0500
Subject: Video-less nodes
In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org>
References: <1067455003.21980.11.camel@qeldroma.cttc.org>
Message-ID: <3FA01F24.2090405@mail2.vcu.edu>

The onscreen info should also be logged. And then there's always the 
crash files.

We now have a couple of clusters with videoless nodes (although they are 
on serial switches, Cyclades).

Mike


Daniel Fernandez wrote:

>Hi all,
>
>I would like to get some opinions about video-less nodes in a cluster,
>we know that there is no problem about monitoring nodes remotely and
>reading logs but I suppose that in a kernel panic situation there's some
>valuable on-screen information... ? any thoughts ?
>
>Of course there's the possibility about putting really cheap video cards
>just that we'll able to see the text screen , nothing more ;)
>
>  
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andreas.boklund at htu.se  Thu Oct 30 01:57:35 2003
From: andreas.boklund at htu.se (andreas boklund)
Date: Thu, 30 Oct 2003 07:57:35 +0100
Subject: opteron VS Itanium 2
Message-ID: <sfa0d51e.022@webaccess.gw.htu.se>

Just a note,


> For bandwidth/memory intensive codes, I think the Opteron is a clear
> winner in a dual processor configuration because of its dual channel
> to memory design.  Stream triad bandwidth during SMP operation is
> ~50% more than a one processor test.  Both the dual Pentium 4 and Itanium 
> 2 share their memory bus and split (with some loss) the bandwidth in 
> dual mode. 

This is true as long as you are using an applicaiton where one process has its own
memory area. If you would have 2 processes and shared memory the Opt, would
behave like a small NUMA machine and a process will get a penalty for accessing
another process (processors) memory segment.

To quote D. Barron, "If it seems to be to good to be true, it probably is!", i have never
yet seen true linear scalability, and with Ahmdahl out there i doubt that i ever will.

Best
//Andreas


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Oct 30 10:51:41 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 30 Oct 2003 10:51:41 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <sfa0d51e.022@webaccess.gw.htu.se>
Message-ID: <Pine.LNX.4.44.0310301042430.20019-100000@coffee.psychology.mcmaster.ca>

> > For bandwidth/memory intensive codes, I think the Opteron is a clear
> > winner in a dual processor configuration because of its dual channel
> > to memory design.  Stream triad bandwidth during SMP operation is
> > ~50% more than a one processor test.  Both the dual Pentium 4 and Itanium 
> > 2 share their memory bus and split (with some loss) the bandwidth in 
> > dual mode. 

this is particularly bad on "high-end" machines.  for instance, several 
machines have 4 it2's on a single FSB.  there's a reason that specfprate
scales so much better on 1/2/4-way opterons than on 1/2/4-way it2's.

don't even get me started about those old profusion-chipset 8-way
PIII machines that Intel pushed for a while...

> This is true as long as you are using an applicaiton where one process has its own
> memory area. If you would have 2 processes and shared memory the Opt, would
> behave like a small NUMA machine and a process will get a penalty for accessing
> another process (processors) memory segment.

huh?  sharing data behaves pretty much the same on opteron systems
(broadcast-based coherency) as on shared-FSB (snoopy) systems.  it's not
at all clear yet whether opterons are higher latency in the case where 
you have *often*written* shared data.

it is perfectly clear that shared/snoopy buses don't scale, and neither does
pure broadcast coherency.  I figure that both Intel and AMD will be adding
some sort of directory support in future machines.  if they bother, that is - 
the market for many-way SMP is definitely not huge, at least not in the 
mass-market sense.

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Thu Oct 30 10:57:00 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Thu, 30 Oct 2003 09:57:00 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org>


On Wed Oct 29 21:38:48 2003, Mark Hahn wrote:

>> >To sum up, the question is: is the Itanium2 worth the price difference
>> >or is the Opteron the best choice?
>> 
>>  The SpecFP2000 performance difference between the best I2 and best
>>  Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000).
>
>which to me indicates that the working set of SPEC codes is a good 
>match to the cache of high-end It2's.  this says nothing about It2's,
>but rather points out that SPEC components are nearly obsolete
>(required to run well in just 64MB core, if I recall correctly!)

Of course, there is some truth to what you say, but "this says nothing about 
It2" seems a tad dramatic (but ... definitely in character ... ;-) ).  Below is 
the memory table for most of the benchmarks.  A few fit in the 6MB cache (although
some surely should, as some codes do or can be made too fit into cache).  Many 
are in the 100 to 200 MB range.  The floating point accumen of the I2 chip is hard 
to question with the capability of performing 4 64-bit flops per clock (that's 
a 6.0 GFlops peak at 1.5 GHz; 12.0 at 32-bits).  Moreover, even an I2 with 1/2 the 
Opteron's clock and only 50% more cache (L3 vs L2) performs more or less equal to the 
Opteron 246 on SpecFP2000.  

And after all a huge cache does raise the average memory bandwidth felt by the 
average code ... ;-) (even as average codes sizes grow) ... and a large node count 
divides the total memory required per node.  Large clusters should love large caches
... you know the quest for super-linear speed ups.

The I2's weakness is in price-performance and in memory bandwidth in SMP configurations
in my view. My last line in the prior note was a reminder to the original poster
that SpecFP numbers are not a final answer.  I repeated the "benchmark you code"
mantra ... partly to relieve Bob Brown of his responsibility to do so ;-). 

Got any snow up in the Great White North yet?

Regards,

rbw


            max     max     num  num
            rsz     vsz     obs  unchanged  stable?
            -----   -----   ---  ---------  -------
gzip        180.0   199.0   181      68
vpr          50.0    53.6   151       6
gcc         154.0   156.0   134       0
mcf         190.0   190.0   232     230     stable
crafty        2.0     2.6   107     106     stable
parser       37.0    66.8   263     254     stable
eon           0.6     1.5   130       0
perlbmk     146.0   158.0   186       0
gap         192.0   194.0   149     148     stable
vortex       72.0    79.4   162       0
bzip2       185.0   199.0   153       6
twolf         3.4     4.0   273       0

wupwise     176.0   177.0   185     181     stable
swim        191.0   192.0   322     320     stable
mgrid        56.0    56.7   281     279     stable
applu       181.0   191.0   371     369     stable
mesa          9.4    23.1   132     131     stable
galgel       63.0   155.0   287      59
art           3.7     4.3   157      37
equake       49.0    49.4   218     216     stable
facerec      16.0    18.5   182     173     stable
ammp         26.0    28.4   277     269     stable
lucas       142.0   143.0   181     179     stable
fma3d       103.0   105.0   268     249     stable
sixtrack     26.0    59.8   148     141     stable
apsi        191.0   192.0   271     270     stable


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Thu Oct 30 11:07:25 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Thu, 30 Oct 2003 10:07:25 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310301607.h9UG7PX06372@mycroft.ahpcrc.org>


On Thu, 30 Oct 2003 07:57, Andreas Boklund wrote:

>Just a note,
>
>> For bandwidth/memory intensive codes, I think the Opteron is a clear
>> winner in a dual processor configuration because of its dual channel
>> to memory design.  Stream triad bandwidth during SMP operation is
>> ~50% more than a one processor test.  Both the dual Pentium 4 and Itanium 
>> 2 share their memory bus and split (with some loss) the bandwidth in 
>> dual mode. 
>
>This is true as long as you are using an applicaiton where one process has its own
>memory area. If you would have 2 processes and shared memory the Opt, would
>behave like a small NUMA machine and a process will get a penalty for accessing
>another process (processors) memory segment.
>
>To quote D. Barron, "If it seems to be to good to be true, it probably is!", i have never
>yet seen true linear scalability, and with Ahmdahl out there i doubt that i ever will.

 Agreed.  Of course, in the case of dual Pentium and Itaniums, even non-
 overlapping memory locations buy you nothing bandwidth-wise.  Small or large 
 scale perfect cross-bars to memory are tough and expensive. The Cray X1, with
 all its customer design effort and great total bandwidth on the node board,
 targeted only 1/4 of peak-data-required iin its design and delivers less under 
 the full load of its 16-way SMP vector engines.

 And it's node board is probably the best bandwidth engine in the world at the 
 moment.

 Regards,

 rbw

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Thu Oct 30 12:28:45 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Thu, 30 Oct 2003 11:28:45 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310301728.h9UHSj508273@mycroft.ahpcrc.org>


On Thu, 30 Oct 2003 12:00:54, Mark Hahn wrote:

>> Of course, there is some truth to what you say, but "this says nothing about 
>> It2" seems a tad dramatic (but ... definitely in character ... ;-) ).  Below is 
>
>all the world's a stage ;)

  Life without drama is life without the pursuit of happiness ... ;-).

>> the memory table for most of the benchmarks.  A few fit in the 6MB cache (although
>> some surely should, as some codes do or can be made too fit into cache).  Many 
>
>seriously, the memory access patterns of very few apps are uniform
>across their rss.  I probably should have said "working set fits in 6M".

 Good point, most memory accesses are not globally stride-one.  But of course 
 this fact leads us back to the idea that cache >>is<< important for a suite
 of "representative codes".

>and you're right; I just reread the spec blurb, and their aim was 100-200MB.
>
>> are in the 100 to 200 MB range.  The floating point accumen of the I2 chip is hard 
>
>that's max rss; it's certainly an upper bound on working set size,
>but definitely not a good estimator.  

 Yes, an upper bound.  We would need more data on the Spec codes to know 
 if the working sets are mostly sitting in the I2 cache.  There is an 
 inevitable dynamism here with larger caches swallowing up larger and larger
 chunks of the "average code's" working set and while the average working
 set grows over time.

>in other words, it tells you something about the peak number of pages that 
>the app ever touches.  it doesn't tell you whether 95% of those pages are 
>never touched again, or whether the app only touches 1 cacheline per page.
>
>in yet other words, max rss is relevant to swapping, not cache behavior.

 You might also say it this way ... cache-exceeding, max-RSS magnitude by 
 itself does guarantee the elimination of unwanted cache effects.
 
>
>> And after all a huge cache does raise the average memory bandwidth felt by the 
>> average code ... ;-) (even as average codes sizes grow) ... and a large node count 
>
>even though Spec uses geo-mean, it can strongly be influenced by outliers,
>as we've all seen with Sun's dramatic "performance improvements" ;)
>
>in particular, 179.art is a good example.  I actually picked it out by
>comparing the specFP barchart for mckinley vs madison - it shows a fairly
>dramatic improvement.  this *could* be due to compiler improvements,
>but given that 179.art has a peak RSS of 3.7MB, I think there's a real
>cache effect here.

 I agree again, but would say that such a suite as SpecFP should 
 include some codes that yield to cache-effects because some real
 world codes do.

 Always learn or am reminded of something from your posts Mark ... keep on
 keeping us honest and true ;-) like a Canadian Mountie.

 Regards,

 rbw

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Oct 30 12:45:20 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 30 Oct 2003 12:45:20 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310301728.h9UHSj508273@mycroft.ahpcrc.org>
Message-ID: <Pine.LNX.4.44.0310301231150.20955-100000@coffee.psychology.mcmaster.ca>

>  this fact leads us back to the idea that cache >>is<< important for a suite
>  of "representative codes".

yes, certainly, and TBBIYOC (*).  but the traditional perhaps slightly 
stodgy attitude towards this has been that caches do not help machine
balance.  that is, it2 has a peak/theoretical 4flops/cycle, but since 
that would require, worstcase, 3 doubles per flop, the highest-ranked 
CPU is actually imbalanced by a factor of 22.5!

(*) the best benchmark is your own code

let's step back a bit.  suppose we were designing a new version of SPEC,
and wanted to avoid every problem that the current benchmarks have.
here are some partially unworkable ideas:

keep geometric mean, but also quote a few other metrics that don't
hide as much interesting detail.  for instance, show the variance of 
scores.  or perhaps show base/peak/trimmed (where the lowest and highest
component are simply dropped).

cache is a problem unless your code is actually a spec component,
or unless all machines have the same basic cache-to-working-set relation
for each component.  alternative: run each component on a sweep of problem
sizes, and derive two scores: in-cache and out-cache.  use both scores 
as part of the overall summary statistic.

I'd love to see good data-mining tools for spec results.  for instance,
I'd like to have an easy way to compare consecutive results for the same 
machine as the vendor changed the compiler, or as clock increases.

there's a characteristic "shape" to spec results - which scores are 
high and low relative to the other scores for a single machine.  not only
does this include outliers (drastic cache or compiler effects), but
points at strengths/weaknesses of particular architectures.  how to do this,
perhaps some kind of factor analysis?

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Oct 30 12:00:54 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 30 Oct 2003 12:00:54 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org>
Message-ID: <Pine.LNX.4.44.0310301140030.20019-100000@coffee.psychology.mcmaster.ca>

> Of course, there is some truth to what you say, but "this says nothing about 
> It2" seems a tad dramatic (but ... definitely in character ... ;-) ).  Below is 

all the world's a stage ;)

> the memory table for most of the benchmarks.  A few fit in the 6MB cache (although
> some surely should, as some codes do or can be made too fit into cache).  Many 

seriously, the memory access patterns of very few apps are uniform
across their rss.  I probably should have said "working set fits in 6M".

and you're right; I just reread the spec blurb, and their aim was 100-200MB.

> are in the 100 to 200 MB range.  The floating point accumen of the I2 chip is hard 

that's max rss; it's certainly an upper bound on working set size,
but definitely not a good estimator.  

in other words, it tells you something about the peak number of pages that 
the app ever touches.  it doesn't tell you whether 95% of those pages are 
never touched again, or whether the app only touches 1 cacheline per page.

in yet other words, max rss is relevant to swapping, not cache behavior.

> And after all a huge cache does raise the average memory bandwidth felt by the 
> average code ... ;-) (even as average codes sizes grow) ... and a large node count 

even though Spec uses geo-mean, it can strongly be influenced by outliers,
as we've all seen with Sun's dramatic "performance improvements" ;)

in particular, 179.art is a good example.  I actually picked it out by
comparing the specFP barchart for mckinley vs madison - it shows a fairly
dramatic improvement.  this *could* be due to compiler improvements,
but given that 179.art has a peak RSS of 3.7MB, I think there's a real
cache effect here.

> Got any snow up in the Great White North yet?

no, but I notice that the permanent temporary DX units are not working as
hard to keep the machineroom from melting down ;)

oh, yeah, and there's something wrong with the color of the leaves.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Thu Oct 30 16:32:38 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Thu, 30 Oct 2003 15:32:38 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310302132.h9ULWcM12979@mycroft.ahpcrc.org>


Mark Hahn wrote:

>>  this fact leads us back to the idea that cache >>is<< important for a suite
>>  of "representative codes".
>
>yes, certainly, and TBBIYOC (*).  but the traditional perhaps slightly 
>stodgy attitude towards this has been that caches do not help machine
>balance.  that is, it2 has a peak/theoretical 4flops/cycle, but since 
>that would require, worstcase, 3 doubles per flop, the highest-ranked 
>CPU is actually imbalanced by a factor of 22.5!
>
>(*) the best benchmark is your own code

 Agreed, but since the scope of the discussion seemed to be microprocessors
 which are all relatively bad on balance compared to vector ISA/designs,
 I did not elaborate on balance. This is design area that favors the 
 Opteron (and Power 4) because the memory controller is on-chip (unlike 
 the Pentium 4 and I2) and as such, its performance improves with clock.

 I think it is interesting to look at other processor's theoretical balance 
 numbers in relationship to the I2's that you compute (I hope I have
 them all correct):

 Pentium 4 EE 3.2 GHz:

  (3.2 GHz * 2 flops * 24 bytes) / 6.4 bytes/sec =  Balance of 24
                                               (max on chip cache 2MB)

 Itanium 2 1.5 GHz:

  (1.5 GHz * 4 flops * 24 bytes) / 6.4 bytes/sec =  Balance of 22.5
                                               (max on chip cache 6MB)

 Opteron 246 2.0 GHz:

  (2.0 GHz * 2 flops * 24 bytes) / 6.4 bytes/sec =  Balance of 15
                                               (max on chip cache 1MB)

 Power 4 1.7 GHz:

  (1.7 GHz * 4 flops * 24 bytes) / 6.4 bytes/sec =  Balance of 25.5*
                                               (max on chip cache 1.44MB)

 Cray X1 .8 GHz:

  (0.8 GHz * 4 flops * 24 bytes) / 19.2 bytes/sec = Balance of 4
                                                (512 byte off-chip L2)
  
 * IBM memory performance is with 1 core disabled and may now be higher
   than this.

 When viewed in context, yes, the I2 is poorly balanced, but it is typical 
 of microprocessors, and it is not the worst among them. It also offers the 
 largest compensating cache. Where it loses alot of ground is in the dual
 processor configuration.  Opteron yields a better number, but this is 
 because it can't do as many flops.  The Cray X1 is has the most agressive 
 design specs and yields a large enough percentage of peak to beat the 
 fast clocked micros on vector code (leaving the ugly question of price aside).  
 This is in part due to the more balanced design, but also due to its vector 
 ISA which is just better at moving data from memory.

>let's step back a bit.  suppose we were designing a new version of SPEC,
>and wanted to avoid every problem that the current benchmarks have.
>here are some partially unworkable ideas:
>
>keep geometric mean, but also quote a few other metrics that don't
>hide as much interesting detail.  for instance, show the variance of 
>scores.  or perhaps show base/peak/trimmed (where the lowest and highest
>component are simply dropped).

 Definitely. I am constantly trimming the reported numbers myself and 
 looking at the bar graphs for an eye-ball variance.  It takes will 
 power to avoid being seduced by a single summarizing number.  The 
 Ultra III's SpecFP number was a good reminder.

>cache is a problem unless your code is actually a spec component,
>or unless all machines have the same basic cache-to-working-set relation
>for each component.  alternative: run each component on a sweep of problem
>sizes, and derive two scores: in-cache and out-cache.  use both scores 
>as part of the overall summary statistic.

 Very good as well.  This is the "cpu-rate-comes-to-spec" approach
 that I am sure Bob Brown would endorse.

>I'd love to see good data-mining tools for spec results.  for instance,
>I'd like to have an easy way to compare consecutive results for the same 
>machine as the vendor changed the compiler, or as clock increases.

 ... or increased cache size.  Another winning suggestion.

>there's a characteristic "shape" to spec results - which scores are 
>high and low relative to the other scores for a single machine.  not only
>does this include outliers (drastic cache or compiler effects), but
>points at strengths/weaknesses of particular architectures.  how to do this,
>perhaps some kind of factor analysis?

 This is what I refer to as the Spec finger print or Roshacht(sp?)
 test. We need a neural net derived analysis and classification here. 

 Another presentation that I like is the "star graph" in which major 
 characteristics (floating point perf., integer perf., cache, memory
 bandwidth, etc.) are layed out in equal degrees as vectors around
 a circle. Each processor is measured on each axis to give a star
 print and the total area is a measure of "total goodness".

 I hope someone from Spec is reading this ... and they remember who
 made these suggestions ... ;-).

 Regards,

 rbw

#---------------------------------------------------
# Richard Walsh
# Project Manager, Cluster Computing, Computational
#                  Chemistry and Finance
# netASPx, Inc.
# 1200 Washington Ave. So.
# Minneapolis, MN 55415
# VOX:    612-337-3467
# FAX:    612-337-3400
# EMAIL:  rbw at networkcs.com, richard.walsh at netaspx.com
#         rbw at ahpcrc.org
#
#---------------------------------------------------
# Nullum magnum ingenium sine mixtura dementiae fuit. 
#                                  - Seneca 
#---------------------------------------------------

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Thu Oct 30 23:31:01 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Fri, 31 Oct 2003 12:31:01 +0800 (CST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org>
Message-ID: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com>

Other problems with the Itanium 2 are power hungry and
heat problem.

Also, reported on another mailing list:

Earth Simulator          35.8 TFlop/s
ASCI Q Alpha EV-68       13.8 TFlop/s
Apple G5 dual (Big Mac)   9.5 TFlop/s
HP RX2600 Itanium 2       8.6 TFlop/s

This would place the Big Mac in the 3rd place on the
top500 list -- assuming they have reported all
submitted results in the report:

http://www.netlib.org/benchmark/performance.pdf (p53)

Andrew.

> The I2's weakness is in price-performance and in
> memory bandwidth in SMP configurations
> in my view. My last line in the prior note was a
> reminder to the original poster
> that SpecFP numbers are not a final answer.  I
> repeated the "benchmark you code"
> mantra ... partly to relieve Bob Brown of his
> responsibility to do so ;-). 
> 
> Got any snow up in the Great White North yet?
> 
> Regards,
> 
> rbw
> 
> 
>             max     max     num  num
>             rsz     vsz     obs  unchanged  stable?
>             -----   -----   ---  ---------  -------
> gzip        180.0   199.0   181      68
> vpr          50.0    53.6   151       6
> gcc         154.0   156.0   134       0
> mcf         190.0   190.0   232     230     stable
> crafty        2.0     2.6   107     106     stable
> parser       37.0    66.8   263     254     stable
> eon           0.6     1.5   130       0
> perlbmk     146.0   158.0   186       0
> gap         192.0   194.0   149     148     stable
> vortex       72.0    79.4   162       0
> bzip2       185.0   199.0   153       6
> twolf         3.4     4.0   273       0
> 
> wupwise     176.0   177.0   185     181     stable
> swim        191.0   192.0   322     320     stable
> mgrid        56.0    56.7   281     279     stable
> applu       181.0   191.0   371     369     stable
> mesa          9.4    23.1   132     131     stable
> galgel       63.0   155.0   287      59
> art           3.7     4.3   157      37
> equake       49.0    49.4   218     216     stable
> facerec      16.0    18.5   182     173     stable
> ammp         26.0    28.4   277     269     stable
> lucas       142.0   143.0   181     179     stable
> fma3d       103.0   105.0   268     249     stable
> sixtrack     26.0    59.8   148     141     stable
> apsi        191.0   192.0   271     270     stable
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 31 11:02:29 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 31 Oct 2003 11:02:29 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310302132.h9ULWcM12979@mycroft.ahpcrc.org>
Message-ID: <Pine.LNX.4.44.0310311012020.4461-100000@lilith.rgb.private.net>

On Thu, 30 Oct 2003, Richard Walsh wrote:

> >cache is a problem unless your code is actually a spec component,
> >or unless all machines have the same basic cache-to-working-set relation
> >for each component.  alternative: run each component on a sweep of problem
> >sizes, and derive two scores: in-cache and out-cache.  use both scores 
> >as part of the overall summary statistic.
> 
>  Very good as well.  This is the "cpu-rate-comes-to-spec" approach
>  that I am sure Bob Brown would endorse.

Oh, sure.  "I endorse this." ;-)

As you guys are working out fine on your own, I like it combined with
Mark's suggestion of showing the entire constellation for spec (which of
course you CAN access and SHOULD access in any case instead of relying
on geometric or any other mean measure of performance:-).

I really think that many HPC performance benchmarks primary weakness is
that they DON'T sweep problem size and present results as a graph, and
that they DON'T present a full suite of different results that measure
many identifiably different components of overall performance.  From way
back with early linpack, this has left many benchmarks susceptible to
vendor manipulation -- there are cases on record of vendors (DEC, IIRC,
but likely others) actually altering CPU/memory architecture to optimize
linpack performance because linpack was what sold their systems.  This
isn't just my feeling, BTW -- Larry McVoy has similar concerns (more
stridently expressed) in his lmbench suite -- he actually had (and
likely still has) as a condition of their application to a system that
they can NEVER be applied singly with just one (favorable:-) number or
numbers quoted in a publication or advertisement --- the results of the
complete suite have to be presented all together, with your abysmal
failures side by side with your successes.

I personally am less religious about NEVER doing anything and dislike
semi-closed sources and "rules" even for benchmarks (it makes far more
sense to caveat emptor and pretty much ignore vendor-based performance
claims in general:-), but do think that you get a hell of a lot more
information from a graph of e.g. stream results as a function of vector
size than you get from just "running stream".  Since running stream as a
function of vector size more or less requires using malloc to allocate
the memory and hence adds one additional step of indirection to memory
address resolution, it also very slightly worsens the results, but very
likely in the proper direction -- towards the real world, where people
do NOT generally recompile an application in order to change problem
size.

I also really like Mark's idea of having a benchmark database site where
comparative results from a wide range of benchmarks can be easily
searched and collated and crossreferenced.  Like the spec site,
actually.  However, that's something that takes a volunteer or
organization with spare resources, much energy, and an attitude to make
happen, and since one would like to e.g. display spec results on a
non-spec site and since spec is (or was, I don't keep up with its
"rules") fairly tightly constrained on who can run it and how/where its
results can be posted, it might not be possible to create your own spec
db, your own lmbench db, your own linpack db, all on a public site.
cpu_rate you can do whatever you want with -- it is full GPL code so a
vendor could even rewrite it as long as they clearly note that they
have done so and post the rewritten sources.  Obviously you should
either get results from somebody you trust or run it yourself, but that
is true for any benchmark, with the latter being vastly preferrable.:-)

If I ever have a vague bit of life in me again and can return to
cpu_rate, I'm in the middle of yet another full rewrite that should make
it much easier to create and encapsulate a new code fragment to
benchmark AND should permit running an "antistream" version of all the
tests involving long vectors (one where all the memory addresses are
accessed in a random/shuffled order, to deliberately defeat the cache).
However, I'm stretched pretty thin at the moment -- a talk to give
Tuesday on xmlsysd/wulfstat, a CW column due on Wednesday, and I've
agreed to write an article on yum due on Sunday of next week I think
(and need to finish the yum HOWTO somewhere in there as well).  So it
won't be anytime soon...:-)

> >I'd love to see good data-mining tools for spec results.  for instance,
> >I'd like to have an easy way to compare consecutive results for the same 
> >machine as the vendor changed the compiler, or as clock increases.
> 
>  ... or increased cache size.  Another winning suggestion.
> 
> >there's a characteristic "shape" to spec results - which scores are 
> >high and low relative to the other scores for a single machine.  not only
> >does this include outliers (drastic cache or compiler effects), but
> >points at strengths/weaknesses of particular architectures.  how to do this,
> >perhaps some kind of factor analysis?
> 
>  This is what I refer to as the Spec finger print or Roshacht(sp?)
>  test. We need a neural net derived analysis and classification here. 

<chortle>.  The only one I'd trust is the one already implemented in
wetware.  After all, classification according to what? 

>  Another presentation that I like is the "star graph" in which major 
>  characteristics (floating point perf., integer perf., cache, memory
>  bandwidth, etc.) are layed out in equal degrees as vectors around
>  a circle. Each processor is measured on each axis to give a star
>  print and the total area is a measure of "total goodness".
> 
>  I hope someone from Spec is reading this ... and they remember who
>  made these suggestions ... ;-).

But things are more complicated than this.  The real problem with SPEC
is that your application may well resemble one of the components of the
suite, in which case that component is a decent predictor of performance
for your application almost by definition.  However, the mean
performance on the suite may or may not be well correlated with that
component, or your application may not resemble ANY of the components on
the suite.  Then there are variations with compiler, operating system,
memory configuration, scaling (or lack thereof!) with CPU clock.  As
Mark says, TBBIYOC is the only safe rule if you seek to compare systems
on the basis of "benchmarks".

I personally tend to view large application benchmarks like linpack and
spec with a jaded eye and prefer lmbench and my own microbenchmarks to
learn something about the DETAILED performance of my architecture on
very specific tasks that might be components of a large application,
supplemented with YOC.  Or rather MOC.

Zen question: Which one reflects the performance of an architecture, a
BLAS-based benchmark or an ATLAS-tuned BLAS-based benchmark?

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Fri Oct 31 09:11:49 2003
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Fri, 31 Oct 2003 09:11:49 -0500 (EST)
Subject: Cluster Poll Results
In-Reply-To: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com>
Message-ID: <Pine.LNX.4.44.0310310906100.22446-100000@boltzmann.basement-supercomputing.com>


For those interested, the latest poll at www.cluster-rant.com was on
cluster size. We had a record 102 responses! Take a look at
http://www.cluster-rant.com/article.pl?sid=03/10/25/1330216
for links to results and to the new poll on interconnects.

Doug

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 31 11:55:43 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 31 Oct 2003 11:55:43 -0500 (EST)
Subject: Cluster Poll Results
In-Reply-To: <Pine.LNX.4.44.0310310906100.22446-100000@boltzmann.basement-supercomputing.com>
Message-ID: <Pine.LNX.4.44.0310311147370.4461-100000@lilith.rgb.private.net>

On Fri, 31 Oct 2003, Douglas Eadline, Cluster World Magazine wrote:

> 
> For those interested, the latest poll at www.cluster-rant.com was on
> cluster size. We had a record 102 responses! Take a look at
> http://www.cluster-rant.com/article.pl?sid=03/10/25/1330216
> for links to results and to the new poll on interconnects.

You need to let people vote more than once in something like this.  I
have three distinct clusters and there are two more I'd vote for the
owners here at Duke.  (They pretty much reflect the numbers you're
getting, which show well over half the clusters at 32 nodes or less).

It is interesting that this indicates that the small cluster is a lot
more common than big clusters, although the way numbers work there are a
lot more nodes in big clusters than in small clusters.  At least in your
biased and horribly unscientific (but FUN!) poll:-)

So from a human point of view, providing support for small clusters is
more important, but from an institutional/hardware point of view, big
clusters dominate.

It is also very interesting to me that RH (for example) thinks that
there is something that they are going to provide that is worth e.g.
several hundred thousand dollars in the case of a 1000+ node cluster
running their "workstation" product.  Fifty dollars certainly.  Five
hundred dollars maybe.  A thousand dollars possibly, but only if they
come up with a cluster-specific installation with some actual added
value.

Sigh.

   rgb

> 
> Doug
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcownie at etnus.com  Fri Oct 31 12:10:31 2003
From: jcownie at etnus.com (James Cownie)
Date: Fri, 31 Oct 2003 17:10:31 +0000
Subject: opteron VS Itanium 2 (Benchmark cheating)
In-Reply-To: Message from "Robert G. Brown" <rgb@phy.duke.edu> 
   of "Fri, 31 Oct 2003 11:02:29 EST." <Pine.LNX.4.44.0310311012020.4461-100000@lilith.rgb.private.net> 
Message-ID: <1AFcn9-5Y0-00@etnus.com>


> From way back with early linpack, this has left many benchmarks
> susceptible to vendor manipulation -- there are cases on record of
> vendors (DEC, IIRC, but likely others) actually altering CPU/memory
> architecture to optimize linpack performance because linpack was
> what sold their systems.

This certainly applied to some compilers which "optimized" sdot and
ddot by recognizing the source (down to the precise comments) and
plugged in a hand coded assembler routine.

Changing a comment (for instance mis-spelling Jack's name :-) or
replacing a loop variable called "i" with one called "k" could halve
the linpack result.

When $$$ are involved people are prepared to sail close to the wind...

-- Jim

James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Fri Oct 31 14:36:09 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Fri, 31 Oct 2003 11:36:09 -0800 (PST)
Subject: opteron VS Itanium 2 (Benchmark cheating)
In-Reply-To: <1AFcn9-5Y0-00@etnus.com>
Message-ID: <Pine.LNX.4.04.10310311127220.6114-100000@12-207-208-137.client.attbi.com>

On Fri, 31 Oct 2003, James Cownie wrote:
> > From way back with early linpack, this has left many benchmarks
> > susceptible to vendor manipulation -- there are cases on record of
> > vendors (DEC, IIRC, but likely others) actually altering CPU/memory
> > architecture to optimize linpack performance because linpack was
> > what sold their systems.
> 
> This certainly applied to some compilers which "optimized" sdot and
> ddot by recognizing the source (down to the precise comments) and
> plugged in a hand coded assembler routine.

Nvidia and ATI have recently done similar things, where their drivers would
attempt to detect benchmarks being run and then use optimized routines or
cheat on following specifications.  Renaming quake2.exe to something else
would cause a large decrease in framerate for example.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Fri Oct 31 14:45:04 2003
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Fri, 31 Oct 2003 11:45:04 -0800 (PST)
Subject: opteron VS Itanium 2
In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com>
Message-ID: <20031031194504.30508.qmail@web11404.mail.yahoo.com>

But still, at least the results showed that the G5s provided similar
performance, and less expensive than IA64...

Rayson

--- Greg Lindahl <lindahl at keyresearch.com> wrote:
> On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote:
> 
> > This would place the Big Mac in the 3rd place on the
> > top500 list
> 
> Except that there are several other new large clusters that will
> likely place higher -- LANL announced a 2,048 cpu Opteron cluster a
> while back, and LLNL has something new, too, I think. Comparing
> yourself to the obsolete list in multiple press releases isn't very
> clever.
> 
> -- greg
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears
http://launch.yahoo.com/promos/britneyspears/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Oct 31 12:38:20 2003
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 31 Oct 2003 18:38:20 +0100 (CET)
Subject: Cluster Poll Results
In-Reply-To: <Pine.LNX.4.44.0310311147370.4461-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com>

On Fri, 31 Oct 2003, Robert G. Brown wrote:

> 
> It is also very interesting to me that RH (for example) thinks that
> there is something that they are going to provide that is worth e.g.
> several hundred thousand dollars in the case of a 1000+ node cluster
> running their "workstation" product.  Fifty dollars certainly.  Five
> hundred dollars maybe.  A thousand dollars possibly, but only if they
> come up with a cluster-specific installation with some actual added
> value.
> 
I'll second that.

There has been a debate running on this topic on the Fedora list
over the last few days.

Sorry to be so boring, but its something we should debate too.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Oct 31 13:19:12 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 31 Oct 2003 10:19:12 -0800
Subject: opteron VS Itanium 2
In-Reply-To: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com>
References: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com>
Message-ID: <20031031181912.GB1289@greglaptop.internal.keyresearch.com>

On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote:

> This would place the Big Mac in the 3rd place on the
> top500 list

Except that there are several other new large clusters that will
likely place higher -- LANL announced a 2,048 cpu Opteron cluster a
while back, and LLNL has something new, too, I think. Comparing
yourself to the obsolete list in multiple press releases isn't very
clever.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From walkev at presearch.com  Fri Oct 31 14:44:59 2003
From: walkev at presearch.com (Vann H. Walke)
Date: Fri, 31 Oct 2003 14:44:59 -0500
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com>
References: 	 <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com>
Message-ID: <1067629499.21719.73.camel@localhost.localdomain>

On Fri, 2003-10-31 at 12:38, John Hearns wrote:
> On Fri, 31 Oct 2003, Robert G. Brown wrote:
> 
> > 
> > It is also very interesting to me that RH (for example) thinks that
> > there is something that they are going to provide that is worth e.g.
> > several hundred thousand dollars in the case of a 1000+ node cluster
> > running their "workstation" product.  Fifty dollars certainly.  Five
> > hundred dollars maybe.  A thousand dollars possibly, but only if they
> > come up with a cluster-specific installation with some actual added
> > value.
> > 
> I'll second that.
> 
> There has been a debate running on this topic on the Fedora list
> over the last few days.
> 
> Sorry to be so boring, but its something we should debate too.
> 

Hmm... Let's take the case of a 1000 node system.  If we assume a
$3000/node cost (probably low once rack, UPS, hardware support, and
interconnect are added in), we arrive at an approximate hardware cost of
$3,000,000.  If we were to use the RHEL WS list price of $179/node, we
get $179,000 or about 6% of the hardware cost.  That is assuming RedHat
will not provide any discount on large volume purchases (unlikely).  Is
6% unreasonable?    

What are the alternatives?

- Keep using an existing RH distro:  Only if you're willing to move into
do it yourself mode when RH stop support (December?).  I expect very few
would be happy with this option.  However, if you have a working RH7.3
cluster, it works, and you don't have to worry too much about security,
why change?  For new clusters though....
- Fedora - Planned releases 2-3 times a year.  So, if I build a system
on the Fedora release scheduled this Monday, who will be providing
security patches for it 2 years from now (after 4-6 new releases have
been dropped).  My guess is no-one. Again, we're in the do it yourself
maintenance or frequent OS upgrade mode.
- SUSE - Not sure about this one.  Their commercial pricing model is
pretty close to RedHat's.  Are they going to keep developing consumer
releases?  What will the support be for those releases?  Can we really
expect more than we get from a purely community developed system? 
Perhaps someone with more SUSE knowledge could comment?
- Debian - Could be a good option, but to some extent you end up in the
same position as Fedora.  How often do the releases come out.  Who
supports the old releases?  What hardware / software will work on the
platform?  
- Gentoo - Not reliable, stable enough to meet my needs for clustering
- Mandrake - Mandrake has their clustering distribution, which could be
a good possibility, but the cost is as high or higher than RedHat.  
- Scyld - Superior design, supported, but again very high cost and may
have to fight some compatibility issues since the it's market share in
the Linux world is less than tiny.
- OSCAR / Rocks / etc...  - generally installed on top of another
distribution.  We still have to pick a base distribution.

My conclusions - If you're in a research facility / university type
setting where limited amounts of down time are acceptable, a free or
nearly free system is perfect.  A new Fedora/Debian/SuSE release comes
out, shut the system down over Christmas break and rebuild it.  (As long
as you're happy spending a fair amount of time doing rebuilds and fixing
upgrade problems). 

If however you really need the thing to work - Corporate research sites,
satellite data processing, etc... the cost of the operating system may
be minuscule relative to the cost of having the system down.  If you
_really_ want a particular application to work having it certified and
supported on the OS may be important.  

The project on which I'm working - building sonar training simulators
for the US Navy Submarine force requires stable systems which should
operate without major maintenance / operational changes for many years. 
Knowing the RedHat will support the enterprise line for 5 years is a big
selling point.

The cluster management portion of the software stack would be great to
have integrated in to the product, but if third party vendors (Linux
Networx, OSCAR, Rocks, etc...) can provide the cluster management
portion on top of the distribution, a solution can be found.  In some
ways this is even better since your cluster management decision is
independent of the OS vendor.

I basically just want to make the point that the cluster space is filled
with people of many different needs.  Will everyone want RHEL?  My guess
is a resounding NO.  (In the days of RH7.3 you could almost say Yes.) 
But, there are situations in which a stable, supported product is
needed.  This is the market RedHat is trying to target and states so
pretty clearly ("Enterprise").  Small users and research systems get
somewhat left out in the cold, but we probably shouldn't complain after
having a free ride for the last 5+ years.  

So, is 6% unreasonable?

Vann


> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eemoore at fyndo.com  Fri Oct 31 14:52:55 2003
From: eemoore at fyndo.com (Dr Eric Edward Moore)
Date: Fri, 31 Oct 2003 19:52:55 +0000
Subject: opteron VS Itanium 2
In-Reply-To: <Pine.LNX.4.44.0310301231150.20955-100000@coffee.psychology.mcmaster.ca> (Mark
 Hahn's message of "Thu, 30 Oct 2003 12:45:20 -0500 (EST)")
References: <Pine.LNX.4.44.0310301231150.20955-100000@coffee.psychology.mcmaster.ca>
Message-ID: <87he1pfalk.fsf@azathoth.fyndo.com>

Mark Hahn <hahn at physics.mcmaster.ca> writes:

> there's a characteristic "shape" to spec results - which scores are 
> high and low relative to the other scores for a single machine.  not only
> does this include outliers (drastic cache or compiler effects), but
> points at strengths/weaknesses of particular architectures.  how to do this,
> perhaps some kind of factor analysis?

Well, being bored, I tried factor analysis on the average results for
the submitted specfp benchmarks at http://www.specbench.org/

The 5 factors with the largest eigenvaslues are:

Eigenvalue:    0.314116    0.353034    0.799331    1.432038   10.614996 
               2.22%       2.25%       5.70%      10.22%      75.82%

168.wupwise   -0.4134913   0.0241240  -0.1437086  -0.2757206   0.2715672
171.swim       0.0245451   0.0965325   0.3495143   0.1209393   0.2783842
172.mgrid      0.1122617   0.1365769   0.3273285   0.1332301   0.2839204
173.applu      0.0299056   0.0439954   0.4163242   0.1913496   0.2725619
177.mesa       0.4791260   0.4190313  -0.0949648  -0.3785996   0.2448368
178.galgel    -0.0489231  -0.5404192  -0.2464610   0.2391370   0.2648068
179.art        0.0646181   0.5095081  -0.4736362   0.6508958   0.1054875
183.equake    -0.5560255   0.0841426   0.0214064   0.1615493   0.2794066
187.facerec   -0.0402649   0.0446221  -0.2628912  -0.0557252   0.2897607
188.ammp       0.3993861  -0.3404615  -0.1456043   0.0359475   0.2832809
189.lucas     -0.2380202   0.0908976   0.0801927  -0.2140971   0.2842518
191.fma3d     -0.0326577   0.1661895  -0.1149762  -0.3148501   0.2774768
200.sixtrack   0.1950678  -0.1574121   0.2852895   0.2008475   0.2741305
301.apsi       0.1128198  -0.2379642  -0.3013536  -0.1224494   0.2782804

Pretty much all the specfp tests correlate with each other pretty
well, except for 179.art, which correlates...  poorly with the
others (it's correlation with 177.mesa is just 0.03).  So most of the
variation in the results is some sort of "raw speed" number, which has
near-equal weightings of all the tests besides 179.art.  

Next most important is whatever makes art so different from all the
others (maybe it's a persistent cache-misser, or maybe it's just the
easiest for vendors to tweak).

Not entirely sure what to make of the others.  There does seem to be
some commonality between 171.swim 172.mgrid 173.applu and 200.sixtrack
in the third biggest factor (plus a lot of whatever art isn't) that
could be important.  

The next two seem to mostly have something to do with whatever makes
177.mesa special.  

This is presumably all useless, but someone might be entertained :)

> regards, mark hahn.

-- 
Eric E. Moore
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathiasbrito at yahoo.com.br  Fri Oct 31 16:38:52 2003
From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=)
Date: Fri, 31 Oct 2003 18:38:52 -0300 (ART)
Subject: sum of matrices
Message-ID: <20031031213852.87539.qmail@web12206.mail.yahoo.com>

Hi,

Last days I write a code(in c) that make the sum of 2
matrices. Let me say a little about how it works. I
send 1 row of the 1st matrice and 1 row of 2nd matrice
for each process, when a process finish its job, if
have more lines i send more to it and it make the sum
of these new 2 lines. The problem is, the program
works fine with 100x100(or less) matrice, but when I
increase this range, something like 10000x10000 i
receive the fallowing message:

p0_8467:  p4_error: Child process exited while making
connection to remote process on node2: 0

This is a MPI problem or it`s my code? What can I do
to fix this problem.


=====
Mathias Brito
Universidade Estadual de Santa Cruz - UESC
Departamento de Ci?ncias Exatas e Tecnol?gicas
Estudante do Curso de Ci?ncia da Computa??o

Yahoo! Mail - o melhor webmail do Brasil
http://mail.yahoo.com.br
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Fri Oct 31 15:52:12 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Fri, 31 Oct 2003 12:52:12 -0800 (PST)
Subject: opteron VS Itanium 2
In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.04.10310311245400.6163-100000@12-207-208-137.client.attbi.com>

On Fri, 31 Oct 2003, Greg Lindahl wrote:
> On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote:
> > This would place the Big Mac in the 3rd place on the
> > top500 list
> 
> Except that there are several other new large clusters that will
> likely place higher -- LANL announced a 2,048 cpu Opteron cluster a
> while back, and LLNL has something new, too, I think. Comparing
> yourself to the obsolete list in multiple press releases isn't very
> clever.

I thought that the 3rd place was in the new preliminary top500 list that
included all the big machines that will be there when the official list comes
out.  But there's been so much poor and conflicting information about Big Mac
who knows?  I'd like to know how much they payed for the infiniband hardware.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From roger at ERC.MsState.Edu  Fri Oct 31 16:14:35 2003
From: roger at ERC.MsState.Edu (Roger L. Smith)
Date: Fri, 31 Oct 2003 15:14:35 -0600
Subject: opteron VS Itanium 2
In-Reply-To: <Pine.LNX.4.04.10310311245400.6163-100000@12-207-208-137.client.attbi.com>
References: <Pine.LNX.4.04.10310311245400.6163-100000@12-207-208-137.client.attbi.com>
Message-ID: <Pine.SGI.4.56.0310311512430.63936@Downforce.ERC.MsState.Edu>

On Fri, 31 Oct 2003, Trent Piepho wrote:

> I thought that the 3rd place was in the new preliminary top500 list that
> included all the big machines that will be there when the official list
> comes out.  But there's been so much poor and conflicting information
> about Big Mac who knows?  I'd like to know how much they payed for the
> infiniband hardware.

Yeah, me too.  As someone who just ponied up for a rather large IB
installation, I'm not sure that most people realize what a substantial
percentage of the cost of the cluster the IB might be.

 _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
| Roger L. Smith                        Phone: 662-325-3625               |
| Sr. Systems Administrator             FAX:   662-325-7692               |
| roger at ERC.MsState.Edu                 http://WWW.ERC.MsState.Edu/~roger |
|                       Mississippi State University                      |
|____________________________________ERC__________________________________|
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From weideng at uiuc.edu  Fri Oct 31 15:37:45 2003
From: weideng at uiuc.edu (Wei Deng)
Date: Fri, 31 Oct 2003 14:37:45 -0600
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain>
References: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> <1067629499.21719.73.camel@localhost.localdomain>
Message-ID: <20031031203745.GU1408@aminor.cs.uiuc.edu>

On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote:
> - OSCAR / Rocks / etc...  - generally installed on top of another
> distribution.  We still have to pick a base distribution.

>From what I heard from Rocks mailing list, they will release 3.1.0 the 
next Month, which will be based on RHEL 3.0, compiled from source code 
that is publicly available, and free of charge.

Even though Rocks is based on RedHat distribution, it is complete, which 
means you only need to download Rocks ISOs to accomplish your 
installation.

-- 
Wei Deng
Pablo Research Group
Department of Computer Science
University of Illinois
217-333-9052
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From josip at lanl.gov  Fri Oct 31 16:17:35 2003
From: josip at lanl.gov (Josip Loncaric)
Date: Fri, 31 Oct 2003 14:17:35 -0700
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain>
References: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> <1067629499.21719.73.camel@localhost.localdomain>
Message-ID: <3FA2D16F.4030807@lanl.gov>

Vann H. Walke wrote:

> On Fri, 2003-10-31 at 12:38, John Hearns wrote:
>>On Fri, 31 Oct 2003, Robert G. Brown wrote:
>>
>>>It is also very interesting to me that RH (for example) thinks that
>>>there is something that they are going to provide that is worth e.g.
>>>several hundred thousand dollars in the case of a 1000+ node cluster
>>>running their "workstation" product.  Fifty dollars certainly.  Five
>>>hundred dollars maybe.  A thousand dollars possibly, but only if they
>>>come up with a cluster-specific installation with some actual added
>>>value.
>>
>>I'll second that.
> 
> Hmm... Let's take the case of a 1000 node system.  If we assume a
> $3000/node cost (probably low once rack, UPS, hardware support, and
> interconnect are added in), we arrive at an approximate hardware cost of
> $3,000,000.  If we were to use the RHEL WS list price of $179/node, we
> get $179,000 or about 6% of the hardware cost.  That is assuming RedHat
> will not provide any discount on large volume purchases (unlikely).  Is
> 6% unreasonable?    

These days, one seldom builds 1000 node systems out of basic x86 boxes. 
  Consider a 1024 node AMD64 system instead: The list price on RHEL WS 
Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. 
  This is unlikely to create any sales.

RH should be paid for the valuable service they provide (patch streams 
etc.) but this is not worth $811K to builders of large clusters.  There 
are other good alternatives, most of them *MUCH* cheaper.  I fully agree 
with RGB that RH needs to announce a sensible pricing structure for 
clusters in order to participate in this market.

Would a single system image (BProc) cluster constructed by recompiling 
the kernel w/BProc patches fit RH's legal definition of a single 
"installed system" and a single "platform"?  If so, $792 for a 1024-node 
cluster would be quite acceptable...

Sincerely,
Josip

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Fri Oct 31 17:38:50 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 31 Oct 2003 17:38:50 -0500
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <3FA2D16F.4030807@lanl.gov>
References: 	 <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com>
	 <1067629499.21719.73.camel@localhost.localdomain>
	 <3FA2D16F.4030807@lanl.gov>
Message-ID: <1067639930.26872.1.camel@squash.scalableinformatics.com>


On Fri, 2003-10-31 at 16:17, Josip Loncaric wrote:

> These days, one seldom builds 1000 node systems out of basic x86 boxes. 
>   Consider a 1024 node AMD64 system instead: The list price on RHEL WS 
> Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. 
>   This is unlikely to create any sales.

SUSE AMD64 version of 9.0 is something like $120.  It was somewhat more
stable for my tests than the RH beta (GinGin64).  I hope that RH will
arrange for similar pricing.

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Fri Oct 31 19:00:30 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Fri, 31 Oct 2003 16:00:30 -0800 (PST)
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <3FA2E6FD.6050107@scali.com>
Message-ID: <Pine.LNX.4.44.0310311558130.8589-100000@twin.uoregon.edu>

On Fri, 31 Oct 2003, Steffen Persvold wrote:

> Josip Loncaric wrote:
> > 
> > 
> > These days, one seldom builds 1000 node systems out of basic x86 boxes. 
> >  Consider a 1024 node AMD64 system instead: The list price on RHEL WS 
> > Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. 
> >  This is unlikely to create any sales.

so download the source build and call you distro something other than 
redhat enterprise linux...

or use debian... or cope.

> > RH should be paid for the valuable service they provide (patch streams 
> > etc.) but this is not worth $811K to builders of large clusters.  There 
> > are other good alternatives, most of them *MUCH* cheaper.  I fully agree 
> > with RGB that RH needs to announce a sensible pricing structure for 
> > clusters in order to participate in this market.

so don't use redhat.

> Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't 
> claim support from RH for more than one of the systems.

read the liscsense agreement for you redhat enterprise disks...
 
> Regards,
> Steffen
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Oct 31 18:43:42 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 31 Oct 2003 15:43:42 -0800
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain>
References: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> <1067629499.21719.73.camel@localhost.localdomain>
Message-ID: <20031031234342.GC3744@greglaptop.internal.keyresearch.com>

On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote:

> So, is 6% unreasonable?

For just the base OS? Yes. The market-place has spoken very loudly
about that, especially people building large machines.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sp at scali.com  Fri Oct 31 17:49:33 2003
From: sp at scali.com (Steffen Persvold)
Date: Fri, 31 Oct 2003 23:49:33 +0100
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <3FA2D16F.4030807@lanl.gov>
References: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> <1067629499.21719.73.camel@localhost.localdomain> <3FA2D16F.4030807@lanl.gov>
Message-ID: <3FA2E6FD.6050107@scali.com>

Josip Loncaric wrote:
> 
> 
> These days, one seldom builds 1000 node systems out of basic x86 boxes. 
>  Consider a 1024 node AMD64 system instead: The list price on RHEL WS 
> Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. 
>  This is unlikely to create any sales.
> 
> RH should be paid for the valuable service they provide (patch streams 
> etc.) but this is not worth $811K to builders of large clusters.  There 
> are other good alternatives, most of them *MUCH* cheaper.  I fully agree 
> with RGB that RH needs to announce a sensible pricing structure for 
> clusters in order to participate in this market.

Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't 
claim support from RH for more than one of the systems.

Regards,
Steffen


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From tod at gust.sr.unh.edu  Fri Oct 31 18:59:16 2003
From: tod at gust.sr.unh.edu (Tod Hagan)
Date: 31 Oct 2003 18:59:16 -0500
Subject: Cluster Poll Results (tangent into OS choices, Fedora and
	Debian)
In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain>
References: 	<Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> 
	<1067629499.21719.73.camel@localhost.localdomain>
Message-ID: <1067644757.5702.219.camel@haze.sr.unh.edu>

On Fri, 2003-10-31 at 14:44, Vann H. Walke wrote:
> What are the alternatives?
> [snip]
> - Fedora - Planned releases 2-3 times a year.  So, if I build a system
> on the Fedora release scheduled this Monday, who will be providing
> security patches for it 2 years from now (after 4-6 new releases have
> been dropped).  My guess is no-one. Again, we're in the do it yourself
> maintenance or frequent OS upgrade mode.
> [snip]
> - Debian - Could be a good option, but to some extent you end up in the
> same position as Fedora.  How often do the releases come out.  Who
> supports the old releases?  What hardware / software will work on the
> platform?  

If Fedora achieves 2-3 upgrades per year then it will be fairly
different from Debian, which seems to be at 2-3 years per upgrade these
days, (well almost).

After a new release comes out Debian supports the old one for a period
of time (12 months?) with security updates before pulling the plug.

Debian can be upgraded in place as opposed to requiring a full
resinstall; while this is great for desktops and servers, I'm not sure
if this is important for a cluster.

As a result of the extended release cycle Debian stable tends to lack
support for the newest hardware (Opteron 64-bit, for example). This is
why Knoppix, which is based on Debian, isn't derived from Debian stable,
but rather from packages in the newer releases (testing, unstable and
experimental). But the flip side is that the stable release, while
dated, tends to work well as it's had a lot of testing.

Debian could probably use more recognition as a target platform by
commercial software vendors but it incorporates a huge number of
packages including many open source applications pertinent to science.
Breadth in packaged applications is probably more important for
workstations since clusters tend to use small numbers of apps very
intensely.

As a distribution Debian is more oriented towards servers than the
desktop (to the point that frustrated users have spawned the "Debian
Desktop" subproject). It seems to me that clusters have more in common
with servers than with desktops so that Debian's deliberate release rate
is a better match for the cluster environment than distros which release
often in order to incorporate the latest GUI improvements.

P.S.

While looking into the number of packages in Debian vs. Fedora I
stumbled across this frightening bit (gotta throw a Halloween reference
in somewhere) on the Fedora site:

http://fedora.redhat.com/participate/terminology.html
> Packages in Fedora Extras should avoid conflicts with other packages
> in Fedora Extras to the fullest extent possible. Packages in Fedora
> Extras must not conflict with packages in Fedora Core.

It seems that Fedora intends to achieve applications breadth through
"Fedora Extras" package sets in other repositories, but the prohibition
of conflicts between Extras packages isn't as strong as the absolute
prohibition of conflicts between Extras and Core packages. Could this
result in a new era of DLL hell a few years down the road?

Wow, I guess I just slung some FUD at Fedora, but maintaining a 2-3
releases per year rate probably requires a small core, putting the bulk
of applications into the Extras category and thus increasing the chance
of conflict. (Wasn't that the original recipe for DLL hell?) Debian has
avoided this through a much larger core, which of course slows the
release cycle.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctierney at hpti.com  Fri Oct 31 17:37:36 2003
From: ctierney at hpti.com (Craig Tierney)
Date: 31 Oct 2003 15:37:36 -0700
Subject: sum of matrices
In-Reply-To: <20031031213852.87539.qmail@web12206.mail.yahoo.com>
References: <20031031213852.87539.qmail@web12206.mail.yahoo.com>
Message-ID: <1067639856.6209.211.camel@hpti10.fsl.noaa.gov>

On Fri, 2003-10-31 at 14:38, Mathias Brito wrote:
> Hi,
> 
> Last days I write a code(in c) that make the sum of 2
> matrices. Let me say a little about how it works. I
> send 1 row of the 1st matrice and 1 row of 2nd matrice
> for each process, when a process finish its job, if
> have more lines i send more to it and it make the sum
> of these new 2 lines. The problem is, the program
> works fine with 100x100(or less) matrice, but when I
> increase this range, something like 10000x10000 i
> receive the fallowing message:
> 
> p0_8467:  p4_error: Child process exited while making
> connection to remote process on node2: 0
> 
> This is a MPI problem or it`s my code? What can I do
> to fix this problem.

It is probably your code.  Are you allocating the
matrix statically or dynamically?  Try increasing
the stack size on your node(s). 

Craig


> 
> 
> =====
> Mathias Brito
> Universidade Estadual de Santa Cruz - UESC
> Departamento de Ci?ncias Exatas e Tecnol?gicas
> Estudante do Curso de Ci?ncia da Computa??o
> 
> Yahoo! Mail - o melhor webmail do Brasil
> http://mail.yahoo.com.br
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sp at scali.com  Fri Oct 31 19:52:23 2003
From: sp at scali.com (Steffen Persvold)
Date: Sat, 01 Nov 2003 01:52:23 +0100
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <Pine.LNX.4.44.0310311558130.8589-100000@twin.uoregon.edu>
References: <Pine.LNX.4.44.0310311558130.8589-100000@twin.uoregon.edu>
Message-ID: <3FA303C7.8050600@scali.com>

Joel Jaeggli wrote:
> On Fri, 31 Oct 2003, Steffen Persvold wrote:
[]
> 
>>Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't 
>>claim support from RH for more than one of the systems.
> 
> 
> read the liscsense agreement for you redhat enterprise disks...
>  

Well the EULA doesn't say anything about having to pay $792 for each node in a cluster (actually it doesn't mention paying license fee's at all). The only relevant stuff I can 
find is item 2, "Intellectual Property Rights" :

    "If Customer makes a commercial redistribution of
     the Software, unless a separate agreement with Red Hat is executed
     or other permission granted, then Customer must modify the files
     identified as REDHAT-LOGOS and anaconda-image to remove all
     images containing the Red Hat trademark or the Shadowman logo.
     Merely deleting these files may corrupt the Software."

And I wouldn't say that installing on your cluster nodes is "making a commercial redistribution" would you ? Or have I missed something fundamental ?

Regards,
Steffen

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Andrew.Cannon at nnc.co.uk  Wed Oct  1 04:12:32 2003
From: Andrew.Cannon at nnc.co.uk (Cannon, Andrew)
Date: Wed, 1 Oct 2003 09:12:32 +0100
Subject: RH8 vs RH9
Message-ID: <DD1E19A9AFC2D311A32200508B5589EF0519DA28@nnc.co.uk>

Hi All,

We have a small test cluster running RH8 which seems to work well. We are
going to expand this cluster and I was wondering what, if any, are the
advantages of installing the cluster using RH9 instead of RH8? Are there any
disadvantages?

Thanks

Andrew

Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
Cheshire, WA16 8QZ.

Telephone; +44 (0) 1565 843768
email: mailto:andrew.cannon at nnc.co.uk
NNC website: http://www.nnc.co.uk


NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856).  The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ.

This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed.  If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From leopold.palomo at upc.es  Wed Oct  1 04:21:59 2003
From: leopold.palomo at upc.es (Leopold Palomo Avellaneda)
Date: Wed, 1 Oct 2003 10:21:59 +0200
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
References: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
Message-ID: <200310011001.31106.lepalom@vilma.upc.es>

A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> Dont overlook lm_sensors+cron
>
Why?


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From leopold.palomo at upc.es  Wed Oct  1 04:24:21 2003
From: leopold.palomo at upc.es (Leopold Palomo Avellaneda)
Date: Wed, 1 Oct 2003 10:24:21 +0200
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
References: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
Message-ID: <200310011001.31106.lepalom@vilma.upc.es>

A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> Dont overlook lm_sensors+cron
>
Why?


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  1 08:24:13 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 1 Oct 2003 08:24:13 -0400 (EDT)
Subject: RH8 vs RH9
In-Reply-To: <DD1E19A9AFC2D311A32200508B5589EF0519DA28@nnc.co.uk>
Message-ID: <Pine.LNX.4.44.0310010800030.18171-100000@lilith.rgb.private.net>

On Wed, 1 Oct 2003, Cannon, Andrew wrote:

> Hi All,
> 
> We have a small test cluster running RH8 which seems to work well. We are
> going to expand this cluster and I was wondering what, if any, are the
> advantages of installing the cluster using RH9 instead of RH8? Are there any
> disadvantages?

Many humans wonder about that, given the very short time that RH8 was
around before RH9 came out.  The usual rule is that major number
upgrades are associated with changes in core libraries that break binary
compatibility, so that binaries built for RH 8 are not guaranteed to
work for RH 9.

I think that the easiest way for you to determine precisely what changed
is to look at e.g. 

  ftp://ftp.dulug.duke.edu/pub/redhat/linux/9/en/os/i386/RELEASE-NOTES

and see if anything in there is important to your work.  Beyond that,
there are a few issues to consider:

   a) 8 will, probably fairly soon, be no longer maintained.  9 will be,
at least for a while (possibly for one more year).  Of course the
maintenance issue right now is very cloudy for RH in general with the
Fedora/RHEL situation a work in progress.  However, maintenance alone is
(in my opinion) a good reason to be using 9 and to move from 8 to 9 to
achieve it.  Fedora will likely be strongly derived from 9 and the
current rawhide in any event.  How the "community based" RH release will
end up being maintained is the interesting question.  One possibility is
"as rapidly as RHEL plus a few days", the difference being the time
required to download the GPL-required logo-free source rpm(s) after an
update and rebuild them and insert them into the community version.  Or
of course you can spring for a RHEL license (set) for your cluster,
which may or may not be reasonable in cost or scale well per node by the
time all the University-price dickering is done.

   b) 9 had some fairly significant library upgrades, service upgrades,
and bug fixes.  That doesn't mean 8 is "bad" -- it just means that your
chances of encountering trouble with 9 are in principle smaller than
with 8, and one hopes that the upgrades added a bit to performance as
well.

  c) A lot of the enhancements in 9 were more useful or relevant to
userspace and LAN client operation (CUPS or Open Office, for example)
than they were to cluster nodes.  So in that sense perhaps it doesn't
matter as much.

We're using 9 on a bunch of hosts and nodes with happiness.  We're also
using 7.3 (still) on a bunch of hosts and nodes with happiness.  We
skipped 8 only because they released 9 before we finished creating a
stable/tested 8 repository as RH changed their release cycle and dropped
the .0, .1 and so forth "correction" releases.

I don't know that we'll ever use RHEL with happiness unless RH charges
something like $1 per system as their university price (which isn't
insane, actually, given that an entire university can install and
maintain, as Duke does, off of a single campus-local repository largely
run by and debugged by and maintained by campus administrators, so RH's
costs don't scale at all strongly with the number of internal campus RH
systems).  

Fedora, quite possibly, but as noted we are fearful, uncertain, and
doubtful at the moment, for once because of real issues and not just as
a sort of Microsoft joke...

    rgb

> 
> Thanks
> 
> Andrew
> 
> Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
> Cheshire, WA16 8QZ.
> 
> Telephone; +44 (0) 1565 843768
> email: mailto:andrew.cannon at nnc.co.uk
> NNC website: http://www.nnc.co.uk
> 
> 
> 
> NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856).  The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ.
> 
> This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed.  If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lepalom at vilma.upc.es  Wed Oct  1 04:01:30 2003
From: lepalom at vilma.upc.es (Leopold Palomo Avellaneda)
Date: Wed, 1 Oct 2003 10:01:30 +0200
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
References: <Pine.LNX.4.44.0309301523100.11455-100000@rocky>
Message-ID: <200310011001.31106.lepalom@vilma.upc.es>

A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> Dont overlook lm_sensors+cron
>
Why?


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From thornton at yoyoweb.com  Wed Oct  1 08:34:40 2003
From: thornton at yoyoweb.com (Thornton Prime)
Date: Wed, 01 Oct 2003 05:34:40 -0700
Subject: RH8 vs RH9
In-Reply-To: <DD1E19A9AFC2D311A32200508B5589EF0519DA28@nnc.co.uk>
References: <DD1E19A9AFC2D311A32200508B5589EF0519DA28@nnc.co.uk>
Message-ID: <1065011679.1923.16.camel@localhost.localdomain>


> We have a small test cluster running RH8 which seems to work well. We are
> going to expand this cluster and I was wondering what, if any, are the
> advantages of installing the cluster using RH9 instead of RH8? Are there any
> disadvantages?

You should check out the release notes. On the whole, I'd say there
isn't much advantage unless you can take advantage of NTPL. Most of the
other enhancements were primarily for desktop users.

The next release should be 2.6-kernel ready, so rather than 9 you may
consider experimenting with Severn or Taroon.  Taroon has much better
support for 64-bit platforms, if you are headed there.

thornton

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  1 08:37:44 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 1 Oct 2003 08:37:44 -0400 (EDT)
Subject: Environment monitoring
In-Reply-To: <200310011001.31106.lepalom@vilma.upc.es>
Message-ID: <Pine.LNX.4.44.0310010824380.18171-100000@lilith.rgb.private.net>

On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote:

> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> > Dont overlook lm_sensors+cron
> >
> Why?

On a system equipped with an internal sensor, lm_sensors can often read
e.g. core CPU temperature on the system itself.  A polling cron script
can then read this and take action, e.g. initiate a shutdown if it
exceeds some threshold.

There are good and bad things about this.  A good thing is it addreses
the real problem -- overheating in the system itself -- and not room
temperature.  CPU's can overheat because of a fan failure when the room
remains cold, and a sensors-driven poweroff can then save your hardware
on a node by node basis.

The bad thing is that it does NOT give you any sort of measure of room
temperature per se, although if you have the poweroff script send you
mail first, getting deluged with N messages as the entire cluster shuts
down would be a good clue that your room cooling failed:-).  Also,
lm_sensors has the API from hell.  In fact, I would hardly call it an
API.  One has to pretty much craft a polling script on the basis of each
supported sensor independently, which requires you to know WAY more than
you ever wanted to about the particular sensor your system may or may
not have.

Alas, if only somebody would give the lm_sensors folks a copy of a good
book on XML for christmas, and they decided to take the monumental step
of converting /proc/sensors into a single xml-based file with the
RELEVANT information presented in toplevel tags like 

  <cpu_temp id="0" units="C">50.4</cpu_temp> 

and the irrelevant information presented in tags like

  <hardware><name>lm78</name><version>1.22a</version></hardware>

then we could ALL reap the fruits of their labor without needing a copy
of the lm78 version 1.22a API manual and having to write an application
that supports each of the sensors THROUGH THEIR INTERFACE one at a
time...;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rocky at atipa.com  Wed Oct  1 09:46:25 2003
From: rocky at atipa.com (Rocky McGaugh)
Date: Wed, 1 Oct 2003 08:46:25 -0500 (CDT)
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0310010824380.18171-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310010832310.15194-100000@rocky>


On Wed, 1 Oct 2003, Robert G. Brown wrote:

> Alas, if only somebody would give the lm_sensors folks a copy of a good
> book on XML for christmas, and they decided to take the monumental step
> of converting /proc/sensors into a single xml-based file with the
> RELEVANT information presented in toplevel tags like 
> 
>   <cpu_temp id="0" units="C">50.4</cpu_temp> 
> 
> and the irrelevant information presented in tags like
> 
>   <hardware><name>lm78</name><version>1.22a</version></hardware>
> 
> then we could ALL reap the fruits of their labor without needing a copy
> of the lm78 version 1.22a API manual and having to write an application
> that supports each of the sensors THROUGH THEIR INTERFACE one at a
> time...;-)

We have that. lm_sensors+cron+gmond. Nice little XML stream on every node
with every other nodes temps. One can keep a range of tolerance for cpu0, 
cpu1, motherboard, and disk temps and shutdown whenever you need to. 

a netbotz would be cooler though. i'd still use the lm_sensors+cron+gmond
and still have the netbotz as a toy..:)

-- 
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://67.8450073/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lepalom at upc.es  Wed Oct  1 10:13:46 2003
From: lepalom at upc.es (Leopold Palomo)
Date: Wed, 1 Oct 2003 16:13:46 +0200
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0310010824380.18171-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310010824380.18171-100000@lilith.rgb.private.net>
Message-ID: <200310011613.46297.lepalom@upc.es>

A Dimecres 01 Octubre 2003 14:37, Robert G. Brown va escriure:
> On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote:
> > A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> > > Dont overlook lm_sensors+cron
> >
> > Why?
>
> On a system equipped with an internal sensor, lm_sensors can often read
> e.g. core CPU temperature on the system itself.  A polling cron script
> can then read this and take action, e.g. initiate a shutdown if it
> exceeds some threshold.
>
> There are good and bad things about this.  A good thing is it addreses
> the real problem -- overheating in the system itself -- and not room
> temperature.  CPU's can overheat because of a fan failure when the room
> remains cold, and a sensors-driven poweroff can then save your hardware
> on a node by node basis.
>
> The bad thing is that it does NOT give you any sort of measure of room
> temperature per se, although if you have the poweroff script send you
> mail first, getting deluged with N messages as the entire cluster shuts
> down would be a good clue that your room cooling failed:-).  Also,
> lm_sensors has the API from hell.  In fact, I would hardly call it an
> API.  One has to pretty much craft a polling script on the basis of each
> supported sensor independently, which requires you to know WAY more than
> you ever wanted to about the particular sensor your system may or may
> not have.
>
> Alas, if only somebody would give the lm_sensors folks a copy of a good
> book on XML for christmas, and they decided to take the monumental step
> of converting /proc/sensors into a single xml-based file with the
> RELEVANT information presented in toplevel tags like
>
>   <cpu_temp id="0" units="C">50.4</cpu_temp>
>
> and the irrelevant information presented in tags like
>
>   <hardware><name>lm78</name><version>1.22a</version></hardware>
>
> then we could ALL reap the fruits of their labor without needing a copy
> of the lm78 version 1.22a API manual and having to write an application
> that supports each of the sensors THROUGH THEIR INTERFACE one at a
> time...;-)

Ok. I was a bit surprise about your sentence. I know that lmsensors is not 
perfect, but it does their job. Ok, I don't think that use lm_sensors to try 
to calculate the T of the room is a bit excesive.

About the xml,... well, ok, it would be a nice feature, but as plain text, 
knowing your hardware it's so good, too.

Best Regards.

Pd How about the pdf, ps, etc?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  1 10:33:29 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 1 Oct 2003 10:33:29 -0400 (EDT)
Subject: Environment monitoring
In-Reply-To: <200310011613.46297.lepalom@upc.es>
Message-ID: <Pine.LNX.4.44.0310011000290.18171-100000@lilith.rgb.private.net>

On Wed, 1 Oct 2003, Leopold Palomo wrote:

> Ok. I was a bit surprise about your sentence. I know that lmsensors is not 
> perfect, but it does their job. Ok, I don't think that use lm_sensors to try 
> to calculate the T of the room is a bit excesive.
> 
> About the xml,... well, ok, it would be a nice feature, but as plain text, 
> knowing your hardware it's so good, too.
> 

Sorry, I tend to get distracted and rant from time to time (even though
as Greg noted, sometimes the rants are of lesser quality:-). In this
particular case the rant is really directed to all of /proc, but the
sensors interface is the worst example of the lot.

I'm "entitled" to rant because I've written two tools (procstatd and
xmlsysd) that parse all sorts of data, including sensors data in
procstatd, out and provide it to clients for monitoring purposes.  Even
my daemon wasn't the first to do this, but I think it was one of the
first two that functioned as a binary without running a shell script or
the like on each node.  procstatd actually predated ganglia by a fair
bit, FWIW.

On the basis of this fairly extensive experience I can say that
lmsensors output is very poorly organized from the perspective of
somebody trying to write a general purpose parser to extract the data it
provides.  In particular, it uses a directory tree structure where the
PARTICULAR sensors interface that you have appears as part of the path,
and where what you find underneath that path depends on the particular
sensor that you've got as well.

Hopefully it is obvious how Evil this makes it from the point of view of
somebody trying to write a general purpose tool to parse it.  Basically,
to write such a tool one has to go through the lmsensors sources and
reverse engineer each interface it supports to determine what is
produced and where, one at a time.  This is more than slightly nuts.
What do "most" sensors provide?  Fields like cpu temperature (for cpu's
0-N), fan speed (for fans 0-N), core voltage (for lines 0-N).  

Sure, some provide more, some provide less, but what are we discussing?
The monitoring of cpu temperature, under the reasonable assumption that
either we have a sensor that provides it or we don't, and that we really
don't give a rodent's furry touchis WHICH sensor we have as long as it
gives us "CPU Temperature", preferrably for every CPU.

So a good API is one that has a single file entitled /proc/sensors, and
in that file one finds things like:

<?xml version="1.0"?>
<sensors>
  <cpu_temperature id="0" units="C">54.2</cpu_temperature>
  <cpu_temperature id="1" units="C">51.7</cpu_temperature>
  <fan_speed id="0"....

  <hardware>
    <type>lm78</type>
    <blah>...</blah>
    <blah>...</blah>
  </hardware>
</sensors>

I can write code to parse this in a few minutes of work, literally, and
the same code will work for all interfaces that lm_sensors might
support, and I don't need to know the interface the system has in it
beforehand (although with the knowledge I might add some advanced
features if it supports them).  Presenting the knowledge is also trivial
-- a web interface might be as sparse as a reader/parser and/or a DTD.

Compare to parsing something like (IIRC)

  /proc/sensors/device-with-a-bunch-of-numbers/subunit/field

where the path that you find under specific devices-with-numbers depends
on the toplevel value on a device by device basis and the contents of
field can as well.  Yech.

And Rocky, hiding the problem with gmond is fine, but then it puts the
burden for writing an API for the API on the poor people that have to
support the gmond interface.  Yes they can (and I could) do this.  I
personally refuse.  They obviously have gritted their teeth and done so.
The correct solution is clearly to redo the lm_sensors interface itself
so that it is organized as the above indicates.

Which criticism, by the way, applies to a LOT of /proc, which currently
looks like it was organized by a bunch of wild individualists who have
handled every emergent subfield by overloading its data in a single
"field" line, usually with documentation only in the form of reading
procps or kernel source.  Just because this is actually true doesn't
excuse it.  Parsing the contents of /proc is maddening for just this
reason, and the cost is a lot of needless complexity, pointless bugs and
upgrade incompatibilities for many people.  Putting the data into
xml-wrapped form would be a valuable exercise in the discipline of
structuring data, for the most part.

   rgb

> Best Regards.
> 
> Pd How about the pdf, ps, etc?

I'll try to work on this as soon as I can.  My task list for the day
looks something like a) debug/fix some dead nodes; b) add a requested
feature/view to wulfstat (that has been on hold for a week or more:-(,
c) work on a bunch of documents associated with teaching and curriculum
at Duke (sigh); d) about eight more tasks, none of which I will likely
get to, including work on my research.

However, this is about the third or fourth time people have requested
a "fix" for the ps/pdf/font issue (with acroread it can even fail
altogether to read the document -- presumably some gs/acrobat
incompatibility where I use gs-derived tools) so I'll try very hard to
craft some sort of fix by the weekend.

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Wed Oct  1 12:36:26 2003
From: becker at scyld.com (Donald Becker)
Date: Wed, 1 Oct 2003 12:36:26 -0400 (EDT)
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0310010832310.15194-100000@rocky>
Message-ID: <Pine.LNX.4.44.0310011137280.1770-100000@training.scyld.com>

On Wed, 1 Oct 2003, Rocky McGaugh wrote:

> On Wed, 1 Oct 2003, Robert G. Brown wrote:
> > Alas, if only somebody would give the lm_sensors folks a copy of a good
> > book on XML for christmas, and they decided to take the monumental step
...
> > then we could ALL reap the fruits of their labor without needing a copy
> > of the lm78 version 1.22a API manual and having to write an application
> > that supports each of the sensors THROUGH THEIR INTERFACE one at a
> > time...;-)
> 
> We have that. lm_sensors+cron+gmond.

I think you missed RGB's point.  The lm_sensors implementation sucks.
Sure, any one specific implementation can be justified.  But having each
implementation use a different output and calibration shows that this
is not an architecture, just a collection of hacks.

The usual reply at this point is "just update the user-level script for
the new motherboard type".  Yup... and you should probably update the
constants in your programs' delay loops at the same time.

With lm_sensors you can get a one-off hack working, but cannot implement
a general case.  Compare this to IPMI, which presents the same
information.  IPMI has a crufty design and ugly implementations, but it
is an architected system.  With care you can implement and deploy code
that works on a broad range of current and future machine.

While I'm on the soapbox, gmond deserves its own mini-butane-torch
flame.
I implemented the translator from Beostat (our status/statistics
subsystem) to gmond (per-machine information for Ganglia), so I have a
pretty good side-by-side comparison.

First, how did they choose what statistics to present?
Apparently just because the numbers were there.

What is the point of using a XML DTD if it is just used to
package undefined data types?  A wrapper around a wrapper...

Example metric lines:
<METRIC NAME="load_fifteen" VAL="1.41" TYPE="float" UNITS="" TN="246"
 TMAX="950" DMAX="0" SLOPE="both" SOURCE="gmond"/>
<METRIC NAME="proc_total" VAL="77" TYPE="uint32" UNITS="" TN="154"
   TMAX="950" DMAX="0" SLOPE="both" SOURCE="gmond"/>
Not only are these metric types not enumerated, they are made more
confusing by abbreviations and no definition.

To tie both together:  What is "proc_total"?
Number of processors?  Number of processes?  Does it count system
daemons?  It seems to be the useless number "ps x | wc", rather than
the number of end user, application processes.

Many statistics are only usable when used/presented as a set.  Why split
the numbers into multiple elements?  It just multiplies the size and
parsing load.

____
Background: Beostat is our status/statistics interface that we published
3+ years ago.  It exports interfaces at multiple levels:
    network protocol,
    shared memory table
       only for very performance sensitive programs, such as schedulers
    dynamic library
       the preferred interface for programs
    command output
Thus Beostat is a infrastructure subsystem, rather than a single-purpose
stack of programs.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From johnb at quadrics.com  Wed Oct  1 09:59:16 2003
From: johnb at quadrics.com (John Brookes)
Date: Wed, 1 Oct 2003 14:59:16 +0100
Subject: Upper bound on no. of sockets
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com>

I think there is a 1k per-process limit on open sockets. It's tuneable in
2.4 kernels, IIRC, but I don't remember how (off the top of my head).
'ulimit -n' adjusts the max number of fd's, but I'm not sure that'll take it
past a/the kernel limit. Maybe recompile kernel? Maybe poke
/proc/sys/.../...? Maybe adjust in userland?

Maybe use fewer sockets ;-)

Does anybody know the score?

Cheers,

John Brookes
Quadrics

> -----Original Message-----
> From: Balaji Rangasamy [mailto:br66 at HPCL.CSE.MsState.Edu]
> Sent: 30 September 2003 05:44
> To: beowulf at beowulf.org
> Subject: Upper bound on no. of sockets
> 
> 
> Hi,
> Is there an upper bound on the number of sockets that can be 
> created by a
> process? If there is one, is the limitation enforced by OS? 
> And what other
> factors does it depend on? Can you please be specific on the 
> numbers for
> different OS (RH Linux 7.2) ?
> Thank you very much,
> Balaji.
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rokrau at yahoo.com  Wed Oct  1 13:06:40 2003
From: rokrau at yahoo.com (Roland Krause)
Date: Wed, 1 Oct 2003 10:06:40 -0700 (PDT)
Subject: RH8 vs RH9 (Robert G. Brown)
In-Reply-To: <200310011504.h91F4DY02889@NewBlue.Scyld.com>
Message-ID: <20031001170640.91750.qmail@web40002.mail.yahoo.com>


--- beowulf-request at scyld.com wrote:
>    6. Re:RH8 vs RH9 (Robert G. Brown)
> From: "Robert G. Brown" <rgb at phy.duke.edu>
> Many humans wonder about that, given the very short time that RH8 was
> around before RH9 came out.  The usual rule is that major number
> upgrades are associated with changes in core libraries that break
> binary
> compatibility, so that binaries built for RH 8 are not guaranteed to
> work for RH 9.

Indeed some of them wont, I have first hand experience that binaries
produced with the Intel Fortran compiler on RH-8, even when statically
linked, will not run on a RH-9 system. Further, if you need the Intel
Fortan compiler, RH-9 is not really an option for you because it is not
officially supported and it will not be either. Inofficially I can
confirm that it works fine if you are not using the OpenMP capabilities
of the compiler. 

> achieve it.  Fedora will likely be strongly derived from 9 and the
> current rawhide in any event.  How the "community based" RH release
> will
> end up being maintained is the interesting question.  One possibility
> is
> "as rapidly as RHEL plus a few days", the difference being the time
> required to download the GPL-required logo-free source rpm(s) after
> an
> update and rebuild them and insert them into the community version. 

Having used fedora in the past on a desktop client I am hopeful that it
will be possible to get all necessary packages for a cluster into an
'aptable' repository, be it hosted by fedora or somewhere else (think
e.g.  sourceforge). If people work together, as they have in the past,
I dont see why RH would succeed pushing their rediculous price policies
upon cluster users.

Roland


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at math.ucdavis.edu  Wed Oct  1 18:10:14 2003
From: bill at math.ucdavis.edu (Bill Broadley)
Date: Wed, 1 Oct 2003 15:10:14 -0700
Subject: Environment monitoring
In-Reply-To: <Pine.LNX.4.44.0310011137280.1770-100000@training.scyld.com>
References: <Pine.LNX.4.44.0310010832310.15194-100000@rocky> <Pine.LNX.4.44.0310011137280.1770-100000@training.scyld.com>
Message-ID: <20031001221014.GA28394@sphere.math.ucdavis.edu>


I'd recommend:
	http://www.maxim-ic.com/quick_view2.cfm/qv_pk/2820

For $25.00 you have a trivial to interface to temperature probe that
even is smart enough to collect samples even if a machine is down
(complete with time stamp).  It will even build a histogram of temp
samples for you.

It's kinda cool that you can leave one in your luggage or send it up
in a space probe and then get periodic samples when you arrive at your
destination.

In anyways people use them for all kinds of things, even in space:
	http://www.voiceofidaho.org/tvnsp/01atchrn.htm

More info:
	http://www.ibutton.com/ibuttons/thermochron.html

They can also be connected via USB, Parallel, and serial.  The other
cool feature is they are chainable, so we have one behind the machine
(i.e. rack temp), one on top of the rack (room temp), and one at the
airconditioner output all on one wire.  Each button has a guarenteed
unique 64 bit ID.

Once you get a feel for the dynamics of the system it becomes really
easy to spot anomalies.

Recommended, the thermo buttons are cheaper, but IMO for most things
the thermocron premium is worth it so you can have continuous sampling
even if a machine crashes.

The logs are very handy for fighting when facilities to combat the well
it's not really getting that hot that often kinda thing.

Oh, I guess I should mention I have no financial ties to any of
the mentioned companies.  So no I won't sell you one.

-- 
Bill Broadley
Mathematics
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Wed Oct  1 18:36:35 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 01 Oct 2003 15:36:35 -0700
Subject: more on structural models for clusters
Message-ID: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>

In regards to my recent post looking for cluster implementations for 
structural dynamic models, I would like to add that I'm interested in 
"highly distributed" solutions where the computational load for each 
processor is very, very low, as opposed to fairly conventional (and widely 
available) schemes for replacing the Cray with a N-node cluster.

The number of processors would be comparable to the number of structural 
nodes (to a first order of magnitude)

Imagine you had something like a geodesic dome with a microprocessor at 
each vertex that wanted to compute the loads for that vertex, communicating 
only with the adjacent vertices...

Trivial, egregiously simplified,  and demo cases are just fine, and, in 
fact, probably preferable....


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Wed Oct  1 19:19:26 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Wed, 1 Oct 2003 16:19:26 -0700
Subject: RH8 vs RH9 (Robert G. Brown)
In-Reply-To: <20031001170640.91750.qmail@web40002.mail.yahoo.com>
References: <200310011504.h91F4DY02889@NewBlue.Scyld.com> <20031001170640.91750.qmail@web40002.mail.yahoo.com>
Message-ID: <20031001231926.GA2900@greglaptop.internal.keyresearch.com>

On Wed, Oct 01, 2003 at 10:06:40AM -0700, Roland Krause wrote:

> Inofficially I can confirm that it works fine if you are not using
> the OpenMP capabilities of the compiler.

Which is no surprise, as the thread library stuff changed fairly
radically in RedHat 9. I have some sympathy for Intel's compiler guys
on that issue.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  1 19:01:55 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 2 Oct 2003 09:01:55 +1000
Subject: RH8 vs RH9
In-Reply-To: <Pine.LNX.4.44.0310010800030.18171-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310010800030.18171-100000@lilith.rgb.private.net>
Message-ID: <200310020901.57000.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 1 Oct 2003 10:24 pm, Robert G. Brown wrote:

> a) 8 will, probably fairly soon, be no longer maintained.  9 will be,
> at least for a while (possibly for one more year).

Updates for 7.3 ends on December 31st 2003.
Updates for 8.0 ends on December 31st 2003.
Updates for 9 ends on April 30th 2004.

So going to 9 will only get you an extra 4 months of updates.

http://www.redhat.com/apps/support/errata/

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/e1zjO2KABBYQAh8RArjhAJoDUAq9xSKjz6pJ58nIvSk1GEqG2QCeJ7f3
5XYQ/rJIzUPP744CNvAOLXA=
=UNIB
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  1 18:58:21 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 2 Oct 2003 08:58:21 +1000
Subject: Environment monitoring
In-Reply-To: <200310011001.31106.lepalom@vilma.upc.es>
References: <Pine.LNX.4.44.0309301523100.11455-100000@rocky> <200310011001.31106.lepalom@vilma.upc.es>
Message-ID: <200310020858.30401.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 1 Oct 2003 06:21 pm, Leopold Palomo Avellaneda wrote:

> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> > Dont overlook lm_sensors+cron
>
> Why?

Presumably because you can use it to monitor the temp and fan sensors and 
stuff and raise alarms if they go out of bounds.

	http://secure.netroedge.com/~lm78/

And from the info page:

Project Mission / Background / Ethics:

 The primary mission for our project is to provide the best and most complete 
hardware health monitoring drivers for Linux. We strive to produce well 
organized, efficient, safe, flexible, and tested code free of charge to all 
Linux users using the Intel x86 hardware platform. The project attempts to 
support as many related devices as possible (when testing and documentation 
is available), especially those which are commonly included on mainboards.

 Our drivers provide the base software layer for utilities to acquire data on 
the environmental conditions of the hardware. We also provide a sample 
text-oriented utility to display sensor data. While this simple utility is 
sufficient for many users, others desire more elaborate user interfaces. We 
leave the development of these GUI-oriented utilities to others. See our 
useful addresses page for references. 

http://secure.netroedge.com/~lm78/info.html

NB: I've used these at home from time to time, but we don't use them on our
      IBM cluster as we can grab the same info out of CSM.

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/e1wQO2KABBYQAh8RApUxAJ0V9QuvuGOLCnS7qXCkWD+9/OrOlgCfezuT
QQ5wnTot9uoJCy3tRjuDKAQ=
=fDWX
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Wed Oct  1 18:27:58 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 01 Oct 2003 15:27:58 -0700
Subject: cluster computing for mechanical structural FEM models
Message-ID: <5.2.0.9.2.20031001152545.03110070@mailhost4.jpl.nasa.gov>

I'm looking for references to work on distributed computing for structural 
models like trusses and spaceframes.  They are typically sparse/diagonalish 
matrices that represent the masses and springs, so distributing the work in 
a cluster seems a natural fit.

Anybody done anything like this (as a demonstration, e.g.) say, using 
NASTRAN inputs?


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From vanw at tticluster.com  Thu Oct  2 08:37:50 2003
From: vanw at tticluster.com (Kevin Van Workum)
Date: Thu, 2 Oct 2003 08:37:50 -0400 (EDT)
Subject: lm_sensors output
Message-ID: <Pine.LNX.4.53.0310020833040.14105@tticluster.com>

The recent discussion on environment sensors motivated me to take the
subject more seriously. I therefore installed lm_senors on one of my nodes
for testing. I simply used the lm_sensors RPM from RH8.0, ran
sensors-detect and did what it told me to do. It apparently worked. The
problem is, I don't really know what the output means or what I should be
looking for. I guess I'm a novice. Anyways, the output from sensors is
shown below.

What is VCore and why is mine out of range?
What are all the other voltages describing?
V5SB is out of range also, is that a bad thing?
I have only 1 CPU, so I guess temp2 and fan2 are meaningless, right?

$ sensors
w83697hf-isa-0290
Adapter: ISA adapter
Algorithm: ISA algorithm
VCore:     +1.50 V  (min =  +0.00 V, max =  +0.00 V)
+3.3V:     +3.29 V  (min =  +2.97 V, max =  +3.63 V)
+5V:       +5.02 V  (min =  +4.50 V, max =  +5.48 V)
+12V:     +12.20 V  (min = +10.79 V, max = +13.11 V)
-12V:     -12.85 V  (min = -13.21 V, max = -10.90 V)
-5V:       -5.42 V  (min =  -5.51 V, max =  -4.51 V)
V5SB:      +5.51 V  (min =  +4.50 V, max =  +5.48 V)
VBat:      +3.29 V  (min =  +2.70 V, max =  +3.29 V)
fan1:     4687 RPM  (min =  187 RPM, div = 32)
fan2:        0 RPM  (min =  187 RPM, div = 32)
temp1:       +53?C  (limit =  +60?C, hysteresis = +127?C) sensor = thermistor
temp2:    +208.0?C  (limit =  +60?C, hysteresis =  +50?C) sensor = thermistor
alarms:
beep_enable:
          Sound alarm disabled

Kevin Van Workum, Ph.D.
www.tsunamictechnologies.com
ONLINE COMPUTER CLUSTERS

__/__ __/__ *
 /     /   /
/     /   /
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From AlberT at SuperAlberT.it  Thu Oct  2 03:35:57 2003
From: AlberT at SuperAlberT.it (AlberT)
Date: Thu, 2 Oct 2003 09:35:57 +0200
Subject: Upper bound on no. of sockets
In-Reply-To: <Pine.GSO.4.44.0309292328210.10011-100000@aurora.cs.msstate.edu>
References: <Pine.GSO.4.44.0309292328210.10011-100000@aurora.cs.msstate.edu>
Message-ID: <200310020935.58006.AlberT@SuperAlberT.it>

On Tuesday 30 September 2003 06:44, Balaji Rangasamy wrote:
> Hi,
> Is there an upper bound on the number of sockets that can be created by a
> process? If there is one, is the limitation enforced by OS? And what other
> factors does it depend on? Can you please be specific on the numbers for
> different OS (RH Linux 7.2) ?
> Thank you very much,
> Balaji.
>

from man setrlimit:

[quote]

getrlimit  and  setrlimit  get  and set resource limits respectively.  Each 
resource has an
       associated soft and hard limit, as defined by the rlimit structure (the  
rlim  argument  to
       both getrlimit() and setrlimit()):

            struct rlimit {
                rlim_t rlim_cur;   /* Soft limit */
                rlim_t rlim_max;   /* Hard limit (ceiling
                                      for rlim_cur) */
            };

       The  soft  limit is the value that the kernel enforces for the 
corresponding resource.  The
       hard limit acts as a ceiling for the soft limit: an unprivileged 
process may only  set  its
       soft  limit  to  a value in the range from 0 up to the hard limit, and 
(irreversibly) lower
       its hard limit.  A privileged process may make arbitrary changes to 
either limit value.

       The value RLIM_INFINITY denotes no limit on a resource (both in the 
structure  returned  by
       getrlimit() and in the structure passed to setrlimit()).

[snip]

 RLIMIT_NOFILE
              Specifies a value one greater than the maximum file descriptor 
number  that  can  be
              opened  by  this  process.   Attempts  (open(), pipe(), dup(), 
etc.)  to exceed this
              limit yield the error EMFILE.

[/QUOTE]


-- 
<?php echo '       Emiliano `AlberT` Gabrielli       '."\n".
           '  E-Mail: AlberT at SuperAlberT.it  '."\n".
           '  Web:    http://SuperAlberT.it  '."\n".
'  IRC:    #php,#AES azzurra.com '."\n".'ICQ: 158591185'; ?>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathog at mendel.bio.caltech.edu  Thu Oct  2 11:33:21 2003
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Thu, 02 Oct 2003 08:33:21 -0700
Subject: Environment monitoring
Message-ID: <E1A55SD-00071v-00@mendel.bio.caltech.edu>

Robert G. Brown  rgb at phy.duke.edu wrote:

>The bad thing is that it does NOT give you any sort of measure of room
>temperature per se,

Well, no, but to be fair that's hardly lm_sensors fault.
The problem is that few (any?) motherboards have a
sensor positioned away from hot devices on the upstream
end of the wind flow.  One can sometimes acquire a fair
approximation of this info using SMART from a hard drive
if the airflow across the drive is good and
the drive itself does not run very hot.  We have not yet
filled the second processor slot on the mobos of our beowulf
and that temperature sensor gives a pretty good indication
of the air temperature in the case (32C) vs. under a live
Athlon MP 2200+ processor (no load, 40.5C). 

We use lm_sensors with mondo 

  http://mondo-daemon.sourceforge.net/

to watch the systems and shut them down if they overheat.

Generally this works well.  Mondo can compensate for
the shortcomings of the lm_sensors/motherboard combos which
sometimes arise.  For instance, on our ASUS A7V266 mobos
(workstations, not in a beowulf!) some of the sensors tend
to go whacky for one or two measurements.  Fan speeds go
to 0 or temps to 255C.  Mondo is set to require an out
of range condition for 3 seconds before triggering
a shutdown, and so far we have not seen a glitch last that
long.

Regards,


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcmoore at atipa.com  Thu Oct  2 13:56:04 2003
From: jcmoore at atipa.com (Curt Moore)
Date: 02 Oct 2003 12:56:04 -0500
Subject: lm_sensors output
In-Reply-To: <Pine.LNX.4.53.0310020833040.14105@tticluster.com>
References: <Pine.LNX.4.53.0310020833040.14105@tticluster.com>
Message-ID: <1065117364.12473.27.camel@picard.lab.atipa.com>

This is really the bad thing about lm_sensors which some have touched on
previously; too much guesswork.  Many times even if the drivers are
present and up to date for your specific hardware, the values may be
meaningless as different board manufacturers may choose to physically
connect the monitoring chip(s) to different onboard devices, such as in
the case with fans.  You have to have a knowledge of which onboard piece
of hardware is connected to which input of the monitoring chip in order
to make sense of the sensors output.  Don't get me wrong, when
lm_sensors works, it works great but sometimes it takes a little work to
get to that point.

Even if the values are sane for your hardware, you still have to go into
the sensors.conf and set max, min, and hysteresis values, if you so
choose, in order to have this information make sense for your specific
hardware.

In recent months, vendors such as Tyan have begun to distribute
customized sensors.conf files for their boards which take into account
the differences between boards and how sensor chips are connected to the
onboard devices for each of their boards.

As Don mentioned earlier, IPMI is more generalized and is much easier to
ask for "CPU 1 Temperature" and actually get "CPU 1 Temperature" instead
of data from some other onboard thermistor.  A mistake in this area
could end up costing time and money if something overheats and it's not
detected because of polling the wrong data.

>From my experience, it would be very difficult to come up with a
generalized set of sensors values to work across differing motherboard
types.  A "standard" such as IPMI makes things much easier to accurately
collect and act upon as all of the "hard" work has already been done by
those implementing IPMI on the hardware.  One would hope that these
individuals would have the in-depth knowledge of exactly which values to
map to which sensor inputs and any computations needed for these values
so that clean and accurate values are returned when the hardware is
polled.

-Curt

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Curt Moore
Systems Integration Engineer
At?pa Technologies
jcmoore at atipa.com (O) 785-813-0312 (Fax) 785-841-1809
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From brbarret at osl.iu.edu  Thu Oct  2 13:05:54 2003
From: brbarret at osl.iu.edu (Brian Barrett)
Date: Thu, 2 Oct 2003 10:05:54 -0700
Subject: Upper bound on no. of sockets
In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com>
References: <010C86D15E4D1247B9A5DD312B7F5AA7E5E303@stegosaurus.bristol.quadrics.com>
Message-ID: <AF182665-F4FA-11D7-A069-000A959EBB70@osl.iu.edu>

On Oct 1, 2003, at 6:59 AM, John Brookes wrote:

> I think there is a 1k per-process limit on open sockets. It's tuneable 
> in
> 2.4 kernels, IIRC, but I don't remember how (off the top of my head).
> 'ulimit -n' adjusts the max number of fd's, but I'm not sure that'll 
> take it
> past a/the kernel limit. Maybe recompile kernel? Maybe poke
> /proc/sys/.../...? Maybe adjust in userland?
>
> Maybe use fewer sockets ;-)
>
> Does anybody know the score?

On linux, there is a default per-process limit of 1024 (hard and soft 
limits) file descriptors.  You can see the per-process limit by running 
limit (csh/tcsh) or ulimit -n (sh).  There is also a limit on the total 
number of file descriptors that the system can have open, which you can 
find by looking at /proc/sys/fs/file-max.  On my home machine, the max 
file descriptor count is around 104K (the default), so that probably 
isn't a worry for you.

There is the concept of a soft and hard limit for file descriptors.  
The soft limit is the "default limit", which is generally set to 
somewhere above the needs of most applications.  The soft limit can be 
increased by a normal user application up to the hard limit.  As I said 
before, the defaults for the soft and hard limits on modern linux 
machines are the same, at 1024.  You can adjust either limit by adding 
the appropriate lines in /etc/security/limits.conf (at least, that 
seems to be the file on both Red Hat and Debian).  In theory, you could 
set the limit up to file-max, but that probably isn't a good idea.  You 
really don't want to run your system out of file descriptors.

There is one other concern you might want to think about.  If you ever 
use any of the created file descriptors in a call to select(), you have 
to ensure all the select()ed file descriptors fit in an FD_SET.  On 
Linux, the size of an FD_SET is hard-coded at 1024 (on most of the 
BSDs, Solaris, and Mac OS X, it can be altered at application compile 
time).  So you may not want to ever set the soft limit above 1024.  
Some applications may expect that any file descriptor that was 
successfully created can be put into an FD_SET.  If this isn't the 
case, well, life could get interesting.


Hope this helps,

Brian

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csmith at lnxi.com  Thu Oct  2 11:45:40 2003
From: csmith at lnxi.com (Curtis Smith)
Date: Thu, 2 Oct 2003 09:45:40 -0600
Subject: lm_sensors output
References: <Pine.LNX.4.53.0310020833040.14105@tticluster.com>
Message-ID: <006001c388fc$3b67cb60$a423a8c0@blueberry>

VCore is the voltage of the CPU #1. You can get the full definition of all
values at http://www2.lm-sensors.nu/~lm78/.

Curtis Smith
Principle Software Engineer
Linux Networx Inc. (www.lnxi.com)

----- Original Message ----- 
From: "Kevin Van Workum" <vanw at tticluster.com>
To: <beowulf at beowulf.org>
Sent: Thursday, October 02, 2003 6:37 AM
Subject: lm_sensors output


> The recent discussion on environment sensors motivated me to take the
> subject more seriously. I therefore installed lm_senors on one of my nodes
> for testing. I simply used the lm_sensors RPM from RH8.0, ran
> sensors-detect and did what it told me to do. It apparently worked. The
> problem is, I don't really know what the output means or what I should be
> looking for. I guess I'm a novice. Anyways, the output from sensors is
> shown below.
>
> What is VCore and why is mine out of range?
> What are all the other voltages describing?
> V5SB is out of range also, is that a bad thing?
> I have only 1 CPU, so I guess temp2 and fan2 are meaningless, right?
>
> $ sensors
> w83697hf-isa-0290
> Adapter: ISA adapter
> Algorithm: ISA algorithm
> VCore:     +1.50 V  (min =  +0.00 V, max =  +0.00 V)
> +3.3V:     +3.29 V  (min =  +2.97 V, max =  +3.63 V)
> +5V:       +5.02 V  (min =  +4.50 V, max =  +5.48 V)
> +12V:     +12.20 V  (min = +10.79 V, max = +13.11 V)
> -12V:     -12.85 V  (min = -13.21 V, max = -10.90 V)
> -5V:       -5.42 V  (min =  -5.51 V, max =  -4.51 V)
> V5SB:      +5.51 V  (min =  +4.50 V, max =  +5.48 V)
> VBat:      +3.29 V  (min =  +2.70 V, max =  +3.29 V)
> fan1:     4687 RPM  (min =  187 RPM, div = 32)
> fan2:        0 RPM  (min =  187 RPM, div = 32)
> temp1:       +53?C  (limit =  +60?C, hysteresis = +127?C) sensor =
thermistor
> temp2:    +208.0?C  (limit =  +60?C, hysteresis =  +50?C) sensor =
thermistor
> alarms:
> beep_enable:
>           Sound alarm disabled
>
> Kevin Van Workum, Ph.D.
> www.tsunamictechnologies.com
> ONLINE COMPUTER CLUSTERS
>
> __/__ __/__ *
>  /     /   /
> /     /   /
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Oct  2 17:25:20 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 2 Oct 2003 17:25:20 -0400 (EDT)
Subject: Power Supply: Supermicro P4DL6 Board?
In-Reply-To: <Pine.LNX.4.44.0309301919260.1770-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.44.0310021716220.3304-100000@coffee.psychology.mcmaster.ca>

> > disks are much, much cooler than they used to be, probably dropping 
> > below power consumed by ram on most clusters.
> 
> Note that most performance-oriented RAM types now have metal cases and
> heat sinks.  They didn't add the metal because it _looks_ cool.

I'm not so sure.  I looked at the spec for a current samsung pc333 ddr
512Mb chip, and it works out to about 16W per GB.  I think most people 
still have 512MB dimms, and probably pc266 (13.6W/GB).  I don't really see 
why a dimm would have trouble dissipating ~20W, considering its size.
I suspect dimm heatsinks are actually a fashion statement inspired by the 
heat-spreaders found on some rambus rimms (which were *spreaders*, a
consequence of how rambus does power management...)

personally, I'm waiting till I can invest in peltier-cooled dimms ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Oct  2 19:08:39 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 3 Oct 2003 09:08:39 +1000
Subject: more on structural models for clusters
In-Reply-To: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>
References: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>
Message-ID: <200310030908.41322.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 2 Oct 2003 08:36 am, Jim Lux wrote:

> Imagine you had something like a geodesic dome with a microprocessor at
> each vertex that wanted to compute the loads for that vertex, communicating
> only with the adjacent vertices...

The nearest I can remember to something like that (which sounds like an 
excellent idea) was for a fault tolerant model built around processors 
connected in a grid where each monitored the neighbours and if one was seen 
to go bad it could be sent a kill signal and the grid would logically reform 
without that processor.

I think I read it in New Scientist between 1-4 years ago, but this abstract 
from the IEEE Transactions on Computers sounds similar (you've got to pay for 
the full article apparently):

	http://csdl.computer.org/comp/trans/tc/1988/11/t1414abs.htm

A Multiple Fault-Tolerant Processor Network Architecture for Pipeline 
Computing

Good luck!
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/fK/3O2KABBYQAh8RArPyAKCCoaQXbywrq9h+3geGOVCE97dhgQCeKzV0
B94q2Yd0yPYFwDbcVINl/4w=
=rbMB
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Thu Oct  2 20:39:33 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 02 Oct 2003 17:39:33 -0700
Subject: more on structural models for clusters
In-Reply-To: <20031003002932.GA5984@sphere.math.ucdavis.edu>
References: <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>
 <5.2.0.9.2.20031001153222.031145f8@mailhost4.jpl.nasa.gov>
Message-ID: <5.2.0.9.2.20031002173001.0310ce38@mailhost4.jpl.nasa.gov>

At 05:29 PM 10/2/2003 -0700, Bill Broadley wrote:
>On Wed, Oct 01, 2003 at 03:36:35PM -0700, Jim Lux wrote:
> > In regards to my recent post looking for cluster implementations for
> > structural dynamic models, I would like to add that I'm interested in
> > "highly distributed" solutions where the computational load for each
> > processor is very, very low, as opposed to fairly conventional (and widely
> > available) schemes for replacing the Cray with a N-node cluster.
> >
> > The number of processors would be comparable to the number of structural
> > nodes (to a first order of magnitude)
>
>Er, why bother?  Is there some reason to distribute those things so
>thinly?  Your average dell can do 1-4 Billion floating point ops/sec,
>why bother with so few per CPU?  Am I missing something?


Your average Dell isn't suited to inclusion as a MCU core in an ASIC at 
each node and would cost more than $10/node...  I'm looking at Z80/6502/low 
end DSP kinds of computational capability in a mesh containing, say, 
100,000 nodes.

Sure, we'd do algorithm development on a bigger machine, but in the end 
game, you're looking at zillions of fairly stupid nodes.  The commodity 
cluster aspect would only be in the development stages, and because it's 
much more likely that someone has solved the problem for a Beowulf (which 
is fairly loosely coupled and coarse grained) than for a big multiprocessor 
with tight coupling like a Cray.

Haven't fully defined the required performance yet, but, as a starting 
point, I'd need to "solve the system" in something like 100 
microseconds.  The key is that I need an algorithm for which the workload 
scales roughly linearly as a function of the number of nodes, because the 
computational power available also scales as the number of loads.

Clearly, I'm not going to do a brute force inversion or LU decomposition of 
a 100,000x100,000 matrix...  However, inverting 100,000 matrices, each, 
say, 10x10, is reasonable.

 >Bill Broadley
 >Mathematics
 >UC Davis

James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From he.94 at osu.edu  Thu Oct  2 20:59:55 2003
From: he.94 at osu.edu (Hao He)
Date: Thu, 02 Oct 2003 20:59:55 -0400
Subject: NFS Problem
Message-ID: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com>

Hi, there.

I am building  a cluster with 16 or 32 nodes, based on Pentium 4, Intel 875P
chipsets and Intel CSA Gigabit NIC.
The distribution is RedHat 9.
I have some experience before but I still got some problem in NFS.

Problem 1: When I just use 'rw' and 'intr' as the parameters used in
/etc/fstab, I got following problem when startup clients (while the server
with NFS daemon is running):
Mount: RPC: Remote system error -- no route to host
Then I added 'bg' to /etc/fstab, this time the result is better. Several
minutes after the client booted up, the remote directory mounted.
However, in many cases following meassage was prompted:
nfs warning: mount version older than kernel

Problem 2: I am mounting two remote directories from the server, however, at
some nodes, only one directory even no directory got mounted.
If only one directory mounted successfully, it differs from one client to
another, and to the same node, it changes from time to time at system
booting up, like dicing.
This really confused me.

Problem 3: Sometimes I got the message at the server node like this:
(scsi 0:A:0:0): Locking max tag count at 33.
However, seems it does not make trouble to mounted directories.
I think it must be related with NFS.

I have a further question: Since there may be 16 or 32 or even more clients
try to mount the remote directory at the same time,
can the NFS server really handle so much requests simultaneously? Is there
any effective alternate method to share data, besides NFS?

How to solve these problems? Any suggestion?
Thank you very much. I will appreciate your response.

Best wishes,
Hao He

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Fri Oct  3 01:13:37 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Fri, 3 Oct 2003 07:13:37 +0200
Subject: NFS Problem
In-Reply-To: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com>
References: <004601c38949$a8f98a90$a6a86ba4@H10152.findquick.com>
Message-ID: <20031003051337.GA6263@unthought.net>

On Thu, Oct 02, 2003 at 08:59:55PM -0400, Hao He wrote:
> Hi, there.
> 
> I am building  a cluster with 16 or 32 nodes, based on Pentium 4, Intel 875P
> chipsets and Intel CSA Gigabit NIC.
> The distribution is RedHat 9.
> I have some experience before but I still got some problem in NFS.
> 
> Problem 1: When I just use 'rw' and 'intr' as the parameters used in
> /etc/fstab, I got following problem when startup clients (while the server
> with NFS daemon is running):
> Mount: RPC: Remote system error -- no route to host

That's a network problem or a network configuration problem.

Usually this would be a name resolution problem. Check that the hostname
in your fstab can be resolved in early boot (add it to your hosts file
if necessary), or use the IP address of the server instead.

But the error message seems to indicate that it's not resolution but
routing - very odd...  Is the network up?   Do you have any special
networking setup?

Try checking your init-scripts to see that the network is really started
before the NFS filesystems are mounted.

> Then I added 'bg' to /etc/fstab, this time the result is better. Several
> minutes after the client booted up, the remote directory mounted.

So you NFS mount depends on something (network related) that isn't up at
the time when the system tries to mount your NFS filesystems.

Either you have a special (and wrong) setup, or RedHat messed up good :)

Check the order in which things are started in your /etc/rc3.d/
directory.  Network should go before NFS.

> However, in many cases following meassage was prompted:
> nfs warning: mount version older than kernel

Most likely this is not really a problem - I've had systems with that
message work just fine.

You could check to see if RedHat has updates to mount.

> 
> Problem 2: I am mounting two remote directories from the server, however, at
> some nodes, only one directory even no directory got mounted.
> If only one directory mounted successfully, it differs from one client to
> another, and to the same node, it changes from time to time at system
> booting up, like dicing.
> This really confused me.

Isn't this problem 1 over again?

> 
> Problem 3: Sometimes I got the message at the server node like this:
> (scsi 0:A:0:0): Locking max tag count at 33.

That's a SCSI diagnostic. You can ignore it.

> However, seems it does not make trouble to mounted directories.
> I think it must be related with NFS.

It's not related to NFS.

> 
> I have a further question: Since there may be 16 or 32 or even more clients
> try to mount the remote directory at the same time,
> can the NFS server really handle so much requests simultaneously? Is there
> any effective alternate method to share data, besides NFS?

That should be no problem at all.

NFS should be up to the task with no special tuning at all.

Once you have all your nodes mounting NFS properly, you can start
looking into tuning for performance - but it really should work 'out of
the box' with no special tweaking.

> 
> How to solve these problems? Any suggestion?
> Thank you very much. I will appreciate your response.

Use the following options to the NFS mounts in your fstab:
  hard,intr

You can add
  rsize=8192,wsize=8192
for tuning.

You should not need 'bg' - although it may be convenient if you need to
be able to boot your nodes when the NFS server is down.

One thing you should make sure: never use host-names or netgroups in
your exports file on the server (!)   *Only* use IP addresses or
wildcards - *Never* use names.  Using names in your 'exports' file on
the server can cause *all* kinds of weird sporadic irreproducible
problems - it's a long-standing and extremely annoying problem, but
fortunately one that has an easy workaround.

Check:
*) Server: Your exports file (only IP or wildcard exports)
*) Clients: Your fstab (use server IP or name in hosts file)
*) Clients: Is network started before NFS mount?

Please write to the list about your progress  :)

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Andrew.Cannon at nnc.co.uk  Fri Oct  3 05:30:07 2003
From: Andrew.Cannon at nnc.co.uk (Cannon, Andrew)
Date: Fri, 3 Oct 2003 10:30:07 +0100
Subject: Filesystem question (sort of newbie)
Message-ID: <DD1E19A9AFC2D311A32200508B5589EF0519DA41@nnc.co.uk>

Hi All,

I am going to be setting up a 16 node cluster in the near future. I have
only set up a 4 node cluster before and I am a little unsure about how to
sort out the disk space.

Each computer will be running Red Hat (either 8 or 9 I haven't decided yet,
any advice is still appreciated), and I was wondering how to best organise
the disks on each node. 

I am thinking (only started wondering about this today) of installing the
cluster software on the master node (pvm, MPI and the actual calculation
software, MCNP) and mounting the disk on each of the other nodes, so that
all they have on their hard drives is the minimal install of RH. The
question I am asking is, will this work and what sort of performance hit
will there be? Would I be better installing the software on each computer?

TIA (sorry for being so stoopid, I'm still very much a learner at linux and
clustering)

Andy

Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
Cheshire, WA16 8QZ.

Telephone; +44 (0) 1565 843768
email: mailto:andrew.cannon at nnc.co.uk
NNC website: http://www.nnc.co.uk


NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856).  The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ.

This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed.  If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Fri Oct  3 05:32:34 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Fri, 03 Oct 2003 05:32:34 -0400
Subject: Filesystem question (sort of newbie)
In-Reply-To: <DD1E19A9AFC2D311A32200508B5589EF0519DA41@nnc.co.uk>
References: <DD1E19A9AFC2D311A32200508B5589EF0519DA41@nnc.co.uk>
Message-ID: <3F7D4232.3070900@lmco.com>

Andrew,

   Let me recommend Warewulf (warewulf-cluster.org). It
boots the nodes using RH 7.3 (although it should work with
8 or but I haven't tested it), but it boots into a small Ram
Disk (about 70 megs depending upon what you need on the
nodes). It's very easy to setup, configure and use, plus you
don't need to install RH on each node. Warewulf will use
a hard disk in the nodes if available for swap and local scratch
space. However, it will also work with diskless nodes (although
you don't get swap or scratch space).
   Warewulf will also take /home from the master node and
NFS mount it throughout the cluster. So you can install your
code on /home for all of the nodes.

Good Luck!

Jeff

> Hi All,
>
> I am going to be setting up a 16 node cluster in the near future. I have
> only set up a 4 node cluster before and I am a little unsure about how to
> sort out the disk space.
>
> Each computer will be running Red Hat (either 8 or 9 I haven't decided 
> yet,
> any advice is still appreciated), and I was wondering how to best 
> organise
> the disks on each node.
>
> I am thinking (only started wondering about this today) of installing the
> cluster software on the master node (pvm, MPI and the actual calculation
> software, MCNP) and mounting the disk on each of the other nodes, so that
> all they have on their hard drives is the minimal install of RH. The
> question I am asking is, will this work and what sort of performance hit
> will there be? Would I be better installing the software on each 
> computer?
>
> TIA (sorry for being so stoopid, I'm still very much a learner at 
> linux and
> clustering)
>
> Andy
>
> Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
> Cheshire, WA16 8QZ.
>
> Telephone; +44 (0) 1565 843768
> email: mailto:andrew.cannon at nnc.co.uk
> NNC website: http://www.nnc.co.uk
>
>
>
> NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC 
> Limited (no. 1120437), National Nuclear Corporation Limited (no. 
> 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited 
> (no. 235856).  The registered office of each company is at Booths 
> Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for 
> Technica-NNC Limited whose registered office is at 6 Union Row, 
> Aberdeen AB10 1DQ.
>
> This email and any files transmitted with it have been sent to you by 
> the relevant UK operating company and are confidential and intended 
> for the use of the individual or entity to whom they are addressed.  
> If you have received this e-mail in error please notify the NNC system 
> manager by e-mail at eadm at nnc.co.uk.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Dr. Jeff Layton
Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Fri Oct  3 08:59:52 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Fri, 03 Oct 2003 08:59:52 -0400
Subject: Filesystem question (sort of newbie)
In-Reply-To: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>
Message-ID: <3F7D72C8.3050409@lmco.com>

Mark Hahn wrote:

> > 8 or but I haven't tested it), but it boots into a small Ram
> > Disk (about 70 megs depending upon what you need on the
>
> alternately, it's almost trivial to PXE boot nodes, mount a simple
> root FS from a server/master, and use the local disk, if any, for
> swap and/or tmp.  one nice thing about this is that you can do it
> with any distribution you like - mine's RH8, for instance.
>
> personally, I prefer the nfs-root approach, probably because once
> you boot, you won't be wasting any ram with boot-only files.
> for a cluster of 48 nodes, there seems to be no drawback;
> for a much larger cluster, I expect all the boot-time traffic
> would be crippling, and you might want to use some kind of
> multicast to distribute a ramdisk image just once...
>

While I don't prefer the nfs-root approach, Warewulf
can do that as well (haven't tried it personally). What
kind of network do you use for the 48-node cluster?
Anybody else use the nfs-root approach?

The 70 megs used in the ram disk is pretty well thought
out. There are some basic things to boot the node, but
it also includes glibc and you can easily add MPICH,
LAM, Ganglia, SGE, etc. The developer has thought
out these packages very well so that only the pieces
of each of these packages that needs to be on the nodes
actually gets installed on the nodes. Very well thought
out.

Oh, one other thing. The image that goes to the nodes
via TFTP (over PXE) is compressed so it's about half
the size of the final ram disk. This really helps cut
down on network traffic (even works over my poor
rtl8139 network).

One of the things I'd like to experiment with is using
squasfs to reduce the size of the ram disk. IMHO, 70
megs is not very big, but reducing it to 30-40 Megs
might be worth the effort.

> regards, mark hahn.
>

Thanks!

Jeff


-- 
Dr. Jeff Layton
Senior Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Fri Oct  3 09:34:30 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Fri, 3 Oct 2003 09:34:30 -0400 (EDT)
Subject: Filesystem question (sort of newbie)
In-Reply-To: <3F7D4232.3070900@lmco.com>
Message-ID: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>

> 8 or but I haven't tested it), but it boots into a small Ram
> Disk (about 70 megs depending upon what you need on the

alternately, it's almost trivial to PXE boot nodes, mount a simple
root FS from a server/master, and use the local disk, if any, for 
swap and/or tmp.  one nice thing about this is that you can do it
with any distribution you like - mine's RH8, for instance.

personally, I prefer the nfs-root approach, probably because once
you boot, you won't be wasting any ram with boot-only files.
for a cluster of 48 nodes, there seems to be no drawback;
for a much larger cluster, I expect all the boot-time traffic 
would be crippling, and you might want to use some kind of 
multicast to distribute a ramdisk image just once...

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct  3 11:24:48 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 3 Oct 2003 11:24:48 -0400 (EDT)
Subject: Filesystem question (sort of newbie)
In-Reply-To: <DD1E19A9AFC2D311A32200508B5589EF0519DA41@nnc.co.uk>
Message-ID: <Pine.LNX.4.44.0310031112570.22807-100000@ganesh.phy.duke.edu>

On Fri, 3 Oct 2003, Cannon, Andrew wrote:

> Each computer will be running Red Hat (either 8 or 9 I haven't decided yet,
> any advice is still appreciated), and I was wondering how to best organise
> the disks on each node. 
> 
> I am thinking (only started wondering about this today) of installing the
> cluster software on the master node (pvm, MPI and the actual calculation
> software, MCNP) and mounting the disk on each of the other nodes, so that
> all they have on their hard drives is the minimal install of RH. The
> question I am asking is, will this work and what sort of performance hit
> will there be? Would I be better installing the software on each computer?
> 
> TIA (sorry for being so stoopid, I'm still very much a learner at linux and
> clustering)

If the nodes have lots of memory, most of their access to non-data disk
(programs and libraries) will come out of caches after the systems have
been up for a while, so they won't take a HUGE performance hit, but
things like loading a big program for the first time may take longer.

However, if you work to master PXE and kickstart (which go together like
ham and eggs) and have adequate disk, in the long run your maintenance
will be minimized by putting energy into developing a node kickstart
script.  Then you just boot the nodes into kickstart over the network,
wait a few minutes for the install and boot into production.

This will take you some time to learn (there are HOWTO-like resource
online, so it isn't a LOT of time) and if you got nodes with NICs that
don't support PXE you'll likely want to replace them or add ones that
do, but once you invest these capital costs the payback is that your
marginal cost for installing additional nodes after the first node you
get to install "perfectly" is so close to zero as to make no nevermind.
Make a dhcp table entry.  Boot node into install.  Boot node.
Reinstalling is exactly the same process and can be done in minutes if a
hard disk crashes.

It gets to be so easy that we almost routinely do a reinstall after
working on a system for any reason, including ones where it probably
isn't necessary.  You can reinstall a system from anywhere on the
internet (if your hardware is accessible and preconfigured for this to
work).  

Finally, if you include yum on the nodes, you can automagically update
the nodes from a master repository image on your server, and mirror your
server image from one of the Red hat mirrors, and actually maintain a
stream of updates onto the nodes with no further action on your part.

At this point, if you aren't doing Scyld or one of the preconfigured
cluster packages and want to roll your own cluster out of a base install
plus selected RPMs (and why not?) PXE+kickstart/RH+yum forms a pretty
solid low-energy paradigm for installation and maintenance once you've
learned how to make it work.

   rgb

> 
> Andy
> 
> Andrew Cannon, Nuclear Technology (J2), NNC Ltd, Booths Hall, Knutsford,
> Cheshire, WA16 8QZ.
> 
> Telephone; +44 (0) 1565 843768
> email: mailto:andrew.cannon at nnc.co.uk
> NNC website: http://www.nnc.co.uk
> 
> 
> 
> NNC's UK Operating Companies : NNC Holdings Limited (no. 3725076), NNC Limited (no. 1120437), National Nuclear Corporation Limited (no. 2290928), STATS-NNC Limited (no. 4339062) and Technica-NNC Limited (no. 235856).  The registered office of each company is at Booths Hall, Chelford Road, Knutsford, Cheshire WA16 8QZ except for Technica-NNC Limited whose registered office is at 6 Union Row, Aberdeen AB10 1DQ.
> 
> This email and any files transmitted with it have been sent to you by the relevant UK operating company and are confidential and intended for the use of the individual or entity to whom they are addressed.  If you have received this e-mail in error please notify the NNC system manager by e-mail at eadm at nnc.co.uk.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ajiang at mail.eecis.udel.edu  Sat Oct  4 12:00:51 2003
From: ajiang at mail.eecis.udel.edu (Ao Jiang)
Date: Sat, 4 Oct 2003 12:00:51 -0400 (EDT)
Subject: Help: About Intel Fortran Compiler:
Message-ID: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>

  Hi, All:
  I tried to compile a Fortran 90 MPI program by
the Intel Frotran Compiler in the OSCAR cluster.
  I run the command:
"
ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
"
 The system failed to compile it and gave me the following information:
"
   module EHFIELD
   program FDTD3DPML
   external function RISEF
   external function WINDOWFUNCTION
   external function SIGMA
   external function GETISTART
   external function GETIEND
   external subroutine COM_EYZ
   external subroutine COM_EYX
   external subroutine COM_EZX
   external subroutine COM_EZY
   external subroutine COM_HYZ
   external subroutine COM_HYX
   external subroutine COM_HZX
   external subroutine COM_HZY

3228 Lines Compiled
/tmp/ifcVao851.o(.text+0x5a): In function `main':
: undefined reference to `mpi_init_'
/tmp/ifcVao851.o(.text+0x6e): In function `main':
: undefined reference to `mpi_comm_rank_'
/tmp/ifcVao851.o(.text+0x82): In function `main':
: undefined reference to `mpi_comm_size_'
/tmp/ifcVao851.o(.text+0xab): In function `main':
: undefined reference to `mpi_wtime_'
/tmp/ifcVao851.o(.text+0x422): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x448): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x47b): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x49e): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x4c1): In function `main':
: undefined reference to `mpi_bcast_'
/tmp/ifcVao851.o(.text+0x4e7): more undefined references to `mpi_bcast_'
follow
/tmp/ifcVao851.o(.text+0x24511): In function `com_hzy_':
: undefined reference to `mpi_recv_'
/tmp/ifcVao851.o(.text+0x24b76): In function `com_hzy_':
: undefined reference to `mpi_send_'
"

At the same time, I tried the same program in the other scyld cluster,
using NAG compiler.

I use command:
"
f95 -I/usr/include -lmpi -lm -o p_wg3 p_fdtd3dwg3.f90
"

It works fine. So that means my fortran program in fine.

Both of the cluster use the MPICH implementation.

But because I have to work on that OSCAR cluster with Intel compiler,
I wonder
1. why the errors happen?
2. Is the problem of cluster or the Intel compiler?
3. How I can solve it.

I know there are a lot of guy with experience and experts of cluster and
MPI in this mailing list. I appreciate your suggestion and advice from
you.

Thanks.

Tom


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From br66 at HPCL.CSE.MsState.Edu  Sun Oct  5 00:52:45 2003
From: br66 at HPCL.CSE.MsState.Edu (Balaji Rangasamy)
Date: Sat, 4 Oct 2003 23:52:45 -0500 (CDT)
Subject: Upper bound on no. of sockets
In-Reply-To: <LDENIIHGLJNAFHFLJNOJGEGGCFAA.yudong@hsb.gsfc.nasa.gov>
Message-ID: <Pine.GSO.4.44.0310042336430.28474-100000@aurora.cs.msstate.edu>

Thanks a billion for all the responses. Here is another question: Is there
a way to
send some data to the listener when I do a connect()? I tried using
sin_zero field of the sockaddr_in structure, but quite unsuccessfully. The
problem is I want to uniquely identify the actively connecting process (IP
address and port number information wont suffice). I can send() the
identifier value to the listener after the connect(), but I want to cut
down the cost of an additional send. Any suggestions are greatly
appreciated.
Thanks,
Balaji.

PS: I am not sure if it is appropriate to send this question to this
mailing list. My sincere apologies for those who find this question
annoyingly incongruous.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at yahoo.com  Sat Oct  4 15:15:08 2003
From: lathama at yahoo.com (Andrew Latham)
Date: Sat, 4 Oct 2003 12:15:08 -0700 (PDT)
Subject: Filesystem question (sort of newbie)
In-Reply-To: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20031004191508.27391.qmail@web60306.mail.yahoo.com>

This is by far my favorite approach however I tend to tweak it with a very
large initrd and custom kernel. I am using older hardware with its max ram so I
use it as best I can. with no local harddisk I am always looking at the best
method of network file access and have gone so far as to try wget with http.


--- Mark Hahn <hahn at physics.mcmaster.ca> wrote:
> > 8 or but I haven't tested it), but it boots into a small Ram
> > Disk (about 70 megs depending upon what you need on the
> 
> alternately, it's almost trivial to PXE boot nodes, mount a simple
> root FS from a server/master, and use the local disk, if any, for 
> swap and/or tmp.  one nice thing about this is that you can do it
> with any distribution you like - mine's RH8, for instance.
> 
> personally, I prefer the nfs-root approach, probably because once
> you boot, you won't be wasting any ram with boot-only files.
> for a cluster of 48 nodes, there seems to be no drawback;
> for a much larger cluster, I expect all the boot-time traffic 
> would be crippling, and you might want to use some kind of 
> multicast to distribute a ramdisk image just once...
> 
> regards, mark hahn.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

=====
Andrew Latham

Penguin loving, moralist agnostic.

LathamA.com - (lay-th-ham-eh)
lathama at lathama.com - lathama at yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From franz.marini at mi.infn.it  Mon Oct  6 04:21:50 2003
From: franz.marini at mi.infn.it (Franz Marini)
Date: Mon, 6 Oct 2003 10:21:50 +0200 (CEST)
Subject: Help: About Intel Fortran Compiler:
In-Reply-To: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>
References: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>
Message-ID: <Pine.LNX.4.53.0310061010350.3375@merlino.mi.infn.it>

On Sat, 4 Oct 2003, Ao Jiang wrote:

> ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90

Try with :

  ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90

Btw, a cleaner way to compile mpi programs is to use the mpif90 
(mpif77 for fortran77) command (which is a wrapper for the real compiler). 
You should be able to make it use the ifc by setting the MPICH_F90 
(MPICH_F77 for fortran77) and MPICH_F90LINKER environment variables to 
choose which compiler to use, e.g. let's say you want to use the ifc 
compiler, and you're using bash, you would have to do:

  export MPICH_F90=ifc
  export MPICH_F90LINKER=ifc

and then, in order to compile your mpi program you should issue the 
command:

  mpif90 -o p_wg3 p_fdtd3dwg3_pml.f90


> 2. Is the problem of cluster or the Intel compiler?

Neither. Intel works fine with Oscar. 

Have a good day,

F.


---------------------------------------------------------
Franz Marini
Sys Admin and Software Analyst,
Dept. of Physics, University of Milan, Italy.
email : franz.marini at mi.infn.it
phone : +390250317221
--------------------------------------------------------- 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Mon Oct  6 07:22:35 2003
From: ds10025 at cam.ac.uk (ds10025 at cam.ac.uk)
Date: Mon, 06 Oct 2003 12:22:35 +0100
Subject: Copy files between Nodes via NFS
In-Reply-To: <Pine.GSO.4.44.0309171050160.16964-100000@verdun.ks.uiuc.ed
 u>
References: <003f01c37d29$47d7ec10$0e01010a@hpcncd.cpe.ku.ac.th>
Message-ID: <5.0.2.1.0.20031006121907.03a25120@hermes.cam.ac.uk>

Morning

I have basic node PC that NFS mount directories from the master node. When 
I try to copy files using 'cp' from the node to NFS mounted directory the 
node PC just hang.

Have any comes across this problem?


How best move/copy files across nodes?

Regards

Dan

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Mon Oct  6 07:21:02 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Mon, 06 Oct 2003 13:21:02 +0200
Subject: Intel compilers and libraries
Message-ID: <20031006112102.GC15837@sgirmn.pluri.ucm.es>

Hello:

We are thinking about purchasing the Intel C++ compiler for linux,
mainly for getting the most of our harware (Xeon 2.4Gz processors), we
are also interested in the Intel MKL (Math Kernel Library), I would like
to know if the performance gain using Intel compiler+libraries, which exploit
SSE2 and make other optimizations for P4/Xeon, are as good as Intel
claims, anyone in the list using those products?

On the other hand, isn't MKL just as good as any other good math library compiled
with Xeon/P4 optimization and extensions (using Intel C++ compiler for
example).

Another question, the only difference I can see reading Intel docs between
P4 and Xeon is more cache on Xeon, and HyperThreading (below P4/Xeon 3Ghz),
does it really makes a big difference taking into account the much more
expensive Xeons are. Any one having experience with both platforms.

Greetings:

Jose M. P?rez.
Madrid. Spain.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From j.a.white at larc.nasa.gov  Mon Oct  6 10:05:07 2003
From: j.a.white at larc.nasa.gov (Jeffery A. White)
Date: Mon, 06 Oct 2003 10:05:07 -0400
Subject: undefined references to pthread related calls
Message-ID: <3F817693.9040001@larc.nasa.gov>

Hi group,

  I have a user of my software (a f90 based CFD code using mpich) that 
is haveing trouble installing my code on
their system. They are using mpich and the Intel version 7.1 ifc 
compiler. The problem occurs at the link step.
They are getting undefined references to what appear to be system calls 
to pthread related functions such as
pthread_self, pthread_equal, pthread_mutex_lock. Does any one else 
encountered and know how to fix this problem?

Thanks,

Jeff


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel.kidger at quadrics.com  Mon Oct  6 03:23:32 2003
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Mon, 6 Oct 2003 08:23:32 +0100
Subject: Help: About Intel Fortran Compiler calling mpich
In-Reply-To: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>
References: <Pine.GSO.4.33.0310041146430.11979-100000@stimpy.eecis.udel.edu>
Message-ID: <200310060823.32239.daniel.kidger@quadrics.com>

Tom,
   this is the standard old chestnut about Fortran and trailing underscores on 
function names.

if you do say ' nm -a /opt/mpich-1.2.5/lib/libmpi.a |grep -i mpi_comm_rank' 
 I expect you will see  2 trailing underscores.  

Different Fortran vendors add a different number of underscores - some add 2 
by default (eg g77), some one (eg ifc), and some none. Sometimes there is a a 
compiler option to change this. 

There are three solutions to this issue:

1/ (Lazy option) recompile mpich several times; once with each Fortran 
compiler you have.
2/ Compile your application with the option that matches your prebuilt mpich 
(presumably 2 underscores - but note that ifc doesn't have an option for 
this)
3/ rebuild mpich with '-fno-second-underscore' (using say g77) . This is the 
common ground. You can link code to this with all current Fortran compilers.

You may also meet the 'mpi_getarg, x_argc' issue - this too is easy to fix.

-- 
Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

On Saturday 04 October 2003 5:00 pm, Ao Jiang wrote:
>   Hi, All:
>   I tried to compile a Fortran 90 MPI program by
> the Intel Frotran Compiler in the OSCAR cluster.
>   I run the command:
> "
> ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
> "
>  The system failed to compile it and gave me the following information:
> "
>    module EHFIELD
>    program FDTD3DPML
>    external function RISEF
>    external function WINDOWFUNCTION
>    external function SIGMA
>    external function GETISTART
>    external function GETIEND
>    external subroutine COM_EYZ
>    external subroutine COM_EYX
>    external subroutine COM_EZX
>    external subroutine COM_EZY
>    external subroutine COM_HYZ
>    external subroutine COM_HYX
>    external subroutine COM_HZX
>    external subroutine COM_HZY
>
> 3228 Lines Compiled
>
> /tmp/ifcVao851.o(.text+0x5a): In function `main':
> : undefined reference to `mpi_init_'
>
> /tmp/ifcVao851.o(.text+0x6e): In function `main':
> : undefined reference to `mpi_comm_rank_'
>
> /tmp/ifcVao851.o(.text+0x82): In function `main':
> : undefined reference to `mpi_comm_size_'
>
> /tmp/ifcVao851.o(.text+0xab): In function `main':
> : undefined reference to `mpi_wtime_'
>
> /tmp/ifcVao851.o(.text+0x422): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x448): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x47b): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x49e): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x4c1): In function `main':
> : undefined reference to `mpi_bcast_'
>
> /tmp/ifcVao851.o(.text+0x4e7): more undefined references to `mpi_bcast_'
> follow
>
> /tmp/ifcVao851.o(.text+0x24511): In function `com_hzy_':
> : undefined reference to `mpi_recv_'
>
> /tmp/ifcVao851.o(.text+0x24b76): In function `com_hzy_':
> : undefined reference to `mpi_send_'
>
> "
>
> At the same time, I tried the same program in the other scyld cluster,
> using NAG compiler.
>
> I use command:
> "
> f95 -I/usr/include -lmpi -lm -o p_wg3 p_fdtd3dwg3.f90
> "
>
> It works fine. So that means my fortran program in fine.
>
> Both of the cluster use the MPICH implementation.
>
> But because I have to work on that OSCAR cluster with Intel compiler,
> I wonder
> 1. why the errors happen?
> 2. Is the problem of cluster or the Intel compiler?
> 3. How I can solve it.
>
> I know there are a lot of guy with experience and experts of cluster and
> MPI in this mailing list. I appreciate your suggestion and advice from
> you.
>
> Thanks.
>
> Tom
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Mon Oct  6 10:54:43 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Mon, 6 Oct 2003 14:54:43 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
Message-ID: <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>

Jose,

Pardon me for some advertising here, but our OptimaNumerics Linear
Algebra Library can very significantly outperform Intel MKL.
Depending on the particular routine and platform, we have seen
performance advantage of almost 32x (yes, that's 32 times!) using
OptimaNumerics Linear Algebra Library!

I can send you one of our white papers which shows performance
benchmark details off-line.  If anyone else is interested, please do
send me an e-mail also.


Best wishes,
Kenneth Tan
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------
This e-mail (and any attachments) is confidential and privileged.  It
is intended only for the addressee(s) stated above.  If you are not an
addressee, please accept my apologies and please do not use,
disseminate, disclose, copy, publish or distribute information in this
e-mail nor take any action through knowledge of its contents: to do so
is strictly prohibited and may be unlawful.  Please inform me that
this e-mail has gone astray, and delete this e-mail from your system.
Thank you for your co-operation.
-----------------------------------------------------------------------


On Mon, 6 Oct 2003, [iso-8859-1] Jos? M. P?rez S?nchez wrote:

> Date: Mon, 06 Oct 2003 13:21:02 +0200
> From: "[iso-8859-1] Jos? M. P?rez S?nchez" <iosephus at sgirmn.pluri.ucm.es>
> To: beowulf at beowulf.org
> Subject: Intel compilers and libraries
>
> Hello:
>
> We are thinking about purchasing the Intel C++ compiler for linux,
> mainly for getting the most of our harware (Xeon 2.4Gz processors), we
> are also interested in the Intel MKL (Math Kernel Library), I would like
> to know if the performance gain using Intel compiler+libraries, which exploit
> SSE2 and make other optimizations for P4/Xeon, are as good as Intel
> claims, anyone in the list using those products?
>
> On the other hand, isn't MKL just as good as any other good math library compiled
> with Xeon/P4 optimization and extensions (using Intel C++ compiler for
> example).
>
> Another question, the only difference I can see reading Intel docs between
> P4 and Xeon is more cache on Xeon, and HyperThreading (below P4/Xeon 3Ghz),
> does it really makes a big difference taking into account the much more
> expensive Xeons are. Any one having experience with both platforms.
>
> Greetings:
>
> Jose M. P?rez.
> Madrid. Spain.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From clwang at csis.hku.hk  Sun Oct  5 21:59:34 2003
From: clwang at csis.hku.hk (Cho Li Wang)
Date: Mon, 06 Oct 2003 09:59:34 +0800
Subject: Cluster2003: Call for Participation (Preliminary)
Message-ID: <3F80CC86.FAFBFAD2@csis.hku.hk>

----------------------------------------------------------------------

                         CALL FOR PARTICIPATION
 
       2003 IEEE International Conference on Cluster Computing
                         1 - 4 December 2003
                   Sheraton Hong Kong Hotel & Towers,
                   Tsim Sha Tsui, Kowloon, Hong Kong
                URL: http://www.csis.hku.hk/cluster2003/

                            Cosponsored by
                        IEEE Computer Society
          IEEE Computer Society Task Force on Cluster Computing
                IEEE Hong Kong Section Computer Chapter
                     The University of Hong Kong

                         Industrial Sponsors : 
                      Hewlett-Packard, Microsoft, 
                IBM, Extreme Networks, Sun Microsystems, 
                      Intel, Dawning, and Dell.
-----------------------------------------------------------------------

Dear Friends,

You are cordially invited to participate the annual international
cluster computing conference to be held on Dec. 1-4, 2003 in Hong Kong,
the most dynamic city in the Orient. The Cluster series of conferences
is one of the flagship events sponsored by the IEEE Task Force on
Cluster Computing (TFCC) since its inception in 1999. The competition
among refereed papers was particularly strong this year, with 48 papers
being selected as full papers from the 164 papers that were submitted,
for a 29% acceptance rate. An additional 19 papers were selected for
poster presentation. Besides the technical paper presentation, there
will be three keynotes, four tutorials, one panel, a Grid live demo
session, and a number of invited talks and exhibits to be arranged
during the conference period. A preliminary program schedule is attached
below. Please share this Call for Participation information with your
colleagues working in the area of cluster computing. For registration,
please visit our registration web page at:

     http://www.csis.hku.hk/cluster2003/registration.htm
  (The deadline for advance registration is October 22, 2003.)

TCPP Awards will be granted to students members, and will partially 
cover the registration and travel cost to attend the conference.
See : http://www.caip.rutgers.edu/~parashar/TCPP/TCPP-Awards.htm

We look forward meeting you in Hong Kong!


Cho-Li Wang and Daniel Katz
Cluster2003, Program Co-chairs

------------------------------------------------------------------
*****************************************
Cluster 2003 Preliminary Program Schedule
*****************************************

Monday, December 1
------------------ 

8:00-5:00 - Conference/Tutorial Registration
8:30-12:00:  Morning Tutorials
 
Designing Next Generation Clusters with Infiniband: Opportunities 
and Challenges 
    D. Panda (Ohio State University)

Using MPI-2: Advanced Features of the Message Passing Interface 
    W. Gropp, E. Lusk, R. Ross, R. Thakur (Argonne National Lab.)

12:00-1:30 - Lunch 
1:30-5:00 : Afternoon Tutorials
 
The Gridbus Toolkit for Grid and Utility Computing 
    R. Buyya (University of Melbourne)

Building and Managing Clusters with NPACI Rocks 
    G. Bruno, M. Katz, P. Papadopoulos, F. Sacerdoti, 
    NPACI Rocks group at San Diego Supercomputer Center), 
    L. Liew, N. Ninaba (Singapore Computing Systems)

************************
Tuesday, December 2
************************ 

7:00-5:00 Conference Registration 
9:00-9:15 Welcome and Opening Remarks 
9:15-10:15 Keynote 1 (TBA)
10:45-12:15 : Session 1A, 1B, 1C

Session 1A (Room A) : Scheduling I 

Dynamic Scheduling of Parallel Real-time Jobs by Modelling Spare 
Capabilities in Heterogeneous Clusters
   Ligang He, Stephen A. Jarvis, Graham R. Nudd, Daniel P. Spooner 
   (University of Warwick, UK)

Parallel Job Scheduling on Multi-Cluster Computing Systems
  Jemal Abawajy and S. P. Dandamudi (Carleton University, Canada)

Interstitial Computing: Utilizing Spare Cycles on Supercomputers
   Stephen Kleban and Scott Clearwater 
   (Sandia National Laboratories, USA)

Session 1B (Room B) : Applications 

A Cluster-Based Solution for High Performance Hmmpfam Using EARTH 
Execution Model
   Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen, Guang R. Gao
   (University of Delaware, USA)

Computing Large-scale Alignments on a Multi-cluster
   Chunxi Chen and Bertil Schmidt 
   (Nanyang Technological University, Singapore)

Auto-CFD: Efficiently Parallelizing CFD Applications on Clusters 
   Li Xiao (Michigan State University, USA),  
   Xiaodong Zhang (College of WIlliam and Mary, NSF, USA), 
   Zhengqian Kuang, Baiming Feng, Jichang Kang 
   (Northwestern Polytechnic University, China)

Session 1C (Room C)  :  Performance Analysis 

Performance Analysis of a Large-Scale Cosmology Application on 
Three Cluster Systems
  Zhiling Lan and Prathibha Deshikachar 
  (Illinois Institute of Technology, USA)

A Performance Monitor based on Virtual Global Time for Clusters of PCs
   Michela Taufer (UC San Diego, USA), Thomas M. Stricker 
   (ETH Zurich, Switzerland)

A Distributed Performance Analysis Architecture for Clusters
   Holger Brunst, Wolfgang E. Nagel (Dresden University of Technology, 
   Germany), Allen D. Malony (University of Oregon, USA)

12:15-2:00 Lunch 
2:00-3:30 : Session 2A, 2B, 2C

Session 2A (Room A) : Scheduling II 

Coordinated Co-scheduling in time-sharing Clusters through a Generic
Framework
   Saurabh Agarwal (IBM India Research Labs, India), Gyu Sang Choi, 
   Chita R. Das (Pennsylvania State University, USA), Andy B. Yoo 
   (Lawrence Livermore National Laboratory, USA), Shailabh Nagar 
   (IBM T.J. Watson Research Center, USA)

A Robust Scheduling Strategy for Moldable Jobs
  Sudha Srinivasan, Savitha Krishnamoorthy, P. Sadayappan 
  (Ohio State University, USA)

Towards Load Balancing Support for I/O-Intensive Parallel Jobs in a 
Cluster of Workstations
   Xiao Qin, Hong Jiang, Yifeng Zhu, David R. Swanson 
   (University of Nebraska-Lincoln, USA)

Session 2B (Room B) : Java 

JavaSplit: A Runtime for Execution of Monolithic Java Programs on 
Heterogeneous Collections of Commodity Workstations
   Michael Factor (IBM Research Lab in Haifa, Israel), Assaf Schuster, 
   Konstantin Shagin (Israel Institute of Technology, Israel)

Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, 
Myrinet and SCI Clusters
   Guillermo L. Taboada, Juan Touri?o, Ramon Doallo 
   (University of A Coruna, Spain)

Compiler Optimized Remote Method Invocation
   Ronald Veldema and  Michael Philippsen 
   (University of Erlangen-Nuremberg, Germany)

Session 2C (Room C) : Communication I
  
Optimizing Mechanisms for Latency Tolerance in Remote Memory Access 
Communication on Clusters 
   Jarek Nieplocha , V. Tipparaju, M. Krishnan 
   (Pacific Northwest National Laboratory, USA), 
   G. Santhanaraman, D.K. Panda (Ohio State University, USA) 

Impact of Computational Resource Reservation to the Communication 
Performance in the Hypercluster Environment
   Kai Wing Tse and P.K Lun (The Hong Kong Polytechnic University, 
   Hong Kong)

Kernel Implementations of Locality-Aware Dispatching Techniques for 
Web Server Clusters
   Michele Di Santo, Nadia Ranaldo, Eugenio Zimeo 
   (University of Sannio, Italy) 

3:30-4:00 Coffee Break 
4:00-4:30 Invited Talk 1 (Room C) : TBA 
4:30-5:00 Invited Talk 2 (Room C) : TBA 
5:30-7:30 Poster Session (Details Attached at the End)

6:00-7:30 Reception

******************************
Wednesday, December 3 
*******************************

8:30-5:00 Conference Registration 
9:00-10:00 Keynote 2 (Room C) TBA 
10:00-10:30 Coffee Break 
10:30-12:00 Session 3A, 3B, 3C

Session 3A (Room A): Middleware 

OptimalGrid: Middleware for Automatic Deployment of Distributed FEM 
Problems on an Internet-Based Computing Grid
   Tobin Lehman and James Kaufman 
   (IBM Almaden Research Center, USA) 

Adaptive Grid Resource Brokering 
   Abdulla Othman, Peter Dew, Karim Djemame, Iain Gourlay 
   (University of Leeds, UK)

HPCM: A Pre-compiler Aided Middleware for the Mobility of Legacy Code
   Cong Du, Xian-He Sun, Kasidit Chanchio 
   (Illinois Institute of Technology, USA)

Session 3B (Room B) : Cluster/Job Management I

The Process Management Component of a Scalable Systems Software
Environment
   Ralph Butler (Middle Tennessee State University, USA), 
   Narayan Desai, Andrew Lusk, Ewing Lusk 
   (Argonne National Laboratory,USA)

Load Distribution for Heterogeneous and Non-Dedicated Clusters Based 
on Dynamic Monitoring and Differentiated Services
   Liria Sato (University of  Sao Paulo, Brazil), 
   Hermes Senger(Catholic University of Santos, Brazil)

GridRM: An Extensible Resource Monitoring System
   Mark Baker and Garry Smith 
   (University of Portsmouth, UK)

Session 3C (Room C) : I/O I

A High Performance Redundancy Scheme for Cluster File Systems
   Manoj Pillai and Mario Lauria 
   (Ohio State University, USA)

VegaFS: A Prototype for File-sharing Crossing Multiple 
Administrative Domains
   Wei Li, Jianmin Liang, Zhiwei Xu 
   (Chinese Academy of Sciences, China)

Design and Performance of the Dawning Cluster File System
   Jin Xiong, Sining Wu, Dan Men, Ninghui Sun, Guojie Li 
   (Chinese Academy of Sciences, China)


12:00-1:30 Lunch 
1:30-3:00 Session 4A, 4B, Vender Talk 1

Session 4A (Room A) Novel Systems 

Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
   Aur?lien Bouteiller, Lemarinier, Krawezik, Cappello 
   (Universit? de Paris Sud, France)

A Performance Comparison of Linux and a Lightweight Kernel
   Ron Brightwell, Rolf Riesen, Keith Underwood 
  (Sandia National Laboratories, USA), Trammell B. Hudson 
  (Operating Systems Research, Inc.), 
  Patrick Bridges, Arthur B. Maccabe 
  (University of New Mexico, USA)

Implications of a PIM Architectural Model for MPI
   Arun Rodrigues, Richard Murphy, Peter Kogge, Jay Brockman 
  (University of Notre Dame, USA), Ron Brightwell, Keith Underwood 
  (Sandia National Laboratories, USA)

Session 4B (Room B) Cluster/Job Management II 

Reusable Mobile Agents for Cluster Computing
   Ichiro Satoh (National Institute of Informatics, Japan)

High Service Reliability For Cluster Server Systems
   M. Mat Deris, M.Rabiei, A. Noraziah, H.M. Suzuri 
   (University College of Science and Technology, Malaysia)

Wide Area Cluster Monitoring with Ganglia
   Federico D. Sacerdoti, Mason J. Katz 
   (San Diego Supercomputing Center, USA), 
   Matthew L. Massie, David E. Culler (UC Berkeley, USA)

Vender Talk 1 (Room C)

3:00-3:30 Coffee Break 
3:30-5:00 Panel Discussion 
6:30-8:30 Banquet Dinner (Ballroom, Conference Hotel)

****************************
Thursday, December 4
****************************
 
8:30-5:00 Conference Registration 

Special Technical Session : Dec. 4  (9am - 4:30pm)

Grid Demo - Life Demonstrations of Grid Technologies and Applications
Session Chairs: Peter Kacsuk (MTA SZTAKI Research Institute, Hungary), 
                Rajkumar Buyya (University of Melbourne, Australia)


9:00-10:00 Keynote 3 (Room C)  
10:00-10:30 Coffee Break 
10:30-12:00 Vender Talk 2, 5B, 5C

Vender Talk 2 (Room A) 

Session 5B (Room B) : Novel Software  

Efficient Parallel Out-of-core Matrix Transposition 
   Sriram Krishnamoorthy, Gerald Baumgartner, Daniel Cociorva, 
   Chi-Chung Lam, P Sadayappan (Ohio State University, USA)

A Case Study of Parallel I/O for Biological Sequence Search on Linux
Clusters
   Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson 
   (University of Nebraska-Lincoln, USA)

CTFS: A New Light-weight, Cooperative Temporary File System for 
Cluster-based Web Server
   Jun Wang (University of Nebraska-Lincoln, USA)

Session 5C (Room C) I/O II

Efficient Structured Data Access in Parallel File Systems
  Avery Ching, Alok Choudhary, Wei-keng Liao 
  (Northwestern University, USA), Robert Ross, William Gropp 
  (Argonne National Laboratory, USA)

View I/O: Improving the Performance of Non-contiguous I/O
   Florin Isaila and Walter F. Tichy (University of Karlsruhe, Germany)

Supporting Efficient Noncontiguous Access in PVFS over InfiniBand
   Jiesheng Wu (Ohio State University), Pete Wyckoff (Ohio Supercomputer 
   Center, USA), D.K. Panda (Ohio State University, USA)

12:00-2:00 Lunch 
2:00-2:30 Invited Talk 3 (Room C) 
2:30-3:00 Invited Talk 4 (Room C)
3:00-3:30 Coffee Break 
3:30-5:00 : Session 6A, 6B, 6C

Session 6A (Room A) : Scheduling III 

A General Self-adaptive Task Scheduling System for Non-dedicated 
Heterogeneous Computing 
   Ming Wu and Xian-He Sun (Illinois Institute of Technology, USA)

Adding Memory Resource Consideration into Workload Distribution for 
Software DSM Systems
   Yen-Tso Liu, Ce-Kuen Shieh (National Chung Kung University, Taiwan), 
   Tyng-Yeu Liang (National Kaohsiung University of Applied Sciences,
   Taiwan)

An Energy-Based Implicit Co-scheduling Model for Beowulf Cluster
   Somsak Sriprayoonsakul and Putchong Uthayopas 
   (Kasetsart University, Thailand)

Session 6B (Room B) : High Availability  
 
Availability Prediction and Modeling of High Availability OSCAR Cluster
   Lixin Shen, Chokchai Leangsuksun, Tong Liu, Hertong Song 
   (Louisiana Tech University, USA), Stephen L. Scott 
   (Oak Ridge National Laboratory, USA)

A System Recovery Benchmark for Clusters
   Ira Pramanick, James Mauro, Ji Zhu (Sun Microsystems, Inc., USA)

Performance Evaluation of Routing Algorithms in RHiNET-2 Cluster
   Michihiro Koibuchi, Konosuke Watanabe, Kenichi Kono, Akiya Jouraku,
   Hideharu Amano (Keio University, Japan)

Session 6C (Room C) : Communications II
 
Application-Bypass Reduction for Large-Scale Clusters
   Adam Wagner, Darius Buntias, D.K. Panda (Ohio State University, USA), 
   Ron Brightwell (Sandia National Laboratories, USA)

Improving the Performance of MPI Derived Datatypes by Optimizing 
Memory-Access Cost
   Surendra Byna (Illinois Institute of Technology, USA), 
   William Gropp (Argonne National Laboratory, USA), 
   Xian-He Sun (Illinois Institute of Technology, USA), 
   Rajeev Thakur (Argonne National Laboratory, USA)

Shared Memory Mirroring for Reducing Communication Overhead on 
Commodity Networks
   Jarek Nieplocha, B. Palmer, E. Apra 
   (Pacific Northwest National Laboratory, USA)

*************************************
5:00 : End of the Conference
*************************************
-------------------------------------------------------------------
Poster Session/Short Papers 

"Plug-and-Play" Cluster Computing using Mac OS X
   Dean Dauger (Dauger Research, Inc.) and 
   Viktor K. Decyk (UC Los Angeles, USA)

Improving Performance of a Dynamic Load Balancing System by Using 
Number of Effective Tasks
   Min Choi, Jung-Lok Yu, Seung-Ryoul Maeng 
   (Korea Advanced Institute of Science and Technology, Korea)

Dynamic Self-Adaptive Replica Location Method in Data Grids
   Dongsheng Li, Nong Xiao, Xicheng Lu, Kai Lu, Yijie Wang 
   (National University of Defense Technology, China)

Efficient I/O Caching in Data Grid and Cluster Management 
   Song Jiang (College of William and Mary, USA), 
   Xiaodong Zhang (National Science Foundation, USA)

Optimized Implementation of Extendible Hashing to Support Large 
File System Directory
   Rongfeng Tang, Dan Mend, Sining Wu 
  (Chinese Academy of Sciences, China) 

Parallel Design Pattern for Computational Biology and Scientific 
Computing Applications
   Weiguo Liu and Bertil Schmidt 
   (Nanyang Technological University, Singapore)

FJM: A High Performance Java Message Library
   Tsun-Yu Hsiao, Ming-Chun Cheng, Hsin-Ta Chiao, Shyan-Ming Yuan 
   (National Chiao Tung University, Taiwan)

Cluster Architecture with Lightweighted Redundant TCP Stacks
   Hai Jin and Zhiyuan Shao 
   (Huazhong University of Science and Technology, China)

>From Clusters to the Fabric: The Job Management Perspective
  Thomas R?oblitz, Florian Schintke, Alexander Reinefeld 
  (Zuse Institute Berlin, Germany)

Towards an Efficient Cluster-based E-Commerce Server
   Victoria Ungureanu, Benjamin Melamed, Michael Katehakis 
   (Rutgers University, USA)

A Kernel Running in a DSM - Design Aspects of a Distributed 
Operating System
   Ralph Goeckelmann, Michael Schoettner, Stefan Frenz, 
   Peter Schulthess (University of Ulm, Germany)

Distributed Recursive Sets: Programmability and Effectiveness for 
Data Intensive Applications
   Roxana Diaconescu (UC Irvine, USA)

Run-Time Prediction of Parallel Applications on Shared Environment
   Byoung-Dai Lee (University of Minnesota, USA), 
  Jennifer M. Schopf (Argonne National Laboratory, USA)

An Instance-Oriented Security Mechanism in Grid-based Mobile Agent
System
   Tianchi Ma and  Shanping Li (Zhejiang University, China)

A Hierarchical and Distributed Approach for Mapping Large Applications 
to Heterogeneous Grids using Genetic Algorithms 
   Soumya Sanyal, Amit Jain, Sajal Das 
   (University of Texas at Arlington, USA), 
   Rupak Biswas (NASA Ames Research Center, USA)

BCFG: A Configuration Management Tool for Heterogeneous Clusters
   Narayan Desai, Andrew Lusk, Rick Bradshaw, Remy Evard 
  (Argonne National Laboratory, USA)

Communication Middleware Systems for Heterogenous Clusters: 
A Comparative Study
   Daniel Balkanski, Mario Trams, Wolfgang Rehm 
   (Technische Universita Chemnitz, Germany)

QoS-Aware Adaptive Resource Management in Distributed Multimedia 
System Using Server Clusters
   Mohammad Riaz Moghal, Mohammad Saleem Mian 
   (University of Engineering and Technology, Pakistan)  

On the InfiniBand Subnet Discovery Process
   Aurelio Berm?dez, Rafael Casado, Francisco J. Quiles 
   (Universidad de Castilla-La Mancha, Spain), 
   Timothy M. Pinkston (University of Southern California, USA), 
   Jos? Duato (Universidad Polit?cnica de Valencia, Spain)

--------------------------------------------------------------

Chairs/Committees 

General Co-Chairs
Jack Dongarra (University of Tennessee)
Lionel Ni (Hong Kong University of Science and Technology)

General Vice Chair
Francis C.M. Lau (The University of Hong Kong)

Program Co-Chairs
Daniel S. Katz (Jet Propulsion Laboratory)
Cho-Li Wang (The University of Hong Kong)

Program Vice Chairs
Bill Gropp (Argonne National Laboratory) -- Middleware
Wolfgang Rehm (Technische Universit?t Chemnitz) -- Hardware
Zhiwei Xu (Chinese Academy of Sciences, China) -- Applications

Tutorials Chair
Ira Pramanick (Sun Microsystems)

Workshops Chair
Jiannong Cao (Hong Kong Polytechnic University)

Exhibits/Sponsors Chairs
Jim Ang (Sandia National Lab)
Nam Ng (The University of Hong Kong)

Publications Chair
Rajkumar Buyya (The University of Melbourne)

Publicity Chair
Arthur B. Maccabe (The University of New Mexico)

Poster Chair
Putchong Uthayopas (Kasetsart University)

Finance/Registration Chair
Alvin Chan (Hong Kong Polytechnic University)

Local Arrangements Chair
Anthony T.C. Tam (The University of Hong Kong)

Programme Committee

David Abramson (Monash U., Australia) 
Gabrielle Allen (Albert Einstein Institute, Germany) 
David A. Bader (U. of New Mexico, USA) 
Mark Baker (U. of Portsmouth, UK) 
Ron Brightwell (Sandia National Laboratory USA) 
Rajkumar Buyya (U. of Melbourne, Australia)
Giovanni Chiola (Universita' di Genova Genova, Italy) 
Sang-Hwa Chung (Pusan National U., Korea) 
Toni Cortes (Universitat Politecnica de Catalunya, Spain) 
Al Geist (Oak Ridge National Laboratory, USA) 
Patrick Geoffray (Myricom Inc., USA) 
Yutaka Ishikawa (U. of Tokyo, Japan) 
Chung-Ta King (National Tsing Hua U., Taiwan) 
Tomohiro Kudoh (AIST, Japan)
Ewing Lusk (Argonne National Laboratory, USA) 
Jens Mache (Lewis and Clark College, USA) 
Phillip Merkey (Michigan Tech U., USA) 
Matt Mutka (Michigan State U., USA) 
Charles D. Norton (JPL, California Institute of Technology, USA) 
D.K. Panda (Ohio State U., USA) 
Philip Papadopoulos (UC San Diego, USA) 
Myong-Soon Park (Korea U., Korea) 
Neil Pundit (Sandia National Laboratory, USA) 
Thomas Rauber (U. Bayreuth, Germany) 
Alexander Reinefeld (ZIB, Germany) 
Rob Ross (Argonne National Laboratory, USA) 
Gudula Ruenger (Chemnitz U. of Technology, Germany) 
Jennifer Schopf (Argonne National Laboratory, USA) 
Peter Sloot (U. of Amsterdam, Netherlands) 
Thomas Stricker (Institut fur Computersysteme, Switzerland) 
Ninghui Sun (Chinese Academy of Sciences, China) 
Xian-He Sun (Illinois Institute of Technology, USA) 
Rajeev Thakur (Argonne National Laboratory, USA) 
Putchong Uthayopas (Kasetsart U., Thailand) 
David Walker (U. of Wales Cardiff, UK) 
Xiaodong Zhang (NSF, USA)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Florent.Calvayrac at univ-lemans.fr  Mon Oct  6 11:54:09 2003
From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac)
Date: Mon, 06 Oct 2003 17:54:09 +0200
Subject: undefined references to pthread related calls
In-Reply-To: <3F817693.9040001@larc.nasa.gov>
References: <3F817693.9040001@larc.nasa.gov>
Message-ID: <3F819021.2050605@univ-lemans.fr>

Jeffery A. White wrote:
> Hi group,
> 
>  I have a user of my software (a f90 based CFD code using mpich) that is 
> haveing trouble installing my code on
> their system. They are using mpich and the Intel version 7.1 ifc 
> compiler. The problem occurs at the link step.
> They are getting undefined references to what appear to be system calls 
> to pthread related functions such as
> pthread_self, pthread_equal, pthread_mutex_lock. Does any one else 
> encountered and know how to fix this problem?
> 
> Thanks,
> 
> Jeff
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
> 


is the compiler installed on a Redhat 8.0 ?

Besides, maybe they use OpenMP/HPF directives and options
  which can mess up things and are usually useless on
a cluster with one CPU per node.


-- 
Florent Calvayrac                          | Tel : 02 43 83 26 26
Laboratoire de Physique de l'Etat Condense | Fax : 02 43 83 35 18
UMR-CNRS 6087         | http://www.univ-lemans.fr/~fcalvay
Universite du Maine-Faculte des Sciences   |
72085 Le Mans Cedex 9

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Oct  6 12:56:37 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 6 Oct 2003 12:56:37 -0400 (EDT)
Subject: Help: About Intel Fortran Compiler:
In-Reply-To: <Pine.LNX.4.53.0310061010350.3375@merlino.mi.infn.it>
Message-ID: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>

On Mon, 6 Oct 2003, Franz Marini wrote:

> On Sat, 4 Oct 2003, Ao Jiang wrote:
> 
> > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
> 
> Try with :
> 
>   ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90
> 
> Btw, a cleaner way to compile mpi programs is to use the mpif90 
> (mpif77 for fortran77) command (which is a wrapper for the real
> compiler).

Acckkk!!

This is one of the horribly broken things about most MPI
implementations.  It's not reasonable to say
    "to use this library you must use our magic compile script"
A MPI library should be just that -- a library conforming to system
standards.  You should be able to link it with just "-lmpi".

Most of the Fortran underscore issues may be hidden from the user with
weak linker aliases.

Similarly, it's not reasonable to say
    "to run this program, you must use our magic script"
You should be able to just run the program, by name, in the usual way.

Our BeoMPI implementation demonstrated how to do it right many years
ago, and we provided the code back to the community.  Many people on
this list seem to take the attitude "I've already learned the crufty
way, therefore the improvements don't matter."

One element of a high-quality library is ease of use, and in the long
run that matters more than a few percent faster for a specific function
call. 


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glen at mail.cert.ucr.edu  Mon Oct  6 13:24:57 2003
From: glen at mail.cert.ucr.edu (Glen Kaukola)
Date: Mon, 06 Oct 2003 10:24:57 -0700
Subject: Intel compilers and libraries
In-Reply-To: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
Message-ID: <3F81A569.80805@cert.ucr.edu>

Jos? M. P?rez S?nchez wrote:

>Hello:
>
>We are thinking about purchasing the Intel C++ compiler for linux,
>mainly for getting the most of our harware (Xeon 2.4Gz processors), we
>are also interested in the Intel MKL (Math Kernel Library), I would like
>to know if the performance gain using Intel compiler+libraries, which exploit
>SSE2 and make other optimizations for P4/Xeon, are as good as Intel
>claims, anyone in the list using those products?
>

You realize that there's a free version of the Intel compiler for Linux 
right?  Anyway, our experience with their Fortran compiler has been that 
it's roughly on par with the Portland Group's compiler.  However, if 
Pentium 4 optimizations are turned on, the code produced by the Intel 
compiler runs just a little bit faster.

Glen

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Wolfgang.Dobler at kis.uni-freiburg.de  Mon Oct  6 12:50:06 2003
From: Wolfgang.Dobler at kis.uni-freiburg.de (Wolfgang Dobler)
Date: Mon, 6 Oct 2003 18:50:06 +0200
Subject: Beowulf digest, Vol 1 #1482 - 2 msgs
In-Reply-To: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com>
References: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com>
Message-ID: <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de>

Hi Ao,

 >   I tried to compile a Fortran 90 MPI program by
 > the Intel Frotran Compiler in the OSCAR cluster.
 >   I run the command:
 >     ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
 >
 >  The system failed to compile it and gave me the following information:
 >   3228 Lines Compiled
 >   /tmp/ifcVao851.o(.text+0x5a): In function `main':
 >   : undefined reference to `mpi_init_'

 [...]

 > I wonder
 > 1. why the errors happen?
 > 2. Is the problem of cluster or the Intel compiler?

Looks like the infamous underscore problem. You have a library (libmpi.so
or libmpi.a) that has been built using the GNU F77 compiler without the
option `-fno-second-underscore' and accordingly the MPI symbols are called
`mpi_init__', not `mpi_init_', etc.

But the Intel compiler (and all other non-G77 compilers) expects a symbol
with only one underscore appended ( `mpi_init_'), but that one is not in
the library.

 > 3. How I can solve it.

The way out is to either rebuild the library, compiling with `g77
-fno-second-underscore' or with the Intel compiler, or (the less elegant
choice) to refer to the MPI functions with one underscore in you F90 code:
  call MPI_INIT_(ierr)


There is one related question I want to ask the ld-specialists on the list:

On some machines libraries like MPICH contain all symbol names with both
underscore conventions, i.e. `mpi_init__', and `mpi_init_' at the same
time. Does anybody know whether there are easy ways of building such a
library? Is there something like `symbol aliases' and how would one create
these when generating the library?
 

W o l f g a n g
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From j.a.white at larc.nasa.gov  Mon Oct  6 12:56:56 2003
From: j.a.white at larc.nasa.gov (Jeffery A. White)
Date: Mon, 06 Oct 2003 12:56:56 -0400
Subject: undefined references to pthread related calls
Message-ID: <3F819ED8.9020002@larc.nasa.gov>

Group,

   Thanks for your responses. Turns out that the problem appears to be 
an incompatiblilty between ifc 7.1 and the glibc version
in the version of RH 8.0 being used. The RH 8.0 being used had some 
patches that updated glibc. I was able to fix it by removing
the -static option when compling with ifc. I have tested this with a 
patch free version of 8.0 and I don't see the problem wit or without
the -static option specified. At runtime my code does not use any calls 
that seem to access pthread related system routines. I am
guessing  that by deferring reolution of the link until runtime I have 
bypassed the problem. Obviously if I did use routines that
needed pthread related code I would still have a problem so this isn't a 
general fix.

Jeff


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Oct  6 13:29:09 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 6 Oct 2003 13:29:09 -0400 (EDT)
Subject: Filesystem question (sort of newbie)
In-Reply-To: <Pine.LNX.4.44.0310030928070.23609-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310061312330.4514-100000@training.scyld.com>

On Fri, 3 Oct 2003, Mark Hahn wrote:

> > 8 or but I haven't tested it), but it boots into a small Ram
> > Disk (about 70 megs depending upon what you need on the

For the Scyld Beowulf system we developed a more sophisticated "diskless
administrative" approach that has better scaling and more predictable
performance.  We cache executable objects, libraries and executables,
using an method that works unchanged with either Ramdisk (==tmpfs) or
local disk cache.

Keep in mind that this is just one element of making a cluster system
scalable and easy to manage.  Using a workstation-oriented distribution
as the software base for compute nodes means generating many different
kinds of configuration files, and dealing with the scheduling impact of
the various daemons.

> alternately, it's almost trivial to PXE boot nodes, mount a simple
> root FS from a server/master, and use the local disk, if any, for 
> swap and/or tmp.  one nice thing about this is that you can do it
> with any distribution you like - mine's RH8, for instance.

The obvious problems are configuration, scaling and update consistency
issues.

> personally, I prefer the nfs-root approach, probably because once
> you boot, you won't be wasting any ram with boot-only files.

They are trivial to get rid of either by explicitly erasing or switching
to a new ramdisk (e.g. our old stage 3) when initialization completes.

> for a cluster of 48 nodes, there seems to be no drawback;
> for a much larger cluster, I expect all the boot-time traffic 
> would be crippling, and you might want to use some kind of 
> multicast to distribute a ramdisk image just once...

Multicast bulk data transfer was a good idea back when we had Ethernet
repeaters.  Today it should only be used for service discovery and
low-rate status updates.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Mon Oct  6 14:02:39 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Mon, 06 Oct 2003 14:02:39 -0400
Subject: Help: About Intel Fortran Compiler:
In-Reply-To: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>
References: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>
Message-ID: <3F81AE3F.9050202@lmco.com>

Donald Becker wrote:

> On Mon, 6 Oct 2003, Franz Marini wrote:
>
> > On Sat, 4 Oct 2003, Ao Jiang wrote:
> >
> > > ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 
> p_fdtd3dwg3_pml.f90
> >
> > Try with :
> >
> >   ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90
> >
> > Btw, a cleaner way to compile mpi programs is to use the mpif90
> > (mpif77 for fortran77) command (which is a wrapper for the real
> > compiler).
>
> Acckkk!!
>
> This is one of the horribly broken things about most MPI
> implementations.  It's not reasonable to say
>     "to use this library you must use our magic compile script"
> A MPI library should be just that -- a library conforming to system
> standards.  You should be able to link it with just "-lmpi".
>

I don't like the mpi compiler helper scripts much either. I just
want a simple makefile or a list of the libraries to link in
in the correct order. I usually end up reading the helper scripts
and pulling out the library order and putting it in my makefiles
anyway (no offense to anyone).

However, in defense of the different MPI implementations,
they have somewhat different philosophies on how to get
the best performance and ease of use. Sometimes this involves
other libraries. Just telling the user to add '-lmpi' to the
end of their link command may not tell them everything
(e.g. they may need to add the pthreads library, or libdl or
whatever).

> One element of a high-quality library is ease of use, and in the long
> run that matters more than a few percent faster for a specific function
> call.
>

One piece of data. While we haven't looked at specific MPI
calls, we have noticed up to about a 30% difference in wall
clock time with our codes between the various MPI
implementations using the same system (same nodes, same
code, same input, same network, same nodes, etc.). I'm all for
that kind of performance boost even if it's a little more
cumbersome to compile/link/run (although one's mileage may
vary depending upon the code)

Jeff


-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From franz.marini at mi.infn.it  Mon Oct  6 14:55:57 2003
From: franz.marini at mi.infn.it (Franz Marini)
Date: Mon, 6 Oct 2003 20:55:57 +0200 (CEST)
Subject: Help: About Intel Fortran Compiler:
In-Reply-To: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>
References: <Pine.LNX.4.44.0310061234470.4514-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.53.0310062046470.9773@merlino.mi.infn.it>

On Mon, 6 Oct 2003, Donald Becker wrote:

> Acckkk!!

Ok, I shouldn't have said "a cleaner way".
I don't usually use mpif77 (or f90) to compile programs requiring mpi 
libs, in fact.

I prefer to explicitly tell the compiler which library I want, and where 
to find them. Btw, this is much simpler and faster if you have multiple 
versions/releases of the same library.

Anyway, clean or not, elegant or not, mpif77 should (and I say should) 
work. 

Btw, I still can't understand why the hell each fortran compiler uses a 
different way to treat underscores. This, and another thousands of reasons 
make me hate fortran. (erm, please, this is a *personal* pov, let's not 
start another flame/discussion on the fortran vs <insert language of 
choice> issue ;)).


Have a nice day,

F.


---------------------------------------------------------
Franz Marini
Sys Admin and Software Analyst,
Dept. of Physics, University of Milan, Italy.
email : franz.marini at mi.infn.it
phone : +39 02 50317221
--------------------------------------------------------- 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Mon Oct  6 17:18:07 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Mon, 6 Oct 2003 14:18:07 -0700
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
Message-ID: <20031006211806.GB2091@greglaptop.internal.keyresearch.com>

On Mon, Oct 06, 2003 at 02:54:43PM +0000, C J Kenneth Tan -- Heuchera Technologies wrote:

> Pardon me for some advertising here, but our OptimaNumerics Linear
> Algebra Library can very significantly outperform Intel MKL.

Kenneth,

Welcome to the beowulf mailing list. Here are some helpful suggestions:

1) Don't top post. Answer postings like I do here, by quoting the
relevant part of the posting you're replying to.

2) Don't include an 8-line confidentiality notice in a posting to a public,
archived mailing list, distributed all over the world.

3) Marketing slogans and paragraphs with several !s don't work so well
here. More sophisticated customers aren't drawn by a claim of a 32x
performance advantage without knowing what is being measured. Is it a
100x100 matrix LU decomposition? Well, no, because Intel's MKL and the
free ATLAS library run at a respectable % of peak. Is it on a 1000
point FFT? Well, no, because the free FFTW library runs at a
respectable % of peak on that.

4) Put your performance whitepapers on your website, or it looks
fishy. I looked and didn't see a single performance claim there.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Mon Oct  6 18:41:34 2003
From: ds10025 at cam.ac.uk (D. Scott)
Date: 06 Oct 2003 23:41:34 +0100
Subject: Root-nfs error 13 while mounting
Message-ID: <E1A6e2o-0008BI-L4@maroon.csi.cam.ac.uk>

Evening

I'm getting error 13 when my diskless client try to mount file system. Hoe 
best to resolved this error 13?


Dan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Mon Oct  6 23:55:13 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Mon, 6 Oct 2003 23:55:13 -0400 (EDT)
Subject: Root-nfs error 13 while mounting
In-Reply-To: <E1A6e2o-0008BI-L4@maroon.csi.cam.ac.uk>
Message-ID: <Pine.LNX.4.44.0310062353350.19472-100000@coffee.psychology.mcmaster.ca>

> I'm getting error 13 when my diskless client try to mount file system. Hoe 
> best to resolved this error 13?

it's best resolved by translating it to text: EACCESS or "permission denied".
I'm guessing you should look at the logs on your fileserver, 
since it seems to be rejecting your clients.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Tue Oct  7 05:17:36 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Tue, 7 Oct 2003 11:17:36 +0200
Subject: weak symbols [Re: Beowulf digest, Vol 1 #1482 - 2 msgs]
In-Reply-To: <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de>
References: <200310051602.h95G2XV14166@NewBlue.Scyld.com.scyld.com> <16257.40254.203820.676508@cincinnatus.kis.uni-freiburg.de>
Message-ID: <200310071117.36507.joachim@ccrl-nece.de>

Wolfgang Dobler:
> On some machines libraries like MPICH contain all symbol names with both
> underscore conventions, i.e. `mpi_init__', and `mpi_init_' at the same
> time. Does anybody know whether there are easy ways of building such a
> library? Is there something like `symbol aliases' and how would one create
> these when generating the library?

Yes, most linkers support "weak symbols" in one way or another (there is no 
common way, usually a pragma or "function attributes" (gcc) are used) which 
supply all required API symbols for the one real implemented function. Just 
take a look at a source file like mpich/src/pt2pt/send.c to see how this can 
be done (some preprocessing "magic").

It can also be done w/o weak symbols at the cost of a slightly bigger library.

  Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcownie at etnus.com  Tue Oct  7 07:07:19 2003
From: jcownie at etnus.com (James Cownie)
Date: Tue, 07 Oct 2003 12:07:19 +0100
Subject: more on structural models for clusters
Message-ID: <1A6pgV-16F-00@etnus.com>

Jim,

> Your average Dell isn't suited to inclusion as a MCU core in an ASIC
> at each node and would cost more than $10/node...  I'm looking at
> Z80/6502/low end DSP kinds of computational capability in a mesh
> containing, say, 100,000 nodes.

Have you seen this gizmo ? (It's just so cute I had to pass it on :-)

  http://www.lantronix.com/products/eds/xport/

It's a 48MHz x86 with 256KB of SRAM and 512KB of flash, a 10/100Mb
ethernet interface an RS232 and three bits of digital I/O and it all
fits _inside_ an RJ45 socket.

It comes loaded up with a web server and so on.

It's on sale here in the UK for GBP 39 + VAT one off, so should come
down somewhere near the price you mention above for your 100,000 off
in the US.

(It might also be useful to the folks who want to build their own
environmental monitoring. Couple one of these up to the serial
interconnect on a temperature monitoring button and you'd immediately
be able to access it from the net).

-- Jim

James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathiasbrito at yahoo.com.br  Tue Oct  7 08:18:16 2003
From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=)
Date: Tue, 7 Oct 2003 09:18:16 -0300 (ART)
Subject: Tools for debuging
Message-ID: <20031007121816.67790.qmail@web12208.mail.yahoo.com>

I'm having problems with a prograa, and i really need
a tool for debug it. There's specific debugers for mpi
programas, if have more than one, what is the best
choice?

Thanks


=====
Mathias Brito
Universidade Estadual de Santa Cruz - UESC
Departamento de Ci?ncias Exatas e Tecnol?gicas
Estudante do Curso de Ci?ncia da Computa??o

Yahoo! Mail - o melhor webmail do Brasil
http://mail.yahoo.com.br
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nican at nsc.liu.se  Tue Oct  7 07:50:26 2003
From: nican at nsc.liu.se (Niclas Andersson)
Date: Tue,  7 Oct 2003 13:50:26 +0200 (CEST)
Subject: CALL FOR PARTICIPATION: Workshop on Linux Clusters for Super Computing
Message-ID: <bulk.21206.20031007134738@papput.nsc.liu.se>

CALL FOR PARTICIPATION

================================================================
4th Annual Workshop on Linux Clusters For Super Computing (LCSC)
Clusters for High Performance Computing and Grid Solutions

22-24 October, 2003 
Hosted by National Supercomputer Centre (NSC)
Link?ping University, SWEDEN
================================================================

The programme is in its final state. The workshop is brimful of
knowledgeable speakers giving exciting talks about Linux clusters,
grids and distributed applications requiring vast computational
resources. Just a few samples:

- Keynote: Andrew Grimshaw, University of Virginia and CTO of Avaki Inc. 

- Comparisons of Linux clusters with the Red Storm MPP
  William J. Camp, Project Leader of Red Storm,
  Sandia National Laboratories

- The EGEE project: building a grid infrastructure for Europe
  Bob Jones, EGEE Technical Director, CERN 

- Linux on modern NUMA architectures
  Jes Sorensen, Wild Open Source Inc.

- The AMANDA Neutrino Telescope                 
  Stephan Hundertmark, Stockholm University

and many more.

In addition to invited speakers there will be vendor presentations,
exhibitions and tutorials.

Last date for registration: October 10.  

For more information and registration: http://www.nsc.liu.se/lcsc
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From keith.murphy at attglobal.net  Tue Oct  7 11:28:50 2003
From: keith.murphy at attglobal.net (Keith Murphy)
Date: Tue, 7 Oct 2003 08:28:50 -0700
Subject: Tools for debuging
References: <20031007121816.67790.qmail@web12208.mail.yahoo.com>
Message-ID: <025701c38ce7$b5b64060$02fea8c0@oemcomputer>

Check out Etnus's Totalview parallel debugger
www.etnus.com


Keith Murphy
Dolphin Interconnect
C: 818-292-5100
T: 818-597-2114
F: 818-597-2119
www.dolphinics.com


----- Original Message ----- 
From: "Mathias Brito" <mathiasbrito at yahoo.com.br>
To: <beowulf at beowulf.org>
Sent: Tuesday, October 07, 2003 5:18 AM
Subject: Tools for debuging


> I'm having problems with a prograa, and i really need
> a tool for debug it. There's specific debugers for mpi
> programas, if have more than one, what is the best
> choice?
>
> Thanks
>
>
> =====
> Mathias Brito
> Universidade Estadual de Santa Cruz - UESC
> Departamento de Ci?ncias Exatas e Tecnol?gicas
> Estudante do Curso de Ci?ncia da Computa??o
>
> Yahoo! Mail - o melhor webmail do Brasil
> http://mail.yahoo.com.br
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rokrau at yahoo.com  Tue Oct  7 13:30:19 2003
From: rokrau at yahoo.com (Roland Krause)
Date: Tue, 7 Oct 2003 10:30:19 -0700 (PDT)
Subject: undefined references to pthread related calls
In-Reply-To: <200310071605.h97G5FV13188@NewBlue.Scyld.com.scyld.com>
Message-ID: <20031007173019.8519.qmail@web40010.mail.yahoo.com>

--- beowulf-request at scyld.com wrote:
>    2. undefined references to pthread related calls (Jeffery A.
> White)

FYI,
Intel has released a version of their compiler that fixes the link
problem for applications that use OpenMP. Intel Fortran now supports
glibc-2.3.2 which is used in RH-9 and Suse-8.2. The old compatibility
hacks have become obsolete at least. 

I hear Intel-8 is in beta, anyone have experience with it?

Roland


> Subject: undefined references to pthread related calls
> 
> Group,
> 
>    Thanks for your responses. Turns out that the problem appears to
> be 
> an incompatiblilty between ifc 7.1 and the glibc version
> in the version of RH 8.0 being used. The RH 8.0 being used had some 
> patches that updated glibc. I was able to fix it by removing
> the -static option when compling with ifc. I have tested this with a 
> patch free version of 8.0 and I don't see the problem wit or without
> the -static option specified. At runtime my code does not use any
> calls 
> that seem to access pthread related system routines. I am
> guessing  that by deferring reolution of the link until runtime I
> have 
> bypassed the problem. Obviously if I did use routines that
> needed pthread related code I would still have a problem so this
> isn't a 
> general fix.
> 
> Jeff
> 
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Tue Oct  7 15:28:42 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Tue, 07 Oct 2003 15:28:42 -0400
Subject: updated cluster finishing script system
Message-ID: <1065554922.32374.47.camel@protein.scalableinformatics.com>

Folks:

  I updated my cluster finishing script package.  This package allows
you to perform post-installation configuration changes (e.g. finishing)
for an RPM based cluster which maintains image state on local disks.  It
used to be specialized to the ROCKS distribution, but it has evolved
significantly and should work with generalized RPM based distributions.

  Major changes:

1) No RPMs are distributed (this is a good thing, read on)

2) a build script generates customized RPMs for you after asking you 4
questions.

(please, no jokes about unladen swallows, neither European nor
African...)

   These RPMs allow you to customize the finishing server and the
finishing script client as you require for your task.  This includes
choosing the server's IP address (used to be hard-coded to 10.1.1.1),
the server's export directory (used to be hard-coded to /opt/finishing),
the cluster's network (used to be hard-coded to 10.0.0.0), and the
cluster's netmask (used to be hard-coded to 255.0.0.0).

3) Documentation (see below)

Have a look at http://scalableinformatics.com/finishing/ for more
details, including new/better instructions.  It is licensed under the
GPL for end users.  Contact us offline if you want to talk about
redistribution licenses.

Joe

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Oct  7 19:50:38 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 8 Oct 2003 09:50:38 +1000
Subject: updated cluster finishing script system
In-Reply-To: <1065554922.32374.47.camel@protein.scalableinformatics.com>
References: <1065554922.32374.47.camel@protein.scalableinformatics.com>
Message-ID: <200310080950.41343.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 8 Oct 2003 05:28 am, Joseph Landman wrote:

> It is licensed under the GPL for end users.  Contact us offline if you want
> to talk about redistribution licenses.

Err, if it's licensed under the GPL then the "end users" who receive it under 
that license can redistribute it themselves under the GPL.  Part 6 of the GPL 
v2 says:

[quote]
  6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the original 
licensor to copy, distribute or modify the Program subject to these terms and 
conditions.  You may not impose any further restrictions on the recipients' 
exercise of the rights granted herein. [...]
[/quote]

Of course as the copyright holder you could also do dual licensing, so I guess 
this is what you mean - correct ?

But whichever it is, once you have released something under the GPL you cannot 
prevent others from redistributing it under the GPL themselves.

cheers,
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/g1FOO2KABBYQAh8RAu1oAJ0fLlcljVYwXj7xgnkjGFyNaoWOFwCfWM/r
IC1/xPLO2ePGM2zlJF2ZHK8=
=HOnr
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ajiang at mail.eecis.udel.edu  Tue Oct  7 21:58:21 2003
From: ajiang at mail.eecis.udel.edu (Ao Jiang)
Date: Tue, 7 Oct 2003 21:58:21 -0400 (EDT)
Subject: Still about the MPICH and Intel Fortran Compiler:
In-Reply-To: <Pine.GSO.4.58.0309231524590.13864@icarus.cc.uic.edu>
Message-ID: <Pine.GSO.4.33.0310072143050.2226-100000@stimpy.eecis.udel.edu>

  Hi,
  First, I want to thank all of you for the answers and suggestions
for my question last time.
(
Last time, I tried:
"
ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
"
 The system failed to compile it and gave me the following information:
"
   module EHFIELD
   program FDTD3DPML
   external function RISEF

3228 Lines Compiled
/tmp/ifcVao851.o(.text+0x5a): In function `main':
: undefined reference to `mpi_init_'
.
.
.
)

  Most of friends suggest me to use '-lmpi', instead of '-Lmpi', I tried
it, the system gave me the following error:
"
ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90

   module EHFIELD
   program FDTD3DPML
   external function RISEF
   external subroutine COM_HZY

3228 Lines Compiled
ld: cannot find -lmpi
"
 either does '-lmpif', although there exist 'mpif.h' and 'mpi.h' in the
directory '/opt/mpich-1.2.5/include'.

I also tried the command:
"
/opt/mpich-1.2.5/bin/mpif90 -w -o p_wg3 p_fdtd3dwg3_pml.f90
"

The system gave the error:
"
3228 Lines Compiled
/opt/intel/compiler70/ia32/lib/libIEPCF90.a(f90fioerr.o)(.text+0x4d3): In
function `f_f77ioerr':
: undefined reference to `__ctype_b'
"

In fact, I don't know what this error means. Of course, I don't know
how to slove it either.

Tom


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Oct  7 22:23:04 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 8 Oct 2003 12:23:04 +1000
Subject: updated cluster finishing script system
In-Reply-To: <1065579065.32368.134.camel@protein.scalableinformatics.com>
References: <1065554922.32374.47.camel@protein.scalableinformatics.com> <200310080950.41343.csamuel@vpac.org> <1065579065.32368.134.camel@protein.scalableinformatics.com>
Message-ID: <200310081223.05966.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 8 Oct 2003 12:11 pm, Joseph Landman wrote:

> Thanks for catching the wording error.

No worries, I wasn't intending to be pedantic, just curious. :-)

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/g3UIO2KABBYQAh8RAgSkAJ48X7RY3ABNnYa2DlQ0z0vHfinaxACfdsMk
hIZqsuVLevZqp2OBtfAafEs=
=2vpF
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Tue Oct  7 22:11:05 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Tue, 07 Oct 2003 22:11:05 -0400
Subject: updated cluster finishing script system
In-Reply-To: <200310080950.41343.csamuel@vpac.org>
References: <1065554922.32374.47.camel@protein.scalableinformatics.com>
	 <200310080950.41343.csamuel@vpac.org>
Message-ID: <1065579065.32368.134.camel@protein.scalableinformatics.com>

On Tue, 2003-10-07 at 19:50, Chris Samuel wrote:

> On Wed, 8 Oct 2003 05:28 am, Joseph Landman wrote:
> 
> > It is licensed under the GPL for end users.  Contact us offline if you want
> > to talk about redistribution licenses.
> 
> Err, if it's licensed under the GPL then the "end users" who receive it under 
> that license can redistribute it themselves under the GPL.  Part 6 of the GPL 
> v2 says:

...

> Of course as the copyright holder you could also do dual licensing, so I guess 
> this is what you mean - correct ?

Commercial redistribution ala the MySQL form of license.  You are
correct, it was a mis-wording on my part.  Basically if someone decides
to turn this into a commercial product (ok, stop laughing...), or wants
support, or a warranty, then they need to speak with us first.  As the
package is mostly source code, make files and scripts, it seems odd to
consider distributing it any other way.  

More to the point, there are some things that should be free (Libre and
beer, though some keep asking me where the free beer is).  Stuff like
this should be free (as in Libre).  RGB and I had a conversation about
this I think... .  I leave it to others to supply the beer.

> But whichever it is, once you have released something under the GPL you cannot 
> prevent others from redistributing it under the GPL themselves.

... which I don't want to hinder (redistribution under GPL), rather I
want to encourage ... 

Thanks for catching the wording error.

-- 
Joseph Landman <landman at scalableinformatics.com>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctierney at hpti.com  Wed Oct  8 13:16:03 2003
From: ctierney at hpti.com (Craig Tierney)
Date: 08 Oct 2003 11:16:03 -0600
Subject: Still about the MPICH and Intel Fortran Compiler:
In-Reply-To: <Pine.GSO.4.33.0310072143050.2226-100000@stimpy.eecis.udel.edu>
References: <Pine.GSO.4.33.0310072143050.2226-100000@stimpy.eecis.udel.edu>
Message-ID: <1065633362.22256.8.camel@woody>

On Tue, 2003-10-07 at 19:58, Ao Jiang wrote:
>   Hi,
>   First, I want to thank all of you for the answers and suggestions
> for my question last time.
> (
> Last time, I tried:
> "
> ifc -I /opt/mpich-1.2.5/include -Lmpi -w -Lm -o p_wg3 p_fdtd3dwg3_pml.f90
> "
>  The system failed to compile it and gave me the following information:
> "
>    module EHFIELD
>    program FDTD3DPML
>    external function RISEF
> 
> 3228 Lines Compiled
> /tmp/ifcVao851.o(.text+0x5a): In function `main':
> : undefined reference to `mpi_init_'

The option -L specifies the path for libraries.
The option -l specifies the library to link.

Your command should be:

ifc -I/opt/mpich-1.2.5/include -L/opt/mpich-1.2.5/lib -lmpi -w -lm -o
p_wg3 p_fdtd3dwg3_pml.f90

Craig


> .
> .
> .
> )
> 
>   Most of friends suggest me to use '-lmpi', instead of '-Lmpi', I tried
> it, the system gave me the following error:
> "
> ifc -I/opt/mpich-1.2.5/include -lmpi -w -o p_wg3 p_fdtd3dwg3_pml.f90
> 
>    module EHFIELD
>    program FDTD3DPML
>    external function RISEF
>    external subroutine COM_HZY
> 
> 3228 Lines Compiled
> ld: cannot find -lmpi
> "
>  either does '-lmpif', although there exist 'mpif.h' and 'mpi.h' in the
> directory '/opt/mpich-1.2.5/include'.
> 
> I also tried the command:
> "
> /opt/mpich-1.2.5/bin/mpif90 -w -o p_wg3 p_fdtd3dwg3_pml.f90
> "
> 
> The system gave the error:
> "
> 3228 Lines Compiled
> /opt/intel/compiler70/ia32/lib/libIEPCF90.a(f90fioerr.o)(.text+0x4d3): In
> function `f_f77ioerr':
> : undefined reference to `__ctype_b'
> "
> 
> In fact, I don't know what this error means. Of course, I don't know
> how to slove it either.
> 
> Tom
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-- 
Craig Tierney <ctierney at hpti.com>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Wed Oct  8 14:02:17 2003
From: ds10025 at cam.ac.uk (D. Scott)
Date: 08 Oct 2003 19:02:17 +0100
Subject: Root-nfs error 13 while mounting
Message-ID: <E1A7Ide-0003y5-23@maroon.csi.cam.ac.uk>

Evening

Have resolved the problem. It was due to setting in dhcpd.conf it require 
option root-path pointing to root path of the node. I get another error. 
When diskless node boot up it can not find init file. Also, what is min 
files is transfer to /tftfpboot/node/?

Dan
On Oct 7 2003, Mark Hahn wrote:

> > I'm getting error 13 when my diskless client try to mount file system. 
> > Hoe best to resolved this error 13?
> 
> it's best resolved by translating it to text: EACCESS or "permission 
> denied". I'm guessing you should look at the logs on your fileserver, 
> since it seems to be rejecting your clients.
> 
> 
> _______________________________________________ Beowulf mailing list, 
> Beowulf at beowulf.org To change your subscription (digest mode or 
> unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Wed Oct  8 16:50:34 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Wed, 8 Oct 2003 13:50:34 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310081324100.8067-100000@twin.uoregon.edu>

the 3ware 6000 7000 and 7006 cards are all gone from the marketplace the 
cards you want to look at are the 3ware 7506 (parallel ata) or the 3ware 
8506 (serial ata). the 2400 was never seriously in the running for us 
because it only supports 4 drives.

joelja

 On Wed, 8 Oct 2003, Daniel Fernandez wrote:

> Hi,
> 
> I would like to know some advice about what kind of technology apply
> into a RAID file server ( through NFS ) . We started choosing hardware
> RAID to reduce cpu usage.
> 
> We have two options , SCSI RAID and ATA RAID. The first would give the
> best results but on the other hand becomes really expensive so we have
> in mind two ATA RAID controllers:
> 
>                 Adaptec 2400A
> 		3Ware 6000/7000 series controllers
> 
> Any one of these has its strong and weak points, after seeing various
> benchmarks/comparisons/reviews these are the only candidates that
> deserve our attention.
> 
> The server has a dozen of client workstations connected through a
> switched 100Mbit LAN , all of these equipped with it's own OS and
> harddisk, all home directories will be stored under the main server,
> main workload (compilation and edition) would be done on the local
> machines tough, server only takes care of file sharing.
> 
> Also parallel MPI executions will be done between the clients.
> 
> Considering that not all the workstantions would be working full time
> and with cost in mind ? it's worth an ATA RAID solution ?
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Wed Oct  8 17:46:31 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Wed, 8 Oct 2003 14:46:31 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081324100.8067-100000@twin.uoregon.edu>
Message-ID: <Pine.LNX.3.96.1031008143206.16161A-100000@Maggie.Linux-Consulting.com>


hi ya daniel

>  On Wed, 8 Oct 2003, Daniel Fernandez wrote:
> 
> > Hi,
> > 
> > I would like to know some advice about what kind of technology apply
> > into a RAID file server ( through NFS ) . We started choosing hardware
> > RAID to reduce cpu usage.
> > 
> > We have two options , SCSI RAID and ATA RAID. The first would give the
> > best results but on the other hand becomes really expensive so we have
> > in mind two ATA RAID controllers:
> > 
> >                 Adaptec 2400A
> > 		3Ware 6000/7000 series controllers
> > 
> > Any one of these has its strong and weak points, after seeing various
> > benchmarks/comparisons/reviews these are the only candidates that
> > deserve our attention.

good points about ata raid
	- large disks storage ( 300GB drives at $300 each +/- )
	- get those drives w/ 8MB buffer disk cache
	- cheap ... can do with software raid or $40 ata-133 ide 
	controller

	- $300 more for making ata drives appear like scsi drives
	with 3ware raid controllers 

	- slower rpm disks ... usually it tops out at 7200rpm

	- it supposedly can sustain 133MB/sec transfers

	- if you use software raid, you can monitor the raid status

	- if you use hardware raid, you are limited to the tools
	the hw vendor gives you tomonitor the raid status
	of pending failures or dead drives

good points about scsi ..
	- some say scsi disks are faster ... 
	- super expensive .. $200 for 36 GB .. at 15000rpm

	- it supposedly can sustain 320MB/sec transfers

if the disks does transfer at its full speed ... 320MB/sec or 133MB/sec
does the rest of the system get to keep up with processing the
data spewing off and onto the disks

independent of which raid system is built, you wil need 2 or 3
more backup systems to backup your Terabyte sized raid systems

more raid fun
	http://www.1U-Raid5.net

c ya
alvin

> > The server has a dozen of client workstations connected through a
> > switched 100Mbit LAN , all of these equipped with it's own OS and
> > harddisk, all home directories will be stored under the main server,
> > main workload (compilation and edition) would be done on the local
> > machines tough, server only takes care of file sharing.
> > 
> > Also parallel MPI executions will be done between the clients.
> > 
> > Considering that not all the workstantions would be working full time
> > and with cost in mind ? it's worth an ATA RAID solution ?

good p
> > 
> > 
> 
> -- 
> -------------------------------------------------------------------------- 
> Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
> GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel at labtie.mmt.upc.es  Wed Oct  8 15:46:59 2003
From: daniel at labtie.mmt.upc.es (Daniel Fernandez)
Date: Wed, 08 Oct 2003 21:46:59 +0200
Subject: building a RAID system
Message-ID: <1065642419.9483.55.camel@qeldroma.cttc.org>

Hi,

I would like to know some advice about what kind of technology apply
into a RAID file server ( through NFS ) . We started choosing hardware
RAID to reduce cpu usage.

We have two options , SCSI RAID and ATA RAID. The first would give the
best results but on the other hand becomes really expensive so we have
in mind two ATA RAID controllers:

                Adaptec 2400A
		3Ware 6000/7000 series controllers

Any one of these has its strong and weak points, after seeing various
benchmarks/comparisons/reviews these are the only candidates that
deserve our attention.

The server has a dozen of client workstations connected through a
switched 100Mbit LAN , all of these equipped with it's own OS and
harddisk, all home directories will be stored under the main server,
main workload (compilation and edition) would be done on the local
machines tough, server only takes care of file sharing.

Also parallel MPI executions will be done between the clients.

Considering that not all the workstantions would be working full time
and with cost in mind ? it's worth an ATA RAID solution ?

-- 
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Laboratori de Termot?cnia i Energia - CTTC
UPC Campus Terrassa

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Wed Oct  8 06:33:13 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 08 Oct 2003 06:33:13 -0400
Subject: Why NFS hang when copying files of 6MB?
In-Reply-To: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>
References: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>
Message-ID: <1065609193.28674.32.camel@squash.scalableinformatics.com>

On Wed, 2003-10-08 at 18:17, D. Scott wrote:
> Hi
> 
> On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes?

Lots of possibilities, though I am not sure you have supplied enough
information to hazard a guess (unless someone ran into this before and
already knows the answer).

An operation on an NFS mounted file system can hang when:

1) the nfs server becomes unresponsive (crash, overload, file system
full, ...)
2) the client becomes unresponsive ...
3) the network becomes unresponsive ...
...

If you could indicate more details, it is likely someone might be able
to tell you where to look next.

> 
> 
> Dan
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ds10025 at cam.ac.uk  Wed Oct  8 18:17:02 2003
From: ds10025 at cam.ac.uk (D. Scott)
Date: 08 Oct 2003 23:17:02 +0100
Subject: Why NFS hang when copying files of 6MB?
Message-ID: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>

Hi

On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes?


Dan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Wed Oct  8 19:39:41 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 8 Oct 2003 19:39:41 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca>

> I would like to know some advice about what kind of technology apply
> into a RAID file server ( through NFS ) . We started choosing hardware
> RAID to reduce cpu usage.

that's unfortunate, since the main way HW raid saves CPU usage is 
by running slower ;)

seriously, CPU usage is NOT a problem with any normal HW raid,
simply because a modern CPU and memory system is *so* much better
suited to performing raid5 opterations than the piddly little
controller in a HW raid card.  the master/fileserver for my 
cluster is fairly mundane (dual-xeon, i7500, dual PC1600), and 
it can *easily* saturate its gigabit connection.  after all, ram
runs at around 2 GB/s sustained, and the CPU can checksum at 3 GB/s!

concern for PCI congestion is a much more serious issue.

finally, why do you care at all?  are you fileserving through
a fast (>300 MB/s) network like quadrics/myrinet/IB?  most people
limp along at a measly gigabit, which even a two-ide-disk raid0
can saturate...

> The server has a dozen of client workstations connected through a
> switched 100Mbit LAN , all of these equipped with it's own OS and

jeez, since your limited to 10 MB/s, you could do raid5 on a 486
and still saturate the net.  seriously, CPU consumption is NOT an issue
at 10 MB/s.

> machines tough, server only takes care of file sharing.

so excess cycles on the fileserver will be wasted unless used.

> Considering that not all the workstantions would be working full time
> and with cost in mind ? it's worth an ATA RAID solution ?

you should buy a single promise sata150 tx4 and four big sata disks
(7200 RPM 3-year models, please).

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Wed Oct  8 19:28:37 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Wed, 8 Oct 2003 16:28:37 -0700 (PDT)
Subject: Why NFS hang when copying files of 6MB?
In-Reply-To: <1065609193.28674.32.camel@squash.scalableinformatics.com>
Message-ID: <Pine.LNX.3.96.1031008162633.12224A-100000@Maggie.Linux-Consulting.com>


On Wed, 8 Oct 2003, Joe Landman wrote:

> On Wed, 2003-10-08 at 18:17, D. Scott wrote:
> > Hi
> > 
> > On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes?
> 
> Lots of possibilities, though I am not sure you have supplied enough
> information to hazard a guess (unless someone ran into this before and
> already knows the answer).
> 
> An operation on an NFS mounted file system can hang when:
> 
> 1) the nfs server becomes unresponsive (crash, overload, file system
> full, ...)

not enough memory, too much swap spce

> 2) the client becomes unresponsive ...
> 3) the network becomes unresponsive ...
> ...

- bad hub, bad switch, bad cables
- bad nic cards, bad motherboard, 
- bad kernel, bad drivers
- bad dhcp config, waiting for machines that went offline

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  8 19:41:12 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 9 Oct 2003 09:41:12 +1000
Subject: Why NFS hang when copying files of 6MB?
In-Reply-To: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>
References: <E1A7McA-0004t4-Rp@maroon.csi.cam.ac.uk>
Message-ID: <200310090941.13302.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 9 Oct 2003 08:17 am, D. Scott wrote:

> On diskless cluster why does NFS hang when doing 'cp' of 6MB between nodes?

You need to give a lot more detail on that, try having a quick read of:

	http://www.catb.org/~esr/faqs/smart-questions.html#beprecise

Basically there are all sorts of possible problems from kernel bugs, node 
hardware problems through to various network problems...

Useful information would be things like:

	/etc/fstab from the nodes
	output of the mount command
	the output of strace when you try and do the 'cp':

		strace -o cp.log -e trace=file cp /path/to/file /path/to/destination

good luck!
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/hKCYO2KABBYQAh8RAqltAJ4/R91yD0KKVA6wB3+UDZxZcAOsFwCbBZn1
DeaCjkFO8bwGLhhSkxB20yE=
=d7Gz
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Wed Oct  8 22:27:35 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Wed, 8 Oct 2003 19:27:35 -0700 (PDT)
Subject: CAD
In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver>
Message-ID: <Pine.LNX.3.96.1031008192444.22819A-100000@Maggie.Linux-Consulting.com>


hi ya

On Thu, 9 Oct 2003, Manoj Gupta wrote:

> Hello,
> 
> One of my clients has asked me to provide a solution for his AutoCAD
> work.
> The minimum file size on which he works is nearly of 400 MB and it takes
> 15-20 minutes to load on his single system.

tell them to break the drawing up into itty-bitty pieces
and work on a real autocad drawing .. :-)
	- separate the item into separate pieces so it can be
	bent, welded, drilled, etc

or get a 3Ghz cpu and load up 4GB or 8GB of memory 

and nope ... beowulf or any other cluster will not help autocad

c ya
alvin
- part time autocad me ..but i cant draw a line .. :-)
- easier to contract out the 1u chassis design "drawings" :-)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  8 19:47:32 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 9 Oct 2003 09:47:32 +1000
Subject: PocketPC Cluster
Message-ID: <200310090947.33601.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Not strictly a Beowulf as there's no Linux, but interesting nonetheless. :-)

IrDA for the networking, 11 compute + 1 management, slower than "a mainstream 
Pentium II-class desktop PC" (they don't specify what spec).

http://www.spbsoftwarehouse.com/dev/articles/pocketcluster/index.html

	Twelve Pocket PC devices have been joined in a cluster to perform distributed
	calculations - the devices share the load of a complex calculation. The
	concept was to compare the performance of several Pocket PC devices linked
	into a cluster with the performance of a typical Pentium II-class desktop
	computer.
	[...]

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/hKIUO2KABBYQAh8RAvJvAJoDNqZ/2m8cIqo02Hbbwzpm2DWeMQCeOltt
3LuUp1Kkoc4jnmwVNgoDoFI=
=+abL
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mg_india at sancharnet.in  Wed Oct  8 20:03:57 2003
From: mg_india at sancharnet.in (Manoj Gupta)
Date: Thu, 09 Oct 2003 05:33:57 +0530
Subject: CAD
Message-ID: <000001c38df8$e1a6d9c0$bbd2003d@myserver>

Hello,

One of my clients has asked me to provide a solution for his AutoCAD
work.
The minimum file size on which he works is nearly of 400 MB and it takes
15-20 minutes to load on his single system.

Can Beowulf be used to solve this problem and minimize the time required
so as to improve productivity?


Sawan Gupta || mg_india at sancharnet.in ||


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Wed Oct  8 20:23:28 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 8 Oct 2003 20:23:28 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.3.96.1031008143206.16161A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>

> 	- get those drives w/ 8MB buffer disk cache

what reason do you have to regard 8M as other than a useless
marketing feature?  I mean, the kenel has a cache that's 100x
bigger, and a lot faster.

> 	- slower rpm disks ... usually it tops out at 7200rpm

unless your workload is dominated by tiny, random seeks,
the RPM of the disk isn't going to be noticable.

> 	- it supposedly can sustain 133MB/sec transfers

it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE
disks in raid0.  interestingly, the chipset controller is normally
not competing for the same bandwidth as the PCI, so even with 
entry-level hardware, it's not hard to break 133.

> 	- if you use software raid, you can monitor the raid status

this is the main and VERY GOOD reason to use sw raid.

> 	- some say scsi disks are faster ... 

usually lower-latency, often not higher bandwidth.  interestingly,
ide disks usually fall off to about half peak bandwidth on inner 
tracks.  scsi disks fall off too, but usually less so - they 
don't push capacity quite as hard.

> 	- it supposedly can sustain 320MB/sec transfers

that's silly, of course.  outer tracks of current disks run at 
between 50 and 100 MB/s, so that's the max sustained.  you can even
argue that's not really 'sustained', since you'll eventually get
to slower inner tracks.

> independent of which raid system is built, you wil need 2 or 3
> more backup systems to backup your Terabyte sized raid systems

backup is hard.  you can get 160 or 200G tapes, but they're almost 
as expensive as IDE disks, not to mention the little matter of a 
tape drive that costs as much as a server.  raid5 makes backup
less about robustness than about archiving or rogue-rm-protection.
I think the next step is primarily a software one - 
some means of managing storage, versioning, archiving, etc...

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bob at drzyzgula.org  Wed Oct  8 21:04:03 2003
From: bob at drzyzgula.org (Bob Drzyzgula)
Date: Wed, 8 Oct 2003 21:04:03 -0400
Subject: CAD
In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver>
References: <000001c38df8$e1a6d9c0$bbd2003d@myserver>
Message-ID: <20031008210403.F28876@www2>

AutoCAD versions since R13 only run on
Windows, and AFAIK no version of AutoCAD
has ever been shipped for Linux. Beowulf is a
Linux- (or, taken more liberally than
most people intend, Unix-) specific thing.

Thus, unless I misunderstand, no.

--Bob Drzyzgula

On Thu, Oct 09, 2003 at 05:33:57AM +0530, Manoj Gupta wrote:
> 
> Hello,
> 
> One of my clients has asked me to provide a solution for his AutoCAD
> work.
> The minimum file size on which he works is nearly of 400 MB and it takes
> 15-20 minutes to load on his single system.
> 
> Can Beowulf be used to solve this problem and minimize the time required
> so as to improve productivity?
> 
> 
> Sawan Gupta || mg_india at sancharnet.in ||
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  8 21:45:08 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 8 Oct 2003 21:45:08 -0400 (EDT)
Subject: building a RAID systemo
In-Reply-To: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310082103370.1253-100000@lilith.rgb.private.net>

On Wed, 8 Oct 2003, Mark Hahn wrote:

> you should buy a single promise sata150 tx4 and four big sata disks
> (7200 RPM 3-year models, please).

I totally agree with everything Mark said and second this.  Although
3-year ata (lower) or scsi (higher) disks would be just fine too,
depending on how much you care to spend and how much it costs you if
things go down.

e.g. md raid under linux works marvelously well, and one can even create
a kickstart file so that it makes your raid for you on a fully automated
install, which is very cool.  It is also dirt cheap.  My home (switched
100 Mbps, 8-9 hosts/nodes depending on what is on) has a 150 GB RAID-5
server (3x80 GB 3-year ATA 7200 RPM disks) on a 2.2 GHz Celeron server
with an extra ATA controller so there is only one disk per channel.  It
cost about $800 total to build inside a full tower case with extra fans
including one with leds in front so that it glows blue.  You couldn't
get the CASE of a HW raid for that price, I don't think (although I
admit that it won't do hot swap and dual power supplies).

The total RAID/NFS load since 9/19 is:

root        11  0.0  0.0     0    0 ?        SW   Sep19   0:00 [mdrecoveryd]
root        21  0.0  0.0     0    0 ?        SW   Sep19   0:00 [raid1d]
root        22  0.0  0.0     0    0 ?        SW   Sep19   0:02 [raid5d]
root        23  0.0  0.0     0    0 ?        SW   Sep19   5:03 [raid5d]
...
root      4928  0.0  0.0     0    0 ?        SW   Sep19   2:58 [nfsd]
root      4929  0.0  0.0     0    0 ?        SW   Sep19   2:57 [nfsd]
root      4930  0.0  0.0     0    0 ?        SW   Sep19   3:00 [nfsd]
root      4931  0.0  0.0     0    0 ?        SW   Sep19   2:43 [nfsd]
root      4932  0.0  0.0     0    0 ?        SW   Sep19   3:00 [nfsd]
root      4933  0.0  0.0     0    0 ?        SW   Sep19   2:43 [nfsd]
root      4934  0.0  0.0     0    0 ?        SW   Sep19   2:56 [nfsd]
root      4935  0.0  0.0     0    0 ?        SW   Sep19   2:58 [nfsd]

(or less than 30 minutes of total CPU).  At 1440 min/day, for 18 days
(conservatively) that is about 0.1% load, on average.

This is a home network load, sure (which includes gaming and a fair bit
of data access, but no, we're not talking GB per day moving over the
lines).  In a more data-intensive environment this would increase, but
there is a lot of head room.  The point is that a 2.2 GHz system has a
LOT of horsepower.  We used to run entire departments of twenty or
thirty workstations using $10-20,000 Sun servers at maybe 5 MEGAHertz on
10 Mbps thinwire networks with fair to middling satisfaction.  My $800
home server has several thousand times the raw speed, about a thousand
times the memory, a thousand times the disk, AND it is RAID 5 disk at
that.  The network has only increased in speed by a factor of maybe
10-20 (allowing for switched vs hub).  Mucho headroom indeed.

BTW, our current department primary server is a 1 GHz PIII, although
we're adding a second CPU shortly as load dictates.  And if you are
planning your server to handle something other than a small cluster or
LAN where downtime isn't too "expensive" you may want to look at higher
quality (rackmount) servers and disk arrays in enclosures that permit
e.g. hot swap and that have redundant power.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Wed Oct  8 23:12:41 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 9 Oct 2003 13:12:41 +1000
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <200310091312.42544.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 9 Oct 2003 10:23 am, Mark Hahn wrote:

> raid5 makes backup
> less about robustness than about archiving or rogue-rm-protection.
> I think the next step is primarily a software one -
> some means of managing storage, versioning, archiving, etc...

For those who haven't seen it, this is a very interesting way of doing 
snapshot style backups:

	http://www.mikerubel.org/computers/rsync_snapshots/

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/hNIpO2KABBYQAh8RAvXaAJ0ecv77jUJe3DWpsinqBFgs4W4JlQCfRz/z
HfXF/JkFSszlvX10/JXjisM=
=7lAy
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Wed Oct  8 22:58:17 2003
From: becker at scyld.com (Donald Becker)
Date: Wed, 8 Oct 2003 22:58:17 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310082243520.21674-100000@training.scyld.com>

On Wed, 8 Oct 2003, Mark Hahn wrote:

> > 	- get those drives w/ 8MB buffer disk cache
> 
> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.

The larger cache does provide some benefit.  Disks now read and cache up
to a whole track/cylinder at once, starting from when the head settles
from a seek up to when the desired sector is read.  You can't do that
type of caching in the kernel.

As disks become more dense, more memory is needed to save a cylinder's
worth of data, so we should expect the cache size to increase.

But you point is likely "disk cache is mostly legacy superstition".
MS-Windows 98 and earlier had such horrible caching behavior that a few MB
of on-disk cache could triple the performance.  This was also why
MS-Windows would run much faster under Linux+VMWare than on the raw
hardware.

> > 	- it supposedly can sustain 133MB/sec transfers

Normal disks top out at 70MB/sec read, 50MB/sec write on the outer
tracks.  These numbers drop significantly on the inner tracks.
You might get 10MB/sec better with 10K or 15K RPM SCSI drives, but it's
certainly not linear with the speed.

BTW, 2.5" laptop drives are _far_ worse.
Typical for a modern fast drive is 20MB/sec read and 10MB/sec write.
Older drivers were worse.

> > 	- some say scsi disks are faster ... 
> 
> usually lower-latency, often not higher bandwidth.  interestingly,
> ide disks usually fall off to about half peak bandwidth on inner 
> tracks.  scsi disks fall off too, but usually less so - they 
> don't push capacity quite as hard.

Look at the shape of the transfer performance curve -- the shape is
sometimes the same as the similar IDE drive, but sometimes has a much
different curve.  Wider tracks mean faster seek settling but lower
density.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Wed Oct  8 22:33:49 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Wed, 8 Oct 2003 19:33:49 -0700 (PDT)
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.3.96.1031008192744.22819B-100000@Maggie.Linux-Consulting.com>


hi ya mark

On Wed, 8 Oct 2003, Mark Hahn wrote:

> > 	- get those drives w/ 8MB buffer disk cache
> 
> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.

for those squeezing the last 1MB/sec transfer out of their
disks ... 8MB did seem to make a difference
	( streaming video apps - encoding/decoding/xmit )

> > 	- slower rpm disks ... usually it tops out at 7200rpm
> 
> unless your workload is dominated by tiny, random seeks,
> the RPM of the disk isn't going to be noticable.

usually a side affect of partitioning too

> > 	- it supposedly can sustain 133MB/sec transfers
> 
> it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE
> disks in raid0.  interestingly, the chipset controller is normally
> not competing for the same bandwidth as the PCI, so even with 
> entry-level hardware, it's not hard to break 133.

super easy to overflow the disks and pci .. depending on apps

> > 	- if you use software raid, you can monitor the raid status
> 
> this is the main and VERY GOOD reason to use sw raid.

yup
 
> > 	- some say scsi disks are faster ... 
> 
> usually lower-latency, often not higher bandwidth.  interestingly,
> ide disks usually fall off to about half peak bandwidth on inner 
> tracks.  scsi disks fall off too, but usually less so - they 
> don't push capacity quite as hard.

scsi capacity doesnt seem to be an issue for them ... 
they're falling behind by several generations
( scsi disks used to be the highest capacity drives .. not any more )

> > 	- it supposedly can sustain 320MB/sec transfers
> 
> that's silly, of course.  outer tracks of current disks run at 
> between 50 and 100 MB/s, so that's the max sustained.  you can even
> argue that's not really 'sustained', since you'll eventually get
> to slower inner tracks.

yup ... those are just marketing numbers... all averages ...

and bigg differences between inner tracks and outer tracks

> > independent of which raid system is built, you wil need 2 or 3
> > more backup systems to backup your Terabyte sized raid systems
> 
> backup is hard.  you can get 160 or 200G tapes, but they're almost 

to me ... backup of terabyte sized systems is trivial ...
	- just give me lots of software raid subsystems
	( 2 backups for each "main" system )

	- lot cheaper than tape drives and 1000x faster than tapes
	for live backups

	- will never touch a tape backup again ... too sloow
	and too unreliable no matter how clean the tape heads are
		( too slow being the key problem for restoring )

c ya
alvin

> as expensive as IDE disks, not to mention the little matter of a 
> tape drive that costs as much as a server.  raid5 makes backup
> less about robustness than about archiving or rogue-rm-protection.
> I think the next step is primarily a software one - 
> some means of managing storage, versioning, archiving, etc...
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct  8 22:31:50 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 8 Oct 2003 22:31:50 -0400 (EDT)
Subject: CAD
In-Reply-To: <000001c38df8$e1a6d9c0$bbd2003d@myserver>
Message-ID: <Pine.LNX.4.44.0310082211420.1253-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Manoj Gupta wrote:

> Hello,
> 
> One of my clients has asked me to provide a solution for his AutoCAD
> work.
> The minimum file size on which he works is nearly of 400 MB and it takes
> 15-20 minutes to load on his single system.

Load from what into what?  It is hard for me to see how a 400 MB file
could take this long to load into memory over any modern channel, as
this is less than 0.5 MB/sec.  This is roughly the bandwidth one
achieves throwing floppies across a room one at a time by hand.

That is, I can't imagine how this is bandwidth limited, unless the
client has primitive hardware.  From a local disk (even a bad one) this
should take ballpark of a few seconds to load into memory.  From NFS
order of a minute or three (in most configurations, less on faster
networks).

If the load is so slow because the program is crunching the file as it
loads it (reading a bit, thinking a bit, reading a bit more) then
nothing can speed this up unless AutoCAD has a parallel version of their
program.

> Can Beowulf be used to solve this problem and minimize the time required
> so as to improve productivity?

I don't know for sure (although somebody else on the list might).  I
doubt it, though, unless autocad has a parallel version that can use a
linux cluster to speed things up.

However, your first step in answering it for yourself is going to be
doing measurements to determine what the bottleneck is.  If it is I/O
then invest in better I/O (perhaps a better network).  So measure e.g.
the network load if it is getting the file from a network file server.

If the problem is that the file is coming from a winXX server with too
little memory on an antique CPU and with creaky old disks on a 10 Mbps
hub, well, FIRST replace the winxx with linux, the old server with a new
server, the old disks with new disks, the 10 BT with 1000 BT.  At that
point you won't have a bandwidth problem, as the server should be able
to deliver files at some tens of MB/sec pretty easily.  If the problem
persists, try to figure out what autocad is doing when it loads.

   rgb

> 
> 
> Sawan Gupta || mg_india at sancharnet.in ||
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Thu Oct  9 02:00:33 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Wed, 8 Oct 2003 23:00:33 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.04.10310082210390.2228-100000@12-207-199-254.client.attbi.com>

On Wed, 8 Oct 2003, Mark Hahn wrote:
> > 	- get those drives w/ 8MB buffer disk cache
> 
> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.

I found a comparison of 8MB vs 2MB drives in a raid, though it's windows
based and not that great:
http://www.madshrimps.be/?action=getarticle&number=13&artpage=289&articID=69

Seems like the 8MB didn't really make much of a difference.

> > independent of which raid system is built, you wil need 2 or 3
> > more backup systems to backup your Terabyte sized raid systems
> 
> backup is hard.  you can get 160 or 200G tapes, but they're almost 
> as expensive as IDE disks, not to mention the little matter of a 

100GB LTO tapes can be had for $36, that's less than half the price of the
cheapest 200 GB drives. 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From maurice at harddata.com  Thu Oct  9 00:58:27 2003
From: maurice at harddata.com (Maurice Hilarius)
Date: Wed, 08 Oct 2003 22:58:27 -0600
Subject: building a RAID system
In-Reply-To: <200310090112.h991CPb24907@NewBlue.scyld.com>
Message-ID: <5.1.1.6.2.20031008225509.04259800@mail.harddata.com>

Where you said:

>I would like to know some advice about what kind of technology apply
>into a RAID file server ( through NFS ) . We started choosing hardware
>RAID to reduce cpu usage.
>
>We have two options , SCSI RAID and ATA RAID. The first would give the
>best results but on the other hand becomes really expensive so we have
>in mind two ATA RAID controllers:
>
>                 Adaptec 2400A
>                 3Ware 6000/7000 series controllers

I would suggest using the 3Ware (current models are 7506 ( parallel ATA) 
and 8506 ( Serial ATA)).


Use mdamd to create software RAID devices.
It will yield better performance, and is much more flexible.
If you are building a large array, use multiple controllers to increase 
throughput.


With our best regards,

Maurice W. Hilarius       Telephone: 01-780-456-9771
Hard Data Ltd.               FAX:       01-780-456-9772
11060 - 166 Avenue        mailto:maurice at harddata.com
Edmonton, AB, Canada      http://www.harddata.com/
    T5X 1Y3

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 03:52:39 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 00:52:39 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.04.10310082210390.2228-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.3.96.1031009004145.30204A-100000@Maggie.Linux-Consulting.com>


hi ya

On Wed, 8 Oct 2003, Trent Piepho wrote:

> On Wed, 8 Oct 2003, Mark Hahn wrote:
> > > 	- get those drives w/ 8MB buffer disk cache
> > 
> > what reason do you have to regard 8M as other than a useless
> > marketing feature?  I mean, the kenel has a cache that's 100x
> > bigger, and a lot faster.
> 
> I found a comparison of 8MB vs 2MB drives in a raid, though it's windows
> based and not that great:
> http://www.madshrimps.be/?action=getarticle&number=13&artpage=289&articID=69

i dont have much data between 2MB and 8MB ... just various people's
feedback ...
	- releasable data i do have is at
	http://www.Linux-1U.net/Disks/Tests/
 
- testing for 2MB and 8MB should be done on the same system  of the
  same sized disks and exact same partition, distro, patchlevel and "test
  programs to amplify the differences"
	- lots of disk writes and reads ... that overflow the memory
	so that disk access is forced ...

> Seems like the 8MB didn't really make much of a difference.
> 
> > > independent of which raid system is built, you wil need 2 or 3
> > > more backup systems to backup your Terabyte sized raid systems

-- emphasizing .. "Terabyte" sized disk subsystems

> > backup is hard.  you can get 160 or 200G tapes, but they're almost 
> > as expensive as IDE disks, not to mention the little matter of a 
> 
> 100GB LTO tapes can be had for $36, that's less than half the price of the
> cheapest 200 GB drives. 

we/i like to build systems that backup 1TB or 2TB per 1U server ...
 	- tapes doesn't come close ... different ballpark 

	- a rack of 1U servers is a minimum of 40TB - 80TB of data ..

	- and than to turn around and simulate a disk crash
	and restore from backups from bare metal   or how fast
	to get a replacement system back online ( hot swap - live backups)

	- i think those 200GB tape drives is something to also
	add into the costs of backup media  .. as are restore
	from tape considerations  before deciding on tape vs disk 
	backup media ( all depends on the purpose of the server and data )

	- last i played with tape drives was those $3K - $4K exabyte tape
	drives ... nice and fast (writing) ..  but very slow for 
	restore and unreliable ... and time consuming and NOT automated

	- people costs the mosts for doing proper backups ... 
	( someone has to write the backup methodology ro swap the tapes 
	etc )

fries ( a local pc store here ) had 160GB disks 8MB buffers for $80 after
rebates ... otherwise general rule is $1 per GB of raw disk storage per
disk

fun stuff ..

have fun
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Thu Oct  9 05:25:18 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Thu, 9 Oct 2003 02:25:18 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.3.96.1031009004145.30204A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.04.10310090130570.2480-100000@12-207-199-254.client.attbi.com>

On Thu, 9 Oct 2003, Alvin Oga wrote:
> > > backup is hard.  you can get 160 or 200G tapes, but they're almost 
> > > as expensive as IDE disks, not to mention the little matter of a 
> > 
> > 100GB LTO tapes can be had for $36, that's less than half the price of the
> > cheapest 200 GB drives. 
> 
> we/i like to build systems that backup 1TB or 2TB per 1U server ...
>  	- tapes doesn't come close ... different ballpark 

How do you stick 2TB in a 1 U server?  I've seen 1U cases with four IDE bays,
and the largest IDE drive I've seen is 250 GB.

I've got two 4U rackmount systems sitting side by side on the same shelf.  One
is a ADIC Scalar 24, which holds 24 100 GB LTO tapes.  The other is a 16 drive
server with 200GB SATA drives and two 8 port 3ware cards.  The tape library
has 2.4 TB and the IDE server is 3.2 TB.  To be fair, the IDE server is brand
new, while the ADIC is around a year old.  If the tape library were bought
today, it would have a LTO-2 drive with double the capacity and could store
4.8 TB.  So tapes seem to come pretty close to me.  It also quite a bit more
practical changes tapes with the library than to be swapping hard drives
around.  The libraries built in barcode reader keeps track of them all for me. 
I can type a command and have it spit out all the tapes that a certain set of
backups are on.  They fit nicely in a box in their plastic cases and if I drop
one it will be ok.  I can stick them on a shelf for five years and still
expect to read them.  And the tapes don't take up any rackspace or power or
need any cooling.  I've never had a tape go bad on me either, even though I've
been though a lot more of them than IDE drives.

Of course the tape library was expensive.  A new LTO-2 model can be had for
around $11,600 on pricewatch.  The 16 bay IDE case, CPUs/MB/memory and 3ware
controllers were much less.  But the cost of the media is a lot less for tapes
than for SATA hard drives.  Especially if you get models with 3 year
warranties.  Once you buy enough drives/tapes you'll break even on a $/GB
comparison.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Thu Oct  9 06:04:20 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Thu, 9 Oct 2003 10:04:20 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
 <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>

Greg,

> Is it a 100x100 matrix LU decomposition? Well, no, because Intel's
> MKL and the free ATLAS library run at a respectable % of peak.

Our benchmarks concentrate on xGEQRF, xGESVD, xGETRF, xGETRS, xGESV,
xPOTRF, xPOSV, xPPTRF, xGEEV, extending to xGETRI, and xTRTRI.

Have you tried DPPSV or DPOSV on Itanium, for example?  I would be
interested in the percentage of peak that you achieve with MKL and
ATLAS, for up to 10000x10000 matrices.

ATLAS does not have full LAPACK implementation.

> 4) Put your performance whitepapers on your website, or it looks
> fishy.

Our white papers are not on the Web they contain performance data, and
particularly, performance data comparing against our competitors.  It
may expose us to libel legal issues.  Putting legitimacy of any legal
issues aside, it is not good for any business to be engulf in legal
squabbles.  We are in the process of clearing this with our legal
department at the moment.

As I have noted in my previous e-mail, anyone who wants to get a hold
of the white papers are welcome to please send me an e-mail.

> I looked and didn't see a single performance claim there.

There is one on the front page!


Ken
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 06:13:21 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 03:13:21 -0700 (PDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.4.04.10310090130570.2480-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com>


On Thu, 9 Oct 2003, Trent Piepho wrote:

> On Thu, 9 Oct 2003, Alvin Oga wrote:
> > > > backup is hard.  you can get 160 or 200G tapes, but they're almost 
> > > > as expensive as IDE disks, not to mention the little matter of a 
> > > 
> > > 100GB LTO tapes can be had for $36, that's less than half the price of the
> > > cheapest 200 GB drives. 
> > 
> > we/i like to build systems that backup 1TB or 2TB per 1U server ...
> >  	- tapes doesn't come close ... different ballpark 
> 
> How do you stick 2TB in a 1 U server?  I've seen 1U cases with four IDE bays,
> and the largest IDE drive I've seen is 250 GB.

8 drives ... 250GB or 300GB each ..

> I've got two 4U rackmount systems sitting side by side on the same shelf.  One
> is a ADIC Scalar 24, which holds 24 100 GB LTO tapes.  The other is a 16 drive
> server with 200GB SATA drives and two 8 port 3ware cards.  The tape library
> has 2.4 TB and the IDE server is 3.2 TB.  To be fair, the IDE server is brand
> new, while the ADIC is around a year old.  If the tape library were bought
> today, it would have a LTO-2 drive with double the capacity and could store
> 4.8 TB.  So tapes seem to come pretty close to me.  It also quite a bit more
> practical changes tapes with the library than to be swapping hard drives

nobody swaps disks around ... unless one is using those 5.25" drive bay
thingies in which case ... thats a different ball game 

i/we claim that if the drives fail, something is wrong ... its not
necessary for the disks to be removable 

> around.  The libraries built in barcode reader keeps track of them all for me. 
> I can type a command and have it spit out all the tapes that a certain set of
> backups are on.  They fit nicely in a box in their plastic cases and if I drop
> one it will be ok.  I can stick them on a shelf for five years and still

i prefer hands off backups and restore .... esp if the machine is not
within your hands reach ...

> expect to read them.  And the tapes don't take up any rackspace or power or
> need any cooling.  I've never had a tape go bad on me either, even though I've
> been though a lot more of them than IDE drives.
> 
> Of course the tape library was expensive.  A new LTO-2 model can be had for
> around $11,600 on pricewatch.  The 16 bay IDE case, CPUs/MB/memory and 3ware

for $11.6K ... i can build two 2TB servers or more ...
	8 * $400 --> $3200 in drives  ... for 2.4TB each ...  
		 + $700 for misc cpu/mem/1u case
		and it'd be 2 live backups of the primary 2TB system
		or about 2-3 months of weekly full backups depending ondata

> controllers were much less.  But the cost of the media is a lot less for
tapes
> than for SATA hard drives.  Especially if you get models with 3 year
> warranties.  Once you buy enough drives/tapes you'll break even on a $/GB
> comparison.

i dont want to be baby sitting tapes ... on a daily basis and cleaning its
heads  or assume that someone else did

c ya
alvin 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From seth at hogg.org  Thu Oct  9 06:38:54 2003
From: seth at hogg.org (Simon Hogg)
Date: Thu, 09 Oct 2003 11:38:54 +0100
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.co
 m>
References: <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
 <20031006112102.GC15837@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
 <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
Message-ID: <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net>

At 10:04 09/10/03 +0000, C J Kenneth Tan -- Heuchera Technologies wrote:

>Our white papers are not on the Web they contain performance data, and
>particularly, performance data comparing against our competitors.  It
>may expose us to libel legal issues.  Putting legitimacy of any legal
>issues aside, it is not good for any business to be engulf in legal
>squabbles.  We are in the process of clearing this with our legal
>department at the moment.
>
>As I have noted in my previous e-mail, anyone who wants to get a hold
>of the white papers are welcome to please send me an e-mail.

I would just like to comment that if you are releasing the white papers by 
email, what difference is that to putting it on the web?  They are both 
still publishing.

Although IANAL, I would doubt that these figures expose you legally, as 
long as they are correct and truthful in the figures you claim (and 
probabily the methodology would be pretty handy, too).

Simon Hogg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Thu Oct  9 06:31:00 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Thu, 09 Oct 2003 06:31:00 -0400
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com>
References: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com>
Message-ID: <3F8538E4.9020400@lmco.com>


> > How do you stick 2TB in a 1 U server?  I've seen 1U cases with four 
> IDE bays,
> > and the largest IDE drive I've seen is 250 GB.
>
> 8 drives ... 250GB or 300GB each ..
>

   Cool. Do you have pictures? How do you get the other 4 drives
out? I assume they're not accessible from the front so do you
have to pull the unit out, pop the cover and replace the drive?

> > I've got two 4U rackmount systems sitting side by side on the same 
> shelf.  One
> > is a ADIC Scalar 24, which holds 24 100 GB LTO tapes.  The other is 
> a 16 drive
> > server with 200GB SATA drives and two 8 port 3ware cards.  The tape 
> library
> > has 2.4 TB and the IDE server is 3.2 TB.  To be fair, the IDE server 
> is brand
> > new, while the ADIC is around a year old.  If the tape library were 
> bought
> > today, it would have a LTO-2 drive with double the capacity and 
> could store
> > 4.8 TB.  So tapes seem to come pretty close to me.  It also quite a 
> bit more
> > practical changes tapes with the library than to be swapping hard 
> drives
>
> nobody swaps disks around ... unless one is using those 5.25" drive bay
> thingies in which case ... thats a different ball game
>
> i/we claim that if the drives fail, something is wrong ... its not
> necessary for the disks to be removable
>

   Are you saying that it's not necessary to have hot-swappable
drives? (I'm just trying to undertand your point).

   Does everyone remember this:

http://www.tomshardware.com/storage/20030425/index.html

My only problem with this approach is off-site storage of
backups. Do you pull a huge number of drives and move them
off-site? (I still love the idea of using inexpensive drives for
backup instead of tape though).

Jeff

-- 
Dr. Jeff Layton
Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Thu Oct  9 07:07:26 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Thu, 9 Oct 2003 11:07:26 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net>
References: <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
 <20031006112102.GC15837@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
 <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
 <4.3.2.7.2.20031009113601.00b74e50@pop.clara.net>
Message-ID: <Pine.LNX.4.56.0310091101030.18109@krylov.OptimaNumerics.com>

Simon,

> I would just like to comment that if you are releasing the white papers by
> email, what difference is that to putting it on the web?  They are both
> still publishing.

I am not a lawyer, so I cannot comment on the legal aspects of
things.

What if an e-mail and its attachments have a confidentiality clause
attached?


Ken
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 07:26:56 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 07:26:56 -0400 (EDT)
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.3.96.1031008192744.22819B-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>

On Wed, 8 Oct 2003, Alvin Oga wrote:

> > > 	- it supposedly can sustain 320MB/sec transfers
> > 
> > that's silly, of course.  outer tracks of current disks run at 
> > between 50 and 100 MB/s, so that's the max sustained.  you can even
> > argue that's not really 'sustained', since you'll eventually get
> > to slower inner tracks.
> 
> yup ... those are just marketing numbers... all averages ...

It probably refers to burst delivery out of its 8 MB cache.  The actual
sustained bps speed is a pure matter of N*2*\pi*\R*f/S, where N = number
of heads, 2piRf is the linear/tangential speed of the platter at R the
read radius, and S is the linear length per bit.  This is an upper
bound.  Similarly average latency (seek time) is something like 1/2f,
the time the platter requires to move half a rotation.

> and bigg differences between inner tracks and outer tracks

Well, proportional to R, at any rate.  Given the physical geometry of
the platters (which I get to look at when I rip open old drives to
salvage their magnets) about a factor of two.

> > > independent of which raid system is built, you wil need 2 or 3
> > > more backup systems to backup your Terabyte sized raid systems
> > 
> > backup is hard.  you can get 160 or 200G tapes, but they're almost 
> 
> to me ... backup of terabyte sized systems is trivial ...
> 	- just give me lots of software raid subsystems
> 	( 2 backups for each "main" system )
> 
> 	- lot cheaper than tape drives and 1000x faster than tapes
> 	for live backups
> 
> 	- will never touch a tape backup again ... too sloow
> 	and too unreliable no matter how clean the tape heads are
> 		( too slow being the key problem for restoring )

C'mon, Alvin.  Sometimes this is a workable solution, sometimes it just
plain is not.  What about archival storage?  What about offsite storage?
What about just plain moving certain data around (where networks of ANY
sort might be held to be untrustworthy).  What about due diligence if
you were are corporate IT exec held responsible for protecting client
data against loss where the data was worth real money (as in millions to
billions) compared to the cost of archival media and mechanism?  "never
touch a tape backup again" is romantic and passionate, but not
necessarily sane or good advice for the vast range of humans out there.

To backup a terabyte scale system, one needs a good automated tape
changer and a pile of tapes.  These days, this will (as Mark noted) cost
more than your original RAID, in all probability, although this depends
on how gold-plated your RAID is and whether or not you install two of
them and use one to backup the other.  I certainly don't have a tape
changer in my house as it would cost more than my server by a factor
of two or three to set up.  I backup key data by spreading it around on
some of the massive amounts of leftover disk that accumulates in any LAN
of systems in days where the smallest drives one can purchase are 40-60
GB but install images take at most a generous allotment of 5 GB
including swap.

In the physics department, though, we are in the midst of a perpetual
backup crisis, because it IS so much more expensive than storage and our
budget is limited.  Our primary department servers are all RAID and
total (IIRC) over a TB and growing.  We do actually back up to disk
several times a day so that most file restores for dropped files take at
most a few seconds to retrieve (well, more honestly a few minutes of FTE
labor between finding the file and putting it back in a user's home
directory).  However, we ALSO very definitely make tape backups using a
couple of changers, keep offsite copies and long term archives, and give
users tapes of special areas or data on request.  The tape system is
expensive, but a tiny fraction of the cost of the loss of data due to
(say) a server room fire, or a monopole storm, or a lightning strike on
the primary room feed that fries all the servers to toast.

I should also point out that since we've been using the RAIDs we have
experienced multidisk failures that required restoring from backup on
more than one occasion.  The book value probability for even one
occasion is ludicrously low, but the book value assumes event
independence and lies.  Disks are often bought in batches, and batches
of disk often fail (if they fail at all) en masse.  Failures are often
due to e.g. overheating or electrical problems, and these are often
common to either all the disks in an enclosure or all the enclosures in
a server room.

I don't think a sysadmin is ever properly paranoid about data loss until
they screw up and drop somebody's data for which they were responsible
because of inadequate backups.  Our campus OIT just dropped a big chunk
of courseware developed for active courses this fall because they
changed the storage system for the courseware without verifying their
backup, experienced a crash during the copy over, and discovered that
the backup was corrupt.  That's real money, people's effort, down the
drain.

Pants AND suspenders.  Superglue around the waistband, actually.  Who
wants to be caught with their pants down in this way?

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 09:16:43 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 09:16:43 -0400 (EDT)
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
Message-ID: <Pine.LNX.4.44.0310090818050.1253-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote:

> > 4) Put your performance whitepapers on your website, or it looks
> > fishy.
> 
> Our white papers are not on the Web they contain performance data, and
> particularly, performance data comparing against our competitors.  It
> may expose us to libel legal issues.  Putting legitimacy of any legal

Expose you to libel suits? Say what?

Only if you lie about your competitor's numbers (or "cook" them so that
they aren't an accurate reflection of their capabilities, as is often
done in the industry) does it expose you to libel charges or more likely
to the ridicule of the potential consumers (who tend to be quite
knowledgeable, like Greg).

One essential element to win those crafty consumers over is to compare
apples to apples, not apples to apples that have been picked green,
bruised, left on the ground for a while in the company of some handy
worms, and then picked up so you can say "look how big and shiny and red
and worm-free our apple is and how green and tiny and worm-ridden our
competitor's apple is".  A wise consumer is going to eschew BOTH of your
"display apples" (as your competitor will often have an equally shiny
and red apple to parade about and curiously bruised and sour apples from
YOUR orchard) and instead insist on wandering into the various orchards
to pick REAL apples from your trees for their OWN comparison.

What exactly prevents you from putting your own raw numbers up, without
any listing of your competitor's numbers?  You can claim anything you
like for your own product and it isn't libel.  False advertising,
possibly, but not libel.  Or put the numbers up with your competitor's
numbers up "anonymized" as A, B, C.  And nobody will sue you for beating
ATLAS/GCC/GSL numbers -- ATLAS etc are open source tools and nobody
"owns" them to sue you or cares in the slightest if you beat them. The
most that might happen is that if you manipulate(d) ATLAS numbers so
they aren't what real humans get on real systems, people might laugh at
you or more likely just ignore you thereafter.

What makes you any LESS liable to libel if you distribute the white
papers to (potential) customers individually?  Libel is against the law
no matter how, and to who, you distribute libelous material; it is
against the law even if shrouded in NDA. It is against the law if you
whisper it in your somebody's ears -- it is just harder to prove.
Benchmark comparisons, by the way, are such a common marketing tool (and
so easily bent to your own needs) that I honestly think that there is a
tacit agreement among vendors not to challenge competitors' claims in
court unless they are openly egregious, only to put up their own
competing claims.  After all, no sane company WOULD actually lie, right
-- they would have a testbed system on which they could run the
comparisons listed right there in court and everybody knows it.  Whether
the parameters, the compiler, the system architecture, the tests run
etc. were carefully selected so your product wins is moot -- if it ain't
a lie it ain't libel, and it is caveat emptor for the rest (and the rest
is near universal practice -- show your best side, compare to their
worst).

> issues aside, it is not good for any business to be engulf in legal
> squabbles.  We are in the process of clearing this with our legal
> department at the moment.
> 
> As I have noted in my previous e-mail, anyone who wants to get a hold
> of the white papers are welcome to please send me an e-mail.

As if your distributing them on a person by person basis is somehow less
libelous?  Or so that you can ask me to sign an NDA so that your
competitors never learn that you are libelling them?  I rather think
that an NDA that was written to protect illegal activity be it libel or
drug dealing or IP theft would not stand up in court.  Finally, product
comparisons via publically available benchmarks of products that are
openly for sale don't sound like trade secrets to me as I could easily
duplicate the results at home (or not) and freely publish them.

Your company's apparent desire to conceal this comes across remarkably
poorly to the consumer.  It has the feel of "Hey, buddy, wanna buy a
watch?  Come right down this alley so I can show you my watches where
none of the bulls can see" compared to an open storefront with your
watches on display to anyone, consumer or competitor.  This is simply my
own viewpoint, of course.  I've simply never heard of a company
shrinking away from making the statement "we are better than our
competitors and here's why" as early and often as they possibly could.
AMD routinely claims to be faster than Intel and vice versa, each has
numbers that "prove" it -- for certain tests that just happen to be the
tests that they tout in their claims, which they can easily back up.
For all the rest of us humans, our mileage may vary and we know it, and
so we mistrust BOTH claims and test the performance of our OWN programs
on both platforms to see who wins.  

I'm certain that the same will prove true for your own product.  I don't
care about your benchmarks except as a hook to "interest" me.  Perhaps
they will convince me to get you to loan me access to your libraries etc
to link them into my own code to see if MY code speeds up relative to
the way I have it linked now, or relative to linking with a variety of
libraries and compilers.  Then I can do a real price/performance
comparison and decide if I'm better off buying your product (and buying
fewer nodes) or using an open source solution that is free (and buying
more nodes).  Which depends on the scaling properties of MY application,
costs, and so forth, and cannot be predicted on the basis of ANY paper
benchmark.

Finally, don't assume that this audience is naive about benchmarking or
algorithms, or at all gullible about performance numbers and vendor
claims.  A lot of people on the list (such as Greg) almost certainly
have far more experience with benchmarks than your development staff;
some are likely involved in WRITING benchmarks.  If you want to be taken
seriously, put up a full suite of benchmarks, by all means, and also
carefully indicate how those benchmarks were run as people will be
interested in duplicating them and irritated if they are unable to.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jsims at csiopen.com  Thu Oct  9 09:02:11 2003
From: jsims at csiopen.com (Joey Sims)
Date: Thu, 9 Oct 2003 09:02:11 -0400
Subject: building a RAID system - 8 drives  
Message-ID: <812B16724C38EE45A802B03DD01FD5472A3BF4@exchange.concen.com>

300GB Maxtor ATA133 5400RPM drives are the largest currently available.  250GB is the largest SATA currently.  You can achieve 2TB in a 1U by using a drive sled that will hold two drives.  The drives are mounted opposing each other and share a backplane.  This is a proprietary solution.  Or, if you have a chassis with 4 external trays and a few internal 3.5" bays it could be done.  I personally don't believe cramming this many drives in a 1U is a good idea.  Increased heat due to lack of airflow would have to decrease the lifespan of the drives.

----------------------------------------------------
|Joey P. Sims		          800.995.4274 x 242
|Sales Manager		          770.442.5896 - Fax
|HPC/Storage Division	           jsims at csiopen.com
|Concentric Systems,Inc.             www.csilabs.net
----------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Thu Oct  9 07:02:57 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Thu, 09 Oct 2003 07:02:57 -0400
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.3.96.1031009042433.22865A-100000@Maggie.Linux-Consulting.com>
References: <Pine.LNX.3.96.1031009042433.22865A-100000@Maggie.Linux-Consulting.com>
Message-ID: <3F854061.3040208@lmco.com>

Alvin Oga wrote:

>
> On Thu, 9 Oct 2003, Jeff Layton wrote:
>
> > > > How do you stick 2TB in a 1 U server?  I've seen 1U cases with four
> > > IDE bays,
> > > > and the largest IDE drive I've seen is 250 GB.
> > >
> > > 8 drives ... 250GB or 300GB each ..
> > >
> >
> >    Cool. Do you have pictures? How do you get the other 4 drives
> > out? I assume they're not accessible from the front so do you
> > have to pull the unit out, pop the cover and replace the drive?
>
> yup.. pull the cover off and pop out the drive the hard way vs
> "hot swap ide tray"
>
>         autocad generated *.jpg file
>         http://linux-1u.net/Dwg/jpg.sm/c2500.jpg
>
>         ( newer version has the mb and ps swapped for better cpu cooling)
>         http://linux-1u.net/Dwg/jpg.sm/c2610.jpg ( also holds 8 drives )
>
> > > i/we claim that if the drives fail, something is wrong ... its not
> > > necessary for the disks to be removable
> > >
> >
> >    Are you saying that it's not necessary to have hot-swappable
> > drives? (I'm just trying to undertand your point).
>
> if the drive is dying ....
>         - find out which brand/model# it is and avoid it
>         - find out if others are having similar problems
>         - put a 40x40x20mm fan on the (7200rpm) disks and see if it helps
>
> i'm not convinced that hotswap ide works w/o special ide controllers
>         - pull the ide disk out while its powered up
>         - pull the ide disk out while you're writing a 2GB file to it
>
>         - or insert the disk while the rest of the systme is up and
>         running
>
> if you have to power down to take the ide disk out, you might as
> well do a clean shutdown  and replace the disk the hard way with
> a screw driver instead of nice ($50 expensive) drive bay handle
>         $ 50 can be an extra 80GB of disk space when a good sale
>         is occuring at the local fries stores 
>

   We've got several NAS boxes with hot-swappable IDE drives
and without it we'd be toast. Granted the controller is specialized,
coming from one vendor, but it allows us to have a fail-over
drive with auto-rebuild in the background. Then we just pull
the bad drive, put in a new one, and designate it as the new hot
spare. Works great! It's saved our bacon a few times. I've
wanted to test hot-swap with 3ware controllers, but have
never done it. Has anyone tested the hotswap capability of
the 3ware controllers/cases?
   Another comment. If you have to pull the node to replace
the drive, then you have to bring down the filesystem which
might not be the best thing to do. Hot-swapping allows the
filesystem to keep functioning, albeit at a lower performance
level.

> >    Does everyone remember this:
> >
> > http://www.tomshardware.com/storage/20030425/index.html
> >
> > My only problem with this approach is off-site storage of
> > backups. Do you pull a huge number of drives and move them
> > off-site? (I still love the idea of using inexpensive drives for
> > backup instead of tape though).
>
> i suppose you can do "incremental" backups across the wire ...
> and "inode" based backups too ...
>
>         - it'd be crazy to xfer the entire 1MB file if
>         only 1 line changed in it
>

   We can't do backups across the wire to an offsite storage
facility. So we have to do backups, pull the tapes, and store
them off-site. I'm just not sure how this would work with
disks instead of tapes. Oh, you can full and incremental backups
to disk - most backup software doesn't care what the media
is anyway - but I'm just not sure if you pull a set of disks
and store them. How does off-site backup recovery work?
Do you pop them in, mount them as read-only, and copy them
to a live filesystem? However, despite all of these questions,
at some point soon, disk will be the only way to get backups
of LARGE filesystems in a reasonable amount of time.

Jeff

-- 
Dr. Jeff Layton
Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 07:36:40 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 04:36:40 -0700 (PDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <3F8538E4.9020400@lmco.com>
Message-ID: <Pine.LNX.3.96.1031009042433.22865A-100000@Maggie.Linux-Consulting.com>


On Thu, 9 Oct 2003, Jeff Layton wrote:

> > > How do you stick 2TB in a 1 U server?  I've seen 1U cases with four 
> > IDE bays,
> > > and the largest IDE drive I've seen is 250 GB.
> >
> > 8 drives ... 250GB or 300GB each ..
> >
> 
>    Cool. Do you have pictures? How do you get the other 4 drives
> out? I assume they're not accessible from the front so do you
> have to pull the unit out, pop the cover and replace the drive?

yup.. pull the cover off and pop out the drive the hard way vs
"hot swap ide tray"

	autocad generated *.jpg file
	http://linux-1u.net/Dwg/jpg.sm/c2500.jpg

 	( newer version has the mb and ps swapped for better cpu cooling)
	http://linux-1u.net/Dwg/jpg.sm/c2610.jpg ( also holds 8 drives )

> > i/we claim that if the drives fail, something is wrong ... its not
> > necessary for the disks to be removable
> >
> 
>    Are you saying that it's not necessary to have hot-swappable
> drives? (I'm just trying to undertand your point).

if the drive is dying .... 
	- find out which brand/model# it is and avoid it
 	- find out if others are having similar problems
	- put a 40x40x20mm fan on the (7200rpm) disks and see if it helps

i'm not convinced that hotswap ide works w/o special ide controllers
	- pull the ide disk out while its powered up
	- pull the ide disk out while you're writing a 2GB file to it

	- or insert the disk while the rest of the systme is up and
	running

if you have to power down to take the ide disk out, you might as
well do a clean shutdown  and replace the disk the hard way with
a screw driver instead of nice ($50 expensive) drive bay handle
	$ 50 can be an extra 80GB of disk space when a good sale 
	is occuring at the local fries stores	

>    Does everyone remember this:
> 
> http://www.tomshardware.com/storage/20030425/index.html
> 
> My only problem with this approach is off-site storage of
> backups. Do you pull a huge number of drives and move them
> off-site? (I still love the idea of using inexpensive drives for
> backup instead of tape though).

i suppose you can do "incremental" backups across the wire ...
and "inode" based backups too ... 

	- it'd be crazy to xfer the entire 1MB file if
	only 1 line changed in it

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Thu Oct  9 08:24:20 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Thu, 9 Oct 2003 14:24:20 +0200 (CEST)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310091414440.29108-100000@kenzo.iwr.uni-heidelberg.de>

On Wed, 8 Oct 2003, Mark Hahn wrote:

> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.

Yes, but the kernel might be dumb at times, like when splitting large 
requests into small pieces to be fed to the block subsystem just to be 
reassembled again before being sent to the disk :-)
Another issue is how this memory is used by the drive firmware. I've seen 
tests that show some Fujitsu SCSI disks (MAN or MAP series, IIRC) perform 
much better than competitors in multi-user situations (lots of different 
files accessed by different users, supposedly scattered on the disk) while 
the competitors were better at streaming media (one big file used by a 
single user, supposedly contiguously placed on disk).

> unless your workload is dominated by tiny, random seeks,

Or your file-system becomes full and thus fragmented. Been there, done 
that! I've had a big storage device changed from ext3 to XFS because ext3 
at about 50% fragmentation was horribly slow; XFS allows live (without
unmounting or mounting "ro") defragmentation.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 09:42:46 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 09:42:46 -0400 (EDT)
Subject: building a RAID system - yup - superglue
In-Reply-To: <Pine.LNX.3.96.1031009044052.22865B-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310090923090.1253-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Alvin Oga wrote:

> > Pants AND suspenders.  Superglue around the waistband, actually.  Who
> > wants to be caught with their pants down in this way?
> 
> always got bit by tapes... somebody didnt change the tape on the 13th
> a couple months ago ... and critical data is now found to be missing
> 	- people do forget to change tapes ... or clean heads...
> 	( thats the part i dont like about tapes .. and is the most
> 	( common failure mode for tapes ... easily/trivially avoided by
> 	( disks-to-disk backups
> 
> 	- people get sick .. people go on vacations .. people forget
> 
> 
> - no (similar) problems since doing disk-to-disk backups
> 	- and i usually have 3-6 months of full backups floating around
> 	in compressed form

All agreed.  And tapes aren't that permanent a medium either -- they
deteriorate on a timescale of years to decades, with data bleeding
through the film, dropped bits due to cosmic ray strikes,
depolymerization of the underlying tape itself.  Even before the tape
itself is unreadable, you are absolutely certain to be unable to find a
working drive to read it with.  I have a small pile of obsolete tapes in
my office -- tapes made with drives that no longer "exist", and that is
after dumping the most egregiously useless of them.

Still, I'd argue that the best system for many environments is to use
all three: RAID, real backup to (separate) disk, possibly a RAID as
well, and tape for offsite and archival purposes.  The first two layers
protect you against the TIME required to handle users accidentally
deleting files (the most common reason to access a backup) as retrieval
is usually nearly instantaneous and not at all labor intensive.  It also
protects you agains the most common single-server failures that get past
the protection of RAID itself (multidisk failures, blown controllers).
The tape (with periodic offsite storage) protects you against server
room fire, brownouts or spikes that cause immediate data corruption or
disk loss on both original and backup servers, and tapes can be saved
for years -- far longer than one typically can go back on a disk backup
mechanism.  Users not infrequently want to get at a file version they
had LAST YEAR, especially if they don't use CVS.  Finally, some research
groups generate data that exceeds even TB-scale disk resources -- they
constantly move data in and out of their space in GB-sized chunks.  They
often like to create their own tape library as a virtual extension of
the active space.  Tapes aren't only about backup.

So you engineer according to what you can afford and what you need,
making the usual compromises brought about by finite resources.

BTW, one point that hasn't been made in the soft vs hard RAID argument
is that with hard RAID you are subject to (proprietary) HARDWARE
obsolescence, which typically is more difficult to control than
software.  You build a RAID, populate it, use it.  After a few years,
the RAID controller itself dies (but the disks are still good).

Can you get another?  One that can actually retrieve the data on your
disks?  There are no guarantees.  Maybe the company that made your
controller is still in business (or rather, still in the RAID business).
Maybe they either still carry old models, or can do depot repair, or
maybe new models can still handle the raid encoding they implemented
with the old model.  Maybe you can AFFORD a new model, or maybe it has
all sorts of new features and costs 3x as much as the first one did
(which may not have been cheap).  Maybe it takes you weeks to find a
replacement and restore access to your data.

Soft RAID can have problems of its own (if the software for example
evolves to where it is no longer backwards compatible) but it is a whole
lot easier to cope with these problems and they are strictly under your
control.  You are very unlikely to have any "event" like the death of
the RAID server that prevents you from retrieving what is on the disks
(at a cost likely to be quite controllable and in a timely way) as long
as the disks themselves are not corrupted.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From michael.worsham at mci.com  Thu Oct  9 09:07:25 2003
From: michael.worsham at mci.com (Michael.Worsham)
Date: Thu, 09 Oct 2003 09:07:25 -0400
Subject: CAD
Message-ID: <000201c38e66$49b2aa40$94022aa6@Wcomnet.com>

My wife works for a construction/architecture firm and handles AutoCad files
like this all the time (some even larger at times, depending on the client).

One thing we looked at first was what platform they were running the AutoCad
on. Windows XP or 95/98 can't really handle Autocad as it is a highly
intensive CPU application. We had a similar 'old' layout where CAD machines
were more based as a word processing workstation than as a CAD station.
Given the amount of work this firm produced in a single day, we went for a
Dual Xeon P4 setup w/ 4 GB ram and 36 GB SCSI hard drives loaded with
Windows 2000 Pro Workstation. When deciding the P4 hardware platform, look
for boards that have PCI-X slots... esp for giganet NIC cards and if needed,
Hardware RAID SCSI adapters. Refrain from using ATA, esp since CAD likes to
really utilize the hard drives and ATA would most likely wear out faster.
(Though some might look that using Xeon is overkill, lets just say there are
many times it has come in handy when the customer shows up on-site
unexpectedly and wants to see a progress report or has changes to be added.
Pulling up the program and the data file in a couple of seconds rather than
several minutes makes a beliver out of you in an instant.)

If the file is being downloaded from a file server, using standard 10/100
via a cheap hub isn't going to cut it. Best to utilize something of a
10/100/1000 switch (ie. copper giganet) and 10/100/1000 NICs in each of the
machines. Make sure the card is set for FULL-DUPLEX to fully utilize the
bandwidth needed esp for downloading large files from the file server. Based
on the file server specs, its is similar to that of the workstations however
it is running Windows 2000 Advanced Server w/ Veritas Backup... can't be too
careful for DR measures, esp with CAD files of this caliber.

-- M

Michael Worsham
MCI/Intermedia Communications
System Administrator & Applications Engineer
Phone: 813-829-6845   Vnet: 838-6845
E-mail: michael.worsham at mci.com

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 07:47:55 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 04:47:55 -0700 (PDT)
Subject: building a RAID system - yup - superglue
In-Reply-To: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.3.96.1031009044052.22865B-100000@Maggie.Linux-Consulting.com>


hi ya robert

On Thu, 9 Oct 2003, Robert G. Brown wrote:

> On Wed, 8 Oct 2003, Alvin Oga wrote:
> 
> > 	- will never touch a tape backup again ... too sloow
> > 	and too unreliable no matter how clean the tape heads are
> > 		( too slow being the key problem for restoring )
> 
> C'mon, Alvin.  Sometimes this is a workable solution, sometimes it just
> plain is not.  What about archival storage?  What about offsite storage?
> What about just plain moving certain data around (where networks of ANY
> sort might be held to be untrustworthy).  What about due diligence if
> you were are corporate IT exec held responsible for protecting client
> data against loss where the data was worth real money (as in millions to
> billions) compared to the cost of archival media and mechanism?  "never
> touch a tape backup again" is romantic and passionate, but not
> necessarily sane or good advice for the vast range of humans out there.

yup .. maybe an oversimplied statement ... tapes are my (distant) 2nd
choice for backups of xx-Terabyte sized servers..
	disk-to-disk being my first choice
	( preferrably to 2 other similar sized machines )

	( it's obviously not across a network :-)

i randomly restore from backups and do a diff w/  the current
servers before it dies ..

> Pants AND suspenders.  Superglue around the waistband, actually.  Who
> wants to be caught with their pants down in this way?

always got bit by tapes... somebody didnt change the tape on the 13th
a couple months ago ... and critical data is now found to be missing
	- people do forget to change tapes ... or clean heads...
	( thats the part i dont like about tapes .. and is the most
	( common failure mode for tapes ... easily/trivially avoided by
	( disks-to-disk backups

	- people get sick .. people go on vacations .. people forget


- no (similar) problems since doing disk-to-disk backups
	- and i usually have 3-6 months of full backups floating around
	in compressed form

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Thu Oct  9 09:26:45 2003
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Thu, 9 Oct 2003 09:26:45 -0400 (EDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <3F854061.3040208@lmco.com>
Message-ID: <Pine.LNX.4.44.0310090843340.27761-100000@chaos.egr.duke.edu>

On Thu, 9 Oct 2003 at 7:02am, Jeff Layton wrote

> spare. Works great! It's saved our bacon a few times. I've
> wanted to test hot-swap with 3ware controllers, but have
> never done it. Has anyone tested the hotswap capability of
> the 3ware controllers/cases?

Yes, and it works just as advertised.  To add my $.05 to the discussion, 
I'm a pretty big fan of the 3wares -- I currently have 5TB of formatted 
space (with about 2TB of data) on them.  I have two servers with 2 cards 
and 16 drives in them, and one with 1 card and 8 drives.  On the two board 
servers, I run the 3wares in hardware RAID mode (R5 with a hot spare), and 
then do a software stripe across the two hardware arrays.  With the boards 
on separate PCI busses, this lets the stripe go faster than the 266MB/s 
that the boards are limited to (these are 7500 boards, which are 64/33).

3ware's 3DM also lets you monitor the status of your arrays (it's almost 
too verbose, actually), and do all sorts of online maintenance.  Not 
having used mdadm much, I can't really compare the functionality of the 
two.  A couple of nice features of 3DM is that it lets you schedule array 
verification and background disk scanning, which can find problems before 
they affect the array.

I'm not sure what cases or backplane these systems use (I bought 'em from 
Silicon Mechanics, who I highly recommend), but the hot swap has always 
just worked.

If anyone's interested, I have benchmarks (bonnie++ and tiobench) of one 
of the 2 board systems using pure software RAID as well as the setup 
above.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 10:09:56 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 10:09:56 -0400 (EDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.3.96.1031009042433.22865A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310091003550.2897-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Alvin Oga wrote:

> > My only problem with this approach is off-site storage of
> > backups. Do you pull a huge number of drives and move them
> > off-site? (I still love the idea of using inexpensive drives for
> > backup instead of tape though).
> 
> i suppose you can do "incremental" backups across the wire ...
> and "inode" based backups too ... 
> 
> 	- it'd be crazy to xfer the entire 1MB file if
> 	only 1 line changed in it

   http://rdiff-backup.stanford.edu/

The name says it all.  I believe it is built on top of rsync -- at any
rate it is distributed in an rpm named librsync.

Awesome tool -- creates a mirror, then saves incremental compressed
diffs.  It is the way we can restore so quickly and yet maintain a
decent archival/historical backup where a user CAN request file X from
last friday (or even the version between the hours of midnight and noon
on last friday).  Efficient enough to run several times a day on the
most active part of your space and not eat a hell of a lot of either
disk or network BW.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Thu Oct  9 09:48:10 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Thu, 09 Oct 2003 09:48:10 -0400
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>
Message-ID: <1065707290.4708.28.camel@protein.scalableinformatics.com>

On Thu, 2003-10-09 at 07:26, Robert G. Brown wrote:

> users tapes of special areas or data on request.  The tape system is
> expensive, but a tiny fraction of the cost of the loss of data due to
> (say) a server room fire, or a monopole storm, or a lightning strike on
> the primary room feed that fries all the servers to toast.

Monopole storm... (smile) I seem to remember (old bad and likely wrong
memory) that Max Dresden had predicted one monopole per universe as a
consequence of the standard model.  Not my area of (former) expertise,
so reality may vary from my memory ...

[...]

> I don't think a sysadmin is ever properly paranoid about data loss until
> they screw up and drop somebody's data for which they were responsible
> because of inadequate backups.  Our campus OIT just dropped a big chunk

I always ask my customers a simple question:  What is the cost to you to
recreate all the data you lost when your disk/tape dies?  That is I tend
to recommend multiple redundant systems for backup.  I also like to
point out that you can build a single point of failure into any system,
and the cost of recovering from that failure needs to be considered when
designing systems to back up the possibly failing systems.  

If you backup all your systems over the network, and your network dies,
are you in a bad way when you need to restore?  What about, if you back
up everything to a single tape drive, and the drive dies (and you need
your backup).

Single points of failure are critical to identify.  They are also
critical to estimate impact from.  Most folks have a backup solution of
some sort.  Some of them are even reasonable, though few of them are
about to withstand a single failure in a critical component.

My old research group has a tape changer robot and drive from a well
known manufacturer.  Said well known manufacturer recently told them
that since the unit was EOLed about 2 years ago, there would be no more
fixes available for it.  They (the research group) told me that they
were having trouble with it... 

One tape drive, one point of failure.  Tape drive company is happy
because you now have to drop a chunk of change on their new units, or
scour eBay for old ones.


-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Thu Oct  9 08:30:45 2003
From: angel at wolf.com (Angel Rivera)
Date: Thu, 09 Oct 2003 12:30:45 GMT
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com> 
References: <Pine.LNX.3.96.1031009030435.8057A-100000@Maggie.Linux-Consulting.com>
Message-ID: <20031009123045.7582.qmail@houston.wolf.com>

Alvin Oga writes: 

> 
> On Thu, 9 Oct 2003, Trent Piepho wrote: 
> 
>> On Thu, 9 Oct 2003, Alvin Oga wrote:

> nobody swaps disks around ... unless one is using those 5.25" drive bay
> thingies in which case ... thats a different ball game 

No quite true. We use Rare drives (one box) to move up to a TB of data 
around w/o having to take the time to create tapes and then download them. 
That takes a lot of time, even w/ LTOs. 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rtomek at cis.com.pl  Thu Oct  9 10:22:53 2003
From: rtomek at cis.com.pl (Tomasz Rola)
Date: Thu, 9 Oct 2003 16:22:53 +0200 (CEST)
Subject: PocketPC Cluster
In-Reply-To: <200310090947.33601.csamuel@vpac.org>
Message-ID: <Pine.LNX.3.96.1031009141233.7777A-100000@pioneer.space.nemesis.pl>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 9 Oct 2003, Chris Samuel wrote:

> Not strictly a Beowulf as there's no Linux, but interesting nonetheless. :-)
> 
> IrDA for the networking, 11 compute + 1 management, slower than "a mainstream 
> Pentium II-class desktop PC" (they don't specify what spec).
> 
> http://www.spbsoftwarehouse.com/dev/articles/pocketcluster/index.html

Yes, it's nice of course. One can also build such cluster with
Linux-based devices:

http://www.handhelds.org/

I myself would like to see if the performance changes after switching to
Linux.

One thing that should be considered is cooling. On my iPAQ, when cpu load
gets too high for too long, the joy button warms itself. This means, cpu
is even more heated.

The other issue is power consumption. If I understand what SBP did, they
run the cluster on electricity from the wall, not from the battery. My own
observetion suggests, that running high load on battery consumes about 2-3
times more power than things like reading html files.

- From the performance side, I wonder how this compares to the following
page:

http://www.applieddata.net/design_Benchmark.asp

which suggests StrongARM SA 1100 @200 is 3x faster than Pentium @166?

I was interested myself, so I ran the quick test on my own iPAQ 3630 (SA
1110 @206) and on AMD-k6-2 @475.

On iPAQ:

- -bash-2.05b# `which time` -p python /tmp/erasieve.py --limit 1000 --quiet
real 0.94
user 0.91
sys 0.04

On K6:

=>  (1020 29): /usr/bin/time -p  erasieve.py --limit 1000 --quiet
real 0.51
user 0.49
sys 0.02

So, how can 12 PocketPCs be slower than 1 p2 (with no clock given at
all, but if I remember they were about 500MHz at best)? If I haven't
misunderstood something, they probably didn't tuned their experiment too
well.

BTW, most PDA cpus lack fpu. So, while such claster may be nice to ad-hoc
password breaking, with nanoscale simulation it will be rather the
opposite, I think.

bye
T.

- --
** A C programmer asked whether computer had Buddha's nature.      **
** As the answer, master did "rm -rif" on the programmer's home    **
** directory. And then the C programmer became enlightened...      **
**                                                                 **
** Tomasz Rola          mailto:tomasz_rola at bigfoot.com             **

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
Charset: noconv

iQA/AwUBP4VvRBETUsyL9vbiEQJfvwCeLU3/270BajC74e+r2HEKs27QoXgAn0fP
C8FHl6mDchvmMBr04oWioqg0
=wFOr
-----END PGP SIGNATURE-----


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 10:32:12 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 10:32:12 -0400 (EDT)
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310091329030.18109@krylov.OptimaNumerics.com>
Message-ID: <Pine.LNX.4.44.0310091016500.2897-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote:

> Robert,
> 
> You covered some of the issues that we are addressing with our lawyers
> right now.  It's a process which, as knowledgeable as you are, I am
> sure you can understand we have to go through.

The comparison, sure, go through the process.  Putting your own numbers
up, no, I cannot see why you need lawyers to tell you you can do this.
How can somebody sue you for putting up the results of your own
good-faith tests of your own product?  There wouldn't be a manufacturer
in existence not bogged down in court if you could (successfully) sue
Tide for claiming that it gets clothes cleaner and removes stains when
the first time you wash a shirt with it the shirt remains dirty and
stains don't come out, for example.  Why, I myself would quit work and
live on the proceeds of my many suits, if every product out there had to
strictly live up to its claims.

The most recourse the consumer has is to not buy Tide (or whatever other
detergent offendeth thee, nothing against Tide but there are plenty of
stains NO detergent removes except maybe xylene or fuming nitric acid
based ones:-).  Or, if they are really irritated -- it is a GRASS stain
and the Tide ad on TV last night shows Tide succeeding against GRASS
stains in particular -- they can take the box back to the store and
likely get their money back. But sue Tide?  Only in Ralph Nader's
dreams...

Caveat emptor is more than a latin phrase, it is a principle of law.
You have to look at the horse's teeth yourself, or don't blame the
vendor for claiming that the old nag they sold you was really a young
and vibrant horse.  To them perhaps it was -- it is a question of just
what an old nag is (opinion) vs the age of the horse as indicated by its
teeth (fact).  Only if the claims are egregious (this here snake oil
will cause hair to grow on your head, cure erectile dysfunction, and
make you smell nice all for the reasonable price of a dollar a bottle)
is there any likelihood of grievance that might be addressed.

Surely your claims aren't egregious.  Your product doesn't slice, dice,
and even eat your meatloaf for you...does it?;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Thu Oct  9 11:48:02 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Thu, 09 Oct 2003 11:48:02 -0400
Subject: [Fwd: [Bioclusters] 2004 Bioclusters Workshop 1st Announcement --
	March 2004, Boston MA USA]
Message-ID: <1065714482.4713.73.camel@protein.scalableinformatics.com>

-----Forwarded Message-----
> =======================================================================
>               MEETING ANNOUNCEMENT / CALL FOR PRESENTERS
> =======================================================================
>                      BIOCLUSTERS 2004 Workshop
>                           March 30, 2004
>                  Hynes Convention Center, Boston MA USA
> =======================================================================
> 
>     * Speakers Wanted - Please Distribute Where Appropriate *
> 
>   Organized by several members of the bioclusters at bioinformatics.org
>   mailing list, the Bioclusters 2004 Workshop is a networking and
>   educational forum for people involved in all aspects of cluster and
>   grid computing within the life sciences.
> 
>   The motivation for organizers of this event was the cancellation of the
>   O'Reilly Bioinformatics Technology Conference series and the general
>   lack of forums for researchers and professionals involved with the
>   applied use of high performance IT and distributed computing techniques
>   in the life sciences.
> 
>   The primary focus of the workshop will be technical presentations from
>   experienced IT professionals and scientific researchers discussing real
>   world systems, solutions, use-cases and best practices.
> 
>   This event is being held onsite at the Hynes Convention Center on the
>   first day of the larger 2004 Bio-IT World Conference+Expo. BioIT-World
>   Magazine is generously providing space and logistical support for the
>   meeting and workshop attendees will have access to the expo floor and
>   keynote addresses. Registration & fees will be finalized in short
>   order.
> 
>   Presentations will be broken down among a few general content areas:
> 
>   1.	Researcher, Application & End user Issues
>   2.	Builder, Scaling & Integration Issues
>   3.	Future Directions
> 
>   The organizing committee is actively soliciting presentation proposals
>   from members of the life science and technical computing communities.
>   Interested parties should contact the committee at bioclusters04 at open-
>   bio.org.
> 
> 
>   Bioclusters 2004 Workshop Committee Members
> 
>   J.W Bizzaro ? Bioinformatics Organization Inc.
>   James Cuff  - MIT/Harvard Broad Institute
>   Chris Dwan  - The University of Minnesota
>   Chris Dagdigian ? Open Bioinformatics Foundation & BioTeam Inc.
>   Joe Landman ? Scalable Informatics LLC
> 
>   The committee can be reached at: bioclusters04 at open-bio.org
> 
> 
>   About the Bioclusters Mailing List Community
> 
>   The bioclusters at bioinformatics.org mailing list is a 600+ member forum
>   for users, builders and programmers of distributed systems used in life
>   science research and bioinformatics. For more information about the
>   list including the public archives and subscription information please
>   visit http://bioinformatics.org/mailman/listinfo/bioclusters
> 

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Thu Oct  9 10:35:16 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Thu, 09 Oct 2003 10:35:16 -0400
Subject: building a RAID system - yup - superglue
In-Reply-To: <Pine.LNX.4.44.0310090923090.1253-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310090923090.1253-100000@lilith.rgb.private.net>
Message-ID: <3F857224.9040801@lmco.com>

Robert G. Brown wrote:

> On Thu, 9 Oct 2003, Alvin Oga wrote:
>
> > > Pants AND suspenders.  Superglue around the waistband, actually.  Who
> > > wants to be caught with their pants down in this way?
> >
> > always got bit by tapes... somebody didnt change the tape on the 13th
> > a couple months ago ... and critical data is now found to be missing
> >       - people do forget to change tapes ... or clean heads...
> >       ( thats the part i dont like about tapes .. and is the most
> >       ( common failure mode for tapes ... easily/trivially avoided by
> >       ( disks-to-disk backups
> >
> >       - people get sick .. people go on vacations .. people forget
> >
> >
> > - no (similar) problems since doing disk-to-disk backups
> >       - and i usually have 3-6 months of full backups floating around
> >       in compressed form
>
> All agreed.  And tapes aren't that permanent a medium either -- they
> deteriorate on a timescale of years to decades, with data bleeding
> through the film, dropped bits due to cosmic ray strikes,
> depolymerization of the underlying tape itself.  Even before the tape
> itself is unreadable, you are absolutely certain to be unable to find a
> working drive to read it with.  I have a small pile of obsolete tapes in
> my office -- tapes made with drives that no longer "exist", and that is
> after dumping the most egregiously useless of them.
>
> Still, I'd argue that the best system for many environments is to use
> all three: RAID, real backup to (separate) disk, possibly a RAID as
> well, and tape for offsite and archival purposes. 
>

I can say with some authority that this is what we at Lockheed
Aeronautics do. And rather than extend this email by quoting
Bob below, we also have an HSM system that we use for data
we may need in the next couple of years.

Jeff


-- 
Dr. Jeff Layton
Chart Monkey - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Thu Oct  9 07:59:54 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Thu, 9 Oct 2003 07:59:54 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081940020.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0310090749050.19503-100000@ra.thebes>


I would also echo most of Mark's points aside from the 8 MB cache issue.  
I have seen some noticeable speed improvements using 2 MB vs 8 MB drives.

I would also offer one other point.  No matter whether you use SCSI or 
IDE drives, be absolutely certain that you keep the drives cool.  The 
"internal" 3.5 bays in most cases are normally useless because they place 
several drives in almost direct contact.  The drive(s) sandwiched in the 
middle have only their edges exposed to air and have to dissipate the bulk 
of their heat through the neighboring drives.  I like mount the drives in 
5.25 bays.  This at least provides an air gap for some cooling.  For large 
raid servers, I like to use the cheap fan coolers.  They can be had for $5 
- $8 each and include 2 or 3 small fans that fill in the 5.25 opening and 
the 5.25-to-3.5 mounting brackets.  Of course, that makes for a lot of fan 
noise.

We typically build 2 identical raid servers connected by a dedicated
gigabit link to do nightly backups, both to protect from raid failure and
user error.

I would like to ask if anyone has investigated Benjamin LaHaise netmd
application yet?

http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf

I think there was some discussion of it a few months ago, but I haven't 
seen anything lately.

Thanks,

Mike Prinkey
Aeolus Research, Inc.

On Wed, 8 Oct 2003, Mark Hahn wrote:

> > 	- get those drives w/ 8MB buffer disk cache
> 
> what reason do you have to regard 8M as other than a useless
> marketing feature?  I mean, the kenel has a cache that's 100x
> bigger, and a lot faster.
> 
> > 	- slower rpm disks ... usually it tops out at 7200rpm
> 
> unless your workload is dominated by tiny, random seeks,
> the RPM of the disk isn't going to be noticable.
> 
> > 	- it supposedly can sustain 133MB/sec transfers
> 
> it's not hard to saturate a 133 MBps PCI with 2-3 normal IDE
> disks in raid0.  interestingly, the chipset controller is normally
> not competing for the same bandwidth as the PCI, so even with 
> entry-level hardware, it's not hard to break 133.
> 
> > 	- if you use software raid, you can monitor the raid status
> 
> this is the main and VERY GOOD reason to use sw raid.
> 
> > 	- some say scsi disks are faster ... 
> 
> usually lower-latency, often not higher bandwidth.  interestingly,
> ide disks usually fall off to about half peak bandwidth on inner 
> tracks.  scsi disks fall off too, but usually less so - they 
> don't push capacity quite as hard.
> 
> > 	- it supposedly can sustain 320MB/sec transfers
> 
> that's silly, of course.  outer tracks of current disks run at 
> between 50 and 100 MB/s, so that's the max sustained.  you can even
> argue that's not really 'sustained', since you'll eventually get
> to slower inner tracks.
> 
> > independent of which raid system is built, you wil need 2 or 3
> > more backup systems to backup your Terabyte sized raid systems
> 
> backup is hard.  you can get 160 or 200G tapes, but they're almost 
> as expensive as IDE disks, not to mention the little matter of a 
> tape drive that costs as much as a server.  raid5 makes backup
> less about robustness than about archiving or rogue-rm-protection.
> I think the next step is primarily a software one - 
> some means of managing storage, versioning, archiving, etc...
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Thu Oct  9 09:34:56 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Thu, 9 Oct 2003 13:34:56 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.44.0310090818050.1253-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310090818050.1253-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.56.0310091329030.18109@krylov.OptimaNumerics.com>

Robert,

You covered some of the issues that we are addressing with our lawyers
right now.  It's a process which, as knowledgeable as you are, I am
sure you can understand we have to go through.


Ken
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------


On Thu, 9 Oct 2003, Robert G. Brown wrote:

> Date: Thu, 9 Oct 2003 09:16:43 -0400 (EDT)
> From: Robert G. Brown <rgb at phy.duke.edu>
> To: C J Kenneth Tan -- Heuchera Technologies <cjtan at OptimaNumerics.com>
> Cc: Greg Lindahl <lindahl at keyresearch.com>, beowulf at beowulf.org
> Subject: Re: Intel compilers and libraries
>
> On Thu, 9 Oct 2003, C J Kenneth Tan -- Heuchera Technologies wrote:
>
> > > 4) Put your performance whitepapers on your website, or it looks
> > > fishy.
> >
> > Our white papers are not on the Web they contain performance data, and
> > particularly, performance data comparing against our competitors.  It
> > may expose us to libel legal issues.  Putting legitimacy of any legal
>
> Expose you to libel suits? Say what?
>
> Only if you lie about your competitor's numbers (or "cook" them so that
> they aren't an accurate reflection of their capabilities, as is often
> done in the industry) does it expose you to libel charges or more likely
> to the ridicule of the potential consumers (who tend to be quite
> knowledgeable, like Greg).
>
> One essential element to win those crafty consumers over is to compare
> apples to apples, not apples to apples that have been picked green,
> bruised, left on the ground for a while in the company of some handy
> worms, and then picked up so you can say "look how big and shiny and red
> and worm-free our apple is and how green and tiny and worm-ridden our
> competitor's apple is".  A wise consumer is going to eschew BOTH of your
> "display apples" (as your competitor will often have an equally shiny
> and red apple to parade about and curiously bruised and sour apples from
> YOUR orchard) and instead insist on wandering into the various orchards
> to pick REAL apples from your trees for their OWN comparison.
>
> What exactly prevents you from putting your own raw numbers up, without
> any listing of your competitor's numbers?  You can claim anything you
> like for your own product and it isn't libel.  False advertising,
> possibly, but not libel.  Or put the numbers up with your competitor's
> numbers up "anonymized" as A, B, C.  And nobody will sue you for beating
> ATLAS/GCC/GSL numbers -- ATLAS etc are open source tools and nobody
> "owns" them to sue you or cares in the slightest if you beat them. The
> most that might happen is that if you manipulate(d) ATLAS numbers so
> they aren't what real humans get on real systems, people might laugh at
> you or more likely just ignore you thereafter.
>
> What makes you any LESS liable to libel if you distribute the white
> papers to (potential) customers individually?  Libel is against the law
> no matter how, and to who, you distribute libelous material; it is
> against the law even if shrouded in NDA. It is against the law if you
> whisper it in your somebody's ears -- it is just harder to prove.
> Benchmark comparisons, by the way, are such a common marketing tool (and
> so easily bent to your own needs) that I honestly think that there is a
> tacit agreement among vendors not to challenge competitors' claims in
> court unless they are openly egregious, only to put up their own
> competing claims.  After all, no sane company WOULD actually lie, right
> -- they would have a testbed system on which they could run the
> comparisons listed right there in court and everybody knows it.  Whether
> the parameters, the compiler, the system architecture, the tests run
> etc. were carefully selected so your product wins is moot -- if it ain't
> a lie it ain't libel, and it is caveat emptor for the rest (and the rest
> is near universal practice -- show your best side, compare to their
> worst).
>
> > issues aside, it is not good for any business to be engulf in legal
> > squabbles.  We are in the process of clearing this with our legal
> > department at the moment.
> >
> > As I have noted in my previous e-mail, anyone who wants to get a hold
> > of the white papers are welcome to please send me an e-mail.
>
> As if your distributing them on a person by person basis is somehow less
> libelous?  Or so that you can ask me to sign an NDA so that your
> competitors never learn that you are libelling them?  I rather think
> that an NDA that was written to protect illegal activity be it libel or
> drug dealing or IP theft would not stand up in court.  Finally, product
> comparisons via publically available benchmarks of products that are
> openly for sale don't sound like trade secrets to me as I could easily
> duplicate the results at home (or not) and freely publish them.
>
> Your company's apparent desire to conceal this comes across remarkably
> poorly to the consumer.  It has the feel of "Hey, buddy, wanna buy a
> watch?  Come right down this alley so I can show you my watches where
> none of the bulls can see" compared to an open storefront with your
> watches on display to anyone, consumer or competitor.  This is simply my
> own viewpoint, of course.  I've simply never heard of a company
> shrinking away from making the statement "we are better than our
> competitors and here's why" as early and often as they possibly could.
> AMD routinely claims to be faster than Intel and vice versa, each has
> numbers that "prove" it -- for certain tests that just happen to be the
> tests that they tout in their claims, which they can easily back up.
> For all the rest of us humans, our mileage may vary and we know it, and
> so we mistrust BOTH claims and test the performance of our OWN programs
> on both platforms to see who wins.
>
> I'm certain that the same will prove true for your own product.  I don't
> care about your benchmarks except as a hook to "interest" me.  Perhaps
> they will convince me to get you to loan me access to your libraries etc
> to link them into my own code to see if MY code speeds up relative to
> the way I have it linked now, or relative to linking with a variety of
> libraries and compilers.  Then I can do a real price/performance
> comparison and decide if I'm better off buying your product (and buying
> fewer nodes) or using an open source solution that is free (and buying
> more nodes).  Which depends on the scaling properties of MY application,
> costs, and so forth, and cannot be predicted on the basis of ANY paper
> benchmark.
>
> Finally, don't assume that this audience is naive about benchmarking or
> algorithms, or at all gullible about performance numbers and vendor
> claims.  A lot of people on the list (such as Greg) almost certainly
> have far more experience with benchmarks than your development staff;
> some are likely involved in WRITING benchmarks.  If you want to be taken
> seriously, put up a full suite of benchmarks, by all means, and also
> carefully indicate how those benchmarks were run as people will be
> interested in duplicating them and irritated if they are unable to.
>
>    rgb
>
> --
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Thu Oct  9 10:57:21 2003
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Thu, 09 Oct 2003 09:57:21 -0500
Subject: building a RAID system
In-Reply-To: <1065642419.9483.55.camel@qeldroma.cttc.org>
References: <1065642419.9483.55.camel@qeldroma.cttc.org>
Message-ID: <3F857751.4090009@tamu.edu>

I've recently built a 2TB (well, a little less really) ATA RAID using a 
pair of HighPoint 374 controlers and 10 250-GB Maxtor 8 MB cache drives 
(plus a 60 GB drive for the system).

It's running as 2 1TB arrays, because of disparate applications, right 
now.

Initially, the drivers for RH9 were not available so we started with 
RH7.3 and all the updates; they're there now and  and allow cross-card 
arrays.  Down the pike we might re-install and span the controllers.

I've also recently done a 2-drive striped array supporting a meteorology 
data application with a lot of data acquisition and database work.  It's 
mounted to a number of other systems via NFS.  Uses a Promise 
Technologies TX2000 and a pair of 80 GB Maxtors.

Both RAID systems have worked very well.  I suspect the next one I build 
will incorporate Serial ATA instead of parallel.  I doubt I'll build 
another SCSI RAID for my applications.

Gerry Creager
Texas Mesonet
Texas A&M University

Daniel Fernandez wrote:
> Hi,
> 
> I would like to know some advice about what kind of technology apply
> into a RAID file server ( through NFS ) . We started choosing hardware
> RAID to reduce cpu usage.
> 
> We have two options , SCSI RAID and ATA RAID. The first would give the
> best results but on the other hand becomes really expensive so we have
> in mind two ATA RAID controllers:
> 
>                 Adaptec 2400A
> 		3Ware 6000/7000 series controllers
> 
> Any one of these has its strong and weak points, after seeing various
> benchmarks/comparisons/reviews these are the only candidates that
> deserve our attention.
> 
> The server has a dozen of client workstations connected through a
> switched 100Mbit LAN , all of these equipped with it's own OS and
> harddisk, all home directories will be stored under the main server,
> main workload (compilation and edition) would be done on the local
> machines tough, server only takes care of file sharing.
> 
> Also parallel MPI executions will be done between the clients.
> 
> Considering that not all the workstantions would be working full time
> and with cost in mind ? it's worth an ATA RAID solution ?
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 10:39:48 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 10:39:48 -0400 (EDT)
Subject: building a RAID system - yup
In-Reply-To: <1065707290.4708.28.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.44.0310091032570.2897-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Joseph Landman wrote:

> On Thu, 2003-10-09 at 07:26, Robert G. Brown wrote:
> 
> > users tapes of special areas or data on request.  The tape system is
> > expensive, but a tiny fraction of the cost of the loss of data due to
> > (say) a server room fire, or a monopole storm, or a lightning strike on
> > the primary room feed that fries all the servers to toast.
> 
> Monopole storm... (smile) I seem to remember (old bad and likely wrong
> memory) that Max Dresden had predicted one monopole per universe as a
> consequence of the standard model.  Not my area of (former) expertise,
> so reality may vary from my memory ...

Hell, there are more than that in California alone.  So far monopoles
have been discovered there at least twice; once on superconducting
niobium balls in a Milliken experiement (but they went away when the
balls were washed and never returned, go figure) and once in a
superconduction flux trap although the events MIGHT have been caused by
somebody flicking a light switch down the hall...:-)

Seriously, this is theory vs experiment, and as a theorist I firmly
defer to experiment.  Until we find an (isolated) monopole, they are
just a very attractive, compelling even, extension of Maxwell's
equations and related field theories that (as a "defect") help us
understand why certain quanties are quantized, or add a certain symmetry
to the theory that is otherwise broken.

However, it does amuse me to think of hard disks as being "experiments"
like the flux loop experiment to measure the existence of monopoles.  It
would be interesting to determine a "signature" of disk penetration by a
cosmic ray monopole and scan a small mountain of crashed disks for the
signature, if such a signature is in any way unique.  Such a mountain
represents a lot more event phase space than a single loop or set of
loops in a California laboratory.

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at yahoo.com  Thu Oct  9 12:08:20 2003
From: lathama at yahoo.com (Andrew Latham)
Date: Thu, 9 Oct 2003 09:08:20 -0700 (PDT)
Subject: Raid Deffinitions
Message-ID: <20031009160820.2217.qmail@web60304.mail.yahoo.com>

Discussing a client setup the other day a cohort and I came to a different
opinion on what each raid level does. Is there a guide/standard to define how
it should work. Also do any vendors stray from the beaten path and add there
own levels?

=====
Andrew Latham

Penguin loving, moralist agnostic.

LathamA.com - (lay-th-ham-eh)
lathama at lathama.com - lathama at yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Oct  9 14:24:22 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 9 Oct 2003 11:24:22 -0700
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
Message-ID: <20031009182422.GB1865@greglaptop.internal.keyresearch.com>

On Thu, Oct 09, 2003 at 10:04:20AM +0000, C J Kenneth Tan -- Heuchera Technologies wrote:

> Our white papers are not on the Web they contain performance data, and
> particularly, performance data comparing against our competitors.  It
> may expose us to libel legal issues.

Welcome to the Internet. In the US, that's not an issue, so we're used
to being able to get our performance data without having to ask a
human.  BTW, in the US, your lawyers would recommend that your "Up to
32X faster" claim would need a "results not typical" disclaimer.

> > I looked and didn't see a single performance claim there.
> 
> There is one on the front page!

Sorry, I should have said "didn't see a single credible performance
claim there". Bogus-looking claims do not help you sell to the HPC
market, either in the US or Europe.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dag at sonsorol.org  Thu Oct  9 11:42:53 2003
From: dag at sonsorol.org (chris dagdigian)
Date: Thu, 09 Oct 2003 11:42:53 -0400
Subject: 2004 Bioclusters Workshop 1st Announcement -- March 2004, Boston
 MA USA
Message-ID: <3F8581FD.3080404@sonsorol.org>

=======================================================================
              MEETING ANNOUNCEMENT / CALL FOR PRESENTERS
=======================================================================
                     BIOCLUSTERS 2004 Workshop
                          March 30, 2004
                 Hynes Convention Center, Boston MA USA
=======================================================================

    * Speakers Wanted - Please Distribute Where Appropriate *

  Organized by several members of the bioclusters at bioinformatics.org
  mailing list, the Bioclusters 2004 Workshop is a networking and
  educational forum for people involved in all aspects of cluster and
  grid computing within the life sciences.

  The motivation for organizers of this event was the cancellation of the
  O'Reilly Bioinformatics Technology Conference series and the general
  lack of forums for researchers and professionals involved with the
  applied use of high performance IT and distributed computing techniques
  in the life sciences.

  The primary focus of the workshop will be technical presentations from
  experienced IT professionals and scientific researchers discussing real
  world systems, solutions, use-cases and best practices.

  This event is being held onsite at the Hynes Convention Center on the
  first day of the larger 2004 Bio-IT World Conference+Expo. BioIT-World
  Magazine is generously providing space and logistical support for the
  meeting and workshop attendees will have access to the expo floor and
  keynote addresses. Registration & fees will be finalized in short
  order.

  Presentations will be broken down among a few general content areas:

  1.	Researcher, Application & End user Issues
  2.	Builder, Scaling & Integration Issues
  3.	Future Directions

  The organizing committee is actively soliciting presentation proposals
  from members of the life science and technical computing communities.
  Interested parties should contact the committee at bioclusters04 at open-
  bio.org.


  Bioclusters 2004 Workshop Committee Members

  J.W Bizzaro ? Bioinformatics Organization Inc.
  James Cuff  - MIT/Harvard Broad Institute
  Chris Dwan  - The University of Minnesota
  Chris Dagdigian ? Open Bioinformatics Foundation & BioTeam Inc.
  Joe Landman ? Scalable Informatics LLC

  The committee can be reached at: bioclusters04 at open-bio.org


  About the Bioclusters Mailing List Community

  The bioclusters at bioinformatics.org mailing list is a 600+ member forum
  for users, builders and programmers of distributed systems used in life
  science research and bioinformatics. For more information about the
  list including the public archives and subscription information please
  visit http://bioinformatics.org/mailman/listinfo/bioclusters


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Thu Oct  9 14:40:02 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Thu, 9 Oct 2003 11:40:02 -0700 (PDT)
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.4.44.0310090655490.1253-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.04.10310091040490.4003-100000@12-207-199-254.client.attbi.com>

On Thu, 9 Oct 2003, Robert G. Brown wrote:
> It probably refers to burst delivery out of its 8 MB cache.  The actual
> sustained bps speed is a pure matter of N*2*\pi*\R*f/S, where N = number

A hard drive only reads from one head at a time.  It's not possible to align
every head with each other to such a degree that every track in a cylinder is
readable at once.  If you look at a given drive family of drives, each
different sized drive is the same basic hardware with more discs/heads.  For
instance Seagate's Cheetah 15K.3 family
(http://www.seagate.com/docs/pdf/datasheet/disc/ds_cheetah15k.3.pdf) has the
exact same internal transfer rate (609-891 megabits/sec) for the 18 GB model
with 2 heads, the 36GB with 4 heads, and the 73GB with 8.

> read radius, and S is the linear length per bit.  This is an upper
> bound.  Similarly average latency (seek time) is something like 1/2f,
> the time the platter requires to move half a rotation.

The average latency is indeed 1/2 the rotational period.  For a 7200 RPM drive
it is 4.16 ms, for a 15k RPM drive it's 2 ms.  Seek time is something
completely different.  It's how long it takes the head to move from one track
to another.  It does not included the latency.  You might see track-to-track,
full stroke, and average seek times in a datasheet.

> I should also point out that since we've been using the RAIDs we have
> experienced multidisk failures that required restoring from backup on
> more than one occasion.  The book value probability for even one

I've had one multidisk failure in a RAID5 system.  It was after moving into a
new building, one array had three out of six disks fail to spin up.  Of course
I had anticipated this, and made a backup, to tape, just before the move. 
None of the tapes were damaged in transit.  I've had several single drive
failures.  I've never seen anyone with significant number of drive-years of
experience say they've never seen a drive fail.  And no manufacture has a
failure rate anywhere near 0%.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Thu Oct  9 13:47:43 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Thu, 9 Oct 2003 13:47:43 -0400 (EDT)
Subject: building a RAID system - 8 drives
In-Reply-To: <20031009123045.7582.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.4.44.0310091344310.19503-100000@ra.thebes>

On Thu, 9 Oct 2003, Angel Rivera wrote:

> Alvin Oga writes: 
> 
> > 
> > On Thu, 9 Oct 2003, Trent Piepho wrote: 
> > 
> >> On Thu, 9 Oct 2003, Alvin Oga wrote:
> 
> > nobody swaps disks around ... unless one is using those 5.25" drive bay
> > thingies in which case ... thats a different ball game 
> 
> No quite true. We use Rare drives (one box) to move up to a TB of data 
> around w/o having to take the time to create tapes and then download them. 
> That takes a lot of time, even w/ LTOs. 

Jim Grey just recommends moving the whole computer:

http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=43

<quote>
JG It's a very convenient way of distributing data.

DP Are you sending them a whole PC?

JG Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, 
and seven 300-GB disks--all for about $3,000.
</quote>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From canon at nersc.gov  Thu Oct  9 14:30:46 2003
From: canon at nersc.gov (canon at nersc.gov)
Date: Thu, 09 Oct 2003 11:30:46 -0700
Subject: building a RAID system 
In-Reply-To: Message from Daniel Fernandez <daniel@labtie.mmt.upc.es> 
   of "Wed, 08 Oct 2003 21:46:59 +0200." <1065642419.9483.55.camel@qeldroma.cttc.org> 
Message-ID: <200310091830.h99IUkNr014912@pookie.nersc.gov>


Daniel,

We have around 50 3ware boxes with a total 
formated space of around 50 TB.  We run all of these in
HW raid mode.  I would avoid using software raid
if you plan to have more than a dozen or so clients.
Our experience is that while software raid works
great, it scales poorly.  This was very noticeable
when the server processors were PIII class.  It may
be less of an issue with newer processors, but I
would still recommend HW raid if the card supports
it.  Also, we like the 3ware cards because they have
been supported by linux for ages now.  Some of the
other cards have been a little dicey.

With our newest systems we've seen aggregate performance 
for a single server of around 70 MB/s and they appear to 
scale quite well (handle over 50 clients).  This last batch of
systems have 12 250 GB drives, a 12 port 3ware card,
dual Xeon, on-board gigE and cost less than $7k.

Also, the 3ware systems hot swap very well.  We make
use of it all the time.


--Shane


------------------------------------------------------------------------
Shane Canon                             voice: 510-486-6981
PSDF Project Lead                       fax:   510-486-7520
National Energy Research Scientific
  Computing Center
1 Cyclotron Road Mailstop 943-256
Berkeley, CA 94720                      canon at nersc.gov
------------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 17:20:25 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 14:20:25 -0700 (PDT)
Subject: Raid Deffinitions
In-Reply-To: <20031009160820.2217.qmail@web60304.mail.yahoo.com>
Message-ID: <Pine.LNX.3.96.1031009141924.4881C-100000@Maggie.Linux-Consulting.com>


On Thu, 9 Oct 2003, Andrew Latham wrote:

> Discussing a client setup the other day a cohort and I came to a different
> opinion on what each raid level does. Is there a guide/standard to define how
> it should work. Also do any vendors stray from the beaten path and add there
> own levels?


http://www.1U-Raid5.net/Differences
	- definitions, and pretty pictures too

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gsheppar at gpc.edu  Thu Oct  9 15:43:57 2003
From: gsheppar at gpc.edu (Gene Sheppard)
Date: Thu, 09 Oct 2003 15:43:57 -0400
Subject: Inquiry small system S/W
In-Reply-To: <GNENLCJOFGONOBMCHGFLAEOICCAA.gsheppar@gpc.edu>
Message-ID: <GNENLCJOFGONOBMCHGFLIEOICCAA.gsheppar@gpc.edu>

 We here are Georgia Perimeter College are planning on putting together a 5
or 6 node Beowulf system.

 My question:
 Is there any software for a system like this?
 What applications have been tested on a small system?

 If there are none, what is the smallest system out there?

 Thank you for your help.

 GEne

 ==============================================
 Gene Sheppard
 Georgia Perimeter College
 Computer Science
 1000 University Center Lane
 Lawrenceville, GA 30043
 678-407-5243


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Thu Oct  9 17:04:30 2003
From: rodmur at maybe.org (Dale Harris)
Date: Thu, 9 Oct 2003 14:04:30 -0700
Subject: building a RAID system - 8 drives
In-Reply-To: <Pine.LNX.4.44.0310091344310.19503-100000@ra.thebes>
References: <20031009123045.7582.qmail@houston.wolf.com> <Pine.LNX.4.44.0310091344310.19503-100000@ra.thebes>
Message-ID: <20031009210430.GD11051@maybe.org>

On Thu, Oct 09, 2003 at 01:47:43PM -0400, Michael T. Prinkey elucidated:
> > 
> > No quite true. We use Rare drives (one box) to move up to a TB of data 
> > around w/o having to take the time to create tapes and then download them. 
> > That takes a lot of time, even w/ LTOs. 
> 
> Jim Grey just recommends moving the whole computer:
> 
> http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=43
> 
> <quote>
> JG It's a very convenient way of distributing data.
> 
> DP Are you sending them a whole PC?
> 
> JG Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, 
> and seven 300-GB disks--all for about $3,000.
> </quote>


Kind of reminds me of a favorite fortune cookie quotes:

"Never underestimate the bandwidth of a station wagon full of tapes
hurling down the highway" -- Andrew S. Tannenbaum


Dale
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel at labtie.mmt.upc.es  Thu Oct  9 14:50:17 2003
From: daniel at labtie.mmt.upc.es (Daniel Fernandez)
Date: Thu, 09 Oct 2003 20:50:17 +0200
Subject: building a RAID system
In-Reply-To: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca>
References: 	 <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca>
Message-ID: <1065725416.1136.59.camel@qeldroma.cttc.org>

Hi again,

Thanks for the advice, also it has started an interesting thread.

On Thu, 2003-10-09 at 01:39, Mark Hahn wrote:
> > I would like to know some advice about what kind of technology apply
> > into a RAID file server ( through NFS ) . We started choosing hardware
> > RAID to reduce cpu usage.
> 
> that's unfortunate, since the main way HW raid saves CPU usage is 
> by running slower ;)
> 
I cannot get the point here, the dedicated processor should take all
transfer commands and offload the CPU why it would run slower ? In some
tests a raid system for a single workstation ( no networking ) it's a
bit useless (slower) unless you want to transfer really big files. In a
networked environment there could be a massive number of I/O commands so
should be critical.

> seriously, CPU usage is NOT a problem with any normal HW raid,
> simply because a modern CPU and memory system is *so* much better
> suited to performing raid5 opterations than the piddly little
> controller in a HW raid card.  the master/fileserver for my 
> cluster is fairly mundane (dual-xeon, i7500, dual PC1600), and 
> it can *easily* saturate its gigabit connection.  after all, ram
> runs at around 2 GB/s sustained, and the CPU can checksum at 3 GB/s!
> 
Agreed, our server would not be doing anything more than managing NFS
so, there is power to spare, where talking about an Athlon XP2600+
processor. But, a really good  Parallel ATA 100/133 controller is
needed, and 4 channels at least... 4 HDs in 2 master/slave channels
reduces drastically performance
? any controller recommended ?

But must be noted that HW RAID offers better response time.

HW raid offers hotswap capability and offload our work instead of
maintaining a SW raid solution ...we'll see ;)

> concern for PCI congestion is a much more serious issue.
> 
We're limited at 32 bit PCI, we cannot get around this unless spend on a
highly priced PCI 64 mainboard.

> finally, why do you care at all?  are you fileserving through
> a fast (>300 MB/s) network like quadrics/myrinet/IB?  most people
> limp along at a measly gigabit, which even a two-ide-disk raid0
> can saturate...
> 
> > The server has a dozen of client workstations connected through a
> > switched 100Mbit LAN , all of these equipped with it's own OS and
> 
> jeez, since your limited to 10 MB/s, you could do raid5 on a 486
> and still saturate the net.  seriously, CPU consumption is NOT an issue
> at 10 MB/s.

There would not be noticeable difference between SW/HW mode here. The
clients would be doing write bursts of 2-5Mb per second so there must
not be any problem.

> > machines tough, server only takes care of file sharing.
> 
> so excess cycles on the fileserver will be wasted unless used.
> 
> > Considering that not all the workstantions would be working full time
> > and with cost in mind ? it's worth an ATA RAID solution ?
> 
> you should buy a single promise sata150 tx4 and four big sata disks
> (7200 RPM 3-year models, please).
> 
> regards, mark hahn.
> 

In fact we have two choices:

	- Use an spare existing ( relatively obsolete ) computer and couple it
with a HW RAID card.

        - Spend on a fast CPU computer and a good but cheap Parallel ATA
controller.


> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 17:17:34 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 17:17:34 -0400 (EDT)
Subject: building a RAID system
In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310091557240.3275-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Daniel Fernandez wrote:

> > that's unfortunate, since the main way HW raid saves CPU usage is 
> > by running slower ;)
> > 
> I cannot get the point here, the dedicated processor should take all
> transfer commands and offload the CPU why it would run slower ? In some
> tests a raid system for a single workstation ( no networking ) it's a
> bit useless (slower) unless you want to transfer really big files. In a
> networked environment there could be a massive number of I/O commands so
> should be critical.

Key word:  "should"

Benchmark results: "often does not"

Your best bet is to try both and run your own benchmarks and do your own
cost/benefit analysis.  When you say things like "better response time"
one is fairly naturally driven to ask "does the difference matter", for
example.  Given that we run over 100 workstations from a SW RAID with
nearly instantaneous (entirely satisfactory) performance, you'd have to
really be hammering it to perceive a difference.

> In fact we have two choices:
> 
> 	- Use an spare existing ( relatively obsolete ) computer and couple it
> with a HW RAID card.
> 
>         - Spend on a fast CPU computer and a good but cheap Parallel ATA
> controller.

Or a cheap computer + PATA or SATA controller.  Even a cheap computer
has 2+ GHz CPUs and hundreds of MB of RAM these days. Spend more on
what you put the disks in, power, cooling.

If it is an old/obsolete computer, will it have enough power, enough
cooling?  Regardless, the disk cost itself will dominate your costs.

   rgb

> 
> 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 17:06:44 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 14:06:44 -0700 (PDT)
Subject: building a RAID system - 8 drives - drive-net
In-Reply-To: <20031009123045.7582.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.3.96.1031009140519.4881A-100000@Maggie.Linux-Consulting.com>


hi ya angel

On Thu, 9 Oct 2003, Angel Rivera wrote:

> Alvin Oga writes: 
> 
> > nobody swaps disks around ... unless one is using those 5.25" drive bay
> > thingies in which case ... thats a different ball game 
> 
> No quite true. We use Rare drives (one box) to move up to a TB of data 
> around w/o having to take the time to create tapes and then download them. 
> That takes a lot of time, even w/ LTOs. 

yes.. guess it makes sense to move disks around for moving tb of data
like floppy-net or sneaker-net
	- done that ( moving disks around ) myself once in a while 
	for a quickie fix

c ya
alvin


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Thu Oct  9 16:35:04 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Thu, 9 Oct 2003 13:35:04 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.04.10310091322230.4364-100000@12-207-199-254.client.attbi.com>

On Thu, 9 Oct 2003, Daniel Fernandez wrote:
> On Thu, 2003-10-09 at 01:39, Mark Hahn wrote:
> > > I would like to know some advice about what kind of technology apply
> > > into a RAID file server ( through NFS ) . We started choosing hardware
> > > RAID to reduce cpu usage.
> > 
> > that's unfortunate, since the main way HW raid saves CPU usage is 
> > by running slower ;)
> > 
> I cannot get the point here, the dedicated processor should take all
> transfer commands and offload the CPU why it would run slower ? In some

Easy, said dedicated processor and memory is quite a bit slower than the main
CPU and memory.  If you look at thoughput in MB/sec, the latest linux software
RAID is usually much faster than hardware raid implimentations.

Usually CPU usage is (stupidly) reported as just a % used during a benchmark. 
If you transfer fewer megabytes in second, obviously the number of CPU cycles
used in that second go down as well.  If CPU usage is correctly reported in
units of % per MB/sec, then you get a real measure of hardware efficiency.

> needed, and 4 channels at least... 4 HDs in 2 master/slave channels
> reduces drastically performance
> ? any controller recommended ?

It seems that most good 4-12 channel (NOT drive, channel!) IDE cards ARE
hardware raid controllers.  Lots of people use the 3ware RAID cards in JBOD
mode with software raid, because their isn't a cheaper non-hardware raid card
comparable to something like the 3ware 7508-8 or 7508-12.  I know about
cheaper 2 and 4 channel non-raid cards, but they're 32/33 PCI and not
comparable to the 3ware.


> > concern for PCI congestion is a much more serious issue.
> > 
> We're limited at 32 bit PCI, we cannot get around this unless spend on a
> highly priced PCI 64 mainboard.

AMD 760MPX and Intel E7501 motherboards have high speed 64/66 PCI and PCI-X
for the E7501.  They're not that expensive really.  An additional $100-$200 at
most over a single PCI 32/33 motherboard.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rokrau at yahoo.com  Thu Oct  9 18:02:34 2003
From: rokrau at yahoo.com (Roland Krause)
Date: Thu, 9 Oct 2003 15:02:34 -0700 (PDT)
Subject: Experience with Omni anyone?
Message-ID: <20031009220234.64852.qmail@web40010.mail.yahoo.com>

Folks,
I came across the Omni OpenMP compiler lately and I was wondering
whether anyone here has used it and what the experience was. 

I.o.w., is it "industrial strength"? 

I know of and use Portland and Intel compilers but I am also curious.

Roland


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lathama at lathama.com  Thu Oct  9 17:52:22 2003
From: lathama at lathama.com (Andrew Latham)
Date: Thu, 9 Oct 2003 14:52:22 -0700 (PDT)
Subject: Raid Deffinitions
In-Reply-To: <Pine.LNX.4.44.0310091727510.2276-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20031009215222.68022.qmail@web60307.mail.yahoo.com>

thanks.

I know that all the different raid levels are here for a reason and raid5 is
great but what are the benefits of the rest?

--- Mark Hahn <hahn at physics.mcmaster.ca> wrote:
> > Discussing a client setup the other day a cohort and I came to a different
> > opinion on what each raid level does. Is there a guide/standard to define
> how
> > it should work. Also do any vendors stray from the beaten path and add
> there
> > own levels?
> 
> sure they do.  IMO the only important levels are:
> 
> raid0 - striping
> raid1 - mirroring
> raid5 - rotating parity-based array
> 
> vendors who make a big deal of obvious extensions like raid 10 
> (mirrored stripes or vice versa) are immediately hung up on by me...
> 


=====
Andrew Latham

Penguin loving, moralist agnostic.

LathamA.com - (lay-th-ham-eh)
lathama at lathama.com - lathama at yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rocky at atipa.com  Thu Oct  9 16:24:36 2003
From: rocky at atipa.com (Rocky McGaugh)
Date: Thu, 9 Oct 2003 15:24:36 -0500 (CDT)
Subject: Inquiry small system S/W
In-Reply-To: <GNENLCJOFGONOBMCHGFLIEOICCAA.gsheppar@gpc.edu>
Message-ID: <Pine.LNX.4.44.0310091516010.21540-100000@rocky>


On Thu, 9 Oct 2003, Gene Sheppard wrote:
>  We here are Georgia Perimeter College are planning on putting together a 5
>  or 6 node Beowulf system.
> 
>  My question:
>  Is there any software for a system like this?
>  What applications have been tested on a small system?
> 
>  If there are none, what is the smallest system out there?
> 
>  Thank you for your help.
> 
>  GEne

System or application software?

For system software, any of the beowulf kits will work. 

	http://warewulf-cluster.org/
	http://www.scyld.com/
	http://oscar.sourceforge.net/
	http://rocks.npaci.edu/
	http://clic.mandrakesoft.com/index-en.html

and others.

Most applications will run just fine on 5 or 6 nodes. To start with, i'd 
get HPL and PMB running to ensure everything is working fine. Then you can
look at other applications to see what you might actually be able to 
benefit from.
	
-- 
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://67.8450073/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Thu Oct  9 16:44:28 2003
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Thu, 9 Oct 2003 16:44:28 -0400 (EDT)
Subject: building a RAID system - yup
In-Reply-To: <Pine.LNX.4.44.0310091032570.2897-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310091640140.25626-100000@boltzmann.basement-supercomputing.com>

On Thu, 9 Oct 2003, Robert G. Brown wrote:

> Hell, there are more than that in California alone.  So far monopoles

Forgot to mention the California "megapoll" which just occurred on 
Tuesday.

Sorry, I could not help myself.

Doug

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Oct  9 18:08:51 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 9 Oct 2003 15:08:51 -0700 (PDT)
Subject: Raid Deffinitions
In-Reply-To: <20031009215222.68022.qmail@web60307.mail.yahoo.com>
Message-ID: <Pine.LNX.4.44.0310091456000.16205-100000@twin.uoregon.edu>

On Thu, 9 Oct 2003, Andrew Latham wrote:

> thanks.
> 
> I know that all the different raid levels are here for a reason and raid5 is
> great but what are the benefits of the rest?

0 is fast (interleaved chunks) but provides no redundancy.

1 is a a 1 + 1 mirror... can be faster on reads but is generally slower on 
writes depending on your controller/implementation...

0 + 1 or 1 + 0  striped mirror or mirrored stripe. less space efficient 
than raid 5 but faster in general. can survive multiple disk failures so 
long as both disks containing the same information don't fail at once.
 
> --- Mark Hahn <hahn at physics.mcmaster.ca> wrote:
> > > Discussing a client setup the other day a cohort and I came to a different
> > > opinion on what each raid level does. Is there a guide/standard to define
> > how
> > > it should work. Also do any vendors stray from the beaten path and add
> > there
> > > own levels?
> > 
> > sure they do.  IMO the only important levels are:
> > 
> > raid0 - striping
> > raid1 - mirroring
> > raid5 - rotating parity-based array
> > 
> > vendors who make a big deal of obvious extensions like raid 10 
> > (mirrored stripes or vice versa) are immediately hung up on by me...
> > 
> 
> 
> =====
> Andrew Latham
> 
> Penguin loving, moralist agnostic.
> 
> LathamA.com - (lay-th-ham-eh)
> lathama at lathama.com - lathama at yahoo.com
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Thu Oct  9 19:19:16 2003
From: angel at wolf.com (Angel Rivera)
Date: Thu, 09 Oct 2003 23:19:16 GMT
Subject: building a RAID system - 8 drives - drive-net
In-Reply-To: <Pine.LNX.3.96.1031009140519.4881A-100000@Maggie.Linux-Consulting.com> 
References: <Pine.LNX.3.96.1031009140519.4881A-100000@Maggie.Linux-Consulting.com>
Message-ID: <20031009231916.21008.qmail@houston.wolf.com>

Alvin Oga writes: 

> 
> hi ya angel 
> 
> On Thu, 9 Oct 2003, Angel Rivera wrote: 
> 
>> Alvin Oga writes:  
>> 
>> > nobody swaps disks around ... unless one is using those 5.25" drive bay
>> > thingies in which case ... thats a different ball game  
>> 
>> No quite true. We use Rare drives (one box) to move up to a TB of data 
>> around w/o having to take the time to create tapes and then download them. 
>> That takes a lot of time, even w/ LTOs. 
> 
> yes.. guess it makes sense to move disks around for moving tb of data
> like floppy-net or sneaker-net
> 	- done that ( moving disks around ) myself once in a while 
> 	for a quickie fix

When you have that much data, it is easier and faster to load 8 drives into 
a box than tons of tapes.  take out the old drives and place the new ones 
in, mount it, export it and voila-it is on-line.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Oct  9 19:36:29 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 9 Oct 2003 16:36:29 -0700 (PDT)
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <20031009231916.21008.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com>


hi ya

On Thu, 9 Oct 2003, Angel Rivera wrote:

..
> > yes.. guess it makes sense to move disks around for moving tb of data
> > like floppy-net or sneaker-net
> > 	- done that ( moving disks around ) myself once in a while 
> > 	for a quickie fix
> 
> When you have that much data, it is easier and faster to load 8 drives into 
> a box than tons of tapes.  take out the old drives and place the new ones 
> in, mount it, export it and voila-it is on-line.

yes and a "bunch of disks" (raid5) survives the loss of one dropped disk
and is relatively secure from prying eyes ....
	- ceo gets one disk
	- cfo gets one disk
	- hr gets one disk
	- eng gets one disk
	- sys admin gets one disk
 	( combine all[-1] disks together to recreate the (raid5) TB data )

	- a single (raid5) disk by itself is basically worthless

tape backups are insecure ...
	- lose a tape ( bad tape, lost tape ) and and all its data is lost
	- anybody can read the entire contents of the full backup

	( one could tar up one disk per tape, instead of tar'ing the
	( whole raid5 subsystem, to provide the
	( same functionality as a raid5 offsite disk backup

c ya
alvin

and hopefully .. the old disks are not MFM drives..
	or ata-133 in a new sata system :-)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Thu Oct  9 19:51:05 2003
From: angel at wolf.com (Angel Rivera)
Date: Thu, 09 Oct 2003 23:51:05 GMT
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com> 
References: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com>
Message-ID: <20031009235105.23420.qmail@houston.wolf.com>

Alvin Oga writes: 

>> yes and a "bunch of disks" (raid5) survives the loss of one dropped disk
> and is relatively secure from prying eyes ....

Well, let's see. We can backup the data to tapes or to disks-disks are 
faster.  From the time the data is on the disk, 1/2-1.0 hours to get to us, 
a few minutes to install them and voila you are on-line. 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct  9 21:31:13 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 9 Oct 2003 21:31:13 -0400 (EDT)
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>

On Thu, 9 Oct 2003, Alvin Oga wrote:

> yes and a "bunch of disks" (raid5) survives the loss of one dropped disk
> and is relatively secure from prying eyes ....
> 	- ceo gets one disk
> 	- cfo gets one disk
> 	- hr gets one disk
> 	- eng gets one disk
> 	- sys admin gets one disk
>  	( combine all[-1] disks together to recreate the (raid5) TB data )
> 
> 	- a single (raid5) disk by itself is basically worthless

Secure from prying eyes, maybe (as in casually secure).  "Secure" as in
your secret plans for world domination or the details of your
flourishing cocaine business are safe from the feds, not at all, unless
the information is encrypted.

Each disk has about one fourth of the information.  English is about 3:1
compressible (really more; this is using simple symbolic compression).
A good cryptanalyst could probably recover "most" of what is on the
disks from any one disk, depending on what kind of data is there.
Numbers, possibly not, but written communications, quite possibly.
Especially if it falls in the hands of somebody who really wants it and
has LOTS of good cryptanalysts.

> tape backups are insecure ...
> 	- lose a tape ( bad tape, lost tape ) and and all its data is lost
> 	- anybody can read the entire contents of the full backup

Unless it is encrypted.  Without strong encryption there is no
data-level security.  With it there is.  Maybe.  Depending on what is
"strong" to you and what is strong to, say, the NSA, whether your
systems and network is secure, depending on whether you have dual
isolation power inside a faraday cage with dobermans at the door.

However, there can be as much or as little physical security for the
tape as you care to put there.  Tape in a locked safe, tape in an
armored car.

Disks are far more fragile than tapes -- drop a disk one meter onto the
ground and chances are quite good that it is toast and will at best cost
hundreds of dollars and a trip to specialized facilities to remount and
mostly recover.  Drop a tape one meter onto the ground and chance are
quite good that it is perfectly fine, and even if it isn't (because e.g.
the case cracked) ordinary humans can generally remount the tape in a
new case without needing a clean room and special tools.  Tapes are
cheap -- you can afford to send almost three tapes compared to one disk.

I get the feeling that you just don't like tapes, Alvin...;-)

    rgb

> 
> 	( one could tar up one disk per tape, instead of tar'ing the
> 	( whole raid5 subsystem, to provide the
> 	( same functionality as a raid5 offsite disk backup
> 
> c ya
> alvin
> 
> and hopefully .. the old disks are not MFM drives..
> 	or ata-133 in a new sata system :-)
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From smuelas at mecanica.upm.es  Fri Oct 10 01:20:06 2003
From: smuelas at mecanica.upm.es (smuelas)
Date: Fri, 10 Oct 2003 07:20:06 +0200
Subject: Inquiry small system S/W
In-Reply-To: <GNENLCJOFGONOBMCHGFLIEOICCAA.gsheppar@gpc.edu>
References: <GNENLCJOFGONOBMCHGFLAEOICCAA.gsheppar@gpc.edu>
	<GNENLCJOFGONOBMCHGFLIEOICCAA.gsheppar@gpc.edu>
Message-ID: <20031010072006.54dfd8a4.smuelas@mecanica.upm.es>

I have put together a 8 node beowulf cluster to my greatest satisfaction and results.
You don't need nothing special; if it is beowulf it must be Linux. If you use, for example, RedHat 9, what I do, you have everything needed in the standard 3 CD's distribution, that you can download at no cost.
Apart from that, and in my particular case, I use fortran90 and the compiler from Intel, also free for non-comercial use.
Perhaps, the only special hardware to buy is a simple, 8 nodes switch for your ethernet connections.
Then, what is really important is to learn to make your software really able to use the cluster. So, some time to study MPI or similar, and work, work, work...  :-)
Before being an 8 nodes cluster, mine has been 4-nodes, then 6-nodes and at 8 I stopped. But there is no difference in the work to do. Just the possibilities and speed increase.

Good luck!!


On Thu, 09 Oct 2003 15:43:57 -0400
Gene Sheppard <gsheppar at gpc.edu> wrote:

>  We here are Georgia Perimeter College are planning on putting together a 5
> or 6 node Beowulf system.
> 
>  My question:
>  Is there any software for a system like this?
>  What applications have been tested on a small system?
> 
>  If there are none, what is the smallest system out there?
> 
>  Thank you for your help.
> 
>  GEne
> 
>  ==============================================
>  Gene Sheppard
>  Georgia Perimeter College
>  Computer Science
>  1000 University Center Lane
>  Lawrenceville, GA 30043
>  678-407-5243
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Santiago Muelas
E.T.S. Ingenieros de Caminos, (U.P.M)    Tf.: (34) 91 336 66 59
e-mail: smuelas at mecanica.upm.es          Fax: (34) 91 336 67 61
www: http://w3.mecanica.upm.es/~smuelas
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bill at math.ucdavis.edu  Fri Oct 10 01:43:57 2003
From: bill at math.ucdavis.edu (Bill Broadley)
Date: Thu, 9 Oct 2003 22:43:57 -0700
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
References: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com> <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
Message-ID: <20031010054357.GB13480@sphere.math.ucdavis.edu>


On the hardware vs software RAID thread.  A friend needed a few TB and
bought a high end raid card (several $k), multiple channels, enclosure,
and some 10's of 73GB drives for somewhere in the $50k-$100k neighborhood.

He needed the capacity and a minumum of 50MB/sec sequential write
performance (on large sequential writes).  He didn't get it.  Call #1 to
dell resulted in well it's your fault, it's our top of the line, it should
be plenty fast, bleah, bleah, bleah.  Call #2 lead to an escalation to
someone with more of a clue, tune paramater X, tune Y, try a different
raid setup, swap out X, etc.  After more testing without helping call #3
was escalated again someone fairly clued answered.  The conversation went
along the lines of what, yeah, it's dead slow.  Yeah most people only
care about the reliability.  Oh performance?  We use linux + software
raid on all the similar hardware we use internally at Dell.

So the expensive controller was returned, and 39160's were used in it's
place (dual channel U160) and performance went up by a factor of 4 or
so.  

In my personal benchmarking on a 2 year old machine with 15 drives
I managed 200-320 MB/sec sustained (large sequential read or write),
depending on filesystem and strip size.  I've not witnessed any "scaling
problems", I've been quite impressed with linux software raid under
all conditions and have had it run significantly faster then several
expensive raid cards I've tried over the years.  Surviving hotswap, over
500 day uptimes, and substantial performance advantages seem to be common.

Anyone have numbers comparing hardware and software raid using bonnie++
for random access or maybe postmark (netapp's diskbenchmark)

Failures so far:
* 3ware 6800 (awful, evil, slow, unreliable, terrible tech support)
* quad channel scsi card from Digital/storage works, rather slow, then started
  crashing   
* More recently (last 6 months) the top of the line dell raid card (PERSC?)
* A few random others

One alternative solution I figured I'd mention is the Apple 2.5 TB array
for $10-$11k isnt' a bad solution for a mostly turnkey, hotswap, redundant
powersupply setup with a warranty.  Dual 2 Gigabit Fiber channels does make
it easier to scale to 10's of TB's then some other solutions.  I managed
70 MB/sec read/write to a 1/2 Xraid (on a single FC).  Of course there
are cheaper solutions.

Oh, I also wanted to mention one gotcha for the DIY methods.  I've had
I think 4 machines now with 8-15 disks, and dual 400 watt powersupplies
or 3x225 watt (n+1) boot just fine for 6 months, but start complaining
at boot due to to high power consumption.  This is of course especially
bad with EIDEs since they all spin up at boot (SCSI can usually be spun
up one at a time).  I suspect a slight decrease in lubrication and or
degradation in the powersupplies which were possibly running above 100%
to be the cause.

In any case great thread, I've yet to see a performance or functionality
benefit from hardware raid.

-- 
Bill Broadley
Mathematics
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Fri Oct 10 03:24:15 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Fri, 10 Oct 2003 09:24:15 +0200
Subject: building a RAID system
In-Reply-To: <1065725416.1136.59.camel@qeldroma.cttc.org>
References: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca> <1065725416.1136.59.camel@qeldroma.cttc.org>
Message-ID: <20031010072415.GI17432@unthought.net>

On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote:
> Hi again,
...

Others have already answered your other questions, I'll try to take one
that went unanswered (as far as I can see).

...
> 
> But must be noted that HW RAID offers better response time.

In a HW RAID setup you *add* an extra layer: the dedicated CPU on the
RAID card.  Remember, this CPU also runs software - calling it
'hardware RAID' in itself is misleading, it could just as well be called
'offloaded SW RAID'.

The problem with offloading is, that while it made great sense in the
days of 1 MHz CPUs, it really doesn't make a noticable difference in the
load on your typical N GHz processor.

However, you added a layer with your offloaded-RAID. You added one extra
CPU in the 'chain of command' - and an inferior CPU at that. That layer
means latency even in the most expensive cards you can imagine (and
bottleneck in cheap cards).  No matter how you look at it, as long as
the RAID code in the kernel is fairly simple and efficient (which it
was, last I looked), then the extra layers needed to run the PCI
commands thru the CPU and then to the actual IDE/SCSI controller *will*
incur latency.  And unless you pick a good controller, it may even be
your bottleneck.

Honestly I don't know how much latency is added - it's been years since
I toyed with offload-RAID last  ;)

I don't mean to be handwaving and spreading FUD - I'm just trying to say
that the people who advocate SW RAID here are not necessarily smoking
crack - there are very good reasons why SW RAID will outperform HW RAID
in many scenarios.

> 
> HW raid offers hotswap capability and offload our work instead of
> maintaining a SW raid solution ...we'll see ;)

That, is probably the best reason I know of for choosing hardware RAID.
And depending on who you will have administering your system, it can be
a very important difference.

There are certainly scenarios where you will be willing to trade a lot
of performance for a blinking LED marking the failed disk - I am not
kidding.

Cheers,

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Fri Oct 10 02:58:37 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Fri, 10 Oct 2003 08:58:37 +0200
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
References: <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com> <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
Message-ID: <20031010065837.GH17432@unthought.net>

On Thu, Oct 09, 2003 at 09:31:13PM -0400, Robert G. Brown wrote:
...
> Each disk has about one fourth of the information.  English is about 3:1
> compressible (really more; this is using simple symbolic compression).
> A good cryptanalyst could probably recover "most" of what is on the
> disks from any one disk, depending on what kind of data is there.

You overlook the fact that data on a RAID-5 is distributed in 'chunks'
of sizes around 4k-128k (depending...)

So you would get the entire first 'Introduction to evil empire plans',
but the entire 'Subverting existing banana government' chapter may be on
one of the disks that you are missing.

> Numbers, possibly not, but written communications, quite possibly.
> Especially if it falls in the hands of somebody who really wants it and
> has LOTS of good cryptanalysts.

You'd probably need historians and psychologists rather than
cryptographers - but of course the point remains the same.  Just
nit-picking here.

> 
> > tape backups are insecure ...
> > 	- lose a tape ( bad tape, lost tape ) and and all its data is lost
> > 	- anybody can read the entire contents of the full backup
> 
> Unless it is encrypted.  Without strong encryption there is no
> data-level security.  With it there is.  Maybe.  Depending on what is
> "strong" to you and what is strong to, say, the NSA, whether your
> systems and network is secure, depending on whether you have dual
> isolation power inside a faraday cage with dobermans at the door.

I'm just thinking of distributing two tapes for each disk - one with
200G of random numbers, the other with 200G of data XOR'ed with the data
from the first tape.

Enter the one-time pad - unbreakable encryption (unless you get a hold
of both tapes of course).

You'd need to make sure you have good random numbers - as an extra
measure of safety one should probably wear a tinfoil hat while working
with the tapes, just in case...   ;)

Of course, if any tape is lost, everything is lost. But one bad KB
on either tape will only result in one bad KB total.

> 
> However, there can be as much or as little physical security for the
> tape as you care to put there.  Tape in a locked safe, tape in an
> armored car.

No no no no no!  Think big!

Think: cobalt bomb in own backyard - threaten anyone who steals your
data, that you'll make the planet inhabitable for a few hundred
decades unless they hand back your tapes.   ;)

(I'm drafting up 'Introduction to evil empire plans' soon by the way  ;)

...
> I get the feeling that you just don't like tapes, Alvin...;-)

Where did you get that idea?  ;)

Cheers,

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From pesch at attglobal.net  Fri Oct 10 13:34:41 2003
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Fri, 10 Oct 2003 10:34:41 -0700
Subject: building a RAID system
References: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca> <1065725416.1136.59.camel@qeldroma.cttc.org> <20031010072415.GI17432@unthought.net>
Message-ID: <3F86EDB1.6264A405@attglobal.net>

You write:

"The problem with offloading is, that while it made great sense in the
days of 1 MHz CPUs, it really doesn't make a noticable difference in the
load on your typical N GHz processor."

Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the
practical limit of SW RAID?

Paul

Jakob Oestergaard wrote:

> On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote:
> > Hi again,
> ...
>
> Others have already answered your other questions, I'll try to take one
> that went unanswered (as far as I can see).
>
> ...
> >
> > But must be noted that HW RAID offers better response time.
>
> In a HW RAID setup you *add* an extra layer: the dedicated CPU on the
> RAID card.  Remember, this CPU also runs software - calling it
> 'hardware RAID' in itself is misleading, it could just as well be called
> 'offloaded SW RAID'.
>
> The problem with offloading is, that while it made great sense in the
> days of 1 MHz CPUs, it really doesn't make a noticable difference in the
> load on your typical N GHz processor.
>
> However, you added a layer with your offloaded-RAID. You added one extra
> CPU in the 'chain of command' - and an inferior CPU at that. That layer
> means latency even in the most expensive cards you can imagine (and
> bottleneck in cheap cards).  No matter how you look at it, as long as
> the RAID code in the kernel is fairly simple and efficient (which it
> was, last I looked), then the extra layers needed to run the PCI
> commands thru the CPU and then to the actual IDE/SCSI controller *will*
> incur latency.  And unless you pick a good controller, it may even be
> your bottleneck.
>
> Honestly I don't know how much latency is added - it's been years since
> I toyed with offload-RAID last  ;)
>
> I don't mean to be handwaving and spreading FUD - I'm just trying to say
> that the people who advocate SW RAID here are not necessarily smoking
> crack - there are very good reasons why SW RAID will outperform HW RAID
> in many scenarios.
>
> >
> > HW raid offers hotswap capability and offload our work instead of
> > maintaining a SW raid solution ...we'll see ;)
>
> That, is probably the best reason I know of for choosing hardware RAID.
> And depending on who you will have administering your system, it can be
> a very important difference.
>
> There are certainly scenarios where you will be willing to trade a lot
> of performance for a blinking LED marking the failed disk - I am not
> kidding.
>
> Cheers,
>
> --
> ................................................................
> :   jakob at unthought.net   : And I see the elder races,         :
> :.........................: putrid forms of man                :
> :   Jakob ?stergaard      : See him rise and claim the earth,  :
> :        OZ9ABN           : his downfall is at hand.           :
> :.........................:............{Konkhra}...............:
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Fri Oct 10 07:12:48 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Fri, 10 Oct 2003 04:12:48 -0700 (PDT)
Subject: building a RAID system - 8 drives - drive-net - tapes - preferences
In-Reply-To: <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.3.96.1031010040605.12902A-100000@Maggie.Linux-Consulting.com>


hi ya robert

On Thu, 9 Oct 2003, Robert G. Brown wrote:

> > tape backups are insecure ...
> > 	- lose a tape ( bad tape, lost tape ) and and all its data is lost
> > 	- anybody can read the entire contents of the full backup
> 
> Unless it is encrypted.  Without strong encryption there is no
> data-level security.  With it there is.  Maybe.  Depending on what is
> "strong" to you and what is strong to, say, the NSA, whether your
> systems and network is secure, depending on whether you have dual
> isolation power inside a faraday cage with dobermans at the door.

just trying to protect the tapes ( backups ) against the casual
"oops look what i found"  and they go and look at the HR records
or the  salary records or employee reviews etc..etc..

not trying to protect the tapes against the [cr/h]ackers ( different
ball game )  and even not protecting against the spies of nsa/kgb etc
either  ( whole new ballgame for those types of backup issues )

 
> However, there can be as much or as little physical security for the
> tape as you care to put there.  Tape in a locked safe, tape in an
> armored car.

dont forget to lock the car/safe too :-)
and log who goes in and out of the "safe" area :-)

> I get the feeling that you just don't like tapes, Alvin...;-)

not my first choice for backups .. even offsite backups...

but if "management" takes out the $$$ to do tape backups... so it shall be
done ...
	ideally, everything works ...  but unfortunately, tapes
	are highly prone to people's "oops i forgot to change it
	yesterday"  or the weekly catridge

have fun
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Fri Oct 10 07:56:39 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Fri, 10 Oct 2003 13:56:39 +0200
Subject: building a RAID system
In-Reply-To: <3F86EDB1.6264A405@attglobal.net>
References: <Pine.LNX.4.44.0310081927480.30258-100000@coffee.psychology.mcmaster.ca> <1065725416.1136.59.camel@qeldroma.cttc.org> <20031010072415.GI17432@unthought.net> <3F86EDB1.6264A405@attglobal.net>
Message-ID: <20031010115639.GN17432@unthought.net>

On Fri, Oct 10, 2003 at 10:34:41AM -0700, pesch at attglobal.net wrote:
> You write:
> 
> "The problem with offloading is, that while it made great sense in the
> days of 1 MHz CPUs, it really doesn't make a noticable difference in the
> load on your typical N GHz processor."
> 
> Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the
> practical limit of SW RAID?

In this forum, I run small storage only. Around 150G for the most busy
server that I have.

Linux has problems with >2TB devices as far as I know, so that sort of
puts an upper limit to whatever you can do with SW/HW RAID there.

In between, it's just one order of magnitude  :)

More seriously - the SW RAID code is extremely simple, and it performs
two different tasks:
*) Reconstruction - which has time complexity T(n) for n bytes of data
*) Read/write - which has time complexity T(1) for n bytes of data

In other words - the more data you have, the longer a resync is going to
take - HW or SW makes no difference (except for a factor, which tends to
be rediculously large on cheap HW RAID cards but acceptable on more
expensive ones).  Reads and writes are not affected by the amount of
data, in the SW RAID layer (and hopefully not in the HW RAID layer
either).

The scalability limits you will run into are:
*) Number of disks you can attach to your box (HW RAID may hide this
   from you and may thus buy you some scalability there)
*) Filesystem limits/performance problems. HW/SW RAID makes no difference
*) Device size limits. HW/SW RAID makes no difference
*) Reconstruction time after unclean shutdown - SW performs much better
   than crap/cheap HW solutions, but I don't know about the expensive
   ones.

There are others on this list with much larger servers and less antique
hardware - guys, speak up - where does it begin to hurt?    :)

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Fri Oct 10 07:59:22 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Fri, 10 Oct 2003 04:59:22 -0700 (PDT)
Subject: building a RAID system
In-Reply-To: <3F86EDB1.6264A405@attglobal.net>
Message-ID: <Pine.LNX.4.44.0310100449040.20691-100000@twin.uoregon.edu>

On Fri, 10 Oct 2003 pesch at attglobal.net wrote:

> You write:
> 
> "The problem with offloading is, that while it made great sense in the
> days of 1 MHz CPUs, it really doesn't make a noticable difference in the
> load on your typical N GHz processor."
> 
> Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the
> practical limit of SW RAID?

size-wise software raid (I'm talking specifically about linux here) scales
far better than most hardware raid controllers (san subsystems are another
kettle of fish entirely), among other reasons because you can spread the
disks out between multiple controllers.

> Paul
> 
> Jakob Oestergaard wrote:
> 
> > On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote:
> > > Hi again,
> > ...
> >
> > Others have already answered your other questions, I'll try to take one
> > that went unanswered (as far as I can see).
> >
> > ...
> > >
> > > But must be noted that HW RAID offers better response time.
> >
> > In a HW RAID setup you *add* an extra layer: the dedicated CPU on the
> > RAID card.  Remember, this CPU also runs software - calling it
> > 'hardware RAID' in itself is misleading, it could just as well be called
> > 'offloaded SW RAID'.
> >
> > The problem with offloading is, that while it made great sense in the
> > days of 1 MHz CPUs, it really doesn't make a noticable difference in the
> > load on your typical N GHz processor.
> >
> > However, you added a layer with your offloaded-RAID. You added one extra
> > CPU in the 'chain of command' - and an inferior CPU at that. That layer
> > means latency even in the most expensive cards you can imagine (and
> > bottleneck in cheap cards).  No matter how you look at it, as long as
> > the RAID code in the kernel is fairly simple and efficient (which it
> > was, last I looked), then the extra layers needed to run the PCI
> > commands thru the CPU and then to the actual IDE/SCSI controller *will*
> > incur latency.  And unless you pick a good controller, it may even be
> > your bottleneck.
> >
> > Honestly I don't know how much latency is added - it's been years since
> > I toyed with offload-RAID last  ;)
> >
> > I don't mean to be handwaving and spreading FUD - I'm just trying to say
> > that the people who advocate SW RAID here are not necessarily smoking
> > crack - there are very good reasons why SW RAID will outperform HW RAID
> > in many scenarios.
> >
> > >
> > > HW raid offers hotswap capability and offload our work instead of
> > > maintaining a SW raid solution ...we'll see ;)
> >
> > That, is probably the best reason I know of for choosing hardware RAID.
> > And depending on who you will have administering your system, it can be
> > a very important difference.
> >
> > There are certainly scenarios where you will be willing to trade a lot
> > of performance for a blinking LED marking the failed disk - I am not
> > kidding.
> >
> > Cheers,
> >
> > --
> > ................................................................
> > :   jakob at unthought.net   : And I see the elder races,         :
> > :.........................: putrid forms of man                :
> > :   Jakob ?stergaard      : See him rise and claim the earth,  :
> > :        OZ9ABN           : his downfall is at hand.           :
> > :.........................:............{Konkhra}...............:
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 10 09:35:35 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 10 Oct 2003 09:35:35 -0400 (EDT)
Subject: building a RAID system - 8 drives - drive-net - tapes - preferences
In-Reply-To: <Pine.LNX.3.96.1031010040605.12902A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0310100718260.3542-100000@lilith.rgb.private.net>

On Fri, 10 Oct 2003, Alvin Oga wrote:

> 
> hi ya robert
> 
> On Thu, 9 Oct 2003, Robert G. Brown wrote:
> 
> > > tape backups are insecure ...
> > > 	- lose a tape ( bad tape, lost tape ) and and all its data is lost
> > > 	- anybody can read the entire contents of the full backup
> > 
> > Unless it is encrypted.  Without strong encryption there is no
> > data-level security.  With it there is.  Maybe.  Depending on what is
> > "strong" to you and what is strong to, say, the NSA, whether your
> > systems and network is secure, depending on whether you have dual
> > isolation power inside a faraday cage with dobermans at the door.
> 
> just trying to protect the tapes ( backups ) against the casual
> "oops look what i found"  and they go and look at the HR records
> or the  salary records or employee reviews etc..etc..
> 
> not trying to protect the tapes against the [cr/h]ackers ( different
> ball game )  and even not protecting against the spies of nsa/kgb etc
> either  ( whole new ballgame for those types of backup issues )

Hmmm, this is morphing offtopic, but data security is a sufficiently
universal problem that I'll chance one more round.  Pardon me while I
light up my crack pipe here...:-7... so I can babble properly. Ah, ya,
NOW I'm awake...:-)

The point is that you cannot do this with precisely HR records or salary
records or employee reviews.  If anybody gets hold of the data (casually
or not) be it on tape or disk or the network while it is in transit then
you are liable to the extent that you failed to take adequate measures
to ensure the data's security.  By the numbers:

  1) Tapes even more than disk are most unlikely to be viewed casually.
Disks have street value, tapes (really) don't.  There is a high entry
investment required before one can even view the contents of an e.g. LTO
tape, plus a fair degree of expertise.  A disk can be pulled from a box
and remounted in any system by a pimple-faced kid with a screwdriver and
an attitude.  The net can be snooped by anyone, with a surprisingly low
entry level of expertise (or rather a high level expertise encapsulated
in openly distributed rootkits and exploits so anybody can do it).

  2) All three are clearly vulnerable to someone (e.g. a private
investigator, an insurance company, a competitor, an identity thief, the
government) seeking to snoop and violate the privacy of the individuals
who have entrusted their data to you.  HR records contain SSNs, bank
numbers (to facilitate direct deposit), names addresses, health records,
employment records, CVs and/or transcripts, disciplinary records: they
are basically everything you never wanted the world to know in one
compact and efficient package.  Federal and state laws regulate the
handling of this data in quite rigorous ways.

  3) An IT officer who was responsible for holding sensitive data secure
according to law and who failed to employ reasonable measures for
maintaining it secure and who subsequently had it stolen (violating his
trust) would be publically eviscerated.  Career ruined, bankrupted by
suits, tormented by guilt, possibly even put in jail, driven to suicide
kind of stuff in the worst case.  The company that employed that officer
would be right behind -- suits, clean sweep firings of the entire
management team in the chain of responsibility, plunging stock prices,
public recriminations and humiliation.  EVEN IF reasonable measures were
employed there would likely be trouble and recrimination, but careers
might survive, damages would be limited, jail might be avoided, and one
wouldn't feel so irresponsibly guilty.

  4) Strong encryption of the data to protect it in transit is an
obvious, inexpensive, knee-jerk sort of reasonable measure (again,
independent of the means of transport presuming only that the data
passes out of your fortress keep where you keep the cobalt bomb and
dobermans and make all of your staff wear tinfoil caps while looking at
the data).  It might even be mandated by law for certain forms of data
-- the federal government just passed a sweeping right to privacy
measure for health data, for example, that may well have highly explicit
provisions for data transport and security.

  5) Therefore... only someone with a death wish would send sensitive,
valuable data for which they are responsible for security, through any
transport layer not under their direct control and deemed secure of its
own right, between secure sites, without encrypting it first (and
otherwise complying with relevant federal and state laws, if any apply
to the case at hand).

Properly paranoid ITvolken would likely consider ALL transport layers
including their own internal LAN not to be secure and would use
ssh/ssl/vpn bidirectional encryption of all network traffic period.  If
it weren't for the fact that there is less motivation to encrypt the
data on the physically secured actual server disks (so the only means of
access are through dobermans and locked doors or by cracking the servers
from outside, in which case you've already lost the game) one would
extend the data encryption to the database itself, and I'm sure that
there are sites that don't trust even their own staff or the moral
character of their dobermans that so do.  I don't want to THINK about
what one has to endure to obtain access to e.g. NSA or certain military
datasites -- probably body cavity searches in and out, shaved heads and
paper suits, and metal detectors, that sort of thing...:-)

> > However, there can be as much or as little physical security for the
> > tape as you care to put there.  Tape in a locked safe, tape in an
> > armored car.
> 
> dont forget to lock the car/safe too :-)
> and log who goes in and out of the "safe" area :-)

Ya, precisely.  It is only partly a joke, you see.  If my Duke HR or my
medical records turn up on the street, with somebody purporting to be me
cleaning out my bank account and maxing my visa, with my applications
for health insurance denied because they've learned about my heavy
drinking problem and all the crack that I smoke (I don't know where
Jakob got the idea that I don't sit here fuming away all day:-) and the
consequent liver failure and bouts of wild-eyed babbling (like this one,
strangely enough:-), my plans for a fusion generator that you can build
in your garage turning up being patented by Exxon and so forth Duke had
DAMN WELL better be able to show my attorney and a court logs of who had
access to this data, proofs that it was never left lying around in cars
(locked or unlocked), proofs that it was transmitted in encrypted form,
etc.  Otherwise I'm detonating the cobalt bomb in my backyard and Duke
will be a radioactive wasteland for a few kiloyears...(it is only a
couple of miles away).

This is the kind of thing that gives IT security officers ulcers.
Duke's current SO is actually a former engineering school beowulfer (and
good friend of mine) whose voice is scattered through the list archives
(Chris Cramer).  As a former 'wulfer (and EE), he is damn smart and
computer-expert (and handsome and witty, just like everybody else on
this list:-).  However, he sweats bullets because Duke is a huge
organization with lots of architectures scattered all over campus --
Windows here (any flavor), Macs there, Suns, Linux boxen, there are
likely godforsaken nooks on campus that still have IBM mainframes and
VAXes.  Sensitive data is routinely served across the campus backbone
and beyond (e.g. I can see my advisees' current transcripts where I sit
at this very moment).  Even with SSL, this data is vulnerable in
fragments to any successful exploit on any client that belongs to any
partially privileged person and that runs a vulnerable operating system.

Hmmm, you say -- wasn't there recently an RPC exploit on a certain very
common OS that permitted crackers to put anything they wanted including
snoops on all cracked clients (not to mention a steady stream of lesser
but equally troublesome invasions of the viral sort)?  Didn't this cost
institutions all over the world thousands of FTE hours to put right
before somebody actually used it to steal access to valuable data?  Why
yes, I believe that there was!  I believe it did!  However, as one who
got slammed (blush) a year ago on an unfortunately unpatched linux box
and who has seen countless exploits succeed against all flavors of
networked OS over many years, I avoid feeling too cocky about it.  

Nevertheless, Chris just keeps suckin' down the prozac and phillips
cocktails dealing with crap like this and knowing that it is his butt on
line should a malevolent attack succeed in compromising Duke's mountains
of sensitive data (gulp) being served by minions whose primary systems
expertise was developed back when knowing cobol was a part of the job
description (gulp) running on servers with, um "interesting" base
architectures (gulp)...

> > I get the feeling that you just don't like tapes, Alvin...;-)
> 
> not my first choice for backups .. even offsite backups...
> 
> but if "management" takes out the $$$ to do tape backups... so it shall be
> done ...
> 	ideally, everything works ...  but unfortunately, tapes
> 	are highly prone to people's "oops i forgot to change it
> 	yesterday"  or the weekly catridge

They are indeed (as the example I gave of a recent small-scale disaster
at Duke clearly shows). A site run by a wise IT human would use a pretty
rigorous protocol to regulate the process so that even if you have e.g.
student labor doing the tape changes there is strict accountability and
people checking the people checking the people who do the job, and so
that tapes are randomly pulled every month and checked to be sure that
the data is actually getting on the tapes in retrievable form.

You can bet that Duke has such a process in place now, if they didn't
before, although Universities tend to be a loose amalgamation of
quasi-independent fiefdoms that accept control and adopt security
measures for the common good and hire competent systems administrators
and develop shared protocols for ensuring data integrity about as often
and as easily as one would expect.  (Sound of Chris in the background
crunching another mylantin and washing it down with P&P:-) So in place
or not, the risk remains.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 10 09:34:25 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 10 Oct 2003 09:34:25 -0400 (EDT)
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <20031010065837.GH17432@unthought.net>
Message-ID: <Pine.LNX.4.44.0310100839340.3542-100000@lilith.rgb.private.net>

On Fri, 10 Oct 2003, Jakob Oestergaard wrote:

> On Thu, Oct 09, 2003 at 09:31:13PM -0400, Robert G. Brown wrote:
> ...
> > Each disk has about one fourth of the information.  English is about 3:1
> > compressible (really more; this is using simple symbolic compression).
> > A good cryptanalyst could probably recover "most" of what is on the
> > disks from any one disk, depending on what kind of data is there.
> 
> You overlook the fact that data on a RAID-5 is distributed in 'chunks'
> of sizes around 4k-128k (depending...)

Overlook, hell.  I'm using my usual strategy of feigning knowledge with
the complete faith that my true state of ignorance will be publically
displayed to the entire internet.  This humiliation, in turn, will
eventually cause such mental anguish that I'll be able to claim mental
disability and retire to tending potted plants on a disability check for
the rest of my life...

You probably noticed that I used the same strategy quite recently
regarding things like factors of N in disk read speed estimates, certain
components in disk latency, and oh, too many other things to mention.
Pardon me if I babble on a bit this morning, but my lawy... erm,
"psychiatrist" insists that I need fairly clear evidence of disability
to get away with this.

I personally find that smoking crack cocaine induces a pleasant tendency
to babble nonsense.  And there is no place to babble for the record like
the beowulf list archives, I always say...:-)

> So you would get the entire first 'Introduction to evil empire plans',
> but the entire 'Subverting existing banana government' chapter may be on
> one of the disks that you are missing.
...
> I'm just thinking of distributing two tapes for each disk - one with
> 200G of random numbers, the other with 200G of data XOR'ed with the data
> from the first tape.

Or just one tape, xor'd with 200G worth of random numbers generated from
a cryptographically strong generator via a relatively short key that you
can (as you note) send or carry separately and which is smaller, easier
to secure, and less susceptible to degradation or loss than a second
tape.  It's cheaper that way, and even if you use two tapes people are
going to try cracking the master tape by trying to guess the
key+algorithm you almost certainly used to generate it (see below), so
the xor is no stronger than the key+algorithm combination.;-)

> Enter the one-time pad - unbreakable encryption (unless you get a hold
> of both tapes of course).

Or determine the method and key you used for (oxymoronically) generating
200 Gigarands (which is NOT going to be a hardware generator, I don't
think, unless you are a very patient person or build/buy a quantum
generator or the like -- entropy based things like /dev/random are too
slow, and even quantum generators I've looked into are barely fast
enough:-).

> You'd need to make sure you have good random numbers - as an extra

Ah, that's the rub.  "Good random numbers" isn't quite an oxymoron.
Why, there is even a government standard measure for cryptographic
strength in the US (which many/most generators fail, by the way).
Entropy based generators tend to be very slow -- order of 10-100 kbps
depending on the source of entropy, last I looked.  Quantum generators
IIRC that rely on e.g. single photon transmission events at
half-silvered mirrors have to run at light intensities where single
photon events are discernible (rare, that is) and STILL have to wait for
an autocorrelation time or ten before opening a window for the next
event because even quantum events like this have an associated
correlation time due to the existence of extended correlated states in
the radiating system.  Photon emission from a single atom itself is
antibunched, for example, as after an emission the system requires time
for the single radiating atom to regain a degree of excitation
sufficient to enable re-emission.  I believe that they can achieve more
like 1 mbps of randomness or at least unpredictability.  As you'd need
1.6x10^12 bits to encode your tape, you'd have to wait around 1.6x10^6
seconds to generate the key.  That is, hmmm, between two and three week,
twenty to thirty weeks with an entropy generator, unless you used a
beowulf of entropy generators to shorten the time:-).

Not exactly in the category of "generate a one-time pad while I go have
a cup of coffee".

Using a truly oxymoronic but much faster (and cryptographically strong)
random number generator, e.g.  the mt19937 from the GSL one can generate
a respectable ballpark of 16 MBps (note B, not b) of random bytes and be
done in a mere four hours.  Alas, mt19937 is seeded from a long int and
the seed probably doesn't have enough bits to be secure against a brute
force attack, so one would likely have to fall back on one of the actual
algorithms that permit the use of long keys (1024 bits or even more).

> No no no no no!  Think big!
> 
> Think: cobalt bomb in own backyard - threaten anyone who steals your
> data, that you'll make the planet inhabitable for a few hundred
> decades unless they hand back your tapes.   ;)
> 
> (I'm drafting up 'Introduction to evil empire plans' soon by the way  ;)

Hmm, I'll have to mail you some of my lithium pills, Jakob.  Your own
prescription obviously ran out...:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From msnitzer at lnxi.com  Fri Oct 10 11:28:03 2003
From: msnitzer at lnxi.com (Mike Snitzer)
Date: Fri, 10 Oct 2003 09:28:03 -0600
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>; from cjtan@optimanumerics.com on Thu, Oct 09, 2003 at 10:04:20AM +0000
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es> <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com> <20031006211806.GB2091@greglaptop.internal.keyresearch.com> <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
Message-ID: <20031010092803.A5136@lnxi.com>

On Thu, Oct 09 2003 at 04:04,
C J Kenneth Tan -- Heuchera Technologies <cjtan at optimanumerics.com> wrote:

> Greg,
> 
> > Is it a 100x100 matrix LU decomposition? Well, no, because Intel's
> > MKL and the free ATLAS library run at a respectable % of peak.
> 
> Our benchmarks concentrate on xGEQRF, xGESVD, xGETRF, xGETRS, xGESV,
> xPOTRF, xPOSV, xPPTRF, xGEEV, extending to xGETRI, and xTRTRI.
> 
> Have you tried DPPSV or DPOSV on Itanium, for example?  I would be
> interested in the percentage of peak that you achieve with MKL and
> ATLAS, for up to 10000x10000 matrices.
> 
> ATLAS does not have full LAPACK implementation.

This gets ATLAS to provide its faster LAPACK routines to a full LAPACK
library: 
http://math-atlas.sourceforge.net/errata.html#completelp

Mike

-- 
Mike Snitzer                           msnitzer at lnxi.com
Linux Networx                          http://www.lnxi.com 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Patrick.Begou at hmg.inpg.fr  Fri Oct 10 13:55:43 2003
From: Patrick.Begou at hmg.inpg.fr (Patrick Begou)
Date: Fri, 10 Oct 2003 19:55:43 +0200
Subject: PVM errors at startup
Message-ID: <3F86F29F.8A37AC5B@hmg.inpg.fr>

Hi 

I'm new on this list so, just 2 lines about me:
A small linux beowulf cluster (10 nodes) for computational fluids
dynamics in
south-est of France (National Polytechnique Institute from Grenoble) .

I've just updated my cluster (from AMD1500+/ Eth100BT to P4 2.8G +
Gigabit ethernet) and I've updated my system to Red-Hat 7.3, Kernel
2.4.20-20-7. The current version of pvm is pvm-3.4.4-2 from the RedHat
7.3. The previous system was RH7.1.

Since this update I'm unable to start PVM from a node to another (with
the add command).
The console hang for several tenth of seconds then says OK.
The pvmd3 is started on the remote node but the conf command do not show
the additionnal node and I get these errors in the /tmp/pvml.xx file:

[t80040000] 10/10 15:58:31 craya.hmg.inpg.fr (xxx.xxx.xxx.xxx:32772)
LINUX 3.4.4
[t80040000] 10/10 15:58:31 ready Fri Oct 10 15:58:31 2003
[t80040000] 10/10 16:01:46 netoutput() timed out sending to craya02
after 14, 190.000000
[t80040000] 10/10 16:01:46  hd_dump() ref 1 t 0x80000 n "craya02" a ""
ar "LINUX" dsig 0x408841
[t80040000] 10/10 16:01:46            lo "" so "" dx "" ep "" bx "" wd
"" sp 1000
[t80040000] 10/10 16:01:46            sa 192.168.81.2:32770 mtu 4080 f
0x0 e 0 txq 1
[t80040000] 10/10 16:01:46            tx 2 rx 1 rtt 1.000000 id "(null)"


rsh and rexec are working (from master to nodes, from nodes to master
and from nodes to nodes). The transfert speed is near 600Mbits/s on the
network (binary ftp on /dev/null)

variables are set:
PVM_ARCH=LINUX
PVM_RSH=/usr/bin/rsh
PVM_DPATH=/usr/local/pvm3/lib/LINUX/pvmd3
PVM_ROOT=/usr/local/pvm3


I've tried so manythings since thes last 3 days:

- trying to compile install pvm3.4.4.tgz from sources file
- uninstall iptables, ipchains and iplock.
- remove /etc/security (to test this with root authority)
- added .rhosts and hosts.equiv file
- on the master eth0 is 100Mbits toward internet and eth1 is GB towards
the nodes.
I've tried the oposite config: eth0 become GB and eth1 100BT.

Always the same problem!

The cluster is down and I do not know where looking for a solution
now....

If some one could help me solving this problem

Thanks for your help

Patrick
-- 
===============================================================
|  Equipe M.O.S.T.         | http://most.hmg.inpg.fr          |
|  Patrick BEGOU           |       ------------               |
|  LEGI                    | mailto:Patrick.Begou at hmg.inpg.fr |
|  BP 53 X                 | Tel 04 76 82 51 35               |
|  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71               |
===============================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Fri Oct 10 21:53:34 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 10 Oct 2003 21:53:34 -0400
Subject: building a RAID system - 8 drives - drive-net - tapes
In-Reply-To: <20031010054357.GB13480@sphere.math.ucdavis.edu>
References: 	 <Pine.LNX.3.96.1031009162851.1271C-100000@Maggie.Linux-Consulting.com>
	 <Pine.LNX.4.44.0310092109310.3542-100000@lilith.rgb.private.net>
	 <20031010054357.GB13480@sphere.math.ucdavis.edu>
Message-ID: <1065837212.18644.0.camel@QUIGLEY.LINIAC.UPENN.EDU>

On Fri, 2003-10-10 at 01:43, Bill Broadley wrote:
> On the hardware vs software RAID thread.  A friend needed a few TB and
> bought a high end raid card (several $k), multiple channels, enclosure,
> and some 10's of 73GB drives for somewhere in the $50k-$100k neighborhood.
> 
> He needed the capacity and a minumum of 50MB/sec sequential write
> performance (on large sequential writes).  He didn't get it.  Call #1 to
> dell resulted in well it's your fault, it's our top of the line, it should
> be plenty fast, bleah, bleah, bleah.  Call #2 lead to an escalation to
> someone with more of a clue, tune paramater X, tune Y, try a different
> raid setup, swap out X, etc.  After more testing without helping call #3
> was escalated again someone fairly clued answered.  The conversation went
> along the lines of what, yeah, it's dead slow.  Yeah most people only
> care about the reliability.  Oh performance?  We use linux + software
> raid on all the similar hardware we use internally at Dell.
> 
> So the expensive controller was returned, and 39160's were used in it's
> place (dual channel U160) and performance went up by a factor of 4 or
> so.  

Can you give more concrete pointers to the hardware that they ended up
using ? -- specifically the enclosure.

Thanks!
Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cjtan at optimanumerics.com  Fri Oct 10 13:55:14 2003
From: cjtan at optimanumerics.com (C J Kenneth Tan -- Heuchera Technologies)
Date: Fri, 10 Oct 2003 17:55:14 +0000 (UTC)
Subject: Intel compilers and libraries
In-Reply-To: <20031010092803.A5136@lnxi.com>
References: <20031006112102.GC15837@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310061446560.1437@krylov.OptimaNumerics.com>
 <20031006211806.GB2091@greglaptop.internal.keyresearch.com>
 <Pine.LNX.4.56.0310090910550.18109@krylov.OptimaNumerics.com>
 <20031010092803.A5136@lnxi.com>
Message-ID: <Pine.LNX.4.56.0310101749460.27811@krylov.OptimaNumerics.com>

Mike,

> > Have you tried DPPSV or DPOSV on Itanium, for example?  I would be
> > interested in the percentage of peak that you achieve with MKL and
> > ATLAS, for up to 10000x10000 matrices.
> >
> > ATLAS does not have full LAPACK implementation.
>
> This gets ATLAS to provide its faster LAPACK routines to a full LAPACK
> library:
> http://math-atlas.sourceforge.net/errata.html#completelp

Inserting the LU factorization code from ATLAS to publicly available
LAPACK will only get you faster LU code in the rest of the publicly
available LAPACK library.  You will not gain from QR factorization
code, Cholesky factorization code, etc..


Ken
-----------------------------------------------------------------------
C. J. Kenneth Tan, Ph.D.
Heuchera Technologies Ltd.
E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
-----------------------------------------------------------------------
This e-mail (and any attachments) is confidential and privileged.  It
is intended only for the addressee(s) stated above.  If you are not an
addressee, please accept my apologies and please do not use,
disseminate, disclose, copy, publish or distribute information in this
e-mail nor take any action through knowledge of its contents: to do so
is strictly prohibited and may be unlawful.  Please inform me that
this e-mail has gone astray, and delete this e-mail from your system.
Thank you for your co-operation.
-----------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Sat Oct 11 13:01:17 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Sat, 11 Oct 2003 13:01:17 -0400 (EDT)
Subject: Intel compilers and libraries
In-Reply-To: <Pine.LNX.4.56.0310101749460.27811@krylov.OptimaNumerics.com>
Message-ID: <Pine.LNX.4.44.0310111300460.10307-100000@coffee.psychology.mcmaster.ca>

> Inserting the LU factorization code from ATLAS to publicly available
> LAPACK will only get you faster LU code in the rest of the publicly
> available LAPACK library.  You will not gain from QR factorization
> code, Cholesky factorization code, etc..

oh, sure, but LU is the only important one because of top500 ;)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Sat Oct 11 16:16:12 2003
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Sat, 11 Oct 2003 15:16:12 -0500
Subject: Help in rsh
In-Reply-To: <LAW12-F11DXeBNKYnvQ000098e2@hotmail.com>; from diego_naruto@hotmail.com on Sat, Oct 11, 2003 at 07:14:13PM +0000
References: <LAW12-F11DXeBNKYnvQ000098e2@hotmail.com>
Message-ID: <20031011151612.A22568@mikee.ath.cx>

On Sat, 11 Oct 2003, diego lisboa wrote:

> Hi,
> I?m having problems with a cluster that i?ve had mount here, it?s a small 
> cluster with 3 machines, i already have instaled NIS and NFS and it?s 
> working very well, with red hat 9.0. When i use LAM 6.5.8 rpm it works 
> beutiful, but i need to install XMPI that needs LAM6.5.8.tar.gz (compiled) 
> and with trilliun, when i install on master it works, but on slaves i have a 
> problem with rsh, and hboot doens?t find "squema LAM" or something like 
> that. Someboy can help me?
> Thanks

Try something more simple first. What happens when you do

$ rsh -l USER HOST uptime

does that work?

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From diego_naruto at hotmail.com  Sat Oct 11 15:14:13 2003
From: diego_naruto at hotmail.com (diego lisboa)
Date: Sat, 11 Oct 2003 19:14:13 +0000
Subject: Help in rsh
Message-ID: <LAW12-F11DXeBNKYnvQ000098e2@hotmail.com>

Hi,
I?m having problems with a cluster that i?ve had mount here, it?s a small 
cluster with 3 machines, i already have instaled NIS and NFS and it?s 
working very well, with red hat 9.0. When i use LAM 6.5.8 rpm it works 
beutiful, but i need to install XMPI that needs LAM6.5.8.tar.gz (compiled) 
and with trilliun, when i install on master it works, but on slaves i have a 
problem with rsh, and hboot doens?t find "squema LAM" or something like 
that. Someboy can help me?
Thanks

_________________________________________________________________
MSN Hotmail, o maior webmail do Brasil.  http://www.hotmail.com

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Sat Oct 11 19:10:29 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Sat, 11 Oct 2003 16:10:29 -0700 (PDT)
Subject: building a RAID system - 8 drives - drive-net - tapes - preferences
In-Reply-To: <Pine.LNX.4.44.0310100718260.3542-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.3.96.1031011160255.4595A-100000@Maggie.Linux-Consulting.com>


hi ya robert

On Fri, 10 Oct 2003, Robert G. Brown wrote:

> > not trying to protect the tapes against the [cr/h]ackers ( different
> > ball game )  and even not protecting against the spies of nsa/kgb etc
> > either  ( whole new ballgame for those types of backup issues )
> 
> Hmmm, this is morphing offtopic, but data security is a sufficiently
> universal problem that I'll chance one more round.  Pardon me while I
> light up my crack pipe here...:-7... so I can babble properly. Ah, ya,
> NOW I'm awake...:-)

humm .. gimme some of that :-)

> The point is that you cannot do this with precisely HR records or salary
> records or employee reviews.  If anybody gets hold of the data (casually
> or not) be it on tape or disk or the network while it is in transit then
> you are liable to the extent that you failed to take adequate measures
> to ensure the data's security.  By the numbers:

security of clusters vs security of normal compute environments
and normal users from home and/or w/ laptops requires varying
degreee of security policies

- from looking at the various incoming sven virus (MS update virus stuff)
- about 75% of the incoming junk is coming from (mis-managed) clusters


80% of the security issues will be due to internal folks and not
the outsiders..
	and i'd hate to be the one responsible for security on
	an university network where there are tons of bright young
	and ambitious kids looking for a "trophy"
 
my security rules, assume the hacker is sitting in the firewall .. 
w/ root passwds .. now protect your data is my model ... 
	- if they have a keyboard sniffer installed .. game over ..
	( there'd be no need to guess what the pass phrase was )

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From victor_ms at brturbo.com  Sun Oct 12 11:27:23 2003
From: victor_ms at brturbo.com (Victor Lima)
Date: Sun, 12 Oct 2003 12:27:23 -0300
Subject: Benchmarks
Message-ID: <3F8972DB.6080802@brturbo.com>

Hi All.
I'm new on list.
Well I have a small linux clusters with 18 P4 2.8 Ghz with FastEthernet 
100Mbits
I need some benchmarks softwares for Latency, Thoughtput on Ethernet, etc.
Ate.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at bellsouth.net  Sun Oct 12 19:10:07 2003
From: laytonjb at bellsouth.net (Jeffrey B. Layton)
Date: Sun, 12 Oct 2003 19:10:07 -0400
Subject: Benchmarks
In-Reply-To: <3F8972DB.6080802@brturbo.com>
References: <3F8972DB.6080802@brturbo.com>
Message-ID: <3F89DF4F.1070500@bellsouth.net>


I'm suprised no one has jumped on this yet. There are several
packages for testing basic network performance from one node
to another. My personal favorite is netpipe:

http://www.scl.ameslab.gov/netpipe/

The other one is netperf:

http://www.netperf.org/netperf/NetperfPage.html

The web pages are pretty good about explaining things.

Good Luck!

Jeff


> Hi All.
> I'm new on list.
> Well I have a small linux clusters with 18 P4 2.8 Ghz with 
> FastEthernet 100Mbits
> I need some benchmarks softwares for Latency, Thoughtput on Ethernet, 
> etc.
> Ate.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Mon Oct 13 03:34:33 2003
From: john.hearns at clustervision.com (John Hearns)
Date: Mon, 13 Oct 2003 09:34:33 +0200 (CEST)
Subject: Benchmarks
In-Reply-To: <3F8972DB.6080802@brturbo.com>
Message-ID: <Pine.LNX.4.44.0310130933470.25773-100000@druifje.clustervision.com>

On Sun, 12 Oct 2003, Victor Lima wrote:

> Hi All.
> I'm new on list.
> Well I have a small linux clusters with 18 P4 2.8 Ghz with FastEthernet 
> 100Mbits
> I need some benchmarks softwares for Latency, Thoughtput on Ethernet, etc.

Have a look at Pallas http://www.pallas.com/e/products/pmb/  


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Mon Oct 13 09:38:36 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Mon, 13 Oct 2003 15:38:36 +0200
Subject: Intel and GNU C++ compilers
Message-ID: <20031013133836.GA1083@sgirmn.pluri.ucm.es>

Hello:

I just wanna thank everybody for the responses to my last question about
Intel compiler, I tried both 'gcc' and 'icc', and got the following results
for one of our work files containing 10^6 steps of calculation:

**************************
*** gcc version 2.95.4 ***
**************************
flags                bin-size   elapsed-time
-----                --------   ------------
none                 9.5 KB     311 sec
"-O3"                8.7 KB     192 sec
"-O3 -ffast-math"    8.7 KB     165 sec
********************************************

***********************
*** icc version 7.1 ***
***********************
flags                                    bin-size   elapsed-time
-----                                    --------   ------------
none                                     597 KB     100 sec
"-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0"   563 KB      89 sec
****************************************************************

the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions
respectively, I guess that using a newer 'gcc', capable of '-march=pentium4'
and SSE2 extensions would improve 'gcc' results.

I am running on a Dual Xeon 2.4 Ghz machine, with 2Gb of RAM. I use
Debian Woody with a 2.4.22 kernel compiled by myself. HyperThreading
is disabled at the BIOS level.

The test were run on one processor only.

Thanks,

Jose M. Perez.
Madrid, Spain.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Mon Oct 13 12:04:47 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Mon, 13 Oct 2003 12:04:47 -0400 (EDT)
Subject: Intel and GNU C++ compilers
In-Reply-To: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
Message-ID: <Pine.LNX.4.44.0310131158390.10307-100000@coffee.psychology.mcmaster.ca>

> *** gcc version 2.95.4 ***

that's god-aweful ancient.

> none                 9.5 KB     311 sec
> "-O3"                8.7 KB     192 sec
> "-O3 -ffast-math"    8.7 KB     165 sec

-fomit-frame-pointer usually helps, sometimes noticably,
since x86 is so short of registers.  -O3 is often not 
better than -O2 or -Os, mainly because of interactions 
between unrolling, Intel's microscopic L1's, and the 
difficulty of scheduling onto a tiny reg set...

I'd be surprised if 3.3 or 3.4 (pre-release) didn't perform
noticably better.

> flags                                    bin-size   elapsed-time
> -----                                    --------   ------------
> none                                     597 KB     100 sec
> "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0"   563 KB      89 sec

isn't -tpp2 redundant if you have -xW?

> the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions
> respectively, I guess that using a newer 'gcc', capable of '-march=pentium4'
> and SSE2 extensions would improve 'gcc' results.

yes.  '-march=pentium4 -fpmath=sse' seems to do it.  gcc doesn't have 
an auto-vectorizer yet, unfortunately.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From indigoneptune at yahoo.com  Mon Oct 13 13:37:47 2003
From: indigoneptune at yahoo.com (stanley george)
Date: Mon, 13 Oct 2003 10:37:47 -0700 (PDT)
Subject: benchmarks for performance
Message-ID: <20031013173747.37343.qmail@web14912.mail.yahoo.com>

Hi,
I have a cluster of 8 P-III machines running redhat 8.
I am trying to measure combined performance in MFLOPS.


I have tried using linpakd and 1000d. It gives me an
error with 'Make.inc' file while compiling. How do I
get rid of this?

Which are the other bechmarking sotwares that I could
use?

Thank you very much

Stanley George

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Mon Oct 13 12:22:17 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Mon, 13 Oct 2003 18:22:17 +0200
Subject: Intel and GNU C++ compilers
In-Reply-To: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
References: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
Message-ID: <200310131822.17717.joachim@ccrl-nece.de>

Jos? M. P?rez S?nchez:
> I just wanna thank everybody for the responses to my last question about
> Intel compiler, I tried both 'gcc' and 'icc', and got the following results
> for one of our work files containing 10^6 steps of calculation:

Jos?,

thanks for the information, but you really should (also) use the latest gcc 
(3.3x) for such a comparision. It will be interesting to see how it performs 
relative to the latest icc on the one hand, and to the old gcc on the other 
hand.

And some information on the application (or libraries used) would be helpful, 
too. Like: is it memory-bound or compute-bound, etc..

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Mon Oct 13 15:26:55 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Mon, 13 Oct 2003 12:26:55 -0700
Subject: Intel and GNU C++ compilers
In-Reply-To: <Pine.LNX.4.44.0310131158390.10307-100000@coffee.psychology.mcmaster.ca>
References: <20031013133836.GA1083@sgirmn.pluri.ucm.es> <Pine.LNX.4.44.0310131158390.10307-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20031013192655.GC16033@greglaptop.internal.keyresearch.com>

On Mon, Oct 13, 2003 at 12:04:47PM -0400, Mark Hahn wrote:

> -fomit-frame-pointer usually helps, sometimes noticably,
> since x86 is so short of registers.

Actually it's a lot more of a tossup than it used to be: having a
frame pointer means you have another 256 bytes accessible via a
single-byte offset, and the SSE registers help relieve the register
pressure problem.

On the Opteron, which has more of both general purpose and SSE
registers, the frame pointer is often a win.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Mon Oct 13 21:21:34 2003
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Mon, 13 Oct 2003 18:21:34 -0700 (PDT)
Subject: The Canadian Internetworked Scientific Supercomputer
Message-ID: <20031014012134.21517.qmail@web11403.mail.yahoo.com>

Just found an interesting paper written by Paul Lu (the auther of
PBSWeb):

http://hpcs2003.ccs.usherbrooke.ca/papers/Lu.pdf

CISS homepage:
http://www.cs.ualberta.ca/~ciss/

Rayson


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Tue Oct 14 10:19:11 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Tue, 14 Oct 2003 16:19:11 +0200
Subject: Intel and GNU C++ compilers
In-Reply-To: <200310131822.17717.joachim@ccrl-nece.de>
References: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
 <200310131822.17717.joachim@ccrl-nece.de>
Message-ID: <20031014141911.GA995@sgirmn.pluri.ucm.es>

On Mon, Oct 13, 2003 at 06:22:17PM +0200, Joachim Worringen wrote:
> thanks for the information, but you really should (also) use the latest gcc 
> (3.3x) for such a comparision. It will be interesting to see how it performs 
> relative to the latest icc on the one hand, and to the old gcc on the other 
> hand.
> 
> And some information on the application (or libraries used) would be helpful, 
> too. Like: is it memory-bound or compute-bound, etc..
> 
>  Joachim

I installed gcc-3.3.2 from the debian testing distribution, here it is
the full report including gcc-3.3.2:

**************************
*** gcc version 2.95.4 ***
**************************
flags                bin-size   elapsed-time
-----                --------   ------------
none                 9.5 KB     311 sec
"-O3"                8.7 KB     192 sec
"-O3 -ffast-math"    8.7 KB     165 sec
********************************************

*************************
*** gcc version 3.3.2 ***
*************************
flags                                     bin-size   elapsed-time
-----                                     --------   ------------
none                                      9.1 KB     245 sec
"-O3"                                     8.8 KB     161 sec
"-O2"                                     8.7 KB     157 sec
"-O2 -ffast-math -fomit-frame-pointer"    8.5 KB     127 sec
"-O2 -ffast-math"                         8.5 KB     125 sec
"-O2 -ffast-math -march=pentium4"         8.5 KB     120 sec
"-O2 -ffast-math -march=pentium4 -msse2"  8.5 KB     120 sec
"-O3 -ffast-math -march=pentium4 -msse2"  8.5 KB     120 sec
********************************************

***********************
*** icc version 7.1 ***
***********************
flags                                    bin-size   elapsed-time
-----                                    --------   ------------
none                                     597 KB     100 sec
"-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0"   563 KB      89 sec
****************************************************************

For this test, we actually wrote a version of the program with many parameters
hardcoded, so that we make it as compute bound as posible, we aimed at
evaluating how the different compilers took advantage of the Xeon
processors.

I will repeat the tests with the full version, which includes more
memory usage, maybe about 80Mb each process, but it will finally depend
on how big we make the files we use to split the calculations.

The main calculation is the phase of a particle, we use an
implementation of the MersenneTwister algorithm:

http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html

and have to compute sqrt(-2*log(x)/x) and sin(C*x/y) (x and y are not
position, they correspond to other variables in the program), C is a
constant hardcoded in the code like sin(9.7438473847*x/y).

I measured how much it it took to compute sqrt(-2*log(x)/x), and it was
about 412 processor cycles (I used rdtscll() ).

I will submit other results as soon as I get them, probably using another
computing algorithm which runs quite faster.

Regards,

Jose M. Perez.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Tue Oct 14 10:32:19 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Tue, 14 Oct 2003 16:32:19 +0200
Subject: Intel and GNU C++ compilers
In-Reply-To: <Pine.LNX.4.56.0310131628350.15969@krylov.OptimaNumerics.com>
References: <20031013133836.GA1083@sgirmn.pluri.ucm.es>
 <Pine.LNX.4.56.0310131628350.15969@krylov.OptimaNumerics.com>
Message-ID: <20031014143219.GB995@sgirmn.pluri.ucm.es>

On Mon, Oct 13, 2003 at 04:40:31PM +0000, C J Kenneth Tan -- Heuchera Technologies wrote:
> Jose,
> 
> Can we benchmark our OptimaNumerics Linear Algebra Library with you on
> the same machine?
> 
> Thank you very much!
> 
> 
> Best wishes,
> Kenneth Tan
> -----------------------------------------------------------------------
> C. J. Kenneth Tan, Ph.D.
> Heuchera Technologies Ltd.
> E-mail: cjtan at OptimaNumerics.com      Telephone: +44 798 941 7838
> Web: http://www.OptimaNumerics.com    Facsimile: +44 289 066 3015
> -----------------------------------------------------------------------

Hi Kenneth:

Thank you very much for your message, unfortunately we have a pretty
tied schedule here, and lot's of different things to do. Right now I
cannot spend time benchmarking your library on my system, and we cannot
provide access to anyone from outside.

On the other hand I don't know if the calculations I am running at
this moment can exploit your libraries.

Thanks again and best regards,

Jose M. Perez
Madrid. Spain.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From michael.fitzmaurice at ngc.com  Mon Oct 13 10:52:18 2003
From: michael.fitzmaurice at ngc.com (Fitzmaurice, Michael)
Date: Mon, 13 Oct 2003 07:52:18 -0700
Subject: Beowulf Users Group meeting
Message-ID: <03E95480F0B2D042A7598115FB3F5D9D49F3E4@XCGVA009>

Please join us at the Baltimore-Washington Beowulf Users Group meeting this
Tuesday the 14th at 2:45 at the Northrop Grumman building on 7575 Colshire
Drive; McLean, VA 22102. For more details please go to <http://bwbug.org> 

Who should attend?

Sales, marketing and Business Development people
Pre sales engineers 
High Performance Computer professionals
IT generalist 
Data Center Managers
Program and Project Managers

Beowulf Clusters installations are one of the fastest growing areas with in
the IT market. Beowulf Clusters are replacing old slower SMP systems for
half the cost and with twice the performance. Beowulf Clusters will grow
even faster with the introduction of easier to use parallel programming
tools.

Engineered Intelligence is leading the revolution in break through parallel
programming tools for the HPC market. So now application on older SMP
machines can be easily moved to COTS cost effective Intel or AMD based
servers, which have been clustered to improve performance and reduce costs.
Come hear the folks from Engineered Intelligence how your projects can use C
x C to make your applications ready to use Beowulf Clusters today.

This will be one of our best topics regarding the Beowulf Cluster market.
There is no cost for the briefing and you do not need to be a BWBUG member.
As always there will be great door prizes and free parking. If you can not
make it to the meeting pass the word to a colleague or business associate.

T. Michael Fitzmaurice, Jr.
Coordinator of the BWBUG
8110 Gatehouse Road, Suite 400W
Falls Church, VA 22042
703-205-3132 office
240-475-7877 cell


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From iosephus at sgirmn.pluri.ucm.es  Tue Oct 14 12:08:50 2003
From: iosephus at sgirmn.pluri.ucm.es (=?iso-8859-1?Q?Jos=E9_M=2E_P=E9rez_S=E1nchez?=)
Date: Tue, 14 Oct 2003 18:08:50 +0200
Subject: Pentium4 vs Xeon
Message-ID: <20031014160850.GA1163@sgirmn.pluri.ucm.es>

Hi:

We are going to buy a second machine! :-) It will be a diskless dual
processor node. We are thinking about buying the same configuration:
Xeon 2.4Ghz 533Mhz FSB, but since Xeon and the motherboards supporting
them are so expensive, we have been thinking about dual normal Pentium4
instead. We don't have now any P4 comparable processor to run some
tests, and after looking at the Intel docs, the only difference we see
between Xeon and P4 is Xeon having more cache. Does anyone has any
idea about the relative performance of these processors, what about the
price/performance ratio? Is it worth paying for more Xeon?

The other point I wanna ask about is the "host bus speed" reported by
the kernel at boot time, it reports 133Mhz, and our memories are
supposed to run at 266Mhz, is it normal, is it just the double rate
thing?

Thanks in advance,

Jose M. Perez.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Patrick.Begou at hmg.inpg.fr  Tue Oct 14 12:53:54 2003
From: Patrick.Begou at hmg.inpg.fr (Patrick Begou)
Date: Tue, 14 Oct 2003 18:53:54 +0200
Subject: PVM errors at startup
References: <3F86F29F.8A37AC5B@hmg.inpg.fr>
Message-ID: <3F8C2A22.83F751A3@hmg.inpg.fr>

This email just to close the thread with the solution.

The problem was not related to any PVM misconfiguration but to the
ethernet driver. Looking at the ethernet communications between 2 nodes
with tcpdump has shown that pvmd was started using tcp communications
BUT that pvmd were trying to talk each other with UDP protocol (it is
also detailed in the PVM doc) and this was the problem. The UDP
communications was unsuccessfull between the nodes.

Details:
The nodes are P4 2.8 with Asustek P4P800 motherboard and on board 3C940
(gigabit) controler. I was using the 3c2000 driver (from the cdrom).
Kernel is 2.4.20-20.7bigmem from RedHat 7.3.
rsh, rexec and rcp are working fine but this driver seems not to work
with UDP protocol???

The solution was to download the sk68lin driver (v6.18) and run the
shell script to patch the kernel sources for the current kernel. Then
correct the module.conf file and set up the gigabit interface. Now PVM
is working fine between the two first nodes and the measured throughput
is the same as with 3c2000 asustek driver. I should now setup the other
nodes!

I would like to thanks Pr. Kenneth R. Koehler and Dr James Arthur Kohl
for their great help in checking the full PVM configuration and leading
me towards a network driver problem.

Patrick
-- 
===============================================================
|  Equipe M.O.S.T.         | http://most.hmg.inpg.fr          |
|  Patrick BEGOU           |       ------------               |
|  LEGI                    | mailto:Patrick.Begou at hmg.inpg.fr |
|  BP 53 X                 | Tel 04 76 82 51 35               |
|  38041 GRENOBLE CEDEX    | Fax 04 76 82 52 71               |
===============================================================
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From josip at lanl.gov  Tue Oct 14 13:38:35 2003
From: josip at lanl.gov (Josip Loncaric)
Date: Tue, 14 Oct 2003 11:38:35 -0600
Subject: Pentium4 vs Xeon
In-Reply-To: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
References: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
Message-ID: <3F8C349B.5040302@lanl.gov>

Jos? M. P?rez S?nchez wrote:

> [...]  we have been thinking about dual normal Pentium4 [...]

SMP operation and larger caches appear to be threshold features in 
Xeons.  Old Pentium III could be used in duals, but Intel's marketing 
has changed.  Normal Pentium4 is *not* dual processor enabled:

http://www.intel.com/products/desktop/processors/pentium4/index.htm?iid=ipp_browse+dsktopprocess_p4p&
http://www.intel.com/products/server/processors/server/xeon/index.htm?iid=ipp_browse+srvrprocess_xeon512&

If you really want a fast dual CPU machine from Intel, you'll probably 
have to pay for a Xeon...

Sincerely,
Josip


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From djholm at fnal.gov  Tue Oct 14 13:40:46 2003
From: djholm at fnal.gov (Don Holmgren)
Date: Tue, 14 Oct 2003 12:40:46 -0500
Subject: Pentium4 vs Xeon
In-Reply-To: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
References: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
Message-ID: <Pine.SGI.4.58.0310141226530.3997086@hppc.fnal.gov>


On Tue, 14 Oct 2003, [iso-8859-1] Jos? M. P?rez S?nchez wrote:

> Hi:
>
> We are going to buy a second machine! :-) It will be a diskless dual
> processor node. We are thinking about buying the same configuration:
> Xeon 2.4Ghz 533Mhz FSB, but since Xeon and the motherboards supporting
> them are so expensive, we have been thinking about dual normal Pentium4
> instead. We don't have now any P4 comparable processor to run some
> tests, and after looking at the Intel docs, the only difference we see
> between Xeon and P4 is Xeon having more cache. Does anyone has any
> idea about the relative performance of these processors, what about the
> price/performance ratio? Is it worth paying for more Xeon?
>
> The other point I wanna ask about is the "host bus speed" reported by
> the kernel at boot time, it reports 133Mhz, and our memories are
> supposed to run at 266Mhz, is it normal, is it just the double rate
> thing?
>
> Thanks in advance,
>
> Jose M. Perez.


The major difference between P4 and Xeon is that P4's are available with
up to 800 MHz FSB, and Xeon's with up to 533 MHz FSB.  If your code is
sensitive to memory bandwidth, a P4 can be a big win.  Otherwise they
are essentially equivalent.  P4 and standard Xeon both have 512K L2
caches.  Xeon's with larger L2 caches are available, but if I'm not
mistaken there's a big price difference.

Pricewise (YMMV), cheap desktop P4's can be had very roughly for half
the price of a comparable dual Xeon.  You may very well prefer to admin
half the number of boxes and so would prefer the Xeon.  If you are using
an expensive interconnect, you may also come out ahead with the dual
processor boxes, buying only half of the PCI adapters and half the
switch ports.

Currently P4 motherboards are only available (AFAIK) with 33MHz/32bit
PCI.  That can be a big bottleneck if your cluster application is
sensitive to I/O bandwidth.  Early in 2004, if the rumours are true,
there will be a P4 chipset supporting 66MHz/64bit PCI-X.  And in late
2004, PCI Express should be available on both P4 and Xeon motherboards,
providing a big increase in I/O bandwidth if one has a network which can
take advantage.

Xeon's and P4's do four transfers per clock - so, a 533MHz FSB is really
a 133MHz clock doing 4 transfers per cycle.  The kernel on my 800 MHz
FSB P4 reports a 200 MHz host bus speed.

Don Holmgren
Fermilab
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Tue Oct 14 17:14:26 2003
From: rodmur at maybe.org (Dale Harris)
Date: Tue, 14 Oct 2003 14:14:26 -0700
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310011000290.18171-100000@lilith.rgb.private.net>
References: <200310011613.46297.lepalom@upc.es> <Pine.LNX.4.44.0310011000290.18171-100000@lilith.rgb.private.net>
Message-ID: <20031014211426.GI8116@maybe.org>

On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated:
> 
> <?xml version="1.0"?>
> <sensors>
>   <cpu_temperature id="0" units="C">54.2</cpu_temperature>


You know... one problem I see with this, assuming this information is
going to pass across the net (or did I miss something).  Is that instead
of passing something like four bytes (ie "54.2"), you are going to be
passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
a little bit of data 14 times.  I can't see this being particularly
efficient way of using a network.  Sure, it looks pretty, but seems like
a waste of bandwidth.  


--
Dale Harris   
rodmur at maybe.org
/.-)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Tue Oct 14 18:13:53 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Tue, 14 Oct 2003 18:13:53 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031014211426.GI8116@maybe.org>
Message-ID: <Pine.LNX.4.44.0310141803570.5310-100000@coffee.psychology.mcmaster.ca>

> > <?xml version="1.0"?>
> > <sensors>
> >   <cpu_temperature id="0" units="C">54.2</cpu_temperature>
> 
> You know... one problem I see with this, assuming this information is
> going to pass across the net (or did I miss something).  Is that instead
> of passing something like four bytes (ie "54.2"), you are going to be
> passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
> a little bit of data 14 times.  I can't see this being particularly
> efficient way of using a network.  Sure, it looks pretty, but seems like
> a waste of bandwidth.  

I'm sure some would claim that 56 bytes is not measurable overhead,
especially considering the size of tcp/eth/etc headers.  but it's 
damn ugly, to be sure.  this sort of thing has been discussed several
times on the linux-kernel list as well - formatting of /proc entries.

it's clear that some form of human-readability is a good thing.
what's not clear is that it has to be so exceptionally verbose.

think of it this way: lmsensors output for a machine is a record
whose type will not change (very fast, if you insist!).  so why should
all the metadata about the record format, units, etc be sent each time?
suppose you could fetch the fully verbose record once, and then on 
subsequent queries, just get '54.2 56.7 40.1 3650 4150 5.0 3.3 12.0 -12.0'.
the only think you've lost is same-packet-self-description 
(and, incidentally, insensitivity to reordering of elements...)

there *is* actually a very mind-bending binarification procedure for xml.
it seems totally cracked to me, though, since afaikt, it completely tosses 
the self-description aspect, which is almost the main point of xml...
of course, the whole xml thing is a massive fraud, since it does nothing 
at all towards actual interoperability - there must already be thousands
of different xml schemas for "SKU", each better than the last, and therefore
mutually incompatible...

does ASN.1 improve on this situation at all?

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Oct 14 20:45:12 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 14 Oct 2003 20:45:12 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031014211426.GI8116@maybe.org>
Message-ID: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>

On Tue, 14 Oct 2003, Dale Harris wrote:

> On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated:
> > 
> > <?xml version="1.0"?>
> > <sensors>
> >   <cpu_temperature id="0" units="C">54.2</cpu_temperature>
> 
> 
> 
> You know... one problem I see with this, assuming this information is
> going to pass across the net (or did I miss something).  Is that instead
> of passing something like four bytes (ie "54.2"), you are going to be
> passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
> a little bit of data 14 times.  I can't see this being particularly
> efficient way of using a network.  Sure, it looks pretty, but seems like
> a waste of bandwidth.  

Ah, an open invitation to waste a little more:-)

Permit me to rant (the following can be freely skipped by the
rant-averse:-).  Note that this is not a flame, merely an impassioned
assertion of an admittedly personal religious viewpoint.  Like similar
rants concerning the virtues of C vs C++ vs Fortran vs Java or Python vs
Perl, it is intended to amuse or possible educate, but doubtless won't
change many human minds.

<rant> 

This is an interesting question and one I kicked around a long time when
designing xmlsysd.  Of course it is also a very longstanding issue -- as
old as computers or just about.  Binary formats (with need for endian
etc translation) are obviously the most efficient but are impossible to
read casually and difficult to maintain or modify.  Compressed binary
(or binary that only uses e.g.  one bit where one bit will do) the most
impossible and most difficult.  Back in the old days, memory and
bandwidth on all computers was a precious and rare thing.  ALL programs
tended to use one bit where one bit was enough.  Entire formats with
headers and metadata and all were created where every bit was
parsimoniously allocated out of a limited pool.  Naturally, those
allocations proved to be inadequate in the long run so that only a few
years ago lilo would complain if the boot partition had more than 1023
divisions because once upon a time somebody decided that 10 bits was all
this particular field was ever going to get.

In order to parse such a binary stream, it is almost essential to use a
single library to both format and write the stream and to read and parse
it, and to maintain both ends at the same time.  Accessing the data ONLY
occurs through the library calls.

This is a PITA.  Cosmically.  Seriously.  Yes, there are many computer
subsystems that do just this, but they are nightmarish to use even via
the library (which from a practical point of view becomes an API, a
language definition of its own, with its own objects and tools for
creating them and extracting them, and the need to be FULLY DOCUMENTED
at each step as one goes along) and require someone with a high level of
devotion and skill to keep them roughly bugfree.  For example, if you
write your code for single CPU systems, it becomes a major problem to
add support for duals, and then becomes a major problem again to add
support for N-CPU SMPs.  Debugging becomes a multistep problem -- is the
problem in the unit that assembles and provides the data, the encoding
library, the decoding library (both of which are one-offs,
written/maintained just for the base application) or is it in the client
application seeking access to the data?

Fortunately, in the old days, nearly all programming was done by
professional programmers working for a wage for giant (or not so giant)
companies.  Binary interfaces were ideal -- they became Intellectual
Property >>because<< they were opaque and required a special library
whose source was hidden to access the actual binary, which might be
entirely undocumented (except via its API library calls).  BECAUSE they
were so bloomin' hidden an difficult/expensive to modify, software
evolved very, very slowly, breaking like all hell every time e.g. MS
Word went from revision 1 to 2 to 3 to... because of broken binary
incompatibility.

ASCII, OTOH, has the advantage of being (in principle) easy to read.
However, it is easy to make it as obscure and difficult to read as
binary.  Examples abound, but let's pull one from /proc, since the
entire /proc interface is designed around the premise that ascii is good
relative to binary (although that seems to be the sole thing that the
many designers of different subsystems agree on).  When parsing the
basic status data of an application, one can work through:

rgb at lilith|T:105>cat /proc/1214/stat
1214 (pine) S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0
14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168
4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0

(which, as you can see, contains the information on the pine application
within which I am currently working on my laptop).

What?  You find that hard to read?  Surely it is obvious that the first
field is the PID, the second the application name (inside parens,
introducing a second, fairly arbitrary delimiter to parse), the runtime
status (which is actually NOT a single character, it can vary) and
then... ooo, my.  Time to check out man proc, kernel source
(/usr/src/linux/fs/proc/array.c) and maybe the procps sources.

One does better with:

rgb at lilith|T:106>cat /proc/1214/status
Name:   pine
State:  S (sleeping)
Tgid:   1214
Pid:    1214
PPid:   1205
TracerPid:      0
Uid:    1337    1337    1337    1337
Gid:    1337    1337    1337    1337
FDSize: 32
Groups: 1337 0 
VmSize:    11752 kB
VmLck:         0 kB
VmRSS:      5652 kB
VmData:     2496 kB
VmStk:        52 kB
VmExe:      2804 kB
VmLib:      3708 kB
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 8000000008001003
SigCgt: 0000000040016c5c
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

This is an almost human readable view of MUCH of the same data that is
in /proc/stat.  Of course there is the little ASCII encoded hexadecimal
garbage at the bottom that could make strong coders weep (again, without
a fairly explicit guide into what every byte or even BIT in this array
does, as one sort of expects that there are binary masked values stuck
in here).  In this case man proc doesn't help -- because this is
supposedly "human readable" they don't provide a reference there.
Still, some of the stuff that is output by ps aux is clearly there in a
fairly easily parseable form.

Mind you, there are still mysteries.  What are the four UID entries?
What is the resolution on the memory, and are kB x1000 or x1024?  What
about the rest of the data in /proc/stat (as there are a lot more fields
there).  What about the contents of /proc/PID/statm? (Or heavens
preserve us, /proc/PID/maps)?  

Finally, what about other things in /proc, e.g.:

rgb at lilith|T:119>cat /proc/stat
cpu  3498 0 2122 239197
cpu0 3498 0 2122 239197
page 128909 55007
swap 1 0
intr 279199 244817 13604 0 3427 6 0 4 4 1 3 2 2 1436 0 15893 0
disk_io: (3,0):(15946,11130,257194,4816,109992) 
ctxt 335774
btime 1066170139
processes 1261

Again, ASCII yes, but now (count them) there are whitespace, :, (, and
',' separators, and one piece of data (the CPU's index) is a part of a
field value (cpu0) so that the entire string "cpu" becomes a sort of
separator (but only in one of the lines).  An impressive ratio of
separators used to field labels.  I won't even begin to address the LIVE
VILE EVIL of overloading nested data structures nested in sequential,
arbitrary separators inside the "values" for a single field, disk_io (or
is that disk_io:?)

If this isn't enough for you, consider /proc/net/dev, which has two
separators (: and ws) but is in COLUMNS, /proc/bus/pci/devices (which I
still haven't figured out) and yes, the aforementioned sensors interface
in /proc.

I offer all of the above as evidence of a fairly evil (did you ever
notice how evil, live, vile, veil and elvi are all anagrams of one
another he asks in a mindless parenthetical insertion to see if you're
still awake:-) middle ground between a true binary interface accessible
only through library calls (which can actually be fairly clean, if one
creates objects/structs with enough mojo to hold the requisite data
types so that one can then create a relatively simple set of methods for
accessing them) and xml.

XML is the opposite end of the binary spectrum.  It asserts as its
primary design principle that the objects/structs with the right kind of
mojo share certain features -- precisely those that constitute the
rigorous design requirements of XML (nesting, attributes, values, etc).
There is a fairly obvious mapping between a C struct, a C++ object, and
an XMLified table.  It also asserts implicitly that whether or not the
object tags are chosen to be human readable (nobody insists that the
tags encapsulating CPU temperature readings be named <cpu_temperature>
-- they could have been just <t>) there MUST be some sort of dictionary
created at the same time as the XML implementation.  If (very) human
readable tags are chosen they are nearly self-documenting, but whole
layers of DTD and CSS and so forth treatment of XML compliant markup are
predicated upon a clear definition of the tag rules and hierarchy.

Oh, and by its very design XML is highly scalable and extensible.  Just
as one can easily enough add fields into a struct without breaking code
that uses existing fields, one can often add tags into an XML document
description without breaking existing tags or tag processing code
(compare with adding a field anywhere into /proc/stat -- ooo, disaster).
This isn't always the case in either case -- sometimes one converts a
field in a struct into a struct in its own right, for example, which can
do violence to both the struct and an XML realization of it.  Still,
often one can and when one can't it is usually because you've had a
serious insight into the "right" way to structure your data and before
the encoding was just plain wrong in some deep way.  This happens, but
generally only fairly early in the design and implementation process.

Note that XML need not be inefficient in transit.  BECAUSE it is so
highly structured, it compresses very efficiently.  Library calls exist
to squeeze out insignificant whitespace, for example (ignored by the
parser anyway).  I haven't checked recently to see whether compression
is making its way into the library, but either way one can certainly
compress/decompress and/or encrypt/decrypt the assembled XML messages
before/after transmission, if CPU is cheaper to you than network or
security is an issue.

I think that it then comes down to the following.  XML may or may not be
perfect, but it does form the basis for a highly consistent
representation of data structures that is NOT OPAQUE and is EASILY
CREATED AND EASILY PARSED with STANDARD TOOLS AND LIBRARIES.  When
designing an XMLish "language" for your data, you can make the same kind
of choices that you face in any program. Do you document your code or
not?  Do you use lots of variable names like egrp1 or do you write out
something roughly human readable like extra_group_1?  Do you write your
loops so that they correspond to the actual formulae or basic algorithm
(and let the compiler do as well as it can with them) or do you block
them out to be cache-friendly, insert inline assembler, and so forth to
make them much faster but impossible to read or remember even yourself
six months after you write them?  Some choices make the code run fast
and short but hard to maintain.  Other choices make it run slower but be
more readable and easier to maintain.

In the long run, I think most programmers eventually come to a sort of
state of natural economy in most of these decisions; one that expresses
their personal style, the requirements of their job, the requirements of
the task, and a reflection of their experience(s) coding.  It is a
cost/benefit problem, after all (as is so much in computing).  You have
to ask how much it costs you to do something X way instead of Y way, and
what the payoff/benefits are, in the long run.

For myself only, years of experience have convinced me that as far as
things like /proc or task/hardware monitoring are concerned, the
bandwidth vs ease of development and maintenance question comes down
solidly in favor of ease of development and maintenance.  Huge amounts
of human time are wasted writing parsers and extracting de facto data
dictionaries from raw source (the only place where they apparently
reside).  Tools that are built to collect data from a more or less
arbitrary interface have to be almost completely rewritten when that
interface changes signficantly (or break horribly in the meantime).

So the cost is this human time (programmers'), more human time (the time
and productivity lost by people who lack the many tools a better
interface would doubtless spawn), and the human time and productivity
lost due to the bugs the more complex and opaque and multilayered
interface generates.  The benefit is that you save (as you note)
anywhere from a factor of 3-4 to 10 or more in the total volume of data
delivered by the interface.  Data organization and human readability
come at a price.

But what is the REAL cost of this extra data?  Data on computers is
typically manipulated in pages of memory, and a page is what, 4096
bytes?  Data movement (especially of contiguous data) is also very rapid
on modern computers -- you are talking about saving a very tiny fraction
of a second indeed when you reduce the message from 54 bytes to 4 bytes.
Even on the network, on a 100BT connection one is empirically limited by
LATENCY on messages less than about 1000 bytes in length.  So if you ask
how long it takes to send a 4 byte packet or a 54 byte packet (either
one of which is TCP encapsulated inside a header that is longer than the
data) the answer is that they take exactly the same amount of time
(within a few tens of nanoseconds).

If the data in question is truly a data stream -- a more or less
continuous flow of data going through a channel that represents a true
bottleneck, then one should probably use a true binary representation to
send the data (as e.g. PVM or MPI generally do), handling endian
translation and data integrity and all that.  If the data in question is
a relatively short (no matter how it is wrapped and encoded) and
intermittant source -- as most things like a sensors interface, the proc
interface(s) in general, the configuration file of your choice, and most
net/web services are, arguably -- then working hard to compress or
minimally encapsulate the data in an opaque form is hard to justify in
terms of the time (if any) that it saves, especially on networks, CPUs,
memory that are ever FASTER.  If it doesn't introduce any
human-noticeable delay, and the overall load on the system(s) in
question remain unmeasurably low (as was generally the case with e.g.
the top command ten Moore's Law years or more ago) then why bother?

I think (again noting that this is my own humble opinion:-) that there
is no point.  /proc should be completely rewritten, probably by being
ghosted in e.g. /xmlproc as it is ported a little at a time, to a
single, consistent, well documented xmlish format.  procps should
similarly be rewritten in parallel with this process, as should the
other tools that extract data from /proc and process it for human or
software consumption.  Perhaps experimentation will determine that there
are a FEW places in /proc where the extra overhead of parsing xml isn't
acceptable for SOME applications -- /proc/pid/stat for example.  In
those few cases it may be worthwhile to make the ghosting permanent --
to provide an xmlish view AND a binary or minimal ASCII view, as is
done now, badly, with /proc/pid/stat and /proc/pid/status.

This is especially true, BTW, in open source software, where a major
component of the labor that creates and maintains both low level/back
end service software and high level/front end client software is unpaid,
volunteer, part time, and of a wide range of skill and experience.  Here
the benefits of having a documented, rigorously organized,
straightforwardly parsed API layer between tools are the greatest.

Finally, to give the rotting horse one last kick, xmlified documents
(deviating slightly from API's per se) are ideal for archival storage
purposes.  Microsoft is being scrutinized now by many agencies concerned
about the risks associated from having 90% of our vital services
provided by an operating system that has proven in practice to be
appallingly vulnerable.  Their problem has barely begun.  The REAL
expense associated with using Microsoft-based documents is going to
prove in the long run to be the expense of de-archiving old
proprietary-binary-format documents long after the tools that created
them have gone away.  This is a problem worthy of a rant all by itself
(and I've written one or two in other venues) but it hasn't quite
reached maturity as it requires enough years of document accumulation
and toplevel drift in the binary "standard" before it jumps out and
slaps you in the face with six and seven figure expenses.  XMLish
documents (especially when accompanied by a suitable DTD and/or data
dictionary) simply cannot cost that much to convert because their
formats are intrinsically open.

</rant>

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From chrismiles1981 at hotmail.com  Tue Oct 14 22:02:28 2003
From: chrismiles1981 at hotmail.com (Chris Miles)
Date: Wed, 15 Oct 2003 03:02:28 +0100
Subject: Condor Problem
Message-ID: <Law11-F42Li7wvyFJWN00012be1@hotmail.com>

Does anyone have any condor experience? im trying to submit a job
which is a Borland C++ console application.. the application writes
a final output to the screen... but this is not being saved to the output
file i specified in the jobs configuration. When i use a simple batch file
and echo some text to the screen and submit that as a job it works
fine and the echoed text is in the output file.

Is there a problem with condor? or is there a problem with c++ or stdout?

any help would be greatly appreciated.

Thanks in advance... Chris Miles, NeuralGrid, Paisley University, Scotland

_________________________________________________________________
Express yourself with cool emoticons - download MSN Messenger today! 
http://www.msn.co.uk/messenger

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kohlja at ornl.gov  Tue Oct 14 15:08:54 2003
From: kohlja at ornl.gov (James Kohl)
Date: Tue, 14 Oct 2003 15:08:54 -0400
Subject: PVM errors at startup
In-Reply-To: <3F8C2A22.83F751A3@hmg.inpg.fr>
References: <3F86F29F.8A37AC5B@hmg.inpg.fr> <3F8C2A22.83F751A3@hmg.inpg.fr>
Message-ID: <20031014190854.GA31004@neo.csm.ornl.gov>

Hey Patrick,

Glad you found the problem.  This is usually manifested when the
networking config is off slightly, or when internal/external networks
are confused, but it sounds like you had a much more interesting
problem...!  :-)

Yes, PVM uses rsh/ssh/TCP to start a remote PVM daemon (pvmd) but
then the daemons themselves use UDP to talk and route PVM messages.
FYI, any PVM tasks that use the "PvmRouteDirect" will use direct TCP
sockets.

Again, glad you figured it out!  (And you're most welcome! :)

All the Best,

	Jim

  On Tue, Oct 14, 2003 at 06:53:54PM +0200, Patrick Begou wrote:
  > This email just to close the thread with the solution.

  > The problem was not related to any PVM misconfiguration but to the
  > ethernet driver. Looking at the ethernet communications between 2 nodes
  > with tcpdump has shown that pvmd was started using tcp communications
  > BUT that pvmd were trying to talk each other with UDP protocol (it is
  > also detailed in the PVM doc) and this was the problem. The UDP
  > communications was unsuccessfull between the nodes.

  > Details:
  > The nodes are P4 2.8 with Asustek P4P800 motherboard and on board 3C940
  > (gigabit) controler. I was using the 3c2000 driver (from the cdrom).
  > Kernel is 2.4.20-20.7bigmem from RedHat 7.3.
  > rsh, rexec and rcp are working fine but this driver seems not to work
  > with UDP protocol???

  > The solution was to download the sk68lin driver (v6.18) and run the
  > shell script to patch the kernel sources for the current kernel. Then
  > correct the module.conf file and set up the gigabit interface. Now PVM
  > is working fine between the two first nodes and the measured throughput
  > is the same as with 3c2000 asustek driver. I should now setup the other
  > nodes!

  > I would like to thanks Pr. Kenneth R. Koehler and Dr James Arthur Kohl
  > for their great help in checking the full PVM configuration and leading
  > me towards a network driver problem.

  > Patrick

(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:

   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
   Oak Ridge National Laboratory              still owe you money, Fool!"
   kohlja at ornl.gov
   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!

:):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Wed Oct 15 04:49:26 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 15 Oct 2003 10:49:26 +0200 (CEST)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.33.0310151044520.27655-100000@maloney.ethz.ch>

On Tue, 14 Oct 2003, Robert G. Brown wrote:
> On Tue, 14 Oct 2003, Dale Harris wrote:
> > On Wed, Oct 01, 2003 at 10:33:29AM -0400, Robert G. Brown elucidated:
> > >
> > > <?xml version="1.0"?>
> > > <sensors>
> > >   <cpu_temperature id="0" units="C">54.2</cpu_temperature>
> >
> >
> >
> > You know... one problem I see with this, assuming this information is
> > going to pass across the net (or did I miss something).  Is that instead
> > of passing something like four bytes (ie "54.2"), you are going to be
> > passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
> > a little bit of data 14 times.  I can't see this being particularly
> > efficient way of using a network.  Sure, it looks pretty, but seems like
> > a waste of bandwidth.
>
> Ah, an open invitation to waste a little more:-)

Isn't it a bit cynical to write a 20 KByte e-mail on the topic of
saving 56 Bytes? ;-)

SCNR,
Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H16             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From djholm at fnal.gov  Wed Oct 15 11:43:09 2003
From: djholm at fnal.gov (Don Holmgren)
Date: Wed, 15 Oct 2003 10:43:09 -0500
Subject: Some application performance results on a dual G5
Message-ID: <Pine.SGI.4.58.0310151040340.4469448@hppc.fnal.gov>


For those who might be interested, I've posted some lattice QCD
application performance results on a 2.0 GHz dual G5 PowerMac.  See

  http://lqcd.fnal.gov/benchmarks/G5/

As expected from the specifications, strong memory bandwidth, reasonable
scaling, and good floating point performance.

Don Holmgren
Fermilab
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct 15 09:46:45 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 15 Oct 2003 09:46:45 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.33.0310151044520.27655-100000@maloney.ethz.ch>
Message-ID: <Pine.LNX.4.44.0310150943490.5854-100000@lilith.rgb.private.net>

On Wed, 15 Oct 2003, Felix Rauch wrote:

> > > passing 56 bytes (just counting the cpu_temp line).  So the XML blows up
> > > a little bit of data 14 times.  I can't see this being particularly
> > > efficient way of using a network.  Sure, it looks pretty, but seems like
> > > a waste of bandwidth.
> >
> > Ah, an open invitation to waste a little more:-)
> 
> Isn't it a bit cynical to write a 20 KByte e-mail on the topic of
> saving 56 Bytes? ;-)

Cynical?  No, not really.  Stupid?  Probably.

If only I could get SOMEBODY to pay me ten measely cents a word for my
rants...

Alas this is not to be.  So the alternative is to see if I can extort
ten cents from everybody on the list NOT to write 20K rants like this.
Sort of like National Lampoon's famous "Buy this magazine or we'll shoot
this dog" issue...:-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Wed Oct 15 14:16:06 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Wed, 15 Oct 2003 11:16:06 -0700
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310150943490.5854-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.33.0310151044520.27655-100000@maloney.ethz.ch> <Pine.LNX.4.44.0310150943490.5854-100000@lilith.rgb.private.net>
Message-ID: <20031015181606.GA1574@greglaptop.internal.keyresearch.com>

On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote:

> So the alternative is to see if I can extort
> ten cents from everybody on the list NOT to write 20K rants like this.

Do you accept pay-pal? Do you promise to spend all the money buying
yourself beer?

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From chrismiles1981 at hotmail.com  Wed Oct 15 21:22:01 2003
From: chrismiles1981 at hotmail.com (Chris Miles)
Date: Thu, 16 Oct 2003 02:22:01 +0100
Subject: Condor Problem
Message-ID: <Law11-F866YHkidcDjo00017362@hotmail.com>

Hi, thanks for the reply

Using all this instead of condor/globus?

The only thing was I need to do this on windows.

What i want to do is setup a Grid but also need a cluster to run
jobs on

Chris

>From: Andrew Wang <andrewxwang at yahoo.com.tw>
>To: Chris Miles <chrismiles1981 at hotmail.com>
>CC: beowulf at beowulf.org
>Subject: Re: Condor Problem
>Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST)
>MIME-Version: 1.0
>Received: from mc11-f10.hotmail.com ([65.54.167.17]) by 
>mc11-s20.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Wed, 15 Oct 
>2003 18:13:50 -0700
>Received: from web16812.mail.tpe.yahoo.com ([202.1.236.152]) by 
>mc11-f10.hotmail.com with Microsoft SMTPSVC(5.0.2195.5600); Wed, 15 Oct 
>2003 18:11:09 -0700
>Received: from [65.49.83.96] by web16812.mail.tpe.yahoo.com via HTTP; Thu, 
>16 Oct 2003 09:11:03 CST
>X-Message-Info: JGTYoYF78jHqyjkG27RbQOhxNCLEO1Jq
>Message-ID: <20031016011103.41833.qmail at web16812.mail.tpe.yahoo.com>
>Return-Path: andrewxwang at yahoo.com.tw
>X-OriginalArrivalTime: 16 Oct 2003 01:11:09.0941 (UTC) 
>FILETIME=[62388A50:01C39382]
>
>If all you need is a batch system, I would suggest SGE
>and Scalable PBS, which have more users and better
>support.
>
>Both of them are free and opensource, so you can try
>both and see which one you like better!
>
>SGE: http://gridengine.sunsource.net
>SPBS: http://www.supercluster.org/projects/pbs/
>
>Andrew.
>
>-----------------------------------------------------------------
>?C???? Yahoo!?_??
>?????C???B?????????B?R?A???????A???b?H??????
>http://tw.promo.yahoo.com/mail_premium/stationery.html

_________________________________________________________________
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Oct 15 21:11:03 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST)
Subject: Condor Problem
Message-ID: <20031016011103.41833.qmail@web16812.mail.tpe.yahoo.com>

If all you need is a batch system, I would suggest SGE
and Scalable PBS, which have more users and better
support.

Both of them are free and opensource, so you can try
both and see which one you like better!

SGE: http://gridengine.sunsource.net
SPBS: http://www.supercluster.org/projects/pbs/

Andrew.

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eric at fnordsystems.com  Wed Oct 15 21:37:36 2003
From: eric at fnordsystems.com (Eric Kuhnke)
Date: Wed, 15 Oct 2003 18:37:36 -0700
Subject: Pentium4 vs Xeon
In-Reply-To: <Pine.SGI.4.58.0310141226530.3997086@hppc.fnal.gov>
References: <20031014160850.GA1163@sgirmn.pluri.ucm.es>
 <20031014160850.GA1163@sgirmn.pluri.ucm.es>
Message-ID: <5.2.0.9.2.20031015183031.03c57ce8@216.82.101.6>

There are single-Xeon boards using the Serverworks GC series of chipsets with 64-bit PCI, but they're just as expensive as a budget dual Xeon board (Tyan S2723 or Supermicro X5DPA-GG)...  In the $280 to $310 per board price range.  Seems rather silly, as the "Prestonia" Socket-604 Xeon CPUs are nothing but a P4 repackaged.  

There's also this board:
http://www.tyan.com/products/html/trinitygcsl.html
Which uses a single P4 @ 533MHz FSB, with the same Serverworks chipset.  

Supermicro X5-SS* series (scroll down):
http://www.supermicro.com/Product_page/product-mS.htm


>Currently P4 motherboards are only available (AFAIK) with 33MHz/32bit
>PCI.  That can be a big bottleneck if your cluster application is
>sensitive to I/O bandwidth.  Early in 2004, if the rumours are true,
>there will be a P4 chipset supporting 66MHz/64bit PCI-X. 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Wed Oct 15 22:15:56 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Thu, 16 Oct 2003 10:15:56 +0800 (CST)
Subject: Condor Problem
In-Reply-To: <Law11-F866YHkidcDjo00017362@hotmail.com>
Message-ID: <20031016021556.52225.qmail@web16812.mail.tpe.yahoo.com>

Unluckily, SGE has very limited Windows support.
PBSPro, which supports MS-Windows (the free versions
do not), does offer free licenses to .edu sites.

BTW, may be there are more people with condor
knowledge from the condor mailing list can answer your
questions.

http://www.cs.wisc.edu/~lists/archive/condor-users/

Andrew.

 --- Chris Miles <chrismiles1981 at hotmail.com>
????> Hi, thanks for the reply
> 
> Using all this instead of condor/globus?
> 
> The only thing was I need to do this on windows.
> 
> What i want to do is setup a Grid but also need a
> cluster to run
> jobs on
> 
> Chris
> 
> >From: Andrew Wang <andrewxwang at yahoo.com.tw>
> >To: Chris Miles <chrismiles1981 at hotmail.com>
> >CC: beowulf at beowulf.org
> >Subject: Re: Condor Problem
> >Date: Thu, 16 Oct 2003 09:11:03 +0800 (CST)
> >MIME-Version: 1.0
> >Received: from mc11-f10.hotmail.com
> ([65.54.167.17]) by 
> >mc11-s20.hotmail.com with Microsoft
> SMTPSVC(5.0.2195.5600); Wed, 15 Oct 
> >2003 18:13:50 -0700
> >Received: from web16812.mail.tpe.yahoo.com
> ([202.1.236.152]) by 
> >mc11-f10.hotmail.com with Microsoft
> SMTPSVC(5.0.2195.5600); Wed, 15 Oct 
> >2003 18:11:09 -0700
> >Received: from [65.49.83.96] by
> web16812.mail.tpe.yahoo.com via HTTP; Thu, 
> >16 Oct 2003 09:11:03 CST
> >X-Message-Info: JGTYoYF78jHqyjkG27RbQOhxNCLEO1Jq
> >Message-ID:
>
<20031016011103.41833.qmail at web16812.mail.tpe.yahoo.com>
> >Return-Path: andrewxwang at yahoo.com.tw
> >X-OriginalArrivalTime: 16 Oct 2003 01:11:09.0941
> (UTC) 
> >FILETIME=[62388A50:01C39382]
> >
> >If all you need is a batch system, I would suggest
> SGE
> >and Scalable PBS, which have more users and better
> >support.
> >
> >Both of them are free and opensource, so you can
> try
> >both and see which one you like better!
> >
> >SGE: http://gridengine.sunsource.net
> >SPBS: http://www.supercluster.org/projects/pbs/
> >
> >Andrew.
> >
>
>-----------------------------------------------------------------
> >??? Yahoo!??
> >??????????????????????
>
>http://tw.promo.yahoo.com/mail_premium/stationery.html
> 
>
_________________________________________________________________
> Stay in touch with absent friends - get MSN
> Messenger 
> http://www.msn.co.uk/messenger
>  

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From graham.mullier at syngenta.com  Thu Oct 16 04:47:12 2003
From: graham.mullier at syngenta.com (graham.mullier at syngenta.com)
Date: Thu, 16 Oct 2003 09:47:12 +0100
Subject: XML for formatting (Re: Environment monitoring)
Message-ID: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12>

[Hmm, and will the rants be longer or shorter after he's bought the mental
lubricant?]

I'm in support of the original rant, however, having had to reverse-engineer
several data formats in the past. Most recently a set of molecular-orbital
output data. Very frustrating trying to count through data fields and
convince myself that we have mapped it correctly.

Anecdote from a different field (weather models) that's related - for a
while, a weather model used calibration data a bit wrong - sea temperature
and sea surface wind speed were swapped. All because someone had to look at
a data dump and guess which column was which.

So, sure, XML is very wordy, but the time saving (when trying to decipher
the data) and potential for avoiding big mistakes more than makes up for it
(IMO).

Graham


Graham Mullier
Chemoinformatics Team Leader,
Chemistry Design Group,
Syngenta, Bracknell, RG42 6EY, UK.
direct line: +44 (0) 1344 414163
mailto:Graham.Mullier at syngenta.com


-----Original Message-----
From: Greg Lindahl [mailto:lindahl at keyresearch.com]
Sent: 15 October 2003 19:16
Cc: beowulf at beowulf.org
Subject: Re: XML for formatting (Re: Environment monitoring)


On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote:

> So the alternative is to see if I can extort
> ten cents from everybody on the list NOT to write 20K rants like this.

Do you accept pay-pal? Do you promise to spend all the money buying
yourself beer?

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Thu Oct 16 08:12:36 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Thu, 16 Oct 2003 14:12:36 +0200
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
References: <20031014211426.GI8116@maybe.org> <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
Message-ID: <20031016121236.GE8711@unthought.net>

On Tue, Oct 14, 2003 at 08:45:12PM -0400, Robert G. Brown wrote:
...
> rgb at lilith|T:105>cat /proc/1214/stat
> 1214 (pine) S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0
> 14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168
> 4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0

While this has nothing to do with your (fine as always ;) rant, I just
need to add a comment (which has everything to do with /proc
stupidities):

> (which, as you can see, contains the information on the pine application
> within which I am currently working on my laptop).
> 
> What?  You find that hard to read?

Imagine I had a process with the (admittedly unlikely but entirely
possible) name 'pine) S 1205 ('

Your stat output would read:
1214 (pine) S 1205 () S 1205 1214 1205 34816 1214 0 767 0 872 0 22 15 0 0 15 0 0 0
14510 12034048 1413 4294967295 134512640 137380700 3221217248 3221190168
4294959106 0 0 134221827 1073835100 3222429229 0 0 17 0 0 0 22 15 0 0

Parsing the ASCII-art in /proc/mdstat is at least as fun  ;)

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct 16 08:08:12 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 16 Oct 2003 08:08:12 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031015181606.GA1574@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0310160759330.8096-100000@lilith.rgb.private.net>

On Wed, 15 Oct 2003, Greg Lindahl wrote:

> On Wed, Oct 15, 2003 at 09:46:45AM -0400, Robert G. Brown wrote:
> 
> > So the alternative is to see if I can extort
> > ten cents from everybody on the list NOT to write 20K rants like this.
> 
> Do you accept pay-pal? Do you promise to spend all the money buying
> yourself beer?

I do accept pay-pal, by strange chance and will cheerfully delete one
word out of a 20Kword base for every dime received (and to make it clear
to the list that I've done so, naturally I'll post the diff with the
original as well as the modified rant:-).  I can't promise to spend ALL
of the money buying beer, because my liver is old and has already
tolerated much abuse over many years and I want it to last a few more
decades, but I'll certainly lift a glass t'alla yer health from time to
time...:-)

On the other hand, given my experiences with people sending me free
money via pay-pal up to this point, it would probably be safe to promise
to spend it "all" on beer.  Even my aged liver can tolerate beer by the
thimbleful...if I didn't end up a de facto teetotaller.;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Thu Oct 16 12:02:18 2003
From: becker at scyld.com (Donald Becker)
Date: Thu, 16 Oct 2003 12:02:18 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12>
Message-ID: <Pine.LNX.4.44.0310161131420.3313-100000@training.scyld.com>

On Thu, 16 Oct 2003 graham.mullier at syngenta.com wrote:

> [Hmm, and will the rants be longer or shorter after he's bought the mental
> lubricant?]

Buy the right amount and they will be eloquent enough that you won't
mind, or too much and they will be short and slurred ;-o

> I'm in support of the original rant, however, having had to reverse-engineer
> several data formats in the past. Most recently a set of molecular-orbital
> output data. Very frustrating trying to count through data fields and
> convince myself that we have mapped it correctly.

What you want is not XML, but a data format description language.

When I first read about XML, that what I believed it was.  I was
expecting that file optionally described the data format as a prologue,
and then had a sequence of efficently packed data structures.

But the XML designers created the evil twin of that idea.  The header is
a schema of parser rules, and each data element had verbose syntax that
conveyes little semantic information.  A XML file 
  - is difficult for humans to read, yet is even larger than
  human-oriented output
  - requires both syntax and rule checking after human editing, yet is
    complex for machines to parse. 
  - is intended for large data sets, where the negative impacts are
    multiplied
  - encourages "cdata" shortcuts that bypass the few supposed advantages.

> Anecdote from a different field (weather models) that's related - for a
> while, a weather model used calibration data a bit wrong - sea temperature
> and sea surface wind speed were swapped. All because someone had to look at
> a data dump and guess which column was which.

Versus looking at an XML output and guessing what "load_one" means?
I see very little difference: repeating a low-content label once for
each data element doesn't convey more information.  The only XML adds
here is avoiding miscounting fields for undocumented data structures.

What we really want in both the weather code case and when reporting
cluster statistics is a data format description language.  That
description includes the format of the packed fields, and should
include what the fields mean and their units, which is what we are
missing in both cases.  With such an approach we can efficiently
assemble, transmit and deconstruct packed data while having automatic
tools to check its validity.  And an general-purpose tools can even
combine a descrition and compact data set to product XML.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Thu Oct 16 16:12:13 2003
From: rodmur at maybe.org (Dale Harris)
Date: Thu, 16 Oct 2003 13:12:13 -0700
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310161131420.3313-100000@training.scyld.com>
References: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12> <Pine.LNX.4.44.0310161131420.3313-100000@training.scyld.com>
Message-ID: <20031016201213.GV8116@maybe.org>

On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated:
> 
> What you want is not XML, but a data format description language.
> 


I think the S-expression guys would say that they have one.  And it is
supermon uses, FWIW.


http://sexpr.sourceforge.net/

http://supermon.sourceforge.net/


(supermon pages are currently unavailable.)


Dale
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Thu Oct 16 16:52:03 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: Thu, 16 Oct 2003 16:52:03 -0400
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031016201213.GV8116@maybe.org>
References: <0B27450D68F1D511993E0001FA7ED2B3025E551D@UKJHMBX12>
	 <Pine.LNX.4.44.0310161131420.3313-100000@training.scyld.com>
	 <20031016201213.GV8116@maybe.org>
Message-ID: <1066337523.11093.20.camel@roughneck.liniac.upenn.edu>

On Thu, 2003-10-16 at 16:12, Dale Harris wrote:
> On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated:
> > 
> > What you want is not XML, but a data format description language.
> > 
> 
> 
> I think the S-expression guys would say that they have one.  And it is
> supermon uses, FWIW.
> 
> 
> http://sexpr.sourceforge.net/
> 
> http://supermon.sourceforge.net/

We use supermon as the data gathering mechanism for Clubmask, and I
really like it. You can mask to get just certain values, and it is
_really_ fast.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dtj at uberh4x0r.org  Tue Oct 14 23:31:55 2003
From: dtj at uberh4x0r.org (Dean Johnson)
Date: 14 Oct 2003 22:31:55 -0500
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310141826270.1214-100000@lilith.rgb.private.net>
Message-ID: <1066188715.3200.120.camel@terra>

On Tue, 2003-10-14 at 19:45, Robert G. Brown wrote:
...
> <rant> 
> 
> ...
>
> </rant>
> 
>     rgb

As someone who has done programming environment tools most of his
reasonably long professional life, I must say you have hit the nail on
the head. I have rooted through more than my share of shitty binary
formats in my day, and I can honestly say that I go home happier as a
result of dealing with an XML trace file in my current project. I was
happily working away dealing with only XML, but then it happened. The
demons of my past rose their ugly heads when I decided that it would be
a good thing to get some ELF information outta some files. Being the
industrious guy I am, I went and got ELF docs from Dave Anderson's
stash. Did that help? Nope, not really, as it was mangled 64-bit focused
ELF. Was it documented? Nope, not really. You could look at the elfdump
code to see what that does, so in a backwards way, it was documented.
The alternative was to ferret out the format by bugging enough compiler
geeks until they gave up the secret handshake. The alternative that I
eventually took was to go lay down until the desire to have the ELF
information went away. ;-)

-- 

	-Dean

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Thu Oct 16 17:36:25 2003
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Thu, 16 Oct 2003 16:36:25 -0500
Subject: OT: same commands to multiple servers?
Message-ID: <20031016163625.C11181@mikee.ath.cx>

I now have control over many AIX servers and I know there
are some programs that allow you (once configured) to send
the same command to multiple nodes/servers, but do these
commands exist within the AIX environment?

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bryce at jfet.net  Thu Oct 16 16:15:08 2003
From: bryce at jfet.net (Bryce Bockman)
Date: Thu, 16 Oct 2003 16:15:08 -0400 (EDT)
Subject: A Petaflop machine in 20 racks?
Message-ID: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>

Hi all,

	Check out this article over at wired:

http://www.wired.com/news/technology/0,1282,60791,00.html

  It makes all sorts of wild claims, but what do you guys think?  
Obviously, there's memory bandwidth limitations due to PCI.  Does anyone 
know anything else about these guys?

Cheers,
Bryce

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Thu Oct 16 17:54:31 2003
From: becker at scyld.com (Donald Becker)
Date: Thu, 16 Oct 2003 17:54:31 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <20031016201213.GV8116@maybe.org>
Message-ID: <Pine.LNX.4.44.0310161729220.3313-100000@training.scyld.com>

On Thu, 16 Oct 2003, Dale Harris wrote:
> On Thu, Oct 16, 2003 at 12:02:18PM -0400, Donald Becker elucidated:
> > 
> > What you want is not XML, but a data format description language.
>
> I think the S-expression guys would say that they have one.  And it is
> supermon uses, FWIW.

No, S-expressions are an ancient concept, developed back in the early
days of computing.  They were needed in Lisp to linearize tree
structures so that they could be saved to, uhmm, paper tape or clay tablets.

Sexprs are oriented toward "structured" data.  In this context
"structured" means "Lisp-like linked lists" rather than "a series of 'C'
structs".

More directly related concepts are
   XDR, part of SunRPC
   MPI packed data
   Object brokers
all of which are trying to solve similar problem.  But, except for a few
of the "object broker" systems, they don't have the metadata language to
translate between domains.  For instance, you can't take MPI packed data
and
   automatically convert it to (useful) XML,
   pass it to an object broker system, or
   call a non-MPI remote procedure

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glen at mail.cert.ucr.edu  Thu Oct 16 19:21:58 2003
From: glen at mail.cert.ucr.edu (Glen Kaukola)
Date: Thu, 16 Oct 2003 16:21:58 -0700
Subject: OT: same commands to multiple servers?
In-Reply-To: <20031016163625.C11181@mikee.ath.cx>
References: <20031016163625.C11181@mikee.ath.cx>
Message-ID: <3F8F2816.9030606@cert.ucr.edu>

Mike Eggleston wrote:

>I now have control over many AIX servers and I know there
>are some programs that allow you (once configured) to send
>the same command to multiple nodes/servers, but do these
>commands exist within the AIX environment?
>

No idea if it would work on AIX, but you could try out pconsole:
http://www.heiho.net/pconsole/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Mark at MarkAndrewSmith.co.uk  Thu Oct 16 19:36:23 2003
From: Mark at MarkAndrewSmith.co.uk (Mark Andrew Smith)
Date: Fri, 17 Oct 2003 00:36:23 +0100
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <LMENKNIMPLKLBBCCBFGKOEADCNAA.Mark@MarkAndrewSmith.co.uk>

Comment: As each generation of this chip gets more powerful, in an
exponential way, then clusters of these chips could be used to break
encryption algorithms via brute force approaches.  If this became anywhere
near an outside chance of a possibility of succeeding, or even threat of,
then I would expect Governments to carefully consider export requirements
and restrictions, or even in the extreme, classify it as a military armament
similar to early RSA 128bit software encryption ciphers.

However it could be the dawn of a new architecture for us all.....


Kindest regards,


Mark Andrew Smith
Tel: (01942)722518
Mob: (07866)070122
http://www.MarkAndrewSmith.co.uk/


-----Original Message-----
From: beowulf-admin at scyld.com [mailto:beowulf-admin at scyld.com]On Behalf
Of Bryce Bockman
Sent: 16 October 2003 21:15
To: beowulf at beowulf.org
Subject: A Petaflop machine in 20 racks?


Hi all,

	Check out this article over at wired:

http://www.wired.com/news/technology/0,1282,60791,00.html

  It makes all sorts of wild claims, but what do you guys think?
Obviously, there's memory bandwidth limitations due to PCI.  Does anyone
know anything else about these guys?

Cheers,
Bryce

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


This email has been scanned for viruses by NetBenefit using Sophos
anti-virus technology

---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.525 / Virus Database: 322 - Release Date: 09/10/2003

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.525 / Virus Database: 322 - Release Date: 09/10/2003


This email has been scanned for viruses by NetBenefit using Sophos anti-virus technology


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Thu Oct 16 19:46:19 2003
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 16 Oct 2003 16:46:19 -0700
Subject: A Petaflop machine in 20 racks?
References: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <000f01c3943f$b40ac100$32a8a8c0@laptop152422>

Browsing through ClearSpeed's fairly "content thin" website, one turns up
the following:
http://www.clearspeed.com/downloads/overview_cs301.pdf
The CS302 has an array of 64 processors and 256Kbytes of memory in the array
+ 128 Kbytes SRAM on chip.  That's 4 Kbytes/processor (much like a cache)..
It doesn't say how many bits wide each processor is, though..
51.2 Gbyte/sec bandwidth is quoted.. that's 800 Mbyte/sec per processor,
which is a reasonable sort of rate.

10 microsecond 1K complex FFTs are reasonably fast, but without knowing how
many bits, it's hard to say whether it's outstanding.
It also doesn't say whether the architecture is, for instance, SIMD.  It
could well be a systolic array, which would be very well suited to cranking
out FFTs or other similar things, but probably not so hot for general
purpose crunching.

For all their vaunted patent and IP portfolio, they have only one patent
listed in the USPTO database under their own name, and that's some sort of
DRAM.

----- Original Message -----
From: "Bryce Bockman" <bryce at jfet.net>
To: <beowulf at beowulf.org>
Sent: Thursday, October 16, 2003 1:15 PM
Subject: A Petaflop machine in 20 racks?


> Hi all,
>
> Check out this article over at wired:
>
> http://www.wired.com/news/technology/0,1282,60791,00.html
>
>   It makes all sorts of wild claims, but what do you guys think?
> Obviously, there's memory bandwidth limitations due to PCI.  Does anyone
> know anything else about these guys?
>
> Cheers,
> Bryce
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Thu Oct 16 21:23:57 2003
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Thu, 16 Oct 2003 21:23:57 -0400 (EDT)
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <Pine.LNX.4.44.0310162103280.27590-100000@boltzmann.basement-supercomputing.com>


Looking at the standard "we have the solution to everyones computing needs
press release" a few things are clear:

"... multi-threaded array processor ..."

which is further verified later in the press release:

"... where the CS301 is acting as a co-processor, dynamic libraries
offload an application's inner loops to the CS301. Although these inner
loops only make up a small portion of the source code, these loops are
responsible for the vast majority of the application's running time. By
offloading the inner loops, the CS301 can bypass the traditional
bottleneck caused by a CPU's limited mathematical capability..."

It seems to be a low power array processor which may be of some real value
to some people. The real issue is can they keep pace in terms of cost and
performance with the commodity CPU market. And what about code
portability. Quite a few people have spent quite a lot of time porting and
tweaking codes for architectures that seemed to have a rather short lived
history.

Of course, there is no hardware yet. 

Doug


On Thu, 16 Oct 2003, Bryce Bockman wrote:

> Hi all,
> 
> 	Check out this article over at wired:
> 
> http://www.wired.com/news/technology/0,1282,60791,00.html
> 
>   It makes all sorts of wild claims, but what do you guys think?  
> Obviously, there's memory bandwidth limitations due to PCI.  Does anyone 
> know anything else about these guys?
> 
> Cheers,
> Bryce
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Fri Oct 17 03:48:17 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Fri, 17 Oct 2003 09:48:17 +0200
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <000f01c3943f$b40ac100$32a8a8c0@laptop152422>
References: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net> <000f01c3943f$b40ac100$32a8a8c0@laptop152422>
Message-ID: <200310170948.17224.joachim@ccrl-nece.de>

Jim Lux:
> It also doesn't say whether the architecture is, for instance, SIMD.  It
> could well be a systolic array, which would be very well suited to cranking
> out FFTs or other similar things, but probably not so hot for general
> purpose crunching.

Exactly. Such coprocessor-boards (typically DSP-based, which also achieve some 
GFlop/s) already exist for a long time, but obviously are not suited to 
change "the way we see computing" (place your marketing slogan here). 

One reason is the lack of portability for code making use of such hardware, 
but I think if the performance for a wider range of applications would 
effectively come anywhere close to the peak performance, this problem would 
be overcome by the premise of getting teraflop-performance for some 10k of $.

Thus, the problem probably is that typical applications do not achieve the 
promised performance. All memory-bound applications will get stuck on the 
PCI-bus, by both, memory access latency and bandwidth. High sustained 
performance for real problems can, in the general case, only be achieved in a 
balanced system.

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Fri Oct 17 04:23:46 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Fri, 17 Oct 2003 10:23:46 +0200
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310161729220.3313-100000@training.scyld.com>
References: <Pine.LNX.4.44.0310161729220.3313-100000@training.scyld.com>
Message-ID: <200310171023.46865.joachim@ccrl-nece.de>

Donald Becker:
> More directly related concepts are
>    XDR, part of SunRPC
>    MPI packed data

Hmm, as you note below, they both do not describe the data they handle, just 
transform in into a uniform representation. 

>    Object brokers
> all of which are trying to solve similar problem.  But, except for a few
> of the "object broker" systems, they don't have the metadata language to
> translate between domains.  For instance, you can't take MPI packed data
> and
>    automatically convert it to (useful) XML,
>    pass it to an object broker system, or
>    call a non-MPI remote procedure

You might want to check HDF5, or for a simpler yet widely used approach, 
NetCDF. They are self-describing file formats. But as you can send everything 
via the net the same way you access it in a file, this should be useful. 

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cap at nsc.liu.se  Fri Oct 17 04:40:56 2003
From: cap at nsc.liu.se (Peter Kjellstroem)
Date: Fri, 17 Oct 2003 10:40:56 +0200 (CEST)
Subject: OT: same commands to multiple servers?
In-Reply-To: <20031016163625.C11181@mikee.ath.cx>
Message-ID: <Pine.LNX.4.44.0310171036480.5214-100000@papput.nsc.liu.se>

There is something called dsh (distributed shell) part of some IBM 
package. The guys at llnl has done further work in this direction with 
pdsh which I belive runs fine on AIX. pdsh can be found at:
 http://www.llnl.gov/linux/pdsh/

/Peter

On Thu, 16 Oct 2003, Mike Eggleston wrote:

> I now have control over many AIX servers and I know there
> are some programs that allow you (once configured) to send
> the same command to multiple nodes/servers, but do these
> commands exist within the AIX environment?
> 
> Mike

-- 
------------------------------------------------------------
  Peter Kjellstroem              | 
  National Supercomputer Centre  | 
  Sweden                         | http://www.nsc.liu.se


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From scheinin at crs4.it  Fri Oct 17 04:35:48 2003
From: scheinin at crs4.it (Alan Scheinine)
Date: Fri, 17 Oct 2003 10:35:48 +0200
Subject: A Petaflop machine in 20 racks?
Message-ID: <200310170835.h9H8ZmY02530@dali.crs4.it>


   I have not read carefully descriptions of the Opteron architecture
until a few minutes ago.  I was not able to find a picture of the
layout in silicon at the AMD site, I found a picture at Tom's Hardware.

 http://www.tomshardware.com/cpu/20030422/opteron-04.html

The page before shows that 50 percent of the silicon is cache.
Of what is not cache, it seems that the floating point unit
occupies about 1/6 or 1/7th of the area, moreover, the authors
Frank Voelkel, Thomas Pabst, Bert Toepelt, and Mirko Doelle
describe the Opteron as having three floating point units,
FADD, FMUL and FMISC.  Just counting FADD and FMUL and considering
the entire area of the Opteron, using 2 GHz for the frequency,
that would be about 12 FP units times 2 GHz, 24 GFLOPS.  So it
is doable.  I do not know the depth of the pipeline, but it is
likely it is deep.  How do you keep the pipeline full?  PCI is
around 0.032 Giga floating point words per second?  The entire
memory subsystem needs to be changed drastically.  Moreover,
whereas integer units might be used to solve problems that are
logically complex, floating point problems are typically ones
that use a large amount of data, more than what can fit into cache.

But you-all knew that already,
Alan Scheinin
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 17 09:43:07 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 17 Oct 2003 09:43:07 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310161729220.3313-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.44.0310170933050.9543-100000@lilith.rgb.private.net>

On Thu, 16 Oct 2003, Donald Becker wrote:

> translate between domains.  For instance, you can't take MPI packed data
> and
>    automatically convert it to (useful) XML,
>    pass it to an object broker system, or
>    call a non-MPI remote procedure

Yes indeedy.  And since XML is at heart linked lists (trees) of structs
as well, you still can't get around the difficulty of mapping a
previously unseen data file containing XMLish into a set of efficiently
accessible structs.  Which is doable, but is a royal PITA and requires
that you maintain DISTINCT (and probably non-portable)
images/descriptions of the data structures and then write all this glue
code to import and export.

So yeah, I have fantasies of ways of encapsulating C header files and a
data dictionary in an XMLified datafile and a toolset that at the very
least made it "easy" to relink a piece of C code to read in the datafile
and just put the data into the associated structs where I could
subsequently use them EFFICIENTLY by local or global name.  I haven't
managed to make this really portable even in my own code, though -- it
isn't an easy problem (so difficult that ad hoc workarounds seem the
simpler route to take).

This really needs a committee or something and a few zillion NSF dollars
to resolve, because it is a fairly serious and widespread problem.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 17 09:29:47 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 17 Oct 2003 09:29:47 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <1066188715.3200.120.camel@terra>
Message-ID: <Pine.LNX.4.44.0310170802010.9543-100000@lilith.rgb.private.net>

On 14 Oct 2003, Dean Johnson wrote:

> As someone who has done programming environment tools most of his
> reasonably long professional life, I must say you have hit the nail on
> the head. I have rooted through more than my share of shitty binary
> formats in my day, and I can honestly say that I go home happier as a
> result of dealing with an XML trace file in my current project. I was
> happily working away dealing with only XML, but then it happened. The
> demons of my past rose their ugly heads when I decided that it would be
> a good thing to get some ELF information outta some files. Being the
> industrious guy I am, I went and got ELF docs from Dave Anderson's
> stash. Did that help? Nope, not really, as it was mangled 64-bit focused
> ELF. Was it documented? Nope, not really. You could look at the elfdump
> code to see what that does, so in a backwards way, it was documented.
> The alternative was to ferret out the format by bugging enough compiler
> geeks until they gave up the secret handshake. The alternative that I
> eventually took was to go lay down until the desire to have the ELF
> information went away. ;-)

And yet Don's points are also very good ones, although I think that is
at least partly a matter of designer style.  XML isn't, after all, a
markup language -- it is a markup language specification.  As an
interface designer, you can implement tags that are reasonably human
readable and well-separated in function or not.  His observation that
what one would REALLY like is a self-documenting interface, or an
interface with its data dictionary included as a header, is very
apropos.  I also >>think<< that he is correct (if I understood his final
point correctly) in saying that someone could sit down and write an
XML-compliant "DML" (data markup language) with straightforward and
consistent rules for encapsulating data streams.

Since those rules would presumably be laid down in the design phase, and
since a wise implementation of them would release a link-level library
with prebuilt functions for creating a new data file and its embedded
data dictionary, writing data out to the file, opening the file, and
reading/parsing data in from the file, it would actually reduce the
amount of wheel reinventing (and very tedious coding!) that has to be
done now while creating/enforcing a fairly rigorous structural
organization on the data itself.

One has to be very careful not to assume that XML will necessarily make
a data file tremendously longer than it likely is now.  For short files
nobody (for the most part) cares, where by short I mean short enough
that file latencies dominate the read time -- using long very
descriptive tags is easy in configuration files.  For longer data files
(which humans cannot in general "read" anyway unless they have a year or
so to spare) there is nothing to prevent XMLish of the following sort of
very general structure:

<?xml version="1.0"?>
<dml>
 <description>
This is part of the production data of Joe's Orchards.  Eat Fruit from
Joe's!
 </description>
 <dict>
  <line id="0">
   <field id="0"><name>apples</name><fmt>%-10.6f</fmt><units>bushels</units></field>
   <field id="1"><sep>|</sep></field>
   <field id="2"><name>oranges</name><fmt>%-12.5e</fmt><units>crates</units></field>
   <field id="3"><sep>|</sep></field>
   <field id="4"><name>price</name><fmt>%-10.2f</fmt><units>dollars</units></field>
  </line>
 </dict>
 <data>
13.400000  |77.00000e+2 |450.00
589.200000 |102.00000e+8|6667.00
...
 </data>
</dml>

The stuff between the <data> tags could even be binary.  Note that the
data itself isn't individually wrapped and tagged, so this might be a
form of XML heresy, but who cares?  For a configuration file or a
small/short data file containing numbers that humans might want to
browse/read without an intermediary software layer, I would say this is
a bad thing, but for a 100 MB data file (a few million lines of data)
the overhead introduced by adding the XML encapsulation and dictionary
is utterly ignorable and the mindless repetition of tags in the
datastream itself pointless.

Note well that this encapsulation is STILL nearly perfectly human
readable, STILL easily machine parseable, and will still be both in
twenty years after Joe's Orchard has been cut down and turned into
firewood (or would be, if Joe had bothered to tell us a bit more about
the database in question in the description).  The data can even be
"validated", if the associated library has appropriate functions for
doing so (which are more or less the data reading functions anyway, with
error management).  I should note that the philosophy above might be
closer to that of e.g. TeX/LaTeX than XML/SGML/MML (as discussed below).

I've already done stuff somewhat LIKE this (without the formal data
dictionary, because I haven't taken the time to write a general purpose
tool for my own specific applications, which is likely a mistake in the
long run but in the long run, after all, I'll be dead:-) in wulfstat.
The .wulfhosts xml permits a cluster to be entered "all at once" using a
format like:

         <hostrange>
           <hostfmt>g%02d</hostfmt>
           <imin>1</imin>
           <imax>15</imax>
         </hostrange>

which is used to generate the hostname strings required to open
connections to hosts e.g. g01, g02, ... g15.  Obviously the same trick
could be used to feed scanf, or to feed a regex parser.

The biggest problem I have with XML as a data description/configuration
file base isn't really details like these, as I think they are all
design decisions and can be done poorly or done well.  It is that on the
parsing end, libxml2 DOES all of the above, more or less.  It generates
on the fly a linked list that mirrors the XML source, and then provides
tools and a consistent framework of rules for walking the list to find
your data.  How else could it do it?  The one parser has to read
arbitrary markup, and it cannot know what the markup is until opens the
file, and it opens/reads the file in one pass, so all it can do is mosey
along and generate recursive structs and link them.

However, that is NOT how one wants to access the data in code that wants
to be efficient.  Walking a llist to find a float data entry that has a
tag name that matches "a" and an index attribute that matches "32912" is
VERY costly compared to accessing a[32912].  At this point, the only
solution I've found is to know what the data encapsulation is (easy,
since I created it:-), create my own variables and structs to hold it
for actual reference in code, open and read in the xml data, and then
walk the list with e.g. xpath and extract the data from the list and
repack it into my variables and structs.

This latter step really sucks.  It is very, very tedious (although
perfectly straightforward to write the parsing/repacking code (so much
so that the libxml guy "apologizes" for the tedium of the parsing code
in the xml.org documentation:-).  It is this latter step that could be
really streamlined by the use of an xmlified data dictionary or even (in
the extreme C case) encapsulating the actual header file with the
associated variable struct definitions.

It is interesting and amusing to compare two different approaches to the
same problem in applications where the issue really is "markup" in a
sense.  I write lots of things using latex, because with latex one can
write equations in a straightforward ascii encoding like $1 =
\sin^2(\theta) + \cos^2(\theta)$.  This input is taken out of an ascii
stream by the tex parser, tokenized and translated into characters, and
converted into an actual equation layout according to the prescriptions
in a (the latex) style file plus any layered modifications I might
impose on top of it.

[Purists could argue about whether or not latex is a true markup language
-- tex/latex are TYPESETTING languages and not really intended to
support other functions (such as translating this equation into an
internal algebraic form in a computer algebra program such as macsyma or
maple).  However, even though it probably isn't, because every ENTITY
represented in the equation string isn't individually tagged wrt
function, it certainly functions like markup at a high level with
entries entered inside functional delimiters and presented in a
way/style that is associated with the delimiters "independent" of the
delimiters themselves.]

If one compares this to the same equation wrapped in MML (math markup
language, which I don't know well enough to be able to reproduce here)
it would likely occupy twenty or thirty lines of markup and be utterly
unreadable by humans.  At least "normal" humans.  Machines, however,
just love it, as one can write a parser that can BOTH display the
equation AND can create the internal objects that permit its
manipulation algebraically and/or numerically.  This would be difficult
to do with the latex, because who knows what all these components are?
Is \theta a constant, a label, a variable?  Are \sin and \cos variables,
functions, or is \s the variable and in a string (do I mean
s*i*n*(theta) where all the objects are variables)?  The equation that
is HUMAN readable and TYPESETTABLE without ambiguity with a style file
and low level definition that recognizes these elements as
non-functional symbols of certain size and shape to be assembled
according to the following rules is far from adequately described for
doing math with it.

For all that, one could easily write an XML compliant LML -- "latex
markup language" -- a perfectly straightforward translation of the
fundamental latex structures into XML form.  Some of these could be
utterly simple (aside for dealing with special character issues:

{\em emphasized text} ->  <em>emphasized text</em>
\begin{equation}a = b+c\end{equation} -> <equation>a = b+c</equation>

linuxdoc is very nearly this translation, actually, except that it
doesn't know how to handle equation content AFAIK.  This sort of
encapsulation is highly efficient for document creation/typesetting
within a specific domain, but less general purpose.

The point is <beep>.... [the following text that isn't there was omitted
in the fond hope that my paypal account will swell, following which I
will make a trip to a purveyor of fine beverages.]

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jac67 at georgetown.edu  Fri Oct 17 09:41:19 2003
From: jac67 at georgetown.edu (Jess Cannata)
Date: Fri, 17 Oct 2003 09:41:19 -0400
Subject: OT: same commands to multiple servers?
In-Reply-To: <20031016163625.C11181@mikee.ath.cx>
References: <20031016163625.C11181@mikee.ath.cx>
Message-ID: <3F8FF17F.3030008@georgetown.edu>

Mike Eggleston wrote:

>I now have control over many AIX servers and I know there
>are some programs that allow you (once configured) to send
>the same command to multiple nodes/servers, but do these
>commands exist within the AIX environment?
>  
>

I'm not sure it will run on AIX, but we use C3 from Oak Ridge National 
Laboratory on all of our Linux Beowulf clusters, and I really like it. 
You might want to take a look at it:

http://www.csm.ornl.gov/torc/C3/index.html

-- 
Jess Cannata
Advanced Research Computing
Georgetown University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From czarek at sun1.chem.univ.gda.pl  Thu Oct 16 18:49:27 2003
From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski)
Date: Fri, 17 Oct 2003 00:49:27 +0200 (CEST)
Subject: Pentium4 vs Xeon
In-Reply-To: <Pine.SGI.4.58.0310141226530.3997086@hppc.fnal.gov>
Message-ID: <Pine.LNX.4.44.0310170030040.31398-100000@sun1.chem.univ.gda.pl>

On Tue, 14 Oct 2003, Don Holmgren wrote:

> Pricewise (YMMV), cheap desktop P4's can be had very roughly for half
> the price of a comparable dual Xeon.

This is true if you look at pricewatch, but the quotes I received shown
that good P4's is less than half of the price (in my case around 36%) of a
comparable dual Xeon.  I am talking about comparison of the price of Asus
PC-DL Dual Xeon 2.8 GHz 512K 533 FSB with 3 GB DDR333 and two 36GB SATA
10K RPM hardrives against Asus P4P800-VM P4 2.8 GHz 800 FSB with 1.5 GB
DDR 400 and one 36GB SATA 10K RPM hardrive. Xeons machines are not very
popular and it is hard to get a good price for them at your local shop (in
my case Ithaca US, in Poland difference would be even bigger).

I am benchmarking this P4 2.8 GHz against dual Opteron 1400MHz, dual
Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+). If you are interested 
in some numbers I can send benchmarks of Gaussian 03, Gamess, and our own 
F77 code.

				czarek

----------------------------------------------------------------------
                     Dr. Cezary Czaplewski
Department of Chemistry                  Box 431 Baker Lab of Chemistry
University of Gdansk                     Cornell University
Sobieskiego 18, 80-952 Gdansk, Poland    Ithaca, NY 14853
phone: +48 58 3450-430                   phone: (607) 255-0556
fax: +48 58 341-0357                     fax: (607) 255-4137 
e-mail: czarek at chem.univ.gda.pl     e-mail: cc178 at cornell.edu
----------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bropers at lsu.edu  Fri Oct 17 03:20:54 2003
From: bropers at lsu.edu (Brian D. Ropers-Huilman)
Date: Fri, 17 Oct 2003 20:20:54 +1300
Subject: OT: same commands to multiple servers?
In-Reply-To: <20031016163625.C11181@mikee.ath.cx>
References: <20031016163625.C11181@mikee.ath.cx>
Message-ID: <3F8F9856.60602@lsu.edu>

I have administered over 100 AIX boxes for a living for over 5 years now. The 
tool of choice for me is dsh, which ships as part of the PSSP LPP, a canned 
implementation of Kerberos 4. We simply install the ssp.clients fileset on 
each node and use our control workstation as the Kerberos realm master. We add 
the external nodes by hand.

I know that dsh is open sourced now and available at:

    http://dsh.sourceforge.net/

There are several other cheap (as in Libris) solutions as well:

1) Use rsh (with TCPwrappers)
2) Use ssh with a password-less key
3) Write your own code around either of the above
4) Implement Kerberos, either as an LPP from IBM, or get the source and 
compile yourself

I think you'll find dsh a good starting point though.

Mike Eggleston wrote:

> I now have control over many AIX servers and I know there
> are some programs that allow you (once configured) to send
> the same command to multiple nodes/servers, but do these
> commands exist within the AIX environment?
> 
> Mike
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
Brian D. Ropers-Huilman                        (225) 578-0461 (V)
Systems Administrator                 AIX      (225) 578-6400 (F)
Office of Computing Services       GNU Linux   brian at ropers-huilman.net
High Performance Computing            .^.      http://www.ropers-huilman.net/
Fred Frey Building, Rm. 201, E-1Q     /V\                          \o/
Louisiana State University           (/ \)           --  __o   /    |
Baton Rouge, LA 70803-1900           (   )          --- `\<,  /    `\\,
                                      ^^-^^              O/ O /     O/ O

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Daniel.Kidger at quadrics.com  Fri Oct 17 10:07:07 2003
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Fri, 17 Oct 2003 15:07:07 +0100
Subject: OT: same commands to multiple servers?
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE1FD@stegosaurus.bristol.quadrics.com>

Consider also pdsh:
   http://www.llnl.gov/linux/pdsh/

It is an open source varient of IBM's dsh
builds on Linux (IA32/IA64, etc.), AIX et al.
 

Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


> -----Original Message-----
> From: Jess Cannata [mailto:jac67 at georgetown.edu]
> Sent: 17 October 2003 14:41
> To: Mike Eggleston
> Cc: beowulf at beowulf.org
> Subject: Re: OT: same commands to multiple servers?
> 
> 
> Mike Eggleston wrote:
> 
> >I now have control over many AIX servers and I know there
> >are some programs that allow you (once configured) to send
> >the same command to multiple nodes/servers, but do these
> >commands exist within the AIX environment?
> >  
> >
> 
> I'm not sure it will run on AIX, but we use C3 from Oak Ridge 
> National 
> Laboratory on all of our Linux Beowulf clusters, and I really 
> like it. 
> You might want to take a look at it:
> 
> http://www.csm.ornl.gov/torc/C3/index.html
> 
> -- 
> Jess Cannata
> Advanced Research Computing
> Georgetown University
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eccf at super.unam.mx  Fri Oct 17 12:42:15 2003
From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores)
Date: Fri, 17 Oct 2003 10:42:15 -0600 (CST)
Subject: RLX?
In-Reply-To: <200310170846.h9H8kbA29081@NewBlue.scyld.com>
Message-ID: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>


Have you ever try or test RLX server for HPC?
What is their performance?


cafe


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Fri Oct 17 14:05:49 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Fri, 17 Oct 2003 11:05:49 -0700
Subject: POVray, beowulf, etc.
Message-ID: <5.2.0.9.2.20031017105940.03129888@mailhost4.jpl.nasa.gov>

I'm aware of some MPI-aware POVray stuff, but is there anything out there 
that can facilitate something where you want to render a sequence of frames 
(using, e.g., POVray), one frame to a processor, then gather the images 
back to a head node for display, in quasi-real time.

For instance, say you had a image that takes 1 second to render, and you 
had 30 processors free to do the rendering.  Assuming you set everything up 
ahead of time, it should be possible to set all the processors spinning, 
and feeding the rendered images back to a central point where they can be 
displayed as an animation at 30 fps (with a latency of 1 second)

Obviously, the other approach is to have each processor render a part of 
the image, and assemble them all, but it seems that this might actually be 
slower overall, because you've got the image assembling time added.

I'm looking for a way to do some real-time visualization of modeling 
results as opposed to a batch oriented "render farm", so it's the pipeline 
to gather the rendered images from the nodes to the display node that I'm 
interested in. I suppose one could write a little MPI program that gathers 
the images up as bitmaps and feeds them to a window, but, if someone has 
already solved this in a reasonably facile and elegant way, why not use it.


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From johnb at quadrics.com  Fri Oct 17 12:01:12 2003
From: johnb at quadrics.com (John Brookes)
Date: Fri, 17 Oct 2003 17:01:12 +0100
Subject: OT: same commands to multiple servers?
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E328@stegosaurus.bristol.quadrics.com>

How are the startup times of IBM's dsh these days? I seem to remember that
it was somewhat on the slow side on big machines. Many moons have passed
since I was last on an AIX machine, though, so I assume the situation's
improved drastically.

Cheers,

John Brookes
Quadrics 

> -----Original Message-----
> From: Brian D. Ropers-Huilman [mailto:bropers at lsu.edu]
> Sent: 17 October 2003 08:21
> To: Mike Eggleston
> Cc: beowulf at beowulf.org
> Subject: Re: OT: same commands to multiple servers?
> 
> 
> I have administered over 100 AIX boxes for a living for over 
> 5 years now. The 
> tool of choice for me is dsh, which ships as part of the PSSP 
> LPP, a canned 
> implementation of Kerberos 4. We simply install the 
> ssp.clients fileset on 
> each node and use our control workstation as the Kerberos 
> realm master. We add 
> the external nodes by hand.
> 
> I know that dsh is open sourced now and available at:
> 
>     http://dsh.sourceforge.net/
> 
> There are several other cheap (as in Libris) solutions as well:
> 
> 1) Use rsh (with TCPwrappers)
> 2) Use ssh with a password-less key
> 3) Write your own code around either of the above
> 4) Implement Kerberos, either as an LPP from IBM, or get the 
> source and 
> compile yourself
> 
> I think you'll find dsh a good starting point though.
> 
> Mike Eggleston wrote:
> 
> > I now have control over many AIX servers and I know there
> > are some programs that allow you (once configured) to send
> > the same command to multiple nodes/servers, but do these
> > commands exist within the AIX environment?
> > 
> > Mike
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> --
> Brian D. Ropers-Huilman                        (225) 578-0461 (V)
> Systems Administrator                 AIX      (225) 578-6400 (F)
> Office of Computing Services       GNU Linux   
> brian at ropers-huilman.net
> High Performance Computing            .^.      
http://www.ropers-huilman.net/
Fred Frey Building, Rm. 201, E-1Q     /V\                          \o/
Louisiana State University           (/ \)           --  __o   /    |
Baton Rouge, LA 70803-1900           (   )          --- `\<,  /    `\\,
                                      ^^-^^              O/ O /     O/ O

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Fri Oct 17 14:38:41 2003
From: becker at scyld.com (Donald Becker)
Date: Fri, 17 Oct 2003 14:38:41 -0400 (EDT)
Subject: RLX?
In-Reply-To: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
Message-ID: <Pine.LNX.4.44.0310171352560.3313-100000@training.scyld.com>

On Fri, 17 Oct 2003, Eduardo Cesar Cabrera Flores wrote:

> Have you ever try or test RLX server for HPC?

Yes, we had access to their earliest machines and I was there at the
NYC announcement.

> What is their performance?

It depends on the generation.

The first generation was great at what it was designed to do: pump out
data, such as static web pages, from memory to two 100Mbps Ethernet
ports per blade.  It used Transmeta chips, 2.5" laptop drives and
fans only on the chassis to fit 24 blades in 3U.
The blades didn't do well at computational tasks or disk I/O.  A third
Ethernet port on each blade was connected to an internal repeater.  They
could only PXE boot using that port, making a flow-controlled boot
server important.

The second generation switched to Intel ULV (Ultra Low Voltage)
processors in the 1GHz range.  This approximately doubled the speed over
Transmeta chips, especially with floating point.  But ULV CPUs are
designed for laptops, and the interconnect was no faster.  Thus this
still was not a computational cluster box.

The current generation blades are much faster, with full speed (and
heat) CPUs and chipset, fast interconnect and good I/O potential.

But lets look at the big picture for HPC cluster packaging:
  --> Beowulf clusters have crossed the density threshold <--
This happened about two years ago.

At the start of the Beowulf project a legitimate problem with clusters
was the low physical density.  This didn't matter in some installations,
as much larger machines were retired leaving plenty of empty space, but
it was a large (pun intended) issue for general use.

As we evolved to 1U rack-mount servers, the situation changed.  Starting
with the API CS-20, Beowulf cluster hardware met and even exceeded the
compute/physical density of contemporary air-cooled Crays.

Since standard 1U dual processor machines can now exceed the air cooled
thermal density supported by an average room, selecting non-standard
packaging (blades, back-to-back mounting, or vertical motherboard
chassis) must be motivated by some other consideration that justifies
the lock-in and higher cost.  At least with blade servers there are a
few opportunities:
   Low-latency backplane communication
   Easier connections to shared storage
   Hot-swap capability to add nodes or replace failed hardware


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Fri Oct 17 14:37:22 2003
From: angel at wolf.com (Angel Rivera)
Date: Fri, 17 Oct 2003 18:37:22 GMT
Subject: RLX?
In-Reply-To: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx> 
References: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
Message-ID: <20031017183722.754.qmail@houston.wolf.com>

Eduardo Cesar Cabrera Flores writes: 

> 
> Have you ever try or test RLX server for HPC?
> What is their performance? 
> 

We have not but will be getting a couple of bricks for testing soon.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Fri Oct 17 14:37:22 2003
From: angel at wolf.com (Angel Rivera)
Date: Fri, 17 Oct 2003 18:37:22 GMT
Subject: RLX?
In-Reply-To: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx> 
References: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
Message-ID: <20031017183722.754.qmail@houston.wolf.com>

Eduardo Cesar Cabrera Flores writes: 

> 
> Have you ever try or test RLX server for HPC?
> What is their performance? 
> 

We have not but will be getting a couple of bricks for testing soon.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Oct 17 15:57:24 2003
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 17 Oct 2003 21:57:24 +0200 (CEST)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310170802010.9543-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310172149340.30441-100000@druifje.clustervision.com>

I just saw YAML announced on www.ntk.net

http://www.yaml.org
YAML (rhymes with camel) is a sraightforward machine parsable
data serialization format designed for human readability and
interaction with scripting languages such as Perl and Python.
YAML is optimised for serialization , configuration settings,
log files, Internet messaging ad filtering.

There are YAML writers and parsers fo Perl, Python, Java, Ruby and C.


Sounds like it might be good for the purposes we are discussing!


BTW, has anyon experimented with Beep for messaging system status,
environment variables, logging etc?
http://www.beepcore.org


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From srihari at mpi-softtech.com  Fri Oct 17 15:34:42 2003
From: srihari at mpi-softtech.com (Srihari Angaluri)
Date: Fri, 17 Oct 2003 15:34:42 -0400
Subject: POVray, beowulf, etc.
References: <5.2.0.9.2.20031017105940.03129888@mailhost4.jpl.nasa.gov>
Message-ID: <3F904452.4090406@mpi-softtech.com>

Jim,
   Not sure if you came across the parallel ray tracer application 
written using MPI. This does real-time rendering.

http://jedi.ks.uiuc.edu/~johns/raytracer/

Jim Lux wrote:
> I'm aware of some MPI-aware POVray stuff, but is there anything out 
> there that can facilitate something where you want to render a sequence 
> of frames (using, e.g., POVray), one frame to a processor, then gather 
> the images back to a head node for display, in quasi-real time.
> 
> For instance, say you had a image that takes 1 second to render, and you 
> had 30 processors free to do the rendering.  Assuming you set everything 
> up ahead of time, it should be possible to set all the processors 
> spinning, and feeding the rendered images back to a central point where 
> they can be displayed as an animation at 30 fps (with a latency of 1 
> second)
> 
> Obviously, the other approach is to have each processor render a part of 
> the image, and assemble them all, but it seems that this might actually 
> be slower overall, because you've got the image assembling time added.
> 
> I'm looking for a way to do some real-time visualization of modeling 
> results as opposed to a batch oriented "render farm", so it's the 
> pipeline to gather the rendered images from the nodes to the display 
> node that I'm interested in. I suppose one could write a little MPI 
> program that gathers the images up as bitmaps and feeds them to a 
> window, but, if someone has already solved this in a reasonably facile 
> and elegant way, why not use it.
> 
> 
> James Lux, P.E.
> Spacecraft Telecommunications Section
> Jet Propulsion Laboratory, Mail Stop 161-213
> 4800 Oak Grove Drive
> Pasadena CA 91109
> tel: (818)354-2075
> fax: (818)393-6875
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Oct 17 16:19:15 2003
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 17 Oct 2003 22:19:15 +0200 (CEST)
Subject: Also on NTK
Message-ID: <Pine.LNX.4.44.0310172215360.30604-100000@druifje.clustervision.com>

Sorry if this is off topic too far.
Also on NTK, an implementation of zeroconf
for Linux, Windows, BSD
http://www.swampwolf.com/products/howl/GettingStarted.html

Anyone care to speculate on uses for zeroconf in big clusters?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathog at mendel.bio.caltech.edu  Fri Oct 17 16:47:08 2003
From: mathog at mendel.bio.caltech.edu (David Mathog)
Date: Fri, 17 Oct 2003 13:47:08 -0700
Subject: When is cooling air cool enough?
Message-ID: <E1AAbV6-0001su-00@mendel.bio.caltech.edu>

Most computer rooms shuttle the air back and forth
between the computers and the A/C.  I'm
wondering if one could not construct a less expensive
facility (less power running the A/C which is rarely
on, smaller A/C units) if the computer room was a
lot more like a wind tunnel: ambient air in (after
filtering out any dust or rain),
pass it through the computers, and then blow it out
the other side of the building.   Note the room
wouldn't be wide open like a normal computer room.
Instead essentially each rack and other largish
computer unit would sit in its own separate air flow,
so that hot air from one wouldn't heat the next.

The question is, how hot can the cooling air be and
still keep the computers happy?

The answer will determine how big an A/C unit is
needed to handle cooling the intake air for those
times when it exceeds this upper limit.

I'm guessing that so long as a lot of air is moving through
the computers most would be ok in a sustained 30C (86F) flow.  
Remember, this isn't 30C in dead air, it's 30C with high
pressure on the intake side of the computer and low
pressure on the outlet side, so that the generated heat
is rapidly moved out of the computer and away.  (But not
so much flow as to blow cards out of their sockets!)
Somewhere between 30C and 40C one might expect poorly
ventilated CPUs and disks to begin to have problems.  Above
40C seems a tad too warm.  At that temperature it's going
to be pretty uncomfortable for the operators too.

Anybody have a good estimate for what this upper limit is.
For instance, from a computer room with an A/C that failed
slowly?

There's clearly a lower temperature limit too.  However on cold
days opening a feedback duct from the outlet back into the intake
should do the trick.  In really cold climates the intake
duct might be closed entirely - when it's 20 below outside.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Fri Oct 17 19:45:24 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Fri, 17 Oct 2003 16:45:24 -0700
Subject: When is cooling air cool enough?
In-Reply-To: <E1AAbV6-0001su-00@mendel.bio.caltech.edu>
Message-ID: <5.2.0.9.2.20031017162321.031343c0@mailhost4.jpl.nasa.gov>

For component life, colder is better (10 degrees is factor of 2 
life/reliability), and the temperature rise inside the box is probably more 
than you think.

You also have some more subtle tradeoffs to address.

You don't need as much colder air as warmer air to remove some quantity of 
heat, and a significant energy cost is pushing the air around (especially 
since the work involved in running the fan winds up heating the air being 
moved).

This is a fairly standard HVAC design problem.

The additional cost to cool the room to, say, 15C instead of 20C is fairly 
low, if the room is insulated, and there's a lot of recirculation (which is 
typical for this kind of thing). It's not like you're cooling the room 
repeatedly after warming up. Once you've reached equilibrium, cooling the 
mass of equipment down, you're moving the same number of joules of heat 
either way and the refrigeration COP doesn't change much over that small a 
temperature range.  The heat leakage through the walls is fairly small, 
compared to the heat dissipated in the equipment.

If you were cooling something that doesn't generate heat itself (i.e. a 
wine cellar or freezer), then the temperature does affect the power consumed.

This all said, I worked for a while on a fairly complex electronic system 
installed at a test facility on a ridge on the island of Kauai, and they 
had no airconditioning. They had big fans and thermostatically controlled 
louvers, and could show that statistically, the air temperature never went 
high enough to cause a problem.  I seem to recall something like the 
calculations showed we'd have to shut down for environmental reasons no 
more than once every 5 years.  Humidity is an issue also, though.


At 01:47 PM 10/17/2003 -0700, David Mathog wrote:
>Most computer rooms shuttle the air back and forth
>between the computers and the A/C.  I'm
>wondering if one could not construct a less expensive
>facility (less power running the A/C which is rarely
>on, smaller A/C units) if the computer room was a
>lot more like a wind tunnel: ambient air in (after
>filtering out any dust or rain),
>pass it through the computers, and then blow it out
>the other side of the building.   Note the room
>wouldn't be wide open like a normal computer room.
>Instead essentially each rack and other largish
>computer unit would sit in its own separate air flow,
>so that hot air from one wouldn't heat the next.
>
>The question is, how hot can the cooling air be and
>still keep the computers happy?
>
>The answer will determine how big an A/C unit is
>needed to handle cooling the intake air for those
>times when it exceeds this upper limit.
>
>I'm guessing that so long as a lot of air is moving through
>the computers most would be ok in a sustained 30C (86F) flow.
>Remember, this isn't 30C in dead air, it's 30C with high
>pressure on the intake side of the computer and low
>pressure on the outlet side, so that the generated heat
>is rapidly moved out of the computer and away.  (But not
>so much flow as to blow cards out of their sockets!)
>Somewhere between 30C and 40C one might expect poorly
>ventilated CPUs and disks to begin to have problems.  Above
>40C seems a tad too warm.  At that temperature it's going
>to be pretty uncomfortable for the operators too.
>
>Anybody have a good estimate for what this upper limit is.
>For instance, from a computer room with an A/C that failed
>slowly?
>
>There's clearly a lower temperature limit too.  However on cold
>days opening a feedback duct from the outlet back into the intake
>should do the trick.  In really cold climates the intake
>duct might be closed entirely - when it's 20 below outside.
>
>Thanks,
>
>David Mathog
>mathog at caltech.edu
>Manager, Sequence Analysis Facility, Biology Division, Caltech
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Oct 17 21:41:49 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 17 Oct 2003 18:41:49 -0700
Subject: RLX?
In-Reply-To: <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
References: <200310170846.h9H8kbA29081@NewBlue.scyld.com> <Pine.LNX.4.44.0310171040020.6159-100000@mezcal.super.unam.mx>
Message-ID: <20031018014149.GB3774@greglaptop.PEATEC.COM>

On Fri, Oct 17, 2003 at 10:42:15AM -0600, Eduardo Cesar Cabrera Flores wrote:

> Have you ever try or test RLX server for HPC?
> What is their performance?

.. what's their price/performance? That decides against them for most
of us el-cheapo HPC customers. RLX has some nice features for enterprise
computing that may justify a higher cost for enterprises, but...

-- greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 17 21:11:39 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 17 Oct 2003 21:11:39 -0400 (EDT)
Subject: XML for formatting (Re: Environment monitoring)
In-Reply-To: <Pine.LNX.4.44.0310172149340.30441-100000@druifje.clustervision.com>
Message-ID: <Pine.LNX.4.44.0310172110120.10139-100000@lilith.rgb.private.net>

On Fri, 17 Oct 2003, John Hearns wrote:

> I just saw YAML announced on www.ntk.net
> 
> http://www.yaml.org

yaml.org doesn't resolve for me in nameservice (yet), but whoa, dude,
rippin' ntk site.  That's one very seriously geeked news site.

   rgb

> YAML (rhymes with camel) is a sraightforward machine parsable
> data serialization format designed for human readability and
> interaction with scripting languages such as Perl and Python.
> YAML is optimised for serialization , configuration settings,
> log files, Internet messaging ad filtering.
> 
> There are YAML writers and parsers fo Perl, Python, Java, Ruby and C.
> 
> 
> Sounds like it might be good for the purposes we are discussing!
> 
> 
> 
> BTW, has anyon experimented with Beep for messaging system status,
> environment variables, logging etc?
> http://www.beepcore.org
> 
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Oct 17 21:39:57 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 17 Oct 2003 18:39:57 -0700
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
References: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <20031018013957.GA3774@greglaptop.PEATEC.COM>

On Thu, Oct 16, 2003 at 04:15:08PM -0400, Bryce Bockman wrote:
> Hi all,
> 
> 	Check out this article over at wired:
> 
> http://www.wired.com/news/technology/0,1282,60791,00.html
> 
>   It makes all sorts of wild claims, but what do you guys think?  

I think it's the Return of the Array Processor.

There's very little new in computing these days -- and it has the
usual flaws of APs: low bandwidth communication to the host.

So if you have a problem that actually fits in the limited memory, and
doesn't need to communicate with anyone else very often, it may be a
win for you.

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 17 21:21:42 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 17 Oct 2003 21:21:42 -0400 (EDT)
Subject: When is cooling air cool enough?
In-Reply-To: <E1AAbV6-0001su-00@mendel.bio.caltech.edu>
Message-ID: <Pine.LNX.4.44.0310172112380.10139-100000@lilith.rgb.private.net>

On Fri, 17 Oct 2003, David Mathog wrote:

> Most computer rooms shuttle the air back and forth
> between the computers and the A/C.  I'm
> wondering if one could not construct a less expensive
> facility (less power running the A/C which is rarely
> on, smaller A/C units) if the computer room was a
> lot more like a wind tunnel: ambient air in (after
> filtering out any dust or rain),
> pass it through the computers, and then blow it out
> the other side of the building.   Note the room
> wouldn't be wide open like a normal computer room.
> Instead essentially each rack and other largish
> computer unit would sit in its own separate air flow,
> so that hot air from one wouldn't heat the next.
> 
> The question is, how hot can the cooling air be and
> still keep the computers happy?

I personally have strong feelings about this, although there probably
are sites out there with hard data and statistics and engineering
recommendations.

70F or cooler would be my recommendation.  In fact, cooler would be my
recommendation -- 60F would be better still.

I think the number is every 10F costs roughly a year of component life
in the 60-80F ranges and even brief periods where the temperature at the
intake gets significantly above 80F makes it uncomfortably likely that
some component is damaged enough to fail within a year.

> The answer will determine how big an A/C unit is
> needed to handle cooling the intake air for those
> times when it exceeds this upper limit.

It costs roughly $1/watt/year to feed AND cool a computer, order of
$100-150/cpu/year, with about 1/4 of that for cooling per se.  The
computer itself costs anywhere from $500 lowball to a couple of thousand
per CPU (more if you have an expensive network).  The HUMAN cost of
screwing around with broken hardware can be crushing, and high
temperatures are an open invitation for hardware to break a lot more
often (and it breaks all too often at LOW temperatures).  It just isn't
worth it.

> 
> I'm guessing that so long as a lot of air is moving through
> the computers most would be ok in a sustained 30C (86F) flow.  
> Remember, this isn't 30C in dead air, it's 30C with high
> pressure on the intake side of the computer and low
> pressure on the outlet side, so that the generated heat
> is rapidly moved out of the computer and away.  (But not
> so much flow as to blow cards out of their sockets!)
> Somewhere between 30C and 40C one might expect poorly
> ventilated CPUs and disks to begin to have problems.  Above
> 40C seems a tad too warm.  At that temperature it's going
> to be pretty uncomfortable for the operators too.

So an 86F wind keeps YOU cool in the summer time?  Only because you're
damp on the outside and evaporating sweat cools you.  Think 86F humid,
and you're only at 98F at core.  The CPU is considerably hotter, and is
cooled by the temperature DIFFERENCE.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Fri Oct 17 21:41:25 2003
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 17 Oct 2003 21:41:25 -0400
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <200310170846.h9H8kGA29022@NewBlue.scyld.com>
References: <200310170846.h9H8kGA29022@NewBlue.scyld.com>
Message-ID: <20031018014123.GB4857@piskorski.com>

> > http://www.wired.com/news/technology/0,1282,60791,00.html

> From: "Jim Lux" <james.p.lux at jpl.nasa.gov>
> Subject: Re: A Petaflop machine in 20 racks?
> Date: Thu, 16 Oct 2003 16:46:19 -0700
>
> Browsing through ClearSpeed's fairly "content thin" website, one turns up
> the following:
> http://www.clearspeed.com/downloads/overview_cs301.pdf

> It also doesn't say whether the architecture is, for instance, SIMD.  It
> could well be a systolic array, which would be very well suited to cranking
> out FFTs or other similar things, but probably not so hot for general
> purpose crunching.

If it is SIMD, this sounds rather reminiscent of the streaming
supercomputer designs people hope to build using SIMD commodity GPU
(Graphics Processing Unit) chips, and Peter Schroeder's 2002 "Hacking
the GPU" class at CalTech.  I don't know much of anything about it,
but these older links made for some interesting reading:

  http://www.cs.caltech.edu/courses/cs101.3/
  http://www.cs.caltech.edu/cspeople/faculty/schroder_p.html

  http://merrimac.stanford.edu/whitepaper.pdf
  http://merrimac.stanford.edu/resources.html

  http://graphics.stanford.edu/~hanrahan/talks/why/

I am really not clear how any of that relates to vector co-processor
add-on cards like the older design mentioned here (I think FPGA
based):

  http://aggregate.org/ECard/

nor to newer MIMD to SIMD compiling technology (and parallel
"nanoprocessors"!) like this:

  http://aggregate.org/KYARCH/

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Fri Oct 17 21:41:25 2003
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 17 Oct 2003 21:41:25 -0400
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <200310170846.h9H8kGA29022@NewBlue.scyld.com>
References: <200310170846.h9H8kGA29022@NewBlue.scyld.com>
Message-ID: <20031018014123.GB4857@piskorski.com>

> > http://www.wired.com/news/technology/0,1282,60791,00.html

> From: "Jim Lux" <james.p.lux at jpl.nasa.gov>
> Subject: Re: A Petaflop machine in 20 racks?
> Date: Thu, 16 Oct 2003 16:46:19 -0700
>
> Browsing through ClearSpeed's fairly "content thin" website, one turns up
> the following:
> http://www.clearspeed.com/downloads/overview_cs301.pdf

> It also doesn't say whether the architecture is, for instance, SIMD.  It
> could well be a systolic array, which would be very well suited to cranking
> out FFTs or other similar things, but probably not so hot for general
> purpose crunching.

If it is SIMD, this sounds rather reminiscent of the streaming
supercomputer designs people hope to build using SIMD commodity GPU
(Graphics Processing Unit) chips, and Peter Schroeder's 2002 "Hacking
the GPU" class at CalTech.  I don't know much of anything about it,
but these older links made for some interesting reading:

  http://www.cs.caltech.edu/courses/cs101.3/
  http://www.cs.caltech.edu/cspeople/faculty/schroder_p.html

  http://merrimac.stanford.edu/whitepaper.pdf
  http://merrimac.stanford.edu/resources.html

  http://graphics.stanford.edu/~hanrahan/talks/why/

I am really not clear how any of that relates to vector co-processor
add-on cards like the older design mentioned here (I think FPGA
based):

  http://aggregate.org/ECard/

nor to newer MIMD to SIMD compiling technology (and parallel
"nanoprocessors"!) like this:

  http://aggregate.org/KYARCH/

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Fri Oct 17 23:15:21 2003
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 17 Oct 2003 23:15:21 -0400
Subject: When is cooling air cool enough?
In-Reply-To: <200310180131.h9I1VYA16665@NewBlue.scyld.com>
References: <200310180131.h9I1VYA16665@NewBlue.scyld.com>
Message-ID: <20031018031519.GB19525@piskorski.com>

> From: "David Mathog" <mathog at mendel.bio.caltech.edu>
> Date: Fri, 17 Oct 2003 13:47:08 -0700

> if the computer room was a lot more like a wind tunnel: ambient air
> in (after filtering out any dust or rain), pass it through the
> computers, and then blow it out the other side of the building.

> The question is, how hot can the cooling air be and still keep the
> computers happy?

That sounds like a pretty neat undergraduate heat transfer homework
problem.  No seriously, since you're at a university, if you want a
rough estimate go over to the Chemical Engineering department and
borrow their heat transfer textbook, or better, borrow somebody to set
up the problem and calculate it for you.  That could work, although
what assumptions to make might be sticky.

It's been too many years since I've forgotten all that, so perhaps
fortunately, I don't quite remember where my old undergrad heat
transfer book is right now anyway.  :)

> I'm guessing that so long as a lot of air is moving through
> the computers most would be ok in a sustained 30C (86F) flow.  

But I bet the other respondants were right when they said that's
probably too hot...

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From atp at piskorski.com  Fri Oct 17 23:15:21 2003
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 17 Oct 2003 23:15:21 -0400
Subject: When is cooling air cool enough?
In-Reply-To: <200310180131.h9I1VYA16665@NewBlue.scyld.com>
References: <200310180131.h9I1VYA16665@NewBlue.scyld.com>
Message-ID: <20031018031519.GB19525@piskorski.com>

> From: "David Mathog" <mathog at mendel.bio.caltech.edu>
> Date: Fri, 17 Oct 2003 13:47:08 -0700

> if the computer room was a lot more like a wind tunnel: ambient air
> in (after filtering out any dust or rain), pass it through the
> computers, and then blow it out the other side of the building.

> The question is, how hot can the cooling air be and still keep the
> computers happy?

That sounds like a pretty neat undergraduate heat transfer homework
problem.  No seriously, since you're at a university, if you want a
rough estimate go over to the Chemical Engineering department and
borrow their heat transfer textbook, or better, borrow somebody to set
up the problem and calculate it for you.  That could work, although
what assumptions to make might be sticky.

It's been too many years since I've forgotten all that, so perhaps
fortunately, I don't quite remember where my old undergrad heat
transfer book is right now anyway.  :)

> I'm guessing that so long as a lot of air is moving through
> the computers most would be ok in a sustained 30C (86F) flow.  

But I bet the other respondants were right when they said that's
probably too hot...

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From czarek at sun1.chem.univ.gda.pl  Sat Oct 18 00:52:59 2003
From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski)
Date: Sat, 18 Oct 2003 06:52:59 +0200 (CEST)
Subject: some ab initio benchmarks
In-Reply-To: <Pine.LNX.4.44.0310170030040.31398-100000@sun1.chem.univ.gda.pl>
Message-ID: <Pine.LNX.4.44.0310180606300.6403-100000@sun1.chem.univ.gda.pl>


Hi,

quite recently I did some benchmarks of P4 2.8 GHz against dual Opteron
1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+) and some
older machines. For comparison I am including benchmarks of dual P3 512
1200MHz I got from Wayne Fisher, The University of Texas at Dallas. On
Opteron I have also tried PC GAMESS program which I received from Alex
Granovsky.


1. Single point HF energy calculation for Ace-Gly-NMe in 6-31G*
    (155 basis functions)

g03: mem=100MW TEST 6-31G* nosym  scf=(tight,incore)
gamess: MEMORY=20000000 DIRSCF=.TRUE.			  [sec]

 itek       g03    Itanium2 1400MHz          efc 7.1      26.5
 prototype  g03    p4 512   2800MHz           pgi4        41.1
 dahlia     g03    Opteron  1400MHz           pgi4        49.5
 m211       g03    k7mp     2133MHz(MP 2600+) pgi4        83.3
 Wayne      g03    p3 512   1200MHz           pgi4        85
 m211       gamess k7mp     2133MHz(MP 2600+) ifc7.1      92.5
 prototype  gamess p4 512   2800MHz           ifc7.1     106.5 
 dahlia     PCgamess   Opteron   1400MHz	         112.9
 dahlia     gamess Opteron  1400MHz           ifc7.1     128.5
 itek       gamess Itanium2 1400MHz          efc 7.1     150.8

2. Single point MP2 energy calculation for Ace-Gly-NMe in 6-31G*
     (155 basis functions)

g03: mem=100mw rwf=a,250MW,b,250MW,c,250MW TEST rmp2/6-31G* nosym 
scf=(tight,incore)
MaxDisk=750MW
gamess: MEMORY=50000000 DIRSCF=.TRUE.

 itek       g03    Itanium2 1400MHz           efc 7.1     51.7
 prototype  g03    p4 512   2800MHz	       pgi4      111.0
 dahlia     g03    Opteron  1400MHz            pgi4      150.7
 m211       gamess k7mp     2133MHz(MP 2600+)  ifc7.1    154.2
 prototype  gamess p4 512   2800MHz            ifc7.1    157.0
 dahlia     PCgamess   Opteron   1400MHz		 163.8
 dahlia     gamess Opteron  1400MHz            ifc7.1    191.0
 itek       gamess Itanium2 1400MHz           efc 7.1    194.8
 m211       g03    k7mp     2133MHz(MP 2600+)  pgi4      251.6
 Wayne      g03    p3 512   1200MHz            pgi4      303

3. Manfreds Gaussian Benchmark
http://www.chemie.uni-dortmund.de/groups/ocb/projekte/mg98b.html

243 basis functions 399 primitive gaussians
RHF/3-21G* Freq
                                                       [sec]

itek       g03     Itanium2  1400MHz    efc 7.1        2843
prototype  g03     p4 512    2800MHz pgi 4	       8084
dahlia     g03     Opteron   1400MHz pgi 4             9332
m211       g03     k7mp 2133MHz(MP 2600+) pgi 4       10289
Wayne      g03     p3 512    1200MHz	pgi 4         12920
galera     g03     p3xenon    700MHz    pgi 3         19317
m001       g03     p3         650MHz    pgi 4         22824

4. test397.com from gaussian03

882 basis functions,  1440 primitive gaussians
rb3lyp/3-21g force test scf=novaracc
                                                      [sec]

itek       g03     Itanium2  1400MHz    efc 7.1       6733
prototype  g03     p4 512    2800MHz	pgi 4        12980
dahlia     g03     Opteron   1400MHz    pgi 4        17879
m211       g03     k7mp 2133MHz(MP 2600+) pgi 4      20521
Wayne      g03     p3 512    1200MHz    pgi 4        24521
galera     g03     p3xenon    700MHz    pgi 3        39353

5. Gaussian calculations of NMR chemical shifts for GlyGlyAlaAla

207 basis functions,   339 primitive gaussians
%MEM=800MB
B3LYP/GEN  NMR                                       [sec]

itek       g03     Itanium2  1400MHz    efc 7.1      275
prototype  g03     p4 512    2800MHz	  pgi 4      614
dahlia     g03     Opteron   1400MHz      pgi 4      849
m211       g03     k7mp 2133MHz(MP 2600+) pgi 4      948
Wayne      g03     p3 512    1200MHz      pgi 4     1134

some details:

g03 is GAUSSIAN 03 rev. B04 with gaussian blas compiled with 32-bit pgi4.0

gamess is  VERSION 6 SEP 2001 (R4) compiled with 32-bit ifc 7.1, for P4 I 
have used additional options -tpp7 -axKW

Opteron (dahlia) had 64bit GinGin64 Linux and I had to use static 32-bit 
binaries.  It should have SuSE Linux Enterprise soon and I will repeat 
tests using PGI 5.0 64-bit compiler when it will be ready.

Itanium2 (itek) uses gamess  VERSION = 14 JAN 2003 (R3) compiled with
64-bit efc and GAUSSIAN 03 rev. B04 with mkl60 compiled with 64-bit efc 
7.1

P3xenon (galera) uses gamess VERSION = 6 SEP 2001 (R4) compiled with ifc
6.0 and GAUSSIAN 03 rev B.01 with gaussian blas compiled with pgi 3.3


                                czarek

----------------------------------------------------------------------
                     Dr. Cezary Czaplewski
Department of Chemistry                  Box 431 Baker Lab of Chemistry
University of Gdansk                     Cornell University
Sobieskiego 18, 80-952 Gdansk, Poland    Ithaca, NY 14853
phone: +48 58 3450-430                   phone: (607) 255-0556
fax: +48 58 341-0357                     fax: (607) 255-4137
e-mail: czarek at chem.univ.gda.pl          e-mail: cc178 at cornell.edu
----------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sun Oct 19 10:39:36 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sun, 19 Oct 2003 22:39:36 +0800 (CST)
Subject: some ab initio benchmarks
In-Reply-To: <Pine.LNX.4.44.0310180606300.6403-100000@sun1.chem.univ.gda.pl>
Message-ID: <20031019143936.29602.qmail@web16807.mail.tpe.yahoo.com>

I have 2 pts:

1. The compilers used across different platforms were
not the same, why not use the Intel compiler for the
P4 as well?

2. What is the working set of the benchmark? If the
benchmark fit in the 6MB on-chip L3 of the Itanium2,
it is very likely to perform very well.

Another benchmark that shows the G5 wins the large
memory case, loses small/medium cases, while the
Itanium2 loses most of its advantages when the working
set does not fit the L3:

http://www.xlr8yourmac.com/G5/G5_fluid_dynamics_bench/G5_fluid_dynamics_bench.html

Andrew.

 --- Cezary Czaplewski <czarek at sun1.chem.univ.gda.pl>
????> 
> Hi,
> 
> quite recently I did some benchmarks of P4 2.8 GHz
> against dual Opteron
> 1400MHz, dual Itanium2 1400MHz and dual k7mp
> 2133MHz(MP 2600+) and some
> older machines. For comparison I am including
> benchmarks of dual P3 512
> 1200MHz I got from Wayne Fisher, The University of
> Texas at Dallas. On
> Opteron I have also tried PC GAMESS program which I
> received from Alex
> Granovsky.
> 
> 
> 1. Single point HF energy calculation for
> Ace-Gly-NMe in 6-31G*
>     (155 basis functions)
> 
> g03: mem=100MW TEST 6-31G* nosym  scf=(tight,incore)
> gamess: MEMORY=20000000 DIRSCF=.TRUE.			  [sec]
> 
>  itek       g03    Itanium2 1400MHz          efc 7.1
>      26.5
>  prototype  g03    p4 512   2800MHz           pgi4  
>      41.1
>  dahlia     g03    Opteron  1400MHz           pgi4  
>      49.5
>  m211       g03    k7mp     2133MHz(MP 2600+) pgi4  
>      83.3
>  Wayne      g03    p3 512   1200MHz           pgi4  
>      85
>  m211       gamess k7mp     2133MHz(MP 2600+) ifc7.1
>      92.5
>  prototype  gamess p4 512   2800MHz           ifc7.1
>     106.5 
>  dahlia     PCgamess   Opteron   1400MHz	        
> 112.9
>  dahlia     gamess Opteron  1400MHz           ifc7.1
>     128.5
>  itek       gamess Itanium2 1400MHz          efc 7.1
>     150.8
> 
> 2. Single point MP2 energy calculation for
> Ace-Gly-NMe in 6-31G*
>      (155 basis functions)
> 
> g03: mem=100mw rwf=a,250MW,b,250MW,c,250MW TEST
> rmp2/6-31G* nosym 
> scf=(tight,incore)
> MaxDisk=750MW
> gamess: MEMORY=50000000 DIRSCF=.TRUE.
> 
>  itek       g03    Itanium2 1400MHz           efc
> 7.1     51.7
>  prototype  g03    p4 512   2800MHz	       pgi4     
> 111.0
>  dahlia     g03    Opteron  1400MHz            pgi4 
>     150.7
>  m211       gamess k7mp     2133MHz(MP 2600+) 
> ifc7.1    154.2
>  prototype  gamess p4 512   2800MHz           
> ifc7.1    157.0
>  dahlia     PCgamess   Opteron   1400MHz		 163.8
>  dahlia     gamess Opteron  1400MHz           
> ifc7.1    191.0
>  itek       gamess Itanium2 1400MHz           efc
> 7.1    194.8
>  m211       g03    k7mp     2133MHz(MP 2600+)  pgi4 
>     251.6
>  Wayne      g03    p3 512   1200MHz            pgi4 
>     303
> 
> 3. Manfreds Gaussian Benchmark
>
http://www.chemie.uni-dortmund.de/groups/ocb/projekte/mg98b.html
> 
> 243 basis functions 399 primitive gaussians
> RHF/3-21G* Freq
>                                                     
>   [sec]
> 
> itek       g03     Itanium2  1400MHz    efc 7.1     
>   2843
> prototype  g03     p4 512    2800MHz pgi 4	      
> 8084
> dahlia     g03     Opteron   1400MHz pgi 4          
>   9332
> m211       g03     k7mp 2133MHz(MP 2600+) pgi 4     
>  10289
> Wayne      g03     p3 512    1200MHz	pgi 4        
> 12920
> galera     g03     p3xenon    700MHz    pgi 3       
>  19317
> m001       g03     p3         650MHz    pgi 4       
>  22824
> 
> 4. test397.com from gaussian03
> 
> 882 basis functions,  1440 primitive gaussians
> rb3lyp/3-21g force test scf=novaracc
>                                                     
>  [sec]
> 
> itek       g03     Itanium2  1400MHz    efc 7.1     
>  6733
> prototype  g03     p4 512    2800MHz	pgi 4       
> 12980
> dahlia     g03     Opteron   1400MHz    pgi 4       
> 17879
> m211       g03     k7mp 2133MHz(MP 2600+) pgi 4     
> 20521
> Wayne      g03     p3 512    1200MHz    pgi 4       
> 24521
> galera     g03     p3xenon    700MHz    pgi 3       
> 39353
> 
> 5. Gaussian calculations of NMR chemical shifts for
> GlyGlyAlaAla
> 
> 207 basis functions,   339 primitive gaussians
> %MEM=800MB
> B3LYP/GEN  NMR                                      
> [sec]
> 
> itek       g03     Itanium2  1400MHz    efc 7.1     
> 275
> prototype  g03     p4 512    2800MHz	  pgi 4     
> 614
> dahlia     g03     Opteron   1400MHz      pgi 4     
> 849
> m211       g03     k7mp 2133MHz(MP 2600+) pgi 4     
> 948
> Wayne      g03     p3 512    1200MHz      pgi 4    
> 1134
> 
> some details:
> 
> g03 is GAUSSIAN 03 rev. B04 with gaussian blas
> compiled with 32-bit pgi4.0
> 
> gamess is  VERSION 6 SEP 2001 (R4) compiled with
> 32-bit ifc 7.1, for P4 I 
> have used additional options -tpp7 -axKW
> 
> Opteron (dahlia) had 64bit GinGin64 Linux and I had
> to use static 32-bit 
> binaries.  It should have SuSE Linux Enterprise soon
> and I will repeat 
> tests using PGI 5.0 64-bit compiler when it will be
> ready.
> 
> Itanium2 (itek) uses gamess  VERSION = 14 JAN 2003
> (R3) compiled with
> 64-bit efc and GAUSSIAN 03 rev. B04 with mkl60
> compiled with 64-bit efc 
> 7.1
> 
> P3xenon (galera) uses gamess VERSION = 6 SEP 2001
> (R4) compiled with ifc
> 6.0 and GAUSSIAN 03 rev B.01 with gaussian blas
> compiled with pgi 3.3
> 
> 
>                                 czarek
> 
>
----------------------------------------------------------------------
>                      Dr. Cezary Czaplewski
> Department of Chemistry                  Box 431
> Baker Lab of Chemistry
> University of Gdansk                     Cornell
> University
> Sobieskiego 18, 80-952 Gdansk, Poland    Ithaca, NY
> 14853
> phone: +48 58 3450-430                   phone:
> (607) 255-0556
> fax: +48 58 341-0357                     fax: (607)
> 255-4137
> e-mail: czarek at chem.univ.gda.pl          e-mail:
> cc178 at cornell.edu
>
----------------------------------------------------------------------
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sun Oct 19 11:37:14 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sun, 19 Oct 2003 23:37:14 +0800 (CST)
Subject: Long lived OpenPBS bug fixed!
Message-ID: <20031019153714.3905.qmail@web16808.mail.tpe.yahoo.com>

All versions of OpenPBS have this problem: the
scheduler uses blocking sockets to contact the nodes,
and if a node is dead, the scheduler hangs for several
minutes, and all user commands will hang (no so
good!).

Scalable PBS finally fixed this problem:

"... In local testing, we are able to issue a 'kill
-STOP' on one node or even all nodes and the
pbs_server daemon continues to be highly responsive to
user commands, scheduler queries, and job
submissions."

http://www.supercluster.org/pipermail/scalablepbsusers/2003-October/000162.html

*Also*, don't miss the Supercluster Newsletter, which
talked about the next generation Maui scheduler called
"Moab":

http://www.supercluster.org/pipermail/scalablepbsusers/2003-October/000132.html

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gmpc at sanger.ac.uk  Sun Oct 19 15:32:42 2003
From: gmpc at sanger.ac.uk (Guy Coates)
Date: Sun, 19 Oct 2003 20:32:42 +0100 (BST)
Subject: RLX
In-Reply-To: <200310181602.h9IG2HA27890@NewBlue.scyld.com>
References: <200310181602.h9IG2HA27890@NewBlue.scyld.com>
Message-ID: <Pine.OSF.4.44.0310191917080.2168354-100000@ecs2d.internal.sanger.ac.uk>

> > Have you ever try or test RLX server for HPC?
> > What is their performance?
>
> .. what's their price/performance?

Well, it all depends.  The performance of the current generation of blade
systems are on a par with 1U systems, and you can now get chassis with
myrinet or SAN connectivity if you need it.

The part of price/performance that tends to get overlooked is
manageability. Do you factor in the time and salaries of you admin staff
who have to look after the thing?

We run clusters with blade servers from various manufacturers (including
RLX) and traditional 1U machines.  The management overhead on blade
systems is significantly lower than for 1U machines, and streets ahead of
"beige boxes on shelves".

On blade systems the network and SAN switching infrastructure is nicely
integrated with the server chassis, and their management interfaces tied in
with OS deployment, remove power management etc.

The difference in management overhead gets more pronounced as your cluster
size increases. The time it takes to look after a 24 node cluster of 1U
boxes isn't going to be that different to the time it takes to look after
24 blades, but running a 1000 blades is much less effort than running a
1000 1U servers.

Whether this actually matters or not depends on your circumstances.

If you have a limitless supply of PhD student slave labour, (eg Virginia
Tech and their G5s), then time and cost of management isn't so much of an
issue. If you have to pay money for your sys-admins and want to run big
clusters, then blades may end up being cost effective.

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eric at fnordsystems.com  Mon Oct 20 04:35:03 2003
From: eric at fnordsystems.com (Eric Kuhnke)
Date: Mon, 20 Oct 2003 01:35:03 -0700
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
Message-ID: <5.2.0.9.2.20031020013259.03c0a4e0@216.82.101.6>

Quoting from the article:
An ordinary desktop PC outfitted with six PCI cards, each containing four of the chips, would perform at about 600 gigaflops (or more than half a teraflop). 

Assuming you were to build cluster systems with six PCI cards each, it would require 4U rack cases...  Unless these floating point cards come as low-profile PCI (MD2 form factor)?  

20 racks * 42U per rack = 840U / 4 = 210 nodes, not counting switching equipment.  Petaflop with 210 compute nodes?


At 04:15 PM 10/16/2003 -0400, you wrote:
>Hi all,
>
>        Check out this article over at wired:
>
>http://www.wired.com/news/technology/0,1282,60791,00.html
>
>  It makes all sorts of wild claims, but what do you guys think?  
>Obviously, there's memory bandwidth limitations due to PCI.  Does anyone 
>know anything else about these guys?
>
>Cheers,
>Bryce
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Mon Oct 20 08:16:19 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Mon, 20 Oct 2003 14:16:19 +0200
Subject: some ab initio benchmarks
In-Reply-To: <Pine.LNX.4.44.0310180606300.6403-100000@sun1.chem.univ.gda.pl>
References: <Pine.LNX.4.44.0310170030040.31398-100000@sun1.chem.univ.gda.pl> <Pine.LNX.4.44.0310180606300.6403-100000@sun1.chem.univ.gda.pl>
Message-ID: <20031020121619.GM8711@unthought.net>

On Sat, Oct 18, 2003 at 06:52:59AM +0200, Cezary Czaplewski wrote:
> 
> Hi,
> 
> quite recently I did some benchmarks of P4 2.8 GHz against dual Opteron
> 1400MHz, dual Itanium2 1400MHz and dual k7mp 2133MHz(MP 2600+) and some
> older machines. For comparison I am including benchmarks of dual P3 512
> 1200MHz I got from Wayne Fisher, The University of Texas at Dallas. On
> Opteron I have also tried PC GAMESS program which I received from Alex
> Granovsky.

Could you please specify which version of which operating system was
used for this?

If the kernel does not have NUMA scheduling, the Opterons are severely
disadvantaged - it would be useful to know.

Thank you,

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From richardlbj at yahoo.com  Sat Oct 18 23:56:37 2003
From: richardlbj at yahoo.com (Richard Brown)
Date: Sat, 18 Oct 2003 20:56:37 -0700 (PDT)
Subject: cluseter node freezes while running namd 2.5/2.5b1
Message-ID: <20031019035637.5382.qmail@web41211.mail.yahoo.com>

I have been try to figure this out for the past two
months with no luck.

I have a 8-node PC cluster that consists of 16 athlon
mp2200+, msi k7d master-l mb, intel i82557/i82558
10/100 on-board lan, 500mb kingston ddr266 pc2100
unbuffered, 3com superstack III baseline 24 port
10/100 switch.

The cluster was built using oscar2.1/redhat7.3 w/ the
kernel update 2.4.20-20. namd used includes 2.5b1 and
the latest 2.5, both linux binary distributions and
source code builds. the simulation tested is apoa1
benchmark example.

namd/apoa1 only runs w/o problems on a single cluster
node, either with one or two cpus. Every time it runs
on two or more nodes, either using one or two cpus
from each node, namd/apoa1 stops somewhere in the
middle of run. One of the nodes freezes and does not
respond to ping, ssh or the directly attached
keyboard. Most of the time there were no error
messages. A few times I received apic error or sorcket
receive failure. I tried plugging a ps/2 mouse into
the nodes as some people suggested for a bug of the
motherboad but it did not help.

I don't know how to proceed from here. Any suggestions
would be appreciated.

Thanks,
Richard


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cb4 at tigertiger.de  Sun Oct 19 21:00:53 2003
From: cb4 at tigertiger.de (Christoph Best)
Date: Sun, 19 Oct 2003 21:00:53 -0400
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <20031018013957.GA3774@greglaptop.PEATEC.COM>
References: <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
	<20031018013957.GA3774@greglaptop.PEATEC.COM>
Message-ID: <16275.13253.833239.996985@random.tigertiger.de>

 > > http://www.wired.com/news/technology/0,1282,60791,00.html

Greg Lindahl writes:
 > I think it's the Return of the Array Processor.
 > 
 > There's very little new in computing these days -- and it has the
 > usual flaws of APs: low bandwidth communication to the host.
 > 
 > So if you have a problem that actually fits in the limited memory, and
 > doesn't need to communicate with anyone else very often, it may be a
 > win for you.

They actually say in this document
 http://www.clearspeed.com/downloads/overview_cs301.pdf
that the chip can be used as stand-alone processor and resembles a
standard RISC processor. I do not see whether it would be SIMD or MIMD
- the block diagram at least does not show a central control unit
separate from the PEs.

Given the small on-chip memory, they will have to connect external
memory. The thing that would worry me is that the external machine
balance is 32 Flops/Word (on 32-bit words), so it will only be useful
for applications that do a lot of operations inside a few 100Kb of
memory.

IBM is following a slightly different approach with the QCDOC and
BlueGene/L supercomputers which are based on systems-on-a-chip where
they put a two PowerPC cores and all support logic on a single chip,
wire it up with one or two GB of memory and connect a lot (64K) of
these chips together. They expect 5.5 GFlops/s per node peak and to
have 360 TFlops operational in 2004/5 (in 64 racks). You would need 
about 200 racks to get to a PetaFlops machine...
  http://sc-2002.org/paperpdfs/pap.pap207.pdf
  http://www.arxiv.org/abs/hep-lat/0306023
[QCDOC is a Columbia University project in collaboration with IBM -
IBM is transitioning the technology from high-energy physics to
biology which makes a lot of sense... :-)]

To put 64 processors on a chip, I am sure ClearSpeed have to sacrifice
a lot in memory and functionality/programmability, and who wins in
this tradeoff remains to be seen. Depends on the application, too, of
course.

BTW, who or what is behind ClearSpeed? Their Bristol address is
identical to Infineon's Design Centre there, and Hewlett Packard seems
to have a lab there, too. If they have that kind of support, I am sure
they thought hard before making these design choices, and it may just
be tarketed at certain problems (vector/matrix/FFT-like stuff).

-Christoph
-- 
Christoph Best                                      cbst at tigertiger.de
Bioinformatics group, LMU Muenchen                http://tigertiger.de/cb
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mof at labf.org  Mon Oct 20 10:13:56 2003
From: mof at labf.org (Mof)
Date: Mon, 20 Oct 2003 23:43:56 +0930
Subject: Solaris Fire Engine.
Message-ID: <200310202343.56524.mof@labf.org>

http://www.theregister.co.uk/content/61/33440.html

... "We worked hard on efficiency, and we now measure, at a given network 
workload on identical x86 hardware, we use 30 percent less CPU than Linux."

Mof.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Mon Oct 20 11:17:24 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 20 Oct 2003 11:17:24 -0400
Subject: cluseter node freezes while running namd 2.5/2.5b1
In-Reply-To: <20031019035637.5382.qmail@web41211.mail.yahoo.com>
References: <20031019035637.5382.qmail@web41211.mail.yahoo.com>
Message-ID: <3F93FC84.7020808@scalableinformatics.com>

Hi Richard:

  Are your Intel network drivers up to date?  Check on the Intel site.  
If only one node repeatedly freezes (the same node), you might look at 
taking it out of the cluster, and seeing if that improves the 
situation.  If it does, swap the one you took out, with one that is 
still in there, and see if the problem returns.  This will help you 
determine if the problem is node based or system based.

Joe

Richard Brown wrote:

>I have been try to figure this out for the past two
>months with no luck.
>
>I have a 8-node PC cluster that consists of 16 athlon
>mp2200+, msi k7d master-l mb, intel i82557/i82558
>10/100 on-board lan, 500mb kingston ddr266 pc2100
>unbuffered, 3com superstack III baseline 24 port
>10/100 switch.
>
>The cluster was built using oscar2.1/redhat7.3 w/ the
>kernel update 2.4.20-20. namd used includes 2.5b1 and
>the latest 2.5, both linux binary distributions and
>source code builds. the simulation tested is apoa1
>benchmark example.
>
>namd/apoa1 only runs w/o problems on a single cluster
>node, either with one or two cpus. Every time it runs
>on two or more nodes, either using one or two cpus
>from each node, namd/apoa1 stops somewhere in the
>middle of run. One of the nodes freezes and does not
>respond to ping, ssh or the directly attached
>keyboard. Most of the time there were no error
>messages. A few times I received apic error or sorcket
>receive failure. I tried plugging a ps/2 mouse into
>the nodes as some people suggested for a bug of the
>motherboad but it did not help.
>
>I don't know how to proceed from here. Any suggestions
>would be appreciated.
>
>Thanks,
>Richard
>
>
>__________________________________
>Do you Yahoo!?
>The New Yahoo! Shopping - with improved product search
>http://shopping.yahoo.com
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  
>

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jschauma at netbsd.org  Mon Oct 20 11:03:50 2003
From: jschauma at netbsd.org (Jan Schaumann)
Date: Mon, 20 Oct 2003 11:03:50 -0400
Subject: New tech-cluster mailing list for NetBSD
Message-ID: <20031020150350.GA26140@netmeister.org>

Hello,

A new tech-cluster at netbsd.org mailing list has been created.  As the
name suggests, this list is intended for technical discussions on
building and using clusters of NetBSD hosts.  Initially, this list is
expected to be of low volume, but we hope to advocate and advance the
use of NetBSD in such environments significantly.

Subscription is via majordomo -- please see
http://www.NetBSD.org/MailingLists/ for details.

-Jan

-- 
http://www.netbsd.org -
         Multiarchitecture OS, no hype required.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Oct 20 14:03:23 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 20 Oct 2003 14:03:23 -0400 (EDT)
Subject: Solaris Fire Engine.
In-Reply-To: <200310202343.56524.mof@labf.org>
Message-ID: <Pine.LNX.4.44.0310201351400.1674-100000@training.scyld.com>

On Mon, 20 Oct 2003, Mof wrote:

> http://www.theregister.co.uk/content/61/33440.html
> 
> ... "We worked hard on efficiency, and we now measure, at a given network 
> workload on identical x86 hardware, we use 30 percent less CPU than Linux."

Linux uses much more CPU per packet than it used to.  The structural
change for IPtable/IPchains capability is very expensive, even when it
is not used.  And there have been substantial, CPU-costly changes to protect
against denial-of-service attacks at many levels.  The only protocol
stack changes that might benefit cluster use are sendfile/zero-copy, and
that doesn't apply to most current hardware or typical cluster message
passing.

I would be technially easy to revert to the interface of old Linux
kernels and see much better than a 30% CPU reduction, but it's very
unlikely that would happen politically: Linux development is
feature-driven, not performance-driven.  And that's easy to understand
when your pet feature is at stake, or there is a news story of "Linux
Kernel Vulnerable to <obscure attack #452844>".


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kinghorn at pqs-chem.com  Mon Oct 20 13:36:28 2003
From: kinghorn at pqs-chem.com (Donald B. Kinghorn)
Date: Mon, 20 Oct 2003 12:36:28 -0500
Subject: parllel eigen solvers
Message-ID: <200310201236.28901.kinghorn@pqs-chem.com>

Does anyone know of any recent progress on parallel eigensolvers suitable for 
beowulf clusters running over gigabit ethernet?

 It would be nice to have something that scaled moderately well and at least 
gave reasonable approximations to some subset of eigenvalues and vectors for 
large (10,000x10,000) symmetric systems.

My interests are primarily for quantum chemistry.

It's pretty obvious that you can compute eigenvectors in parallel after you 
get the eigenvalues but it would be nice to get eigenvalues mostly in 
parallel requiring maybe just a couple of serial iterates ...

Best regards to all
-Don
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From czarek at sun1.chem.univ.gda.pl  Mon Oct 20 15:08:21 2003
From: czarek at sun1.chem.univ.gda.pl (Cezary Czaplewski)
Date: Mon, 20 Oct 2003 21:08:21 +0200 (CEST)
Subject: some ab initio benchmarks
In-Reply-To: <20031020121619.GM8711@unthought.net>
Message-ID: <Pine.LNX.4.44.0310201459170.27198-100000@sun1.chem.univ.gda.pl>

On Mon, 20 Oct 2003, Jakob Oestergaard wrote:

> Could you please specify which version of which operating system was
> used for this?

Opteron machine (dahlia) was a prototype which dr Paulette Clancy got for
evaluation from local computer shop. It had RedHat GinGin 64 operating
system preistalled when I did testing.

> If the kernel does not have NUMA scheduling, the Opterons are severely
> disadvantaged - it would be useful to know.

I don't remember which kernel was installed when I did benchmarks, I
suppose standard kernel which is coming with GinGin64. Machine should
have SuSE installed now so I cannot check it. I will repeat benchmarks
with PGI 5 64bit compiler and SuSE when I will have some time. 


				czarek


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Mon Oct 20 17:50:56 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Mon, 20 Oct 2003 14:50:56 -0700
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <16275.13253.833239.996985@random.tigertiger.de>
References: <20031018013957.GA3774@greglaptop.PEATEC.COM>
 <Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
 <20031018013957.GA3774@greglaptop.PEATEC.COM>
Message-ID: <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov>

At 09:00 PM 10/19/2003 -0400, Christoph Best wrote:


>BTW, who or what is behind ClearSpeed? Their Bristol address is
>identical to Infineon's Design Centre there, and Hewlett Packard seems
>to have a lab there, too. If they have that kind of support, I am sure
>they thought hard before making these design choices, and it may just
>be tarketed at certain problems (vector/matrix/FFT-like stuff).

Off their web site...http://www.clearspeed.com/about.php?team
The CEO and president are marketing oriented (CEO: "he focused on taking 
new technologies to market", President: "..successfully grown glabal sales 
mangement and field application organizations and instrumental in creating 
key partnership agreements".

The CTO (Ray McConnell) does parallel processing with 300K processors, etc. 
VP Engr (Russell David) designed mixed signal baseband ICs for wireless 
market.  I didn't turn up any papers in the IEEE on-line library, but 
that's not particularly signficant, in and of itself.

McConnell has a paper 
http://www.hotchips.org/archive/hc11/hc11pres_pdf/hc99.s3.2.McConnell.pdf 
shows architectures from PixelFusion, Ltd... SIMD core with 32 bit embedded 
processor running a 256 PE "Fuzion block". Each PE has an 8 bit ALU and 
2kByte PE memory... (sound familiar?)
 From "Hot Chips 99"


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Mon Oct 20 18:46:31 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Tue, 21 Oct 2003 00:46:31 +0200 (CEST)
Subject: Solaris Fire Engine.
In-Reply-To: <Pine.LNX.4.44.0310201351400.1674-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.44.0310210036060.17148-100000@kenzo.iwr.uni-heidelberg.de>

On Mon, 20 Oct 2003, Donald Becker wrote:

> The only protocol stack changes that might benefit cluster use are
> sendfile/zero-copy, and that doesn't apply to most current hardware or
> typical cluster message passing.

Has actually somebody tried to use sendfile in MPICH or LAM-MPI ? I 
planned to do it, but this is somewhere in the middle of my always growing 
TODO queue... Recipes for how to use it were posted a few times at least 
on netdev list, so those interested can find them easily.

> I would be technially easy to revert to the interface of old Linux
> kernels and see much better than a 30% CPU reduction, but it's very
> unlikely that would happen politically:

But there are many projects that live outside the official kernel, the 
Scyld network drivers being one good example. What's wrong with replacing 
the IP stack with one maintained separately with performance in mind ?
I agree though that this would mean somebody to take care of it and make 
sure that it works with newer kernels...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Oct 20 19:08:12 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 20 Oct 2003 19:08:12 -0400 (EDT)
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>

On Mon, 20 Oct 2003, Trent Piepho wrote:

> I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> messages I've sent to this list has started bouncing back to me from
> dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> day.  Is anyone else having this problem?

Oh yes, and it is a SERIOUS problem.  I was just mulling on the right
procmail recipe to consign this domain to the dark depths of hell, but
if it were done at the list level instead it would only be a good thing.
My .procmailrc is already getting quite long indeed.

BTW, you (and of course the rest of the list) are just the man to ask;
what is the status of Opterons and fortran compilers.  I myself don't
use fortran any more, but a number of folks at Duke do, and they are
starting to ask what the choices are for Opterons.  A websearch reveals
that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
Opteron fortran, but rumor also suggests that a number of these are
really "beta" quality with bugs that may or may not prove fatal to any
given project.  Then there is Gnu.

Any comments on any of these from you (or anybody, really)?  Is there a
functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
Do the compilers permit access to large (> 3GB) memory, do they optimize
the use of that memory, do they support the various SSE instructions?

I'm indirectly interested in this as it looks like I'm getting Opterons
for my next round of cluster purchases personally, although I'll be
using C on them (hopefully 64 bit Gnu C).

   rgb

> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Mon Oct 20 18:52:03 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: Mon, 20 Oct 2003 18:52:03 -0400
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
References: 	 <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
Message-ID: <1066690323.7027.17.camel@roughneck.liniac.upenn.edu>

On Mon, 2003-10-20 at 18:41, Trent Piepho wrote:
> I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> messages I've sent to this list has started bouncing back to me from
> dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> day.  Is anyone else having this problem?

Yes -- quite annoying :/

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Mon Oct 20 18:41:51 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Mon, 20 Oct 2003 15:41:51 -0700 (PDT)
Subject: flood of bounces from postmaster@systemsfirm.net
Message-ID: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>

I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
messages I've sent to this list has started bouncing back to me from
dan at systemsfirm.com.  I'm getting about ten copies of each one every other
day.  Is anyone else having this problem?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Oct 20 20:08:31 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 20 Oct 2003 20:08:31 -0400 (EDT)
Subject: Solaris Fire Engine.
In-Reply-To: <Pine.LNX.4.44.0310210036060.17148-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.LNX.4.44.0310201957050.1674-100000@training.scyld.com>

On Tue, 21 Oct 2003, Bogdan Costescu wrote:
> On Mon, 20 Oct 2003, Donald Becker wrote:
> 
> > The only protocol stack changes that might benefit cluster use are
> > sendfile/zero-copy, and that doesn't apply to most current hardware or
> > typical cluster message passing.
> 
> Has actually somebody tried to use sendfile in MPICH or LAM-MPI ? I 
> planned to do it, but this is somewhere in the middle of my always growing 
> TODO queue... Recipes for how to use it were posted a few times at least 
> on netdev list, so those interested can find them easily.

The trick is to
   memory map a file
   use that memory region as message buffers
   send the message buffers using sendfile()

My belief is that the page locking involved with sendfile() would be too
costly for anything smaller than about 32KB.  While I'm certain that
there are a few MPI applications that use messages that large, they
don't seem to be typical.

> But there are many projects that live outside the official kernel, the 
> Scyld network drivers being one good example. What's wrong with replacing 
> the IP stack with one maintained separately with performance in mind ?
> I agree though that this would mean somebody to take care of it and make 
> sure that it works with newer kernels...

>From my experience trying to keep the network driver interface stable, I
very much doubt that it would be possible to separately maintain a
network protocol stack.  Especially since it would be perceived as
competition with the in-kernel version, which brings out the worst
behavior...

As a specific example, a few years ago we had cluster performance
patches for the 2.2 kernel.  Even while the 2.3.99 development was going
on, the 2.2 kernel changed too quickly to keep those patches up to date
and tested.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From cb4 at tigertiger.de  Mon Oct 20 19:31:33 2003
From: cb4 at tigertiger.de (Christoph Best)
Date: Tue, 21 Oct 2003 01:31:33 +0200
Subject: A Petaflop machine in 20 racks?
In-Reply-To: <5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov>
References: <20031018013957.GA3774@greglaptop.PEATEC.COM>
	<Pine.LNX.4.44.0310161609090.2883-100000@admin.jfet.net>
	<5.2.0.9.2.20031020142958.030cd5d0@mailhost4.jpl.nasa.gov>
Message-ID: <16276.28757.858683.189030@random.tigertiger.de>

Jim Lux writes:
 > At 09:00 PM 10/19/2003 -0400, Christoph Best wrote:
 > >BTW, who or what is behind ClearSpeed? Their Bristol address is
 > >identical to Infineon's Design Centre there, and Hewlett Packard seems
 > >to have a lab there, too. If they have that kind of support, I am sure
 > >they thought hard before making these design choices, and it may just
 > >be tarketed at certain problems (vector/matrix/FFT-like stuff).
 > 
 > The CTO (Ray McConnell) does parallel processing with 300K processors, etc. 
 > VP Engr (Russell David) designed mixed signal baseband ICs for wireless 
 > market.  I didn't turn up any papers in the IEEE on-line library, but 
 > that's not particularly signficant, in and of itself.

I actually found some more info about them: Clearspeed used to be
Pixelfusion, a spin-off from Inmos, who made the original Transputer.
  http://www.eetimes.com/sys/news/OEG20010524S0044
Clearspeed tried to design a SIMD processor called Fuzion for graphics
applications, then around 2001 turned to the networking sector, and
now it seems to high-performance computing. So its a processor in
search of an application. 
  http://www.eetimes.com/semi/news/OEG20000208S0039
  http://www.eetimes.com/semi/news/OEG19990512S0012
Poor guys went through at least three CEOs during the last four
years...
-Christoph
-- 
Christoph Best                                      cbst at tigertiger.de
Bioinformatics group, LMU Muenchen                http://tigertiger.de/cb
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Mon Oct 20 20:33:23 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Mon, 20 Oct 2003 17:33:23 -0700 (PDT)
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.3.96.1031020173014.11380D-100000@Maggie.Linux-Consulting.com>


hi ya trent

just add the ip# of systemsfirm.net to your  /etc/mail/access files

	# a polite msg i added for them/somebody to see ..
	systemsfirm.net   REJECT - geez .. do you need help to fix your PC

	cd /etc/mail ; make ; restart-sendmail or your exim or ...

c ya
alvin

and about 75% or more of the sven virus is coming from
mis-managed/mis-configured clusters
	http://www.Linux-Sec.net/MSJunk


On Mon, 20 Oct 2003, Trent Piepho wrote:

> I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> messages I've sent to this list has started bouncing back to me from
> dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> day.  Is anyone else having this problem?
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Mon Oct 20 23:34:44 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Mon, 20 Oct 2003 20:34:44 -0700 (PDT)
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.04.10310201539050.21831-100000@12-207-199-254.client.attbi.com>
Message-ID: <Pine.LNX.4.44.0310202033530.8262-100000@twin.uoregon.edu>

yes...

I've tried contacting the admin contact for that domain and got no 
response...

joelja

On Mon, 20 Oct 2003, Trent Piepho wrote:

> I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> messages I've sent to this list has started bouncing back to me from
> dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> day.  Is anyone else having this problem?
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From pesch at attglobal.net  Tue Oct 21 08:06:56 2003
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Tue, 21 Oct 2003 05:06:56 -0700
Subject: Solaris Fire Engine.
References: <Pine.LNX.4.44.0310201351400.1674-100000@training.scyld.com>
Message-ID: <3F95215F.9DE43BD9@attglobal.net>

In a cluster, would it not make more sense to catch an attack in a firewall rather than at the kernel level? If
so, should cluster builders  perhaps look for other - more cluster specific - kernels? Should kernel development
at some point split in two distinct lines: one for single computer applications and one for clusters?

Paul Schenker

Donald Becker wrote:

> On Mon, 20 Oct 2003, Mof wrote:
>
> > http://www.theregister.co.uk/content/61/33440.html
> >
> > ... "We worked hard on efficiency, and we now measure, at a given network
> > workload on identical x86 hardware, we use 30 percent less CPU than Linux."
>
> Linux uses much more CPU per packet than it used to.  The structural
> change for IPtable/IPchains capability is very expensive, even when it
> is not used.  And there have been substantial, CPU-costly changes to protect
> against denial-of-service attacks at many levels.  The only protocol
> stack changes that might benefit cluster use are sendfile/zero-copy, and
> that doesn't apply to most current hardware or typical cluster message
> passing.
>
> I would be technially easy to revert to the interface of old Linux
> kernels and see much better than a 30% CPU reduction, but it's very
> unlikely that would happen politically: Linux development is
> feature-driven, not performance-driven.  And that's easy to understand
> when your pet feature is at stake, or there is a news story of "Linux
> Kernel Vulnerable to <obscure attack #452844>".
>
> --
> Donald Becker                           becker at scyld.com
> Scyld Computing Corporation             http://www.scyld.com
> 914 Bay Ridge Road, Suite 220           Scyld Beowulf cluster system
> Annapolis MD 21403                      410-990-9993
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From zby at tsinghua.edu.cn  Mon Oct 20 22:34:54 2003
From: zby at tsinghua.edu.cn (Baoyin Zhang)
Date: Tue, 21 Oct 2003 10:34:54 +0800
Subject: Jcluster toolkit v 1.0 releases!
Message-ID: <266703288.27688@mail.tsinghua.edu.cn>

            Apologies if you receive multiple copies of this message.

Dear all,

  I am pleased to annouce the Jcluster Toolkit (Ver 1.0) releases, you can 
freely download it from the website below.

http://vip.6to23.com/jcluster/

The toolkit is a high performance Java parallel environment, implemented in 
pure java. It provides you the popular PVM-like and MPI-like 
message-passing interface, automatic task load balance across large-scale 
heterogeneous cluster and high performance, reliable multithreaded 
communications using UDP protocol.

In the version 1.0, Object passing interface is added into PVM-like and MPI-like message passing interface, 
and provide very convenient deployment -- the classes of user application only need to be
deployed at one node in a large-scale cluster.

I welcome your comments, suggestions, cooperation, and involvement in 
improving the toolkit.

Best regards

Baoyin Zhang   


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Oct 21 08:20:10 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 21 Oct 2003 08:20:10 -0400 (EDT)
Subject: Solaris Fire Engine.
In-Reply-To: <3F95215F.9DE43BD9@attglobal.net>
Message-ID: <Pine.LNX.4.44.0310210653090.2827-100000@lilith.rgb.private.net>

On Tue, 21 Oct 2003 pesch at attglobal.net wrote:

> In a cluster, would it not make more sense to catch an attack in a firewall rather than at the kernel level? If
> so, should cluster builders  perhaps look for other - more cluster specific - kernels? Should kernel development
> at some point split in two distinct lines: one for single computer applications and one for clusters?

It's the usual problem (and a continuation of my XML rant in a way, as
it is at least partly motivated by this).  Sure, one can do this.
However, it is very, very expensive to do so, a classic case of 90% of
the work producing 10% of the benefit, if that.  

As Don pointed out, even Scyld, with highly talented people who are (in
principle:-) even making money doing so found maintaining a separate
kernel line crushingly expensive very quickly.  Whenever expense is
mentioned, especially in engineering, one has to consider benefit, and
do a CBA.  The CBA is the crux of all optimization theory; find the
point of diminishing returns and stay there.  I would argue that
splitting the kernel is WAY far beyond that point.  Folks who agree can
skip the editorial below.  For that matter, so can folks who
disagree...;-)

The expense can be expressed/paid one of several ways -- get a distinct
kernel optimized and stable, get an entire associated distribution
optimized and stable, and then freeze everything except for bugfixes.
You then get a local optimum (after a lot of work) that doesn't take a
lot of work to maintain, BUT you pay the penalty of drifting apart from
the rest of linux and can never resynchronize without redoing all that
work (and accepting all that new expense).  New, more efficient gcc?
Forget it -- the work of testing it with your old kernel costs too much.
New device drivers?  Hours to days of testing for each one.  Eventually
a key application or improvement appears in the main kernel line (e.g.
64 bit, Opteron support) that is REALLY different, REALLY worth more to
nearly everybody than the benefit they might or might not gain from the
custom HPC optimized kernel, and your optimized but stagnant kernel is
abandoned.

Alternatively, you can effectively twin the entire kernel development
cycle, INCLUDING the testing and debugging.  Back in my ill-spent youth
I spent a considerable amount of time on the linux-smp list (I couldn't
take being on the main linux kernel list even then, as its traffic
dwarfs both the beowulf list and the linux-smp list combined).  I also
played a tiny bit with drivers on a couple of occassions.  The amount of
work, and number of human volunteers, required to drive these processes
is astounding, and I would guess that it would have to be done on
twinned lists as the kernelvolken would likely not welcome a near
doubling of traffic on their lists or doubling of the work burden trying
to figure out just who owns a given emergent bug (and inevitably they
WOULD have to help figure out who owns emergent bugs, as some of them
WOULD belong to them, others to the group supporting the split off
sources, if they were to proceed independently but "keep up" with the
development kernel so that true divergence did not occur).

A better alternative exists (and is even used to some extent).  The
linux kernel is already highly modular.  It is already possible to e.g.
bypass the IP stack altogether (as is done by myrinet and other high
speed networks) with custom device drivers that work below the IP and
TCP layers -- just doing this saves you a lot of the associated latency
hit in high speed networks, as TCP/IP is designed for WAN routing and
security and tends to be overkill for a secure private LAN IPC channel
in a beowulf.  This route requires far less maintenance and
customization -- specialized drivers for MPI and/or PVM and/or a network
socket layer, plus a kernel module or three.  Even this is "expensive"
and tends to be done only by companies that make hefty marginal profits
for their specific devices, but it is FAR cheaper than maintaining a
separate kernel altogether.  

I would also lump into this group applying and testing on an ad hoc
basis things like Josip's network optimization patches which make
relatively small, relatively specific changes that might technically
"break" a kernel for WAN application but can produce measureable
benefits for certain classes of communication pattern.  This sort of
thing is NOT for everybody.  It is like a small scale version of the
first alternative -- the patches tend to be put together for some
particular kernel revision and then frozen (or applied "blindly" to
succeeding kernel revisions until they manifestly break).  Again this
motivates one to freeze kernel and distribution once one gets everything
working and live with it until advances elsewhere make it impossible to
continue doing so.  This is the kind of thing where MAYBE one could get
the patches introduced into the mainstream kernel sources in a form that
was e.g.  sysctl controllable -- "modular", as it were, but inside the
non-modular part of the kernel as a "proceed at your own risk" feature.

Expense alternatives in hand, one has to measure benefit.  We could
break up HPC applications very crudely into groups.  One group is code
that is CPU bound -- where the primary/only bottleneck is the number of
double precision floating point (and associated integer) computations
that the computer can retire per second.  Another might be memory bound
-- limited primarily by the speed with which the system can move values
into and out of memory doing some simple operations on them in the
meantime.  Still another might be disk or other non-network I/O bound
(people who crunch large data sets to and from large storage devices).
Finally yes, one group might be bound by the network and network based
IPC's in a parallel division of a program.

This latter group is the ONLY group that would really benefit from the
kernel split; the rest of the kernel is reasonably well optimized for
raw computations, memory access, and even hardware device access (or can
be configured and tuned to be without the need of a separate kernel
line). I would argue that even the network group splits again, into
latency limited and bandwidth limited.  Bandwidth limited applications
would again see little benefit from a hacked kernel split as TCP can
deliver data throughput that is roughly 90% of wire speed (or better)
for ethernet, depending on the quality of hardware as much as the
kernel.  Of course, the degree of the CPU's involvement in sending and
receiving these messages could be improved; one would like to be able to
use DMA as much as possible to send the messages without blocking the
CPU, but this matters only if the CPU can do something useful while
awaiting the network IPC transfers; often it cannot.

The one remaining group that would significantly benefit is the latency
limited group -- true network parallel applications that need to send
lots of little messages that cannot be sensibly aggregated in software.
The benefit there could be profound, as the TCP stack adds quite a lot
of latency (and CPU load) on top of the irreducible hardware latency,
IIRC, even on a switched network where the CPU doesn't have to deal with
a lot of spurious network traffic.  Are there enough members of this
group to justify splitting the kernel?  I very much doubt it.  I don't
even think that the existence of this group has motivated the widespread
adoption of a non-IP ethernet transport layer -- nearly everybody just
lives with the IP stack latency OR...

...uses one of the dedicated HPC networks.

This is the real kicker.  TCP latency is almost two orders of magnitude
greater than either myrinet or dolphin/sci latency (which are both order
of microseconds instead of order of hundreds of microseconds).  They
>>also<< deliver very high bandwidth.  Sure, they are expensive, but you
know that you are paying for precisely what YOU need for YOUR HPC
computations.  I don't have to pay for them (even indirectly, by helping
out with a whole secondary kernel development track) when MY code is CPU
bound; the big DB guys don't have to pay for it when THEIR code depends
on how long it takes to read in those ginormous databases of e.g.
genetic data; the linear algebra folks who need large, fast memory don't
pay for it (unless they try splitting up their linear algebra across the
network, of course:-) -- it is paid for only the people who need it, who
send lots of little messages or who need its bleeding edge bandwidth or
both.

One COULD ask, very reasonably, for just about any of the kernel
optimizations that can be implemented at the modular level -- that is a
matter of writing the module, accepting responsibility for its
integration into the kernel and sequential debugging in perpetuity (that
is, becoming a slave of the lamp, in perpetuity bound to the kernel
lists:-).  Alas, TCP/IP is so bound up inside the main part of the
kernel that I don't think it can be separated out into modules any more
than it already is.

^^^^^ ^^^^^, (closing omitted in the fond hope of remuneration)

    rgb

(C'mon now -- here I am omitting all sorts of words from my rants and my
paypal account is still dry as a bone, dry as a desert, bereft of all
money, parched as my throat in the noonday sun.  Seriously, either I
make some money or I'm gonna compose a 50 kiloword opus for my next
one...:-)

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Tue Oct 21 08:46:57 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Tue, 21 Oct 2003 14:46:57 +0200 (CEST)
Subject: Solaris Fire Engine.
In-Reply-To: <Pine.LNX.4.44.0310201957050.1674-100000@training.scyld.com>
Message-ID: <Pine.LNX.4.44.0310211358520.29398-100000@kenzo.iwr.uni-heidelberg.de>

On Mon, 20 Oct 2003, Donald Becker wrote:

> My belief is that the page locking involved with sendfile() would be too
> costly for anything smaller than about 32KB.

IIRC, both MPICH and LAM-MPI make the distinction between small and large
messages with the default cutoff being 64KB. So large messages could be
sent this way... I don't know what you meant with "too costly", but small
messsages are not too costly to copy in the stack (normal behaviour)  
especially with increasing cache sizes of today CPUs, while the large ones
(where copying time would be significant) could be sent without the extra
copy in the stack.

> While I'm certain that there are a few MPI applications that use
> messages that large, they don't seem to be typical.

... or might not care that much about the speedup.

> >From my experience trying to keep the network driver interface stable, I
> very much doubt that it would be possible to separately maintain a
> network protocol stack.

Well, it was late last night and probably I haven't chosen the most 
appropriate example... the Scyld network drivers are maintained by one 
person, while my suggestion was more going toward a community project.

> Especially since it would be perceived as competition with the in-kernel
> version, which brings out the worst behavior...

Yeah, political issues - I think that making the intent clear would solve
the problem: there is no competition, it serves a completely different
purpose. And given what you wrote in the previous e-mail about
"feature-driven", who would use it on normal computers when it misses
several "high-profile features" like iptables ? Even more, if it's clear
that it should only be used on local fast networks, several aspects of the
stack can be optimized without fear of breaking very high latency
(satellite) or very low bandwidth (phone modems) connections. But I guess 
that I should stop dreaming :-)

> As a specific example, a few years ago we had cluster performance
> patches for the 2.2 kernel.

Those maintained by Josip Loncaric ? Again it was a one-man show. 

I think that this is exactly the problem: there are small projects
maintained by one person but which depend on the free time or interest of
this person. Given that the clustering had moved from research-only into a
lucrative bussiness and that the software (Linux kernel, MPI libraries,
etc.) evolved quite a lot and the entry barrier into let's say kernel
programing is quite high, it's normal that not many people want to make 
the step. I already expressed about a year ago my oppinion that such 
projects can only be carried forward by companies that benefit from them 
or universities where work from students comes for free. But it seems that 
there are no companies thinking that they can benefit or universities 
where students' work is for free...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Tue Oct 21 09:31:37 2003
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Tue, 21 Oct 2003 09:31:37 -0400 (EDT)
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310210908260.28819-100000@chaos.egr.duke.edu>

On Mon, 20 Oct 2003 at 7:08pm, Robert G. Brown wrote

> On Mon, 20 Oct 2003, Trent Piepho wrote:
> 
> > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> > messages I've sent to this list has started bouncing back to me from
> > dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> > day.  Is anyone else having this problem?
> 
> BTW, you (and of course the rest of the list) are just the man to ask;
> what is the status of Opterons and fortran compilers.  I myself don't
> use fortran any more, but a number of folks at Duke do, and they are
> starting to ask what the choices are for Opterons.  A websearch reveals
> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
> Opteron fortran, but rumor also suggests that a number of these are
> really "beta" quality with bugs that may or may not prove fatal to any
> given project.  Then there is Gnu.
> 
> Any comments on any of these from you (or anybody, really)?  Is there a
> functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
> Do the compilers permit access to large (> 3GB) memory, do they optimize
> the use of that memory, do they support the various SSE instructions?

Well, this is as good a place as many to put up the benchmarks I ran using 
DYNA (a commercial FEM code from LSTC, first developed at LLNL, and 
definitely Fortran):

http://www.duke.edu/~jlb17/bench-results.pdf

According to their docs, the 32bit binary was compiled using ifc6.0.  The 
slowdown in the newer point release is due to them dialing back the 
optimizations due to compiler bugs.  The 64bit Opteron binary was compiled 
using PGI, but that's all I know about it.

To sum it up, I bought some Opterons.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Tue Oct 21 09:41:53 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Tue, 21 Oct 2003 15:41:53 +0200 (CEST)
Subject: Solaris Fire Engine.
In-Reply-To: <Pine.LNX.4.44.0310211358520.29398-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.LNX.4.44.0310211537150.29398-100000@kenzo.iwr.uni-heidelberg.de>

On Tue, 21 Oct 2003, Bogdan Costescu wrote:

> But I guess that I should stop dreaming :-)

Well, either I'm not dreaming, or somebody else is dreaming too :-)
Below are some fragments of e-mails from David Miller (one of the Linux 
network maintainers) to netdev today:

> People on clusters use their own special clustering hardware and
> protocol stacks _ANYWAYS_ because ipv4 is too general to serve their
> performance needs.  And I think that is a good thing rather than
> a bad thing.  People should use specialized solutions if that is the
> best way to attack their problem.

...

> The things cluster people want is totally against what a general
> purpose IPV4 implementation should do.  Linux needs to provide a
> general purpose IPV4 stack that works well for everybody, not just
> cluster people.
>
> I'd rather have millions of servers using my IPV4 stack than a handful
> of N-thousand system clusters.
> ...
> Sure, many people would like to simulate the earth and nuclear weapons
> using Linux, but I'm sure as hell not going to put features into the
> kernel to help them if such features hurt the majority of Linux users.


-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Tue Oct 21 11:19:22 2003
From: becker at scyld.com (Donald Becker)
Date: Tue, 21 Oct 2003 11:19:22 -0400 (EDT)
Subject: flood of bounces from postmaster@systemsfirm.net
In-Reply-To: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310211108480.1674-100000@training.scyld.com>

On Mon, 20 Oct 2003, Robert G. Brown wrote:

> On Mon, 20 Oct 2003, Trent Piepho wrote:
> 
> > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> > messages I've sent to this list has started bouncing back to me from
> > dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> > day.  Is anyone else having this problem?
> 
> Oh yes, and it is a SERIOUS problem.  I was just mulling on the right

There are many more problems that list readers do not see.  I delete the
address from the list only when the problem is persistent.
The major problem happens when messages take a few days to bounce, and
the bounce does not follow standards.  In that case there are dozens of
messages in the remote queue, and they all appears to be replies by a
valid list subscriber.

> BTW, you (and of course the rest of the list) are just the man to ask;
> what is the status of Opterons and fortran compilers.  I myself don't
> use fortran any more, but a number of folks at Duke do, and they are
> starting to ask what the choices are for Opterons.  A websearch reveals
> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
> Opteron fortran, but rumor also suggests that a number of these are
> really "beta" quality with bugs that may or may not prove fatal to any
> given project.

A surprising amount of 64 bit software (certainly not limited to the
Opteron) is still not mature enough for general purpose use.  It still
requires more development and testing to get to the stability level
required for real deployment.  And it's not the "64 bit" nature of the
software, since we did have reasonable maturity on the Alpha years ago.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From edwardsa at plk.af.mil  Tue Oct 21 11:06:37 2003
From: edwardsa at plk.af.mil (Arthur H. Edwards)
Date: Tue, 21 Oct 2003 09:06:37 -0600
Subject: parllel eigen solvers
In-Reply-To: <200310211049.OAA18031@nocserv.free.net>
References: <200310201236.28901.kinghorn@pqs-chem.com> <200310211049.OAA18031@nocserv.free.net>
Message-ID: <20031021150637.GA8076@plk.af.mil>

I should point out that density function theorcan be compute-bound on
diagonalization. QUEST, a Sandia Code, easily handles several hundred
atoms, but the eigen solve dominates by ~300-400 atoms. Thus,
intermediate size diagonalization is of strong interest.

Art Edwards

On Tue, Oct 21, 2003 at 02:49:07PM +0400, Mikhail Kuzminsky wrote:
> According to Donald B. Kinghorn
> > 
> > Does anyone know of any recent progress on parallel eigensolvers suitable for 
> > beowulf clusters running over gigabit ethernet?
> >  It would be nice to have something that scaled moderately well and at least 
> > gave reasonable approximations to some subset of eigenvalues and vectors for 
> > large (10,000x10,000) symmetric systems.
> > My interests are primarily for quantum chemistry.
> >
>   In the case you think about semiempirical fockian diagonalisation,
> there is a set of alternative methods for direct construction of density
> matrix avoiding preliminary finding of eigenvectors. This methods
> are realized, in particular, in Gaussian-03 and MOPAC-2002 methods.
>   
>   For non-empirical quantum chemistry diagonalisation usually doesn't limit
> common performance. In the case of methods like CI it's necessary to
> find only some eigenvectors, and it is better to use special diagonalization
> methods. 
> 
>   There is special parallel solver package, but I don't have exact
> reference w/me :-(
> 
> Mikhail Kuzminsky
> Zelinsky Inst. of Orgamic Chemistry
> Moscow
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Art Edwards
Senior Research Physicist
Air Force Research Laboratory
Electronics Foundations Branch
KAFB, New Mexico

(505) 853-6042 (v)
(505) 846-2290 (f)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eccf at super.unam.mx  Tue Oct 21 15:32:05 2003
From: eccf at super.unam.mx (Eduardo Cesar Cabrera Flores)
Date: Tue, 21 Oct 2003 13:32:05 -0600 (CST)
Subject: shift bit & performance?
In-Reply-To: <200310211603.h9LG3cA22580@NewBlue.scyld.com>
Message-ID: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>


Hi, 

sometime ago, somebody sent an info about performance working with "<<" & 
">>" doing shift bits instead of using "*" or "/"
 Could anybody help me about it?


cafe


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glen at mail.cert.ucr.edu  Tue Oct 21 15:21:24 2003
From: glen at mail.cert.ucr.edu (Glen Kaukola)
Date: Tue, 21 Oct 2003 12:21:24 -0700
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <Pine.LNX.4.44.0310210908260.28819-100000@chaos.egr.duke.edu>
References: <Pine.LNX.4.44.0310210908260.28819-100000@chaos.egr.duke.edu>
Message-ID: <3F958734.2030300@cert.ucr.edu>


>>On Mon, 20 Oct 2003, Trent Piepho wrote:
>>    
>>
>>BTW, you (and of course the rest of the list) are just the man to ask;
>>what is the status of Opterons and fortran compilers.  I myself don't
>>use fortran any more, but a number of folks at Duke do, and they are
>>starting to ask what the choices are for Opterons.  A websearch reveals
>>that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
>>Opteron fortran, but rumor also suggests that a number of these are
>>really "beta" quality with bugs that may or may not prove fatal to any
>>given project.  Then there is Gnu.
>>
>>Any comments on any of these from you (or anybody, really)?  Is there a
>>functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
>>Do the compilers permit access to large (> 3GB) memory, do they optimize
>>the use of that memory, do they support the various SSE instructions?
>>

I can tell you about PGI's compilers.  They are kinda beta quality as 
you say.  As of now they only want to install on Suse enterprise 
edition.  Although a little fiddling around with the install scripts and 
you can get them to install on other distributions.  But even though you 
can get the compilers installed, they only seem to run on the Suse beta 
for opterons.  PGI says this should all change in the near future 
though.  As far as the code that the compilers produce, we haven't had 
any problems at all as far as I know of.  The great thing about PGI 
compilers though is that you can download them and try them out for free 
for 15 days or so and see for yourself.

As far as the Gnu Fortran compiler goes, it seems to work great on 
Opterons too.  But then as you're probably aware, it's only a Fortran 77 
compiler.


Cheers,
Glen

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dtj at uberh4x0r.org  Tue Oct 21 15:33:33 2003
From: dtj at uberh4x0r.org (Dean Johnson)
Date: 21 Oct 2003 14:33:33 -0500
Subject: shift bit & performance?
In-Reply-To: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>
References: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>
Message-ID: <1066764813.27603.4.camel@terra>

On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote:
> Hi, 
> 
> sometime ago, somebody sent an info about performance working with "<<" & 
> ">>" doing shift bits instead of using "*" or "/"
>  Could anybody help me about it?
> 

There is certainly performance to be had from using a logical shift instead of a 
multiply or divide, but its of declining value. I am fairly sure that with modern
compilers you do a integer divide by a constant power of 2, that it will generate
a logical shift. That aint rocket science.

	-Dean

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Tue Oct 21 16:32:07 2003
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Tue, 21 Oct 2003 15:32:07 -0500
Subject: shift bit & performance?
In-Reply-To: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>; from eccf@super.unam.mx on Tue, Oct 21, 2003 at 01:32:05PM -0600
References: <200310211603.h9LG3cA22580@NewBlue.scyld.com> <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx>
Message-ID: <20031021153207.N31870@mikee.ath.cx>

On Tue, 21 Oct 2003, Eduardo Cesar Cabrera Flores wrote:

> 
> 
> Hi, 
> 
> sometime ago, somebody sent an info about performance working with "<<" & 
> ">>" doing shift bits instead of using "*" or "/"
>  Could anybody help me about it?

The operations << and >> are closer to assembler operations
for integer values than * and /. If using * or / there are
many assembler instructions to compute the new values. When
using power of 2s for * or / then << and >> are much faster.

Mike
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bhalevy at panasas.com  Tue Oct 21 17:35:06 2003
From: bhalevy at panasas.com (Halevy, Benny)
Date: Tue, 21 Oct 2003 17:35:06 -0400
Subject: shift bit & performance?
Message-ID: <30489F1321F5C343ACF6872B2CF7942A039DF8BC@PIKES.panasas.com>

Could be meaningful on a 32 bit platform doing 64-bit math emulation.
Emulating shift is much cheaper than multiply/divide.

Benny

>-----Original Message-----
>From: Dean Johnson [mailto:dtj at uberh4x0r.org]
>Sent: Tuesday, October 21, 2003 3:34 PM
>To: Eduardo Cesar Cabrera Flores
>Cc: beowulf at beowulf.org
>Subject: Re: shift bit & performance?
>
>
>On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote:
>> Hi, 
>> 
>> sometime ago, somebody sent an info about performance 
>working with "<<" & 
>> ">>" doing shift bits instead of using "*" or "/"
>>  Could anybody help me about it?
>> 
>
>There is certainly performance to be had from using a logical 
>shift instead of a 
>multiply or divide, but its of declining value. I am fairly 
>sure that with modern
>compilers you do a integer divide by a constant power of 2, 
>that it will generate
>a logical shift. That aint rocket science.
>
>	-Dean
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From pesch at attglobal.net  Wed Oct 22 03:32:31 2003
From: pesch at attglobal.net (pesch at attglobal.net)
Date: Wed, 22 Oct 2003 00:32:31 -0700
Subject: flood of bounces from postmaster@systemsfirm.net
References: <Pine.LNX.4.44.0310211108480.1674-100000@training.scyld.com>
Message-ID: <3F96328F.DF327416@attglobal.net>

Perhaps it's not related to the topic but any mail I post to this list results automatically in a "incident
report" to my mail provider (attglobal.net) which then automatically replies with the mail below. Any inquiry to
attglobal.net with the reference number below results always in exactly 0 (zero) replies from attglobal.

Paul Schenker


"Received:
                 from e4.ny.us.ibm.com ([32.97.182.104]) by prserv.net (in5) with ESMTP
                 id <20031021031824105041p20me>; Tue, 21 Oct 2003 03:18:27 +0000
        Received:
                 from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com
                 [9.56.224.149]) by e4.ny.us.ibm.com (8.12.10/8.12.2) with ESMTP id
                 h9L3IN0N801416 for <pesch at attglobal.net>; Mon, 20 Oct 2003
                 23:18:23 -0400
        Received:
                 from BLDVMB.POK.IBM.COM (d01av01.pok.ibm.com [9.56.224.215])
                 by northrelay01.pok.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id
                 h9L3IMqW036946 for
                 <@vm-av.pok.relay.ibm.com:pesch at attglobal.net>; Mon, 20 Oct 2003
                 23:18:22 -0400
      Message-ID:
                 <200310210318.h9L3IMqW036946 at northrelay01.pok.ibm.com>
        Received:
                 by BLDVMB.POK.IBM.COM (IBM VM SMTP Level 320) via spool
                 with SMTP id 7133 ; Mon, 20 Oct 2003 21:09:30 MDT
            Date:
                 Mon, 20 OCT 2003 23:13:12 (-0400 GMT)
            From:
                 notify at attglobal.net
              To:
                 <PESCH at attglobal.net>
             CC:
          Subject:
                 Re: Solaris Fire Engine. (REF:#_CSSEMAIL_0870689)
  X-Mozilla-Status:
                 8011
 X-Mozilla-Status2:
                 00000000
         X-UIDL:
                 200310210327271050a5ammfe0013d2


An incident reported by you has been created.              Sev: 4
The incident # is listed below. No need to respond to this e-mail.
For Account:  CSSEMAIL  Incident Number:     0870689 Status:  INITIAL
Last Updated:        Mon, 20 OCT 2003 23:13:12 (-0400 GMT) PROBLEM CREATED
*************************************************************************

Summary: Re: Solaris Fire Engine.


*************************************************************************


If replying via email, do not alter the reference id in the subject
line and send only new information, do not send entire note again.
Do not send attachments, graphics or images."

Donald Becker wrote:

> On Mon, 20 Oct 2003, Robert G. Brown wrote:
>
> > On Mon, 20 Oct 2003, Trent Piepho wrote:
> >
> > > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
> > > messages I've sent to this list has started bouncing back to me from
> > > dan at systemsfirm.com.  I'm getting about ten copies of each one every other
> > > day.  Is anyone else having this problem?
> >
> > Oh yes, and it is a SERIOUS problem.  I was just mulling on the right
>
> There are many more problems that list readers do not see.  I delete the
> address from the list only when the problem is persistent.
> The major problem happens when messages take a few days to bounce, and
> the bounce does not follow standards.  In that case there are dozens of
> messages in the remote queue, and they all appears to be replies by a
> valid list subscriber.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From douglas at shore.net  Wed Oct 22 01:33:02 2003
From: douglas at shore.net (Douglas O'Flaherty)
Date: Wed, 22 Oct 2003 01:33:02 -0400
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <200310211601.h9LG1QA22261@NewBlue.scyld.com>
References: <200310211601.h9LG1QA22261@NewBlue.scyld.com>
Message-ID: <3F96168E.7050908@shore.net>


Here's the short summary of Opteron compilers. When someone offers an 
AMD64 compiler, it typically may be used to create 32-bit or 64-bit 
executables as long as you are specific about which libraries you use. 
Any IA-32 compiler can create code and run on Opterons. Of course, 
32-bit executables don't get the extra memory either, even when running 
on a 64-bit OS, but sometimes a 32-bit executable might be what you want.

With SC2003 coming up, I expect we'll see a flurry of activity relating 
to compilers and tools. This information will likely be stale soon. 
Also, most of these have a free trial period, so you can kick the tires.

Intel compilers work great in 32-bit and can be run on a 32 or 64-bit OS 
natively. Performance and compatability is not an issue. For obvious 
reasons many of the benchmarks have been run using IFC.

PGI's first AMD64 production release was around July 5.  There is a 
limitation on objects greater than 2GB in Linux as a result of the GNU 
assembly linker, but the application can address as much memory as you 
can give it. Only a small fraction of the world has objects that large. 
I've only run into it with synthetic benchmarks. The gal coding is done 
and PGI is working on the next release. As for performance, since this 
was the first AMD64 fortran compiler to market, it was used in AMD 
presentations. You can see performance comparisons in Rich Brunner's 
presentation from ClusterWorld. It's on-line at 
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/RichBrunnerClusterWorldpresFINAL.pdf  
(about slide 39 IIRC) There was a minor patch release near the begining 
of August. I suspect there is always someone finding flaws, but 
generally it's doing well.
NB: Saw Glenn's post re: PGI on SuSE v. RedHat. We've got it running on 
both. There were definately some fiddley bits to make it happy on 
RedHat, but I think they are documented on PGI's site.

Absoft had a long beta of their AMD64 compiler and went GA in September. 
I have no personal experience on it, nor do I know of any public benchmarks.

NAG worked closely with AMD on the AMD Core Math Libraries. They should 
know the processor well.

No experience with the Gnu Fortran or Lahey. I believe GFC to be AMD64 
functional. Lahey would only generate 32-bit code.

Your other question was about SSE2. Yes Opteron has complete SSE2 
support. I *know* PGI & IFC support it, I expect the others do as well.

doug
douglas_at_shore.net

Disclaimer: Among my several hats I am also in AMD Marketing. This is an 
unofficial response. No AMD bits were utlized in the creation of this 
email, etc..  If you want to talk about Opterons 'officially' you need 
to email me at doug.oflaherty(at)amd.com


On Mon, 20 Oct 2003 at 7:08pm, Robert G. Brown wrote

>> On Mon, 20 Oct 2003, Trent Piepho wrote:
>> 
>  
>
>>> > I'm getting a flood of bounced messages from postmaster at systemsfirm.net, every
>>> > messages I've sent to this list has started bouncing back to me from
>>> > dan at systemsfirm.com.  I'm getting about ten copies of each one every other
>>> > day.  Is anyone else having this problem?
>>    
>>
>> 
>> BTW, you (and of course the rest of the list) are just the man to ask;
>> what is the status of Opterons and fortran compilers.  I myself don't
>> use fortran any more, but a number of folks at Duke do, and they are
>> starting to ask what the choices are for Opterons.  A websearch reveals
>> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
>> Opteron fortran, but rumor also suggests that a number of these are
>> really "beta" quality with bugs that may or may not prove fatal to any
>> given project.  Then there is Gnu.
>> 
>> Any comments on any of these from you (or anybody, really)?  Is there a
>> functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
>> Do the compilers permit access to large (> 3GB) memory, do they optimize
>> the use of that memory, do they support the various SSE instructions?
>  
>

>  
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jakob at unthought.net  Wed Oct 22 04:45:08 2003
From: jakob at unthought.net (Jakob Oestergaard)
Date: Wed, 22 Oct 2003 10:45:08 +0200
Subject: shift bit & performance?
In-Reply-To: <1066764813.27603.4.camel@terra>
References: <Pine.LNX.4.44.0310211329460.24473-100000@mezcal.super.unam.mx> <1066764813.27603.4.camel@terra>
Message-ID: <20031022084508.GA7048@unthought.net>

On Tue, Oct 21, 2003 at 02:33:33PM -0500, Dean Johnson wrote:
> On Tue, 2003-10-21 at 14:32, Eduardo Cesar Cabrera Flores wrote:
> > Hi, 
> > 
> > sometime ago, somebody sent an info about performance working with "<<" & 
> > ">>" doing shift bits instead of using "*" or "/"
> >  Could anybody help me about it?
> > 
> 
> There is certainly performance to be had from using a logical shift instead of a 
> multiply or divide, but its of declining value. I am fairly sure that with modern
> compilers you do a integer divide by a constant power of 2, that it will generate
> a logical shift. That aint rocket science.
> 

It used to be true that shifts were 'better' on Intel x86 processors,
but it is not that simple anymore.

On the P4 for example, a sequence of 'add's is cheaper than a left 
shift, for three adds or less (because the latency on the shift opcode 
has increased compared to earlier generations).  

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From serguei.patchkovskii at sympatico.ca  Wed Oct 22 10:05:38 2003
From: serguei.patchkovskii at sympatico.ca (serguei.patchkovskii at sympatico.ca)
Date: Wed, 22 Oct 2003 10:05:38 -0400
Subject: (no subject)
Message-ID: <20031022140538.QSP8001.tomts7-srv.bellnexxia.net@[209.226.175.20]>

> Any IA-32 compiler can create code and run on Opterons. Of course,
> 32-bit executables don't get the extra memory either, even when running
> on a 64-bit OS

Not true. A 32-bit binary running on x86-64 Linux has access to full 32-bit
address
space. When I run a very simple 32-bit Fortran program, I see program itself
mapped at very low addresses; the shaped libraries get mapped at 1Gbyte
mark,
while the stack grows down from 4Gbyte mark. On an x86 Linux, the upper
1Gbyte
(but this depends on the kernel options) is taken by the kernel address
space.

What this means in practice, is that on an x86 Linux, I can allocate at most
2.5Gbytes of memory for my data without resorting to ugly tricks; in 32-bit
mode of x86-64 Linux, this goes up to about 3.5Gbytes - enough to make a
difference in some cases.

Serguei


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From brian.dobbins at yale.edu  Wed Oct 22 09:38:00 2003
From: brian.dobbins at yale.edu (Brian Dobbins)
Date: Wed, 22 Oct 2003 09:38:00 -0400 (EDT)
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <3F96168E.7050908@shore.net>
Message-ID: <Pine.LNX.4.44.0310220933320.12746-100000@email.combustion.eng.yale.edu>

> PGI's first AMD64 production release was around July 5.  There is a 
> limitation on objects greater than 2GB in Linux as a result of the GNU 
> assembly linker, but the application can address as much memory as you 

  One simple way to get around the 2GB limit (*) is to simply use FORTRAN 
90 dynamic allocation calls - we've done this, and have run codes up to 
(so far) about 7.7GB in size.  If you're used to static allocations in 
F77, it's only about two lines to alter things to use dynamic mem.

  (*) - I don't think this limitation is in the GNU assembly linker, since 
g77 has no problems here.  I think if you compile to assembly, you'll see 
that PGI has issues with 32-bit wraparound, whereas g77 does not.  Their 
tech people are aware of this, and it's something I expect will be fixed 
farily soon.

  Also, if you do happen to run jobs > 4GB, make sure you update the 'top' 
version you're using (procps.sourceforge.net).  Previous versions had 
wraparound at the 4GB mark, and it's cool seeing a listing say something 
to the effect of "7.7G" next to the size.  :)

  Cheers,
  - Brian

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From shewa at inel.gov  Wed Oct 22 12:22:20 2003
From: shewa at inel.gov (Andrew Shewmaker)
Date: Wed, 22 Oct 2003 10:22:20 -0600
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
In-Reply-To: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>
References: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net>
Message-ID: <3F96AEBC.2020107@inel.gov>

Robert G. Brown wrote:

> BTW, you (and of course the rest of the list) are just the man to ask;
> what is the status of Opterons and fortran compilers.  I myself don't
> use fortran any more, but a number of folks at Duke do, and they are
> starting to ask what the choices are for Opterons.  A websearch reveals
> that PGI, Absoft, NAG, Lahey, and perhaps others claim to have an
> Opteron fortran, but rumor also suggests that a number of these are
> really "beta" quality with bugs that may or may not prove fatal to any
> given project.  Then there is Gnu.

I have used the PGI compiler 5.0-2 on SuSE SLES 8 with Radion
Technologies' (www.radiative.com) Attila Fortran 90 code.  One of our
scientists has run models in which a single Attila process allocates
up to about 7GB of RAM.  The performance of the Opteron was quite
impressive too.

I'm still testing the g77 3.3 prerelease that SuSE includes.  By default
it creates 64 bit binaries.

The gfortran (G95) snapshot doesn't work, but I'm planning on building
it myself later on and trying to compile the above Attila code with it.
Radiative looked at this earlier (months ago) and it wasn't ready at
that time.

Andrew

> 
> Any comments on any of these from you (or anybody, really)?  Is there a
> functional 64-bit Gnu fortran for the Opteron?  Does Intel Fortran work?
> Do the compilers permit access to large (> 3GB) memory, do they optimize
> the use of that memory, do they support the various SSE instructions?
> 
> I'm indirectly interested in this as it looks like I'm getting Opterons
> for my next round of cluster purchases personally, although I'll be
> using C on them (hopefully 64 bit Gnu C).
> 
>    rgb
> 
> 
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
> 
> 


-- 
Andrew Shewmaker, Associate Engineer
Phone: 1-208-526-1415
Idaho National Eng. and Environmental Lab.
P.0. Box 1625, M.S. 3605
Idaho Falls, Idaho 83415-3605

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From edwardsa at plk.af.mil  Wed Oct 22 13:21:39 2003
From: edwardsa at plk.af.mil (Arthur H. Edwards)
Date: Wed, 22 Oct 2003 11:21:39 -0600
Subject: Cooling
Message-ID: <20031022172139.GA12958@plk.af.mil>

I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The
cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will
be on metal racks. Does anyone have a simple way to calculate cooling
requirements? We will have fair flexibility with air flow.

Art Edwards

-- 
Art Edwards
Senior Research Physicist
Air Force Research Laboratory
Electronics Foundations Branch
KAFB, New Mexico

(505) 853-6042 (v)
(505) 846-2290 (f)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From JAI_RANGI at SDSTATE.EDU  Wed Oct 22 15:43:36 2003
From: JAI_RANGI at SDSTATE.EDU (RANGI, JAI)
Date: Wed, 22 Oct 2003 14:43:36 -0500
Subject: How to calculate operations on the cluster
Message-ID: <B787DC897037D51182A600508BDFFAE80BE0D7C5@sdsuex2.sdstate.edu>

Hi,
Can some tell me how to find out that how many operations can be performed
on your cluster. If some say 3 million operation can be performed on this
cluster, how to verify that and how to find out the actual performance. 
-Thanks

-Jai Rangi
 
 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From james.p.lux at jpl.nasa.gov  Wed Oct 22 16:16:09 2003
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 22 Oct 2003 13:16:09 -0700
Subject: Cooling
References: <20031022172139.GA12958@plk.af.mil>
Message-ID: <001d01c398d9$5b09cd00$32a8a8c0@laptop152422>

To a first order, figure you've got to reject 150-200W per node.. that's
roughly 10kW of heat you need to get rid of.  That's 10kJ/second.

That will tell you right away how many "tons" of A/C you'll need (1 ton =
12000 BTU/hr or, more usefully, here, 3.517 kW)... Looks like you'll need
3-4 tons (3 tons and 5tons are  standard sizes...)

Next, figure out how much temperature rise in the air you can tolerate (say,
10 degrees C)

Use the specific heat of air to calculate how many kilos (or, more
practically, cubic feet) of air you need to move
(use 1000 J/kg deg as an approximation... you need to move 1 kg of air every
second or about a cubic meter...
roughly approximating, a cubic meter is about 35 cubic feet, so you need
around 2100 cubic feet per minute)

As a practical matter, you'll want a lot more flow (using idealized numbers
when it's cheap to put margin in is foolish).
Also, a 10 degree rise is pretty substantial... If you kept the room at 15C,
the air coming out of the racks would be 25C, and I'll bet the processors
would be a good 20C above that.  Calculating for a 5 degree rise might be a
better plan.  Just double the flow.

Unless you're investing in specialized ducting that pushes the AC only
through the racks and not the room, a lot of the flow will be going around
the racks, whether you like it or not.

In general, one likes to keep the duct flow speed below 1000 linear feet per
minute (for noise reasons!), so your ducting will be around 3-4 square feet.

This is not a window airconditioner!... This is the curse of rackmounted
equipment in general.  Getting the heat out of the room is easy. The tricky
part is getting the heat out of the rack.  Think about it, you've got to
pump all those thousands of CFM *through the rack*, which is aerodynamically
not well suited to this, especially in 1U boxes.  How much cross sectional
area is there in that rack chassis aperture for the air?  How fast does that
imply that the air is moving? What sort of pressure drop is there going
through the rack?


Take a look at RGB's Brahma web site.  There's some photos there of their
chiller unit, so you can get an idea of what's involved.

Your HVAC engineer will do a much fancier and useful version of this,
allowing for things such as pressure drop, the amount of recirculation, the
amount of heat leaking in from other sources (lighting, bodies in the room,
etc.), heating from the fans, and so forth; But, at least you've got a ball
park figure for what you're going to need.

Jim Lux

----- Original Message -----
From: "Arthur H. Edwards" <edwardsa at plk.af.mil>
To: <beowulf at beowulf.org>
Sent: Wednesday, October 22, 2003 10:21 AM
Subject: Cooling


> I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The
> cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will
> be on metal racks. Does anyone have a simple way to calculate cooling
> requirements? We will have fair flexibility with air flow.
>
> Art Edwards
>
> --
> Art Edwards
> Senior Research Physicist
> Air Force Research Laboratory
> Electronics Foundations Branch
> KAFB, New Mexico
>
> (505) 853-6042 (v)
> (505) 846-2290 (f)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From toon at moene.indiv.nluug.nl  Wed Oct 22 17:43:16 2003
From: toon at moene.indiv.nluug.nl (Toon Moene)
Date: Wed, 22 Oct 2003 23:43:16 +0200
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@systemsfirm.net)
References: <Pine.LNX.4.44.0310201900490.2827-100000@lilith.rgb.private.net> <3F96AEBC.2020107@inel.gov>
Message-ID: <3F96F9F4.8050505@moene.indiv.nluug.nl>

Andrew Shewmaker wrote:

> I'm still testing the g77 3.3 prerelease that SuSE includes.  By default
> it creates 64 bit binaries.

Is there any interest in having g77 deal correctly with > 2Gb *direct 
access* records ?  I have a patch in progress (due to 
http://gcc.gnu.org/PR10885) that I can't test myself ...

> The gfortran (G95) snapshot doesn't work, but I'm planning on building
> it myself later on and trying to compile the above Attila code with it.
> Radiative looked at this earlier (months ago) and it wasn't ready at
> that time.

Please do not forget to enter bug reports in our Bugzilla database (see 
http://gcc.gnu.org/bugs.html).

Thanks !

-- 
Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
GNU Fortran 95: http://gcc.gnu.org/fortran/ (under construction)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Daniel.Kidger at quadrics.com  Wed Oct 22 09:53:51 2003
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Wed, 22 Oct 2003 14:53:51 +0100
Subject: Opteron Fortran (was Re: flood of bounces frompostmaster@syst
	emsfirm.net)
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE210@stegosaurus.bristol.quadrics.com>

>From: Brian Dobbins [mailto:brian.dobbins at yale.edu]
(cut)
> Also, if you do happen to run jobs > 4GB, make sure you 
> update the 'top' 
> version you're using (procps.sourceforge.net).  Previous versions had 
> wraparound at the 4GB mark, and it's cool seeing a listing 
> say something  to the effect of "7.7G" next to the size.  :)

On the subject of top another caveat is that top is hard-coded at compile
time 
about what it thinks the pagesize is.

If you compile kernels with bigger pagesizes (generally a 'good thing' for
large memory nodes) then 'top' gets the memory used by your programs wrong
by a factor of x2,x4 etc. !


Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From leopold.palomo at upc.es  Thu Oct 23 05:33:35 2003
From: leopold.palomo at upc.es (Leopold Palomo Avellaneda)
Date: Thu, 23 Oct 2003 11:33:35 +0200
Subject: OpenMosix,  opinions?
Message-ID: <200310231133.35912.leopold.palomo@upc.es>

Hi,

I'm a newbie in all of this questions of paralelism and clusters. I'm reading 
all of I can. I have found some point that I need some opinions.

Hipotesis,

having a typical beowulf, with some nodes, a switch, etc. All of the nodes 
running GNU/Linux, and the applications that are running are using MPI or 
PVM. All works, etc ....

Imaging that we have an aplication. A pararell aplication that doesn't use a 
lot I/O operation, but intensive cpu, and some messages. Something like a 
pure parallel app. We implement it using PVM or MPI ... MPI. And we make a 
test, and we have some result.

Now, we have our beowulf, with a linux kernel with OpenMosix with a patch that 
can migrate threads (light weith process, Mighsm, http://mcaserta.com/maask/) 
or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, that 
com from here: http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/
benchmark.htm.

We have our program, and we change it that use threads for the paralel 
behaviour and not MPI. And we run the same test. So, what will be better? Any 
one have tested it?

Thank's in advance.

Best regards,

Leo


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Oct 23 07:52:14 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 23 Oct 2003 07:52:14 -0400 (EDT)
Subject: Cooling
In-Reply-To: <20031022172139.GA12958@plk.af.mil>
Message-ID: <Pine.LNX.4.44.0310230719350.1196-100000@lilith.rgb.private.net>

On Wed, 22 Oct 2003, Arthur H. Edwards wrote:

> I'm moving a cluster into a 9.25x11.75 foot room (7.75 ' ceiling). The
> cluster now has 48 nodes (single processor AMD XP 2100+ boxes). The will
> be on metal racks. Does anyone have a simple way to calculate cooling
> requirements? We will have fair flexibility with air flow.

My kill-a-watt shows 1900+ AMD Athlon duals drawing roughly ~230W/node
(or 115 per processor) (under steady, full load).  I don't have a single
CPU system in this class to test, but because of hardware replication I
would guess that one draws MORE than half of this, probably ballpark of
150-160W where YMMV depending on memory and disk and etc configuration.
Your clock is also a bit higher than what I measure and there is a
clockspeed dependence on the CPU side, so you should likely guesstimate
highball, say 175W OR buy a <$50 kill-a-watt (numerous sources online)
and measure your prototyping node yourself and get a precise number.

Then it is a matter of arithmetic.  To be really safe and make the
arithmetic easy enough to do on my fingers, I'll assume 200 W/node.
Times 48 is 9600 watts.  Plus 400 watts for electric lights, a head node
with disk, a monitor, a switch (this is likely lowball, but we
highballed the nodes).  Call it 10 KW in a roughly 1000 cubic foot
space.

One ton of AC removes approximately 3500 watts continuously.  You
therefore need at LEAST 3 tons of AC.  However, you'd really like to be
able to keep the room COLD, not just on a part with its external
environment, and so need to be able to remove heat infiltrating through
the walls, so providing overcapacity is desireable -- 4-5 tons wouldn't
be out of the question.  This also gives you at least limited capacity
for future growth and upgrade without another remodelling job (maybe
you'll replace those singles with duals that draw 250-300W apiece in the
same rack density one day). 

You also have to engineer airflow so that cold air enters on the air
intake side of the nodes (the front) and is picked up by a warm air
return after being exhausted, heated after cooling the nodes, from their
rear.  I don't mean that you need air delivery and returns per rack
necessarily, but the steady state airflow needs to retard mixing and
above all prevent air exhausted by one rack being picked up as intake to
the next.

There are lots of ways to achieve this.  You can set up the racks so
that the node fronts face in one aisle and node exhausts face in the
rear and arrange for cold air delivery into the lower part of the node
front aisle (and warm air return on the ceiling).  You can put all the
racks in a single row and deliver cold air as low as possible on the
front side and remove it on the ceiling of the rear side.  If you have a
raised floor and four post racks with sidepanels you can deliver it from
underneath each rack and remove it from the top.

This is all FYI, but it is a good idea to hire an actual architect or
engineer with experience in server room design to design your
power/cooling system, as there are lots of things (thermal power kill
switch, for example) that you might miss but they should not.  However,
I think that the list wisdom is that you should deal with them armored
with a pretty good idea of what they should be doing, as the unfortunate
experience of many who have done so is that even the pros make costly
mistakes when it comes to server rooms (maybe they just don't do enough
of them, or aren't used to working with 1000 cubic foot spaces).

If you google over the list archives, there are longranging, extended
discussions on server room design that embrace power delivery, cooling,
node issues, costs, and more.

    rgb

> 
> Art Edwards
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From fmahr at gmx.de  Thu Oct 23 09:06:09 2003
From: fmahr at gmx.de (Ferdinand Mahr)
Date: Thu, 23 Oct 2003 15:06:09 +0200
Subject: OpenMosix,  opinions?
References: <200310231133.35912.leopold.palomo@upc.es>
Message-ID: <3F97D241.2180EEF8@gmx.de>

Hi Leo,

> Imaging that we have an aplication. A pararell aplication that doesn't use a
> lot I/O operation, but intensive cpu, and some messages. Something like a
> pure parallel app. We implement it using PVM or MPI ... MPI. And we make a
> test, and we have some result.
> 
> Now, we have our beowulf, with a linux kernel with OpenMosix with a patch that
> can migrate threads (light weith process, Mighsm, http://mcaserta.com/maask/)
> or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/, that
> com from here: http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/
> benchmark.htm.
> 
> We have our program, and we change it that use threads for the paralel
> behaviour and not MPI. And we run the same test. So, what will be better? Any
> one have tested it?

I haven't tested your special situation, but here are my thoughts about
it:

- Why changing an application that you already have? It costs you an
unnecessary amount of time and money.

- Migshm seems to enable OpenMosix to migrate System V shared memory
processes, not threads. But, "Threads created using the clone() system
call can also be migrated using Migshm", that's what you want, right? I
don't know how well that works, but it limits you to clone(), and I
don't know if thats sufficient for reasonable thread programming. Still
(as you mentioned before), you really can only write code that uses
minimum I/O and interprocess/thread communication because of network
limitations.

- Programs using PThreads don't run in parallel with OpenMosix/Migshm,
they can only be migrated in whole.

- If your MPI/PVM programs are well designed, they are usually really
fast and can scale very well when CPU-bound.

- Currently (Open)Mosix is better for load-balancing than HPC,
especially in clusters with different hardware configurations. In HPC
clusters, you usually have identical compute nodes.

Hope that helps,
Ferdinand
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From RobertsGP at ncsc.navy.mil  Thu Oct 23 15:07:19 2003
From: RobertsGP at ncsc.navy.mil (Roberts Gregory P DLPC)
Date: Thu, 23 Oct 2003 14:07:19 -0500
Subject: UnitedLinux?
Message-ID: <F42966A0479AD4118AE6009027D08CEB0789A093@dlpcexch02.ncsc.navy.mil>

Has anyone used UnitedLinux 1.0? I am using it on a 2 node dual CPU Opteron
system. 


Greg 


-----Original Message-----
From: Bill Broadley [mailto:bill at math.ucdavis.edu]
Sent: Thursday, September 25, 2003 7:46 PM
To: Brian Dobbins
Cc: Bill Broadley; beowulf at beowulf.org
Subject: Re: A question of OS!!


>   Yikes.. what kernels are used on these systems by default, and how large
> is the code?  I've been running SuSE 8.2 Pro on my nodes, and have gotten 

Factory default in both cases AFAIK.  I don't have access to the SLES
system at the moment, but the redhat box is:

Linux foo.math.ucdavis.edu 2.4.21-1.1931.2.349.2.2.entsmp #1 SMP Fri Jul 18
00:06:19 EDT 2003 x86_64 x86_64 x86_64 GNU/Linux

What relationship that has to the original 2.4.21 I know not.

> varying performance due to motherboard, BIOS level and kernel.  (SuSE 8.2 
> Pro comes a modified 2.4.19, but I've also run 2.6.0-test5)

>   Also, are the BIOS settings the same?  And how are the RAM slots 

I don't have access to the SLES bios.

> populated?  That made a difference, too!

I'm well aware of the RAM slot issues, and I've experimentally verified
that the full bandwidth is available.  Basically each cpu will see 2GB/sec
or so to main memory, and both see a total of 3GB/sec if both use
memory simultaneously.

>   (Oh, and I imagine they're both writing to a local disk, or minimal 
> amounts over NFS?  That could play a big part, too.. )

Yeah, both local disk, and not much.  I didn't notice any difference when
I commented out all output.

>   I should have some numbers at some point for how much things vary, but 
> at the moment we've been pretty busy on our systems.  Any more info on 
> this would be great, though, since I've been looking at the faster chips, 
> too!

ACK, I never considered that the opterons might be slower in some ways
at faster clock speeds.  My main suspicious is that MPICH was messaging
passing for local nodes in some strange way and triggering some corner
case under SLES.  I.e. writing an int at a time between CPUs who are
fighting over the same page.

None of my other MPI benchmarks for latency of bandwidth (at various
message sizes) have found any sign of problems.  Numerous recompiles
of MPICH haven't had any effect either.


-- 
Bill Broadley
Mathematics
UC Davis
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lepalom at wol.es  Thu Oct 23 10:17:17 2003
From: lepalom at wol.es (Leopold Palomo Avellaneda)
Date: Thu, 23 Oct 2003 16:17:17 +0200
Subject: OpenMosix,  opinions?
In-Reply-To: <3F97D241.2180EEF8@gmx.de>
References: <200310231133.35912.leopold.palomo@upc.es> <3F97D241.2180EEF8@gmx.de>
Message-ID: <200310231617.17014.lepalom@wol.es>

A Dijous 23 Octubre 2003 15:06, Ferdinand Mahr va escriure:
> Hi Leo,
>
> > Imaging that we have an aplication. A pararell aplication that doesn't
>
> use a
>
> > lot I/O operation, but intensive cpu, and some messages. Something like a
> > pure parallel app. We implement it using PVM or MPI ... MPI. And we make
>
> a
>
> > test, and we have some result.
> >
> > Now, we have our beowulf, with a linux kernel with OpenMosix with a patch
>
> that
>
> > can migrate threads (light weith process, Mighsm,
>
> http://mcaserta.com/maask/)
>
> > or threads compiled with http://moss.csc.ncsu.edu/~mueller/pthreads/,
>
> that
>
> > com from here:
>
> http://filibusta.crema.unimi.it/openmosix/fsu_threads_on_om/
>
> > benchmark.htm.
> >
> > We have our program, and we change it that use threads for the paralel
> > behaviour and not MPI. And we run the same test. So, what will be better?
>
> Any
>
> > one have tested it?

Hi, 


> I haven't tested your special situation, but here are my thoughts about
> it:
>
> - Why changing an application that you already have? It costs you an
> unnecessary amount of time and money.

Ok, I just explaining an example. If I have to begin from 0, which approach 
will be better?

> - Migshm seems to enable OpenMosix to migrate System V shared memory
> processes, not threads. But, "Threads created using the clone() system
> call can also be migrated using Migshm", that's what you want, right? I
> don't know how well that works, but it limits you to clone(), and I
> don't know if thats sufficient for reasonable thread programming. Still
> (as you mentioned before), you really can only write code that uses
> minimum I/O and interprocess/thread communication because of network
> limitations.

Yes, you are right. However, I hope than soon it will run pure threads. I have 
heart that 2.6 have a lot of improvements in the thread part, but I'm not 
sure.

>
> - Programs using PThreads don't run in parallel with OpenMosix/Migshm,
> they can only be migrated in whole.

Well, Pthreads can migrate with openMosix (not Linux Threads!), without the 
patch. I have understood that. 

> - If your MPI/PVM programs are well designed, they are usually really
> fast and can scale very well when CPU-bound.

The question that I comment is to make the programation of a parallel program 
as a threads programation, and the rest is a job of the kernel in a cluster. 
If this is avalaible, the management of the parallelism will be a job of the 
SO, in a distributed machine.

> - Currently (Open)Mosix is better for load-balancing than HPC,
> especially in clusters with different hardware configurations. In HPC
> clusters, you usually have identical compute nodes.
>
> Hope that helps,

Yes, of course.

Thank's,

regards.

Leo


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gilberto at ula.ve  Thu Oct 23 17:43:14 2003
From: gilberto at ula.ve (Gilberto Diaz)
Date: 23 Oct 2003 17:43:14 -0400
Subject: Oscar 2.3
Message-ID: <1066945394.1200.132.camel@odie>

Hello everybody

   I'm trying to install a small cluster using RH8.0 and oscar 2.3. The
machines has a sis900 NIC (PXE capable) in the motherboard. When I try
to boot the client nodes they not boot because the sis900.o module is
not present. 

   Does anybody have any idea how to load the module in the init image
in order to boot the nodes without change the kernel using the kernel
picker?

Thanks in advance
Regards
Gilberto


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From erwan at mandrakesoft.com  Fri Oct 24 04:48:03 2003
From: erwan at mandrakesoft.com (Erwan Velu)
Date: Fri, 24 Oct 2003 10:48:03 +0200
Subject: CLIC 2, the newest version is out !
Message-ID: <1066985283.32232.57.camel@revolution.mandrakesoft.com>

CLIC is a GPL Linux based distribution made for meeting the HPC needs.

CLIC 2 now allow people to install a full Linux cluster from scratch in
a few hours. This product contains the Linux core system + the
clustering autoconfiguration tools + the deployement tools + MPI stacks
(mpich, lam/mpi). 

CLIC 2 is based on the results of MandrakeClustering, and includes
several major features:
- New backend engine (fully written in perl)
- A new configure step during the server's graphical installation 
- An automated Dual Ethernet configuration (One NIC for computing, One
Nic for administrating)
- A new kernel (2.4.22)
- A new version of urpmi parallel (a parallel rpm installer)
- A graphical tool for managing users (add/remove) : userdrake
- A new node management
  |- You just need to power on a fresh node to install and integrate it
in your cluster !
  |- Fully automated add/remove procedure

And of course the lastest version of the clustering software:
- Maui 3.2.5-p5
- ScalablePBS 1.0-p4
- Ganglia 2.5.4
- Mpich 1.2.5-2
- LAM/MPI 6.5.9 (will be updated when 7.1 will be available)
- PXELinux 2.06

CLIC 2 will no more being compatible with CLIC 1 due to a fully
rewritten backend. This will no more happen in the future but it was
needed as CLIC 1 was a test release.

We hope this product will meet the CLIC community needs.
CLIC 2 is now available on your favorite mirrors in the mandrake-iso
directory.

For example you can found it at
Europe:
ftp://ftp.lip6.fr:/pub/linux/distributions/mandrake-iso/i586/CLIC-2.0-i586.iso
ftp://ftp.mirror.ac.uk:/sites/sunsite.uio.no/pub/unix/Linux/Mandrake/Mandrake-iso/i586/CLIC-2.0-i586.iso
ftp://ftp.tu-chemnitz.de:/pub/linux/mandrake-iso/i586/CLIC-2.0-i586.iso

USA:
ftp://ftp.rpmfind.net:/linux/Mandrake-iso/i586/CLIC-2.0-i586.iso
ftp://mirrors.usc.edu:/pub/linux/distributions/mandrake-iso/i586/CLIC-2.0-i586.iso

The documentation is included inside the cdrom (/doc/) under pdf and
html format. This is the MandrakeClustering documentation based on the
same core, everything is the same except the configuration GUI which is
only available in MandrakeClustering.

All the configuration scripts that DrakCluster (our GUI) uses are
beginning whith the "setup_" prefix.

So for 
auto configurating your server, you use the setup_auto_server.pl script.
adding new nodes to your cluster, you use setup_auto_add_nodes.pl
removing a node, you can use the setup_auto_remove_nodes.pl 

All this scripts have a really easy to learn syntax :)

I hope this release will please every CLIC user, this new generation of
CLIC is really easier to use than the previous releases.

PS: I've been heard that the 2.4.22 kernel brand may seriously damage LG
cdrom drives.  So be carefull with CLIC2 if you own LG cdrom drives,
remove your cdrom drive before installing it.
-
CLIC Website: http://clic.mandrakesoft.com/index-en.html
-- 
Erwan Velu
Linux Cluster Distribution Project Manager
MandrakeSoft
43 rue d'aboukir 75002 Paris
Phone Number : +33 (0) 1 40 41 17 94
Fax Number   : +33 (0) 1 40 41 92 00
Web site     : http://www.mandrakesoft.com
OpenPGP key  : http://www.mandrakesecure.net/cks/ 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From scheinin at crs4.it  Fri Oct 24 11:06:55 2003
From: scheinin at crs4.it (Alan Scheinine)
Date: Fri, 24 Oct 2003 17:06:55 +0200
Subject: A Petaflop machine in 20 racks?
Message-ID: <200310241506.h9OF6tP02285@dali.crs4.it>

I asked ClearSpeed what is the width of the floating point units
and today I received a reply.
The floating point units in the CS301 are 32 bits wide.

A previous email on the subject noted a earlier design
Each PE has an 8 bit ALU for the 256 PE "Fuzion block".
Evidently, this design is different.

My opinion: 32 bits is more than adequate for many signal
processing applications, not so long ago 24 bits was considered
enough for signal processing.  But for simulations of physical
events the "eigenvalues" have a range that makes 32 bit floating
point too small.

regards,
Alan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Daniel.Kidger at quadrics.com  Fri Oct 24 12:09:31 2003
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Fri, 24 Oct 2003 17:09:31 +0100
Subject: A Petaflop machine in 20 racks?
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA78DE233@stegosaurus.bristol.quadrics.com>

> I asked ClearSpeed what is the width of the floating point units
> and today I received a reply.
> The floating point units in the CS301 are 32 bits wide.

Dont forget that www.clearspeed.com used to be www.pixelfusion.com
Their target market at the time was massively parallel SIMD PCI based
graphics engines.

So that is most likely why they use only 32bit floats.


Yours,
Daniel.
(and yes Clearspeed are based in Bristol,UK but are nothing to do with us.)

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jeffrey.b.layton at lmco.com  Tue Oct 28 11:09:54 2003
From: jeffrey.b.layton at lmco.com (Jeff Layton)
Date: Tue, 28 Oct 2003 11:09:54 -0500
Subject: SFF boxes for a cluster?
Message-ID: <3F9E94D2.3020307@lmco.com>

Good morning,

   I've seen a few cluster made from the Small Form Factor
(SFF) boxes including "Space Simulator". Has anyone else
made a decent size cluster (n > 16) from these boxes? If so,
how has the reliability been?

Thanks!

Jeff

-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Peter.Lindgren at experian.com  Tue Oct 28 13:19:11 2003
From: Peter.Lindgren at experian.com (Lindgren, Peter)
Date: Tue, 28 Oct 2003 10:19:11 -0800
Subject: SFF boxes for a cluster?
Message-ID: <C3A1925EF61C0941922F990DCEBB7B1101396B72@schexch2.ems.us.experian.local>

We have had 48 Dell GX260 SFF boxes in production since March without a
single hardware failure.

Peter
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eric at fnordsystems.com  Tue Oct 28 15:12:38 2003
From: eric at fnordsystems.com (Eric Kuhnke)
Date: Tue, 28 Oct 2003 12:12:38 -0800
Subject: Beowulf digest, Vol 1 #1515 - 1 msg
In-Reply-To: <09AE3D324A22D511A1A50002A5289F2101030E0A@lao-exchpo1-nt.nv
 .doe.gov>
Message-ID: <5.2.0.9.2.20031028120820.04272e60@216.82.101.6>

One serious problem with the Shuttle and most competing "small form factor" 
PCs is the air intake, which is located on the sides.  You can't put them 
flush with each other side-by-side on shelves...  Most minitower or 
midtower ATX cases (and proper 1U or 2U cases) have air intake entirely on 
the front panel.

air intake on the left side:
http://www.sfftech.com/showdocs.cfm?aid=447

At 11:45 AM 10/28/2003 -0800, you wrote:
>I've got one of those SS51G's at home and I love it.  My only complaint is
>that it does get a bit warm with a video card, but for a cluster you wont
>need one.
>
>-----Original Message-----
>From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com]
>Sent: Tuesday, October 28, 2003 10:07 AM
>To: beowulf at beowulf.org
>Subject: Beowulf digest, Vol 1 #1515 - 1 msg
>
>
>Send Beowulf mailing list submissions to
>         beowulf at beowulf.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.beowulf.org/mailman/listinfo/beowulf
>or, via email, send a message with subject or body 'help' to
>         beowulf-request at beowulf.org
>
>You can reach the person managing the list at
>         beowulf-admin at beowulf.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Beowulf digest..."
>
>
>Today's Topics:
>
>    1. SFF boxes for a cluster? (Jeff Layton)
>
>--__--__--
>
>Message: 1
>Date: Tue, 28 Oct 2003 11:09:54 -0500
>From: Jeff Layton <jeffrey.b.layton at lmco.com>
>Subject: SFF boxes for a cluster?
>To: beowulf at beowulf.org
>Reply-to: jeffrey.b.layton at lmco.com
>Organization: Lockheed-Martin Aeronautics Company
>
>Good morning,
>
>    I've seen a few cluster made from the Small Form Factor
>(SFF) boxes including "Space Simulator". Has anyone else
>made a decent size cluster (n > 16) from these boxes? If so,
>how has the reliability been?
>
>Thanks!
>
>Jeff
>
>--
>Dr. Jeff Layton
>Aerodynamics and CFD
>Lockheed-Martin Aeronautical Company - Marietta
>
>
>
>
>--__--__--
>
>_______________________________________________
>Beowulf mailing list
>Beowulf at beowulf.org
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>
>End of Beowulf Digest
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ZukaitAJ at nv.doe.gov  Tue Oct 28 14:45:02 2003
From: ZukaitAJ at nv.doe.gov (Zukaitis, Anthony)
Date: Tue, 28 Oct 2003 11:45:02 -0800
Subject: Beowulf digest, Vol 1 #1515 - 1 msg
Message-ID: <09AE3D324A22D511A1A50002A5289F2101030E0A@lao-exchpo1-nt.nv.doe.gov>

I've got one of those SS51G's at home and I love it.  My only complaint is
that it does get a bit warm with a video card, but for a cluster you wont
need one.

-----Original Message-----
From: beowulf-request at scyld.com [mailto:beowulf-request at scyld.com]
Sent: Tuesday, October 28, 2003 10:07 AM
To: beowulf at beowulf.org
Subject: Beowulf digest, Vol 1 #1515 - 1 msg


Send Beowulf mailing list submissions to
	beowulf at beowulf.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://www.beowulf.org/mailman/listinfo/beowulf
or, via email, send a message with subject or body 'help' to
	beowulf-request at beowulf.org

You can reach the person managing the list at
	beowulf-admin at beowulf.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beowulf digest..."


Today's Topics:

   1. SFF boxes for a cluster? (Jeff Layton)

--__--__--

Message: 1
Date: Tue, 28 Oct 2003 11:09:54 -0500
From: Jeff Layton <jeffrey.b.layton at lmco.com>
Subject: SFF boxes for a cluster?
To: beowulf at beowulf.org
Reply-to: jeffrey.b.layton at lmco.com
Organization: Lockheed-Martin Aeronautics Company

Good morning,

   I've seen a few cluster made from the Small Form Factor
(SFF) boxes including "Space Simulator". Has anyone else
made a decent size cluster (n > 16) from these boxes? If so,
how has the reliability been?

Thanks!

Jeff

-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta


--__--__--

_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf


End of Beowulf Digest

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From periea at bellsouth.net  Tue Oct 28 16:08:49 2003
From: periea at bellsouth.net (periea at bellsouth.net)
Date: Tue, 28 Oct 2003 16:08:49 -0500
Subject: SAS running on compute nodes
Message-ID: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net>

Hello All,

Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA...

Phil...

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rossini at blindglobe.net  Tue Oct 28 17:30:35 2003
From: rossini at blindglobe.net (A.J. Rossini)
Date: Tue, 28 Oct 2003 14:30:35 -0800
Subject: SAS running on compute nodes
In-Reply-To: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net> (periea@bellsouth.net's
 message of "Tue, 28 Oct 2003 16:08:49 -0500")
References: <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net>
Message-ID: <858yn5m1v8.fsf@blindglobe.net>

<periea at bellsouth.net> writes:


> Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA...

Sure, as a bunch of singleton processes.    I don't think you can do
much more than that (but would be interested if I'm wrong).

best,
-tony

-- 
rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be
confidential and privileged. If you received this message in error,
please destroy it and notify the sender. Thank you.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gabriele.butti at unimib.it  Tue Oct 28 04:58:04 2003
From: gabriele.butti at unimib.it (Butti Gabriele - Dottorati di Ricerca)
Date: 28 Oct 2003 10:58:04 +0100
Subject: opteron VS Itanium 2
Message-ID: <1067335084.12500.63.camel@tantalio.mater.unimib.it>

Dear all, 
      we are planning to build up a new cluster (16 nodes) before this
year's end; we are evaluating different proposals from machine sellers,
but the main doubt we have at this moment is whether choosing an Itanium
2 architecture or an AMD Opteron one. 

I know that ther's had already been on this list a debate on such a
topic, but maybe some of you has some new experience to tell about. 

There is a wild bunch of benchmarks on these machines, but we fear that
these are somewhat misleading and are not designed to test CPU's for
intense scientific computing. The code we want to run on these machines
is basically a home-made code, not fully optimized, which allocates
around 500 Mb of RAM per node. Communication between nodes is a quite
rare event and does not affect much computation time. In the past we had
a very nice experience using Alpha CPU's which performed very well.

To sum up, the question is: is the Itanium2 worth the price difference
or is the Opteron the best choice?

Thank you all

Gabriele Butti
-- 
                                 \\|//
                                -(o o)-
              /------------oOOOo--(_)--oOOOo-------------\
              |                                          |
              |             Gabriele Butti               |
              |        -----------------------           |
              |      Department of Material Science      |     
              |      University of Milano-Bicocca        |     
              |      Via Cozzi 53, 20125 Milano, ITALY   |     
              |      Tel (+39)02 64485214                |           
              |             .oooO   Oooo.                |
              \--------------(   )---(   )---------------/
                              \ (     ) /
                               \_)   (_/


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jim at ks.uiuc.edu  Tue Oct 28 21:23:48 2003
From: jim at ks.uiuc.edu (Jim Phillips)
Date: Tue, 28 Oct 2003 20:23:48 -0600 (CST)
Subject: opteron VS Itanium 2
In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
Message-ID: <Pine.GSO.4.44.0310282001040.12264-100000@verdun.ks.uiuc.edu>

Hi,

The Athlon design has some Alpha blood in it, and in my experience they
both excel on branchy, unoptimized, float-intensive code.  The Opteron is
similar to the Athlon, but I wouldn't bother with 64-bit unless you're
actually going to use more than 2 GB of memory per node.  Athlon vs
Pentium 4 or Xeon is a closer match, and you really need to run some
benchmarks to decide between them.  If you have access to an Opteron you
should benchmark it as well, since I've heard they fly on some problems.

Itanium 2 (Madison) is the current NAMD speed champ (although it's tied
with a hyperthreaded P4 running multithreaded code), but it took some
serious work to get the inner loops to the point that the Intel compiler
could software pipeline them to get decent performance.  I've heard that
some Fortran codes had an easier time of it.  Big branches really hurt.

-Jim


On 28 Oct 2003, Butti Gabriele - Dottorati di Ricerca wrote:

> Dear all,
>       we are planning to build up a new cluster (16 nodes) before this
> year's end; we are evaluating different proposals from machine sellers,
> but the main doubt we have at this moment is whether choosing an Itanium
> 2 architecture or an AMD Opteron one.
>
> I know that ther's had already been on this list a debate on such a
> topic, but maybe some of you has some new experience to tell about.
>
> There is a wild bunch of benchmarks on these machines, but we fear that
> these are somewhat misleading and are not designed to test CPU's for
> intense scientific computing. The code we want to run on these machines
> is basically a home-made code, not fully optimized, which allocates
> around 500 Mb of RAM per node. Communication between nodes is a quite
> rare event and does not affect much computation time. In the past we had
> a very nice experience using Alpha CPU's which performed very well.
>
> To sum up, the question is: is the Itanium2 worth the price difference
> or is the Opteron the best choice?
>
> Thank you all
>
> Gabriele Butti
> --
>                                  \\|//
>                                 -(o o)-
>               /------------oOOOo--(_)--oOOOo-------------\
>               |                                          |
>               |             Gabriele Butti               |
>               |        -----------------------           |
>               |      Department of Material Science      |
>               |      University of Milano-Bicocca        |
>               |      Via Cozzi 53, 20125 Milano, ITALY   |
>               |      Tel (+39)02 64485214                |
>               |             .oooO   Oooo.                |
>               \--------------(   )---(   )---------------/
>                               \ (     ) /
>                                \_)   (_/
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From smuelas at mecanica.upm.es  Wed Oct 29 04:30:28 2003
From: smuelas at mecanica.upm.es (smuelas)
Date: Wed, 29 Oct 2003 10:30:28 +0100
Subject: opteron VS Itanium 2
In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
References: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
Message-ID: <20031029103028.5b7a89a7.smuelas@mecanica.upm.es>

Why don't you try a more humble Athlon, (2800 will be enough and you can use DRAM at 400). 
You will economize a lot of money and for intensive operation it is very, very quick. 
I have a small cluster with 8 nodes and Athlon 2400 and the results are astonishing. The important point is the motherboard, and nforce is great.


On 28 Oct 2003 10:58:04 +0100
gabriele.butti at unimib.it (Butti Gabriele - Dottorati di Ricerca) wrote:

> Dear all, 
>       we are planning to build up a new cluster (16 nodes) before this
> year's end; we are evaluating different proposals from machine sellers,
> but the main doubt we have at this moment is whether choosing an Itanium
> 2 architecture or an AMD Opteron one. 
> 
> I know that ther's had already been on this list a debate on such a
> topic, but maybe some of you has some new experience to tell about. 
> 
> There is a wild bunch of benchmarks on these machines, but we fear that
> these are somewhat misleading and are not designed to test CPU's for
> intense scientific computing. The code we want to run on these machines
> is basically a home-made code, not fully optimized, which allocates
> around 500 Mb of RAM per node. Communication between nodes is a quite
> rare event and does not affect much computation time. In the past we had
> a very nice experience using Alpha CPU's which performed very well.
> 
> To sum up, the question is: is the Itanium2 worth the price difference
> or is the Opteron the best choice?
> 
> Thank you all
> 
> Gabriele Butti
> -- 
>                                  \\|//
>                                 -(o o)-
>               /------------oOOOo--(_)--oOOOo-------------\
>               |                                          |
>               |             Gabriele Butti               |
>               |        -----------------------           |
>               |      Department of Material Science      |     
>               |      University of Milano-Bicocca        |     
>               |      Via Cozzi 53, 20125 Milano, ITALY   |     
>               |      Tel (+39)02 64485214                |           
>               |             .oooO   Oooo.                |
>               \--------------(   )---(   )---------------/
>                               \ (     ) /
>                                \_)   (_/
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Santiago Muelas
E.T.S. Ingenieros de Caminos, (U.P.M)    Tf.: (34) 91 336 66 59
e-mail: smuelas at mecanica.upm.es          Fax: (34) 91 336 67 61
www: http://w3.mecanica.upm.es/~smuelas
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csmith at platform.com  Wed Oct 29 10:01:58 2003
From: csmith at platform.com (Chris Smith)
Date: Wed, 29 Oct 2003 07:01:58 -0800
Subject: SAS running on compute nodes
In-Reply-To: <858yn5m1v8.fsf@blindglobe.net>
References: 	 <20031028210849.DLHB1816.imf17aec.mail.bellsouth.net@mail.bellsouth.net>
	 <858yn5m1v8.fsf@blindglobe.net>
Message-ID: <1067439718.3742.53.camel@plato.dreadnought.org>

On Tue, 2003-10-28 at 14:30, A.J. Rossini wrote:
> <periea at bellsouth.net> writes:
> 
> 
> > Has anyone attempted or using SAS (SAS 9.0) in a clustering environment? TIA...
> 
> Sure, as a bunch of singleton processes.    I don't think you can do
> much more than that (but would be interested if I'm wrong).
> 
Actually ... you can after a fashion. SAS has something called MP
CONNECT as part of the SAS/CONNECT product which allows you to call out
to other SAS processes to have them run code for you, so you can do
parallel SAS programs.

http://support.sas.com/rnd/scalability/connect/index.html

-- Chris


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Wed Oct 29 10:11:19 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Wed, 29 Oct 2003 09:11:19 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310291511.h9TFBJx10935@mycroft.ahpcrc.org>


On Tue Oct 28 19:26:25 2003, Gabriele Butti wrote:

>To sum up, the question is: is the Itanium2 worth the price difference
>or is the Opteron the best choice?

 The SpecFP2000 performance difference between the best I2 and best
 Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000).
 The 1.5 GHz I2 with the 6MB cache is very expensive with a recent 
 estimate here for dual processor nodes with the >>smaller<< cache at over 
 $12,000 per node when Myrinet interconnect costs and other incidentals
 are included.  A dual Opteron 246 at 2.0 GHz with the same interconnect 
 and incidentals included was about $4,250.  Top of the line Pentium 4 
 duals again with same interconnect and incidentals about $750 less
 at $3,500. 

 For bandwidth/memory intensive codes, I think the Opteron is a clear
 winner in a dual processor configuration because of its dual channel
 to memory design.  Stream triad bandwidth during SMP operation is
 ~50% more than a one processor test.  Both the dual Pentium 4 and Itanium 
 2 share their memory bus and split (with some loss) the bandwidth in 
 dual mode. 

 In a single processor configuration the conclusion is less clear.  Itanium's 
 spec numbers are very impressive, but still not high enough to win on price 
 performance.  The new Pentium 4 3.2 GHz Extremem Edition with its 4x200 FSB 
 has very good SpecFP2000 numbers out performing the Opteron by about 100 spec 
 points and may be the best price performance choice in a single processor 
 configuration.

 But of course the above logic means nothing with a benchmark of >>your<<
 application and specific vendor quotes in >>your<< hands.

 rbw

#---------------------------------------------------
# Richard Walsh
# Project Manager, Cluster Computing, Computational
#                  Chemistry and Finance
# netASPx, Inc.
# 1200 Washington Ave. So.
# Minneapolis, MN 55415
# VOX:    612-337-3467
# FAX:    612-337-3400
# EMAIL:  rbw at networkcs.com, richard.walsh at netaspx.com
#         rbw at ahpcrc.org
#---------------------------------------------------
# Nullum magnum ingenium sine mixtura dementiae fuit. 
#                                  - Seneca 
#---------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctierney at hpti.com  Wed Oct 29 11:13:27 2003
From: ctierney at hpti.com (Craig Tierney)
Date: 29 Oct 2003 09:13:27 -0700
Subject: opteron VS Itanium 2
In-Reply-To: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
References: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
Message-ID: <1067444007.6209.16.camel@hpti10.fsl.noaa.gov>

On Tue, 2003-10-28 at 02:58, Butti Gabriele - Dottorati di Ricerca
wrote:
> Dear all, 
>       we are planning to build up a new cluster (16 nodes) before this
> year's end; we are evaluating different proposals from machine sellers,
> but the main doubt we have at this moment is whether choosing an Itanium
> 2 architecture or an AMD Opteron one. 
> 
> I know that ther's had already been on this list a debate on such a
> topic, but maybe some of you has some new experience to tell about. 
> 
> There is a wild bunch of benchmarks on these machines, but we fear that
> these are somewhat misleading and are not designed to test CPU's for
> intense scientific computing. The code we want to run on these machines
> is basically a home-made code, not fully optimized, which allocates
> around 500 Mb of RAM per node. Communication between nodes is a quite
> rare event and does not affect much computation time. In the past we had
> a very nice experience using Alpha CPU's which performed very well.
> 
> To sum up, the question is: is the Itanium2 worth the price difference
> or is the Opteron the best choice?
> 

Why don't you run your codes on the two platforms and figure it
out for yourself?  Better yet, get the vendors to do it.

I have seen cases where Itanium 2 performs much better than Opteron,
justifying the price difference.  Other codes did not show the same
difference, but both were faster than a Xeon.

Craig

> Thank you all
> 
> Gabriele Butti

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Thomas.Alrutz at dlr.de  Wed Oct 29 11:15:48 2003
From: Thomas.Alrutz at dlr.de (Thomas Alrutz)
Date: Wed, 29 Oct 2003 17:15:48 +0100
Subject: opteron VS Itanium 2
References: <1067335084.12500.63.camel@tantalio.mater.unimib.it>
Message-ID: <3F9FE7B4.1000607@dlr.de>

Hi Gabriele,

we have bought a similar Linux Cluster (16 nodes) you are lokking for 
with the smallest dual Opteron 240 (1.4 GHz) and two Gigabit networks
(one for communications (MPI) and one for nfs).

> Dear all, 
>       we are planning to build up a new cluster (16 nodes) before this
> year's end; we are evaluating different proposals from machine sellers,
> but the main doubt we have at this moment is whether choosing an Itanium
> 2 architecture or an AMD Opteron one. 
> 
> I know that ther's had already been on this list a debate on such a
> topic, but maybe some of you has some new experience to tell about. 
> 

The nodes have all 2 GB RAM (4*512 MB DDR333 REG), 2 Gigabit NICs 
(Broadcom onboard) and a Harddisk. The board we had choosen was the 
Rioworks HDAMA. I know it is not cheap, but it is stable and 
performances well with the SUSE/United Linux Enterprise Edition.

> There is a wild bunch of benchmarks on these machines, but we fear that
> these are somewhat misleading and are not designed to test CPU's for
> intense scientific computing. The code we want to run on these machines
> is basically a home-made code, not fully optimized, which allocates
> around 500 Mb of RAM per node. Communication between nodes is a quite
> rare event and does not affect much computation time. In the past we had
> a very nice experience using Alpha CPU's which performed very well.

We have done some benchmarking with our TAU-Code (unstructured finite 
volume CFD-code, in multigrid), which hangs extremly on the memory 
bandwith and latency. Therefore we tested 4 different architectures:

1. AMD Athlon MP 1.8 GHz   FSB 133 MHZ - with gcc3.2 in 32 Bit
2. Intel Xeon 2.66 GHz     FSB 133 MHZ - with icc7   in 32 bit
3. Intel Itanium2  1.0 GHz FSB 100 MHZ - with ecc6   in 64 Bit
4. AMD Opteron 240 1.4 GHz FSB 155 MHZ - with gcc3.2 in 64 Bit

For the benchmark we used a "real life" example (aircraft configuration 
with wing, body and engine - approx. 2 million grid points) which 
desires 1.3 GB to 1.7 GB for the job (1 process)
We have performed 30 iterations (Navier Stokes calculation - Spalart 
Allmares - central scheme - multigrid cycle) and taken the total 
(Wallclock) time.

> 
> To sum up, the question is: is the Itanium2 worth the price difference
> or is the Opteron the best choice?

To answer your question take a look on the following chart :

All times in seconds for 1 cpu on the node in use

1. AMD Athlon MP 1.8 GHz   - 30 iter. = 3642.4 sec.
2. Intel Xeon 2.66 GHz     - 30 iter. = 2151.4 sec. <- fastest
3. Intel Itanium2  1.0 GHz - 30 iter. = 3571.8 sec.
4. AMD Opteron 240 1.4 GHz - 30 iter. = 2256.5 sec.

and 2 cpu on the node in use (2 process via MPI)

1. AMD Athlon MP 1.8 GHz   - 30 iter. = 2076.1 sec.
2. Intel Xeon 2.66 GHz     - 30 iter. = 1447.8 sec
3. Intel Itanium2  1.0 GHz - 30 iter. = 1842.8 sec.
4. AMD Opteron 240 1.4 GHz - 30 iter. = 1159.5 sec. <-- fastest

So here you can see why we had to choose an Opteron based node to build 
up the cluster.
The price/performance ratio for the Opteron machine is verry good 
compared to the itanium2 machines.
And the Xeons are not so much cheaper....

Thomas
-- 
  __/|__ | Dipl.-Math. Thomas Alrutz
/_/_/_/ | DLR Institut fuer Aerodynamik und Stroemungstechnik
   |/    | Numerische Verfahren
     DLR | Bunsenstr. 10
         | D-37073 Goettingen/Germany


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From daniel at labtie.mmt.upc.es  Wed Oct 29 14:16:43 2003
From: daniel at labtie.mmt.upc.es (Daniel Fernandez)
Date: Wed, 29 Oct 2003 20:16:43 +0100
Subject: Video-less nodes
Message-ID: <1067455003.21980.11.camel@qeldroma.cttc.org>

Hi all,

I would like to get some opinions about video-less nodes in a cluster,
we know that there is no problem about monitoring nodes remotely and
reading logs but I suppose that in a kernel panic situation there's some
valuable on-screen information... ? any thoughts ?

Of course there's the possibility about putting really cheap video cards
just that we'll able to see the text screen , nothing more ;)

-- 
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Laboratori de Termot?cnia i Energia - CTTC
UPC Campus Terrassa


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Wed Oct 29 15:45:25 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Wed, 29 Oct 2003 12:45:25 -0800 (PST)
Subject: Video-less nodes
In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310291244380.22963-100000@twin.uoregon.edu>

On Wed, 29 Oct 2003, Daniel Fernandez wrote:

> Hi all,
> 
> I would like to get some opinions about video-less nodes in a cluster,
> we know that there is no problem about monitoring nodes remotely and
> reading logs but I suppose that in a kernel panic situation there's some
> valuable on-screen information... ? any thoughts ?

console on serial... let your terminal server collect oopses...

 
> Of course there's the possibility about putting really cheap video cards
> just that we'll able to see the text screen , nothing more ;)
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jlb17 at duke.edu  Wed Oct 29 16:41:21 2003
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Wed, 29 Oct 2003 16:41:21 -0500 (EST)
Subject: Video-less nodes
In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310291639030.28819-100000@chaos.egr.duke.edu>

On Wed, 29 Oct 2003 at 8:16pm, Daniel Fernandez wrote

> I would like to get some opinions about video-less nodes in a cluster,
> we know that there is no problem about monitoring nodes remotely and
> reading logs but I suppose that in a kernel panic situation there's some
> valuable on-screen information... ? any thoughts ?
> 
> Of course there's the possibility about putting really cheap video cards
> just that we'll able to see the text screen , nothing more ;)

As always, the answer is it depends.  A serial console should handle all 
your needs.  But sometimes the BIOS sucks or the console doesn't work 
right or...

IMHO, unless it messes other stuff up (e.g. drags your only PCI bus down 
to 32/33), there's not much reason *not* to stuff cheap video boards into 
nodes. 

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Oct 29 17:00:47 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 29 Oct 2003 17:00:47 -0500 (EST)
Subject: Video-less nodes
In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org>
Message-ID: <Pine.LNX.4.44.0310291658140.10069-100000@ganesh.phy.duke.edu>

On Wed, 29 Oct 2003, Daniel Fernandez wrote:

> Hi all,
> 
> I would like to get some opinions about video-less nodes in a cluster,
> we know that there is no problem about monitoring nodes remotely and
> reading logs but I suppose that in a kernel panic situation there's some
> valuable on-screen information... ? any thoughts ?
> 
> Of course there's the possibility about putting really cheap video cards
> just that we'll able to see the text screen , nothing more ;)

To my direct experience, the extra time you waste debugging problems on
videoless nodes by hauling them out of the rack, sticking video in them,
resolving the problem, removing the video, and reinserting the nodes is
far more costly than cheap video, or better yet onboard video (many/most
good motherboards have onboard video these days) and being able to
resolve many of these problems without deracking the nodes.

Just my opinion of course.  When things go well, of course, it doesn't
matter.

Just think about the labor involved in a single BIOS reflash, for
example.

   rgb

> 
> -- 
> Daniel Fernandez <daniel at labtie.mmt.upc.es>
> Laboratori de Termot?cnia i Energia - CTTC
> UPC Campus Terrassa
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Wed Oct 29 22:00:09 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 29 Oct 2003 22:00:09 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310291511.h9TFBJx10935@mycroft.ahpcrc.org>
Message-ID: <Pine.LNX.4.44.0310291847260.16540-100000@coffee.psychology.mcmaster.ca>

> >To sum up, the question is: is the Itanium2 worth the price difference
> >or is the Opteron the best choice?
> 
>  The SpecFP2000 performance difference between the best I2 and best
>  Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000).

which to me indicates that the working set of SPEC codes is a good 
match to the cache of high-end It2's.  this says nothing about It2's,
but rather points out that SPEC components are nearly obsolete
(required to run well in just 64MB core, if I recall correctly!)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jmdavis at mail2.vcu.edu  Wed Oct 29 15:12:20 2003
From: jmdavis at mail2.vcu.edu (Mike Davis)
Date: Wed, 29 Oct 2003 15:12:20 -0500
Subject: Video-less nodes
In-Reply-To: <1067455003.21980.11.camel@qeldroma.cttc.org>
References: <1067455003.21980.11.camel@qeldroma.cttc.org>
Message-ID: <3FA01F24.2090405@mail2.vcu.edu>

The onscreen info should also be logged. And then there's always the 
crash files.

We now have a couple of clusters with videoless nodes (although they are 
on serial switches, Cyclades).

Mike


Daniel Fernandez wrote:

>Hi all,
>
>I would like to get some opinions about video-less nodes in a cluster,
>we know that there is no problem about monitoring nodes remotely and
>reading logs but I suppose that in a kernel panic situation there's some
>valuable on-screen information... ? any thoughts ?
>
>Of course there's the possibility about putting really cheap video cards
>just that we'll able to see the text screen , nothing more ;)
>
>  
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andreas.boklund at htu.se  Thu Oct 30 01:57:35 2003
From: andreas.boklund at htu.se (andreas boklund)
Date: Thu, 30 Oct 2003 07:57:35 +0100
Subject: opteron VS Itanium 2
Message-ID: <sfa0d51e.022@webaccess.gw.htu.se>

Just a note,


> For bandwidth/memory intensive codes, I think the Opteron is a clear
> winner in a dual processor configuration because of its dual channel
> to memory design.  Stream triad bandwidth during SMP operation is
> ~50% more than a one processor test.  Both the dual Pentium 4 and Itanium 
> 2 share their memory bus and split (with some loss) the bandwidth in 
> dual mode. 

This is true as long as you are using an applicaiton where one process has its own
memory area. If you would have 2 processes and shared memory the Opt, would
behave like a small NUMA machine and a process will get a penalty for accessing
another process (processors) memory segment.

To quote D. Barron, "If it seems to be to good to be true, it probably is!", i have never
yet seen true linear scalability, and with Ahmdahl out there i doubt that i ever will.

Best
//Andreas


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Oct 30 10:51:41 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 30 Oct 2003 10:51:41 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <sfa0d51e.022@webaccess.gw.htu.se>
Message-ID: <Pine.LNX.4.44.0310301042430.20019-100000@coffee.psychology.mcmaster.ca>

> > For bandwidth/memory intensive codes, I think the Opteron is a clear
> > winner in a dual processor configuration because of its dual channel
> > to memory design.  Stream triad bandwidth during SMP operation is
> > ~50% more than a one processor test.  Both the dual Pentium 4 and Itanium 
> > 2 share their memory bus and split (with some loss) the bandwidth in 
> > dual mode. 

this is particularly bad on "high-end" machines.  for instance, several 
machines have 4 it2's on a single FSB.  there's a reason that specfprate
scales so much better on 1/2/4-way opterons than on 1/2/4-way it2's.

don't even get me started about those old profusion-chipset 8-way
PIII machines that Intel pushed for a while...

> This is true as long as you are using an applicaiton where one process has its own
> memory area. If you would have 2 processes and shared memory the Opt, would
> behave like a small NUMA machine and a process will get a penalty for accessing
> another process (processors) memory segment.

huh?  sharing data behaves pretty much the same on opteron systems
(broadcast-based coherency) as on shared-FSB (snoopy) systems.  it's not
at all clear yet whether opterons are higher latency in the case where 
you have *often*written* shared data.

it is perfectly clear that shared/snoopy buses don't scale, and neither does
pure broadcast coherency.  I figure that both Intel and AMD will be adding
some sort of directory support in future machines.  if they bother, that is - 
the market for many-way SMP is definitely not huge, at least not in the 
mass-market sense.

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Thu Oct 30 10:57:00 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Thu, 30 Oct 2003 09:57:00 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org>


On Wed Oct 29 21:38:48 2003, Mark Hahn wrote:

>> >To sum up, the question is: is the Itanium2 worth the price difference
>> >or is the Opteron the best choice?
>> 
>>  The SpecFP2000 performance difference between the best I2 and best
>>  Opteron seems to be about 600 spec points or 40% (~1400 versus ~2000).
>
>which to me indicates that the working set of SPEC codes is a good 
>match to the cache of high-end It2's.  this says nothing about It2's,
>but rather points out that SPEC components are nearly obsolete
>(required to run well in just 64MB core, if I recall correctly!)

Of course, there is some truth to what you say, but "this says nothing about 
It2" seems a tad dramatic (but ... definitely in character ... ;-) ).  Below is 
the memory table for most of the benchmarks.  A few fit in the 6MB cache (although
some surely should, as some codes do or can be made too fit into cache).  Many 
are in the 100 to 200 MB range.  The floating point accumen of the I2 chip is hard 
to question with the capability of performing 4 64-bit flops per clock (that's 
a 6.0 GFlops peak at 1.5 GHz; 12.0 at 32-bits).  Moreover, even an I2 with 1/2 the 
Opteron's clock and only 50% more cache (L3 vs L2) performs more or less equal to the 
Opteron 246 on SpecFP2000.  

And after all a huge cache does raise the average memory bandwidth felt by the 
average code ... ;-) (even as average codes sizes grow) ... and a large node count 
divides the total memory required per node.  Large clusters should love large caches
... you know the quest for super-linear speed ups.

The I2's weakness is in price-performance and in memory bandwidth in SMP configurations
in my view. My last line in the prior note was a reminder to the original poster
that SpecFP numbers are not a final answer.  I repeated the "benchmark you code"
mantra ... partly to relieve Bob Brown of his responsibility to do so ;-). 

Got any snow up in the Great White North yet?

Regards,

rbw


            max     max     num  num
            rsz     vsz     obs  unchanged  stable?
            -----   -----   ---  ---------  -------
gzip        180.0   199.0   181      68
vpr          50.0    53.6   151       6
gcc         154.0   156.0   134       0
mcf         190.0   190.0   232     230     stable
crafty        2.0     2.6   107     106     stable
parser       37.0    66.8   263     254     stable
eon           0.6     1.5   130       0
perlbmk     146.0   158.0   186       0
gap         192.0   194.0   149     148     stable
vortex       72.0    79.4   162       0
bzip2       185.0   199.0   153       6
twolf         3.4     4.0   273       0

wupwise     176.0   177.0   185     181     stable
swim        191.0   192.0   322     320     stable
mgrid        56.0    56.7   281     279     stable
applu       181.0   191.0   371     369     stable
mesa          9.4    23.1   132     131     stable
galgel       63.0   155.0   287      59
art           3.7     4.3   157      37
equake       49.0    49.4   218     216     stable
facerec      16.0    18.5   182     173     stable
ammp         26.0    28.4   277     269     stable
lucas       142.0   143.0   181     179     stable
fma3d       103.0   105.0   268     249     stable
sixtrack     26.0    59.8   148     141     stable
apsi        191.0   192.0   271     270     stable


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Thu Oct 30 11:07:25 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Thu, 30 Oct 2003 10:07:25 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310301607.h9UG7PX06372@mycroft.ahpcrc.org>


On Thu, 30 Oct 2003 07:57, Andreas Boklund wrote:

>Just a note,
>
>> For bandwidth/memory intensive codes, I think the Opteron is a clear
>> winner in a dual processor configuration because of its dual channel
>> to memory design.  Stream triad bandwidth during SMP operation is
>> ~50% more than a one processor test.  Both the dual Pentium 4 and Itanium 
>> 2 share their memory bus and split (with some loss) the bandwidth in 
>> dual mode. 
>
>This is true as long as you are using an applicaiton where one process has its own
>memory area. If you would have 2 processes and shared memory the Opt, would
>behave like a small NUMA machine and a process will get a penalty for accessing
>another process (processors) memory segment.
>
>To quote D. Barron, "If it seems to be to good to be true, it probably is!", i have never
>yet seen true linear scalability, and with Ahmdahl out there i doubt that i ever will.

 Agreed.  Of course, in the case of dual Pentium and Itaniums, even non-
 overlapping memory locations buy you nothing bandwidth-wise.  Small or large 
 scale perfect cross-bars to memory are tough and expensive. The Cray X1, with
 all its customer design effort and great total bandwidth on the node board,
 targeted only 1/4 of peak-data-required iin its design and delivers less under 
 the full load of its 16-way SMP vector engines.

 And it's node board is probably the best bandwidth engine in the world at the 
 moment.

 Regards,

 rbw

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Thu Oct 30 12:28:45 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Thu, 30 Oct 2003 11:28:45 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310301728.h9UHSj508273@mycroft.ahpcrc.org>


On Thu, 30 Oct 2003 12:00:54, Mark Hahn wrote:

>> Of course, there is some truth to what you say, but "this says nothing about 
>> It2" seems a tad dramatic (but ... definitely in character ... ;-) ).  Below is 
>
>all the world's a stage ;)

  Life without drama is life without the pursuit of happiness ... ;-).

>> the memory table for most of the benchmarks.  A few fit in the 6MB cache (although
>> some surely should, as some codes do or can be made too fit into cache).  Many 
>
>seriously, the memory access patterns of very few apps are uniform
>across their rss.  I probably should have said "working set fits in 6M".

 Good point, most memory accesses are not globally stride-one.  But of course 
 this fact leads us back to the idea that cache >>is<< important for a suite
 of "representative codes".

>and you're right; I just reread the spec blurb, and their aim was 100-200MB.
>
>> are in the 100 to 200 MB range.  The floating point accumen of the I2 chip is hard 
>
>that's max rss; it's certainly an upper bound on working set size,
>but definitely not a good estimator.  

 Yes, an upper bound.  We would need more data on the Spec codes to know 
 if the working sets are mostly sitting in the I2 cache.  There is an 
 inevitable dynamism here with larger caches swallowing up larger and larger
 chunks of the "average code's" working set and while the average working
 set grows over time.

>in other words, it tells you something about the peak number of pages that 
>the app ever touches.  it doesn't tell you whether 95% of those pages are 
>never touched again, or whether the app only touches 1 cacheline per page.
>
>in yet other words, max rss is relevant to swapping, not cache behavior.

 You might also say it this way ... cache-exceeding, max-RSS magnitude by 
 itself does guarantee the elimination of unwanted cache effects.
 
>
>> And after all a huge cache does raise the average memory bandwidth felt by the 
>> average code ... ;-) (even as average codes sizes grow) ... and a large node count 
>
>even though Spec uses geo-mean, it can strongly be influenced by outliers,
>as we've all seen with Sun's dramatic "performance improvements" ;)
>
>in particular, 179.art is a good example.  I actually picked it out by
>comparing the specFP barchart for mckinley vs madison - it shows a fairly
>dramatic improvement.  this *could* be due to compiler improvements,
>but given that 179.art has a peak RSS of 3.7MB, I think there's a real
>cache effect here.

 I agree again, but would say that such a suite as SpecFP should 
 include some codes that yield to cache-effects because some real
 world codes do.

 Always learn or am reminded of something from your posts Mark ... keep on
 keeping us honest and true ;-) like a Canadian Mountie.

 Regards,

 rbw

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Oct 30 12:45:20 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 30 Oct 2003 12:45:20 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310301728.h9UHSj508273@mycroft.ahpcrc.org>
Message-ID: <Pine.LNX.4.44.0310301231150.20955-100000@coffee.psychology.mcmaster.ca>

>  this fact leads us back to the idea that cache >>is<< important for a suite
>  of "representative codes".

yes, certainly, and TBBIYOC (*).  but the traditional perhaps slightly 
stodgy attitude towards this has been that caches do not help machine
balance.  that is, it2 has a peak/theoretical 4flops/cycle, but since 
that would require, worstcase, 3 doubles per flop, the highest-ranked 
CPU is actually imbalanced by a factor of 22.5!

(*) the best benchmark is your own code

let's step back a bit.  suppose we were designing a new version of SPEC,
and wanted to avoid every problem that the current benchmarks have.
here are some partially unworkable ideas:

keep geometric mean, but also quote a few other metrics that don't
hide as much interesting detail.  for instance, show the variance of 
scores.  or perhaps show base/peak/trimmed (where the lowest and highest
component are simply dropped).

cache is a problem unless your code is actually a spec component,
or unless all machines have the same basic cache-to-working-set relation
for each component.  alternative: run each component on a sweep of problem
sizes, and derive two scores: in-cache and out-cache.  use both scores 
as part of the overall summary statistic.

I'd love to see good data-mining tools for spec results.  for instance,
I'd like to have an easy way to compare consecutive results for the same 
machine as the vendor changed the compiler, or as clock increases.

there's a characteristic "shape" to spec results - which scores are 
high and low relative to the other scores for a single machine.  not only
does this include outliers (drastic cache or compiler effects), but
points at strengths/weaknesses of particular architectures.  how to do this,
perhaps some kind of factor analysis?

regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Oct 30 12:00:54 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 30 Oct 2003 12:00:54 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org>
Message-ID: <Pine.LNX.4.44.0310301140030.20019-100000@coffee.psychology.mcmaster.ca>

> Of course, there is some truth to what you say, but "this says nothing about 
> It2" seems a tad dramatic (but ... definitely in character ... ;-) ).  Below is 

all the world's a stage ;)

> the memory table for most of the benchmarks.  A few fit in the 6MB cache (although
> some surely should, as some codes do or can be made too fit into cache).  Many 

seriously, the memory access patterns of very few apps are uniform
across their rss.  I probably should have said "working set fits in 6M".

and you're right; I just reread the spec blurb, and their aim was 100-200MB.

> are in the 100 to 200 MB range.  The floating point accumen of the I2 chip is hard 

that's max rss; it's certainly an upper bound on working set size,
but definitely not a good estimator.  

in other words, it tells you something about the peak number of pages that 
the app ever touches.  it doesn't tell you whether 95% of those pages are 
never touched again, or whether the app only touches 1 cacheline per page.

in yet other words, max rss is relevant to swapping, not cache behavior.

> And after all a huge cache does raise the average memory bandwidth felt by the 
> average code ... ;-) (even as average codes sizes grow) ... and a large node count 

even though Spec uses geo-mean, it can strongly be influenced by outliers,
as we've all seen with Sun's dramatic "performance improvements" ;)

in particular, 179.art is a good example.  I actually picked it out by
comparing the specFP barchart for mckinley vs madison - it shows a fairly
dramatic improvement.  this *could* be due to compiler improvements,
but given that 179.art has a peak RSS of 3.7MB, I think there's a real
cache effect here.

> Got any snow up in the Great White North yet?

no, but I notice that the permanent temporary DX units are not working as
hard to keep the machineroom from melting down ;)

oh, yeah, and there's something wrong with the color of the leaves.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rbw at ahpcrc.org  Thu Oct 30 16:32:38 2003
From: rbw at ahpcrc.org (Richard Walsh)
Date: Thu, 30 Oct 2003 15:32:38 -0600
Subject: opteron VS Itanium 2
Message-ID: <200310302132.h9ULWcM12979@mycroft.ahpcrc.org>


Mark Hahn wrote:

>>  this fact leads us back to the idea that cache >>is<< important for a suite
>>  of "representative codes".
>
>yes, certainly, and TBBIYOC (*).  but the traditional perhaps slightly 
>stodgy attitude towards this has been that caches do not help machine
>balance.  that is, it2 has a peak/theoretical 4flops/cycle, but since 
>that would require, worstcase, 3 doubles per flop, the highest-ranked 
>CPU is actually imbalanced by a factor of 22.5!
>
>(*) the best benchmark is your own code

 Agreed, but since the scope of the discussion seemed to be microprocessors
 which are all relatively bad on balance compared to vector ISA/designs,
 I did not elaborate on balance. This is design area that favors the 
 Opteron (and Power 4) because the memory controller is on-chip (unlike 
 the Pentium 4 and I2) and as such, its performance improves with clock.

 I think it is interesting to look at other processor's theoretical balance 
 numbers in relationship to the I2's that you compute (I hope I have
 them all correct):

 Pentium 4 EE 3.2 GHz:

  (3.2 GHz * 2 flops * 24 bytes) / 6.4 bytes/sec =  Balance of 24
                                               (max on chip cache 2MB)

 Itanium 2 1.5 GHz:

  (1.5 GHz * 4 flops * 24 bytes) / 6.4 bytes/sec =  Balance of 22.5
                                               (max on chip cache 6MB)

 Opteron 246 2.0 GHz:

  (2.0 GHz * 2 flops * 24 bytes) / 6.4 bytes/sec =  Balance of 15
                                               (max on chip cache 1MB)

 Power 4 1.7 GHz:

  (1.7 GHz * 4 flops * 24 bytes) / 6.4 bytes/sec =  Balance of 25.5*
                                               (max on chip cache 1.44MB)

 Cray X1 .8 GHz:

  (0.8 GHz * 4 flops * 24 bytes) / 19.2 bytes/sec = Balance of 4
                                                (512 byte off-chip L2)
  
 * IBM memory performance is with 1 core disabled and may now be higher
   than this.

 When viewed in context, yes, the I2 is poorly balanced, but it is typical 
 of microprocessors, and it is not the worst among them. It also offers the 
 largest compensating cache. Where it loses alot of ground is in the dual
 processor configuration.  Opteron yields a better number, but this is 
 because it can't do as many flops.  The Cray X1 is has the most agressive 
 design specs and yields a large enough percentage of peak to beat the 
 fast clocked micros on vector code (leaving the ugly question of price aside).  
 This is in part due to the more balanced design, but also due to its vector 
 ISA which is just better at moving data from memory.

>let's step back a bit.  suppose we were designing a new version of SPEC,
>and wanted to avoid every problem that the current benchmarks have.
>here are some partially unworkable ideas:
>
>keep geometric mean, but also quote a few other metrics that don't
>hide as much interesting detail.  for instance, show the variance of 
>scores.  or perhaps show base/peak/trimmed (where the lowest and highest
>component are simply dropped).

 Definitely. I am constantly trimming the reported numbers myself and 
 looking at the bar graphs for an eye-ball variance.  It takes will 
 power to avoid being seduced by a single summarizing number.  The 
 Ultra III's SpecFP number was a good reminder.

>cache is a problem unless your code is actually a spec component,
>or unless all machines have the same basic cache-to-working-set relation
>for each component.  alternative: run each component on a sweep of problem
>sizes, and derive two scores: in-cache and out-cache.  use both scores 
>as part of the overall summary statistic.

 Very good as well.  This is the "cpu-rate-comes-to-spec" approach
 that I am sure Bob Brown would endorse.

>I'd love to see good data-mining tools for spec results.  for instance,
>I'd like to have an easy way to compare consecutive results for the same 
>machine as the vendor changed the compiler, or as clock increases.

 ... or increased cache size.  Another winning suggestion.

>there's a characteristic "shape" to spec results - which scores are 
>high and low relative to the other scores for a single machine.  not only
>does this include outliers (drastic cache or compiler effects), but
>points at strengths/weaknesses of particular architectures.  how to do this,
>perhaps some kind of factor analysis?

 This is what I refer to as the Spec finger print or Roshacht(sp?)
 test. We need a neural net derived analysis and classification here. 

 Another presentation that I like is the "star graph" in which major 
 characteristics (floating point perf., integer perf., cache, memory
 bandwidth, etc.) are layed out in equal degrees as vectors around
 a circle. Each processor is measured on each axis to give a star
 print and the total area is a measure of "total goodness".

 I hope someone from Spec is reading this ... and they remember who
 made these suggestions ... ;-).

 Regards,

 rbw

#---------------------------------------------------
# Richard Walsh
# Project Manager, Cluster Computing, Computational
#                  Chemistry and Finance
# netASPx, Inc.
# 1200 Washington Ave. So.
# Minneapolis, MN 55415
# VOX:    612-337-3467
# FAX:    612-337-3400
# EMAIL:  rbw at networkcs.com, richard.walsh at netaspx.com
#         rbw at ahpcrc.org
#
#---------------------------------------------------
# Nullum magnum ingenium sine mixtura dementiae fuit. 
#                                  - Seneca 
#---------------------------------------------------

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Thu Oct 30 23:31:01 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Fri, 31 Oct 2003 12:31:01 +0800 (CST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org>
Message-ID: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com>

Other problems with the Itanium 2 are power hungry and
heat problem.

Also, reported on another mailing list:

Earth Simulator          35.8 TFlop/s
ASCI Q Alpha EV-68       13.8 TFlop/s
Apple G5 dual (Big Mac)   9.5 TFlop/s
HP RX2600 Itanium 2       8.6 TFlop/s

This would place the Big Mac in the 3rd place on the
top500 list -- assuming they have reported all
submitted results in the report:

http://www.netlib.org/benchmark/performance.pdf (p53)

Andrew.

> The I2's weakness is in price-performance and in
> memory bandwidth in SMP configurations
> in my view. My last line in the prior note was a
> reminder to the original poster
> that SpecFP numbers are not a final answer.  I
> repeated the "benchmark you code"
> mantra ... partly to relieve Bob Brown of his
> responsibility to do so ;-). 
> 
> Got any snow up in the Great White North yet?
> 
> Regards,
> 
> rbw
> 
> 
>             max     max     num  num
>             rsz     vsz     obs  unchanged  stable?
>             -----   -----   ---  ---------  -------
> gzip        180.0   199.0   181      68
> vpr          50.0    53.6   151       6
> gcc         154.0   156.0   134       0
> mcf         190.0   190.0   232     230     stable
> crafty        2.0     2.6   107     106     stable
> parser       37.0    66.8   263     254     stable
> eon           0.6     1.5   130       0
> perlbmk     146.0   158.0   186       0
> gap         192.0   194.0   149     148     stable
> vortex       72.0    79.4   162       0
> bzip2       185.0   199.0   153       6
> twolf         3.4     4.0   273       0
> 
> wupwise     176.0   177.0   185     181     stable
> swim        191.0   192.0   322     320     stable
> mgrid        56.0    56.7   281     279     stable
> applu       181.0   191.0   371     369     stable
> mesa          9.4    23.1   132     131     stable
> galgel       63.0   155.0   287      59
> art           3.7     4.3   157      37
> equake       49.0    49.4   218     216     stable
> facerec      16.0    18.5   182     173     stable
> ammp         26.0    28.4   277     269     stable
> lucas       142.0   143.0   181     179     stable
> fma3d       103.0   105.0   268     249     stable
> sixtrack     26.0    59.8   148     141     stable
> apsi        191.0   192.0   271     270     stable
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????????????????????
http://tw.promo.yahoo.com/mail_premium/stationery.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 31 11:02:29 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 31 Oct 2003 11:02:29 -0500 (EST)
Subject: opteron VS Itanium 2
In-Reply-To: <200310302132.h9ULWcM12979@mycroft.ahpcrc.org>
Message-ID: <Pine.LNX.4.44.0310311012020.4461-100000@lilith.rgb.private.net>

On Thu, 30 Oct 2003, Richard Walsh wrote:

> >cache is a problem unless your code is actually a spec component,
> >or unless all machines have the same basic cache-to-working-set relation
> >for each component.  alternative: run each component on a sweep of problem
> >sizes, and derive two scores: in-cache and out-cache.  use both scores 
> >as part of the overall summary statistic.
> 
>  Very good as well.  This is the "cpu-rate-comes-to-spec" approach
>  that I am sure Bob Brown would endorse.

Oh, sure.  "I endorse this." ;-)

As you guys are working out fine on your own, I like it combined with
Mark's suggestion of showing the entire constellation for spec (which of
course you CAN access and SHOULD access in any case instead of relying
on geometric or any other mean measure of performance:-).

I really think that many HPC performance benchmarks primary weakness is
that they DON'T sweep problem size and present results as a graph, and
that they DON'T present a full suite of different results that measure
many identifiably different components of overall performance.  From way
back with early linpack, this has left many benchmarks susceptible to
vendor manipulation -- there are cases on record of vendors (DEC, IIRC,
but likely others) actually altering CPU/memory architecture to optimize
linpack performance because linpack was what sold their systems.  This
isn't just my feeling, BTW -- Larry McVoy has similar concerns (more
stridently expressed) in his lmbench suite -- he actually had (and
likely still has) as a condition of their application to a system that
they can NEVER be applied singly with just one (favorable:-) number or
numbers quoted in a publication or advertisement --- the results of the
complete suite have to be presented all together, with your abysmal
failures side by side with your successes.

I personally am less religious about NEVER doing anything and dislike
semi-closed sources and "rules" even for benchmarks (it makes far more
sense to caveat emptor and pretty much ignore vendor-based performance
claims in general:-), but do think that you get a hell of a lot more
information from a graph of e.g. stream results as a function of vector
size than you get from just "running stream".  Since running stream as a
function of vector size more or less requires using malloc to allocate
the memory and hence adds one additional step of indirection to memory
address resolution, it also very slightly worsens the results, but very
likely in the proper direction -- towards the real world, where people
do NOT generally recompile an application in order to change problem
size.

I also really like Mark's idea of having a benchmark database site where
comparative results from a wide range of benchmarks can be easily
searched and collated and crossreferenced.  Like the spec site,
actually.  However, that's something that takes a volunteer or
organization with spare resources, much energy, and an attitude to make
happen, and since one would like to e.g. display spec results on a
non-spec site and since spec is (or was, I don't keep up with its
"rules") fairly tightly constrained on who can run it and how/where its
results can be posted, it might not be possible to create your own spec
db, your own lmbench db, your own linpack db, all on a public site.
cpu_rate you can do whatever you want with -- it is full GPL code so a
vendor could even rewrite it as long as they clearly note that they
have done so and post the rewritten sources.  Obviously you should
either get results from somebody you trust or run it yourself, but that
is true for any benchmark, with the latter being vastly preferrable.:-)

If I ever have a vague bit of life in me again and can return to
cpu_rate, I'm in the middle of yet another full rewrite that should make
it much easier to create and encapsulate a new code fragment to
benchmark AND should permit running an "antistream" version of all the
tests involving long vectors (one where all the memory addresses are
accessed in a random/shuffled order, to deliberately defeat the cache).
However, I'm stretched pretty thin at the moment -- a talk to give
Tuesday on xmlsysd/wulfstat, a CW column due on Wednesday, and I've
agreed to write an article on yum due on Sunday of next week I think
(and need to finish the yum HOWTO somewhere in there as well).  So it
won't be anytime soon...:-)

> >I'd love to see good data-mining tools for spec results.  for instance,
> >I'd like to have an easy way to compare consecutive results for the same 
> >machine as the vendor changed the compiler, or as clock increases.
> 
>  ... or increased cache size.  Another winning suggestion.
> 
> >there's a characteristic "shape" to spec results - which scores are 
> >high and low relative to the other scores for a single machine.  not only
> >does this include outliers (drastic cache or compiler effects), but
> >points at strengths/weaknesses of particular architectures.  how to do this,
> >perhaps some kind of factor analysis?
> 
>  This is what I refer to as the Spec finger print or Roshacht(sp?)
>  test. We need a neural net derived analysis and classification here. 

<chortle>.  The only one I'd trust is the one already implemented in
wetware.  After all, classification according to what? 

>  Another presentation that I like is the "star graph" in which major 
>  characteristics (floating point perf., integer perf., cache, memory
>  bandwidth, etc.) are layed out in equal degrees as vectors around
>  a circle. Each processor is measured on each axis to give a star
>  print and the total area is a measure of "total goodness".
> 
>  I hope someone from Spec is reading this ... and they remember who
>  made these suggestions ... ;-).

But things are more complicated than this.  The real problem with SPEC
is that your application may well resemble one of the components of the
suite, in which case that component is a decent predictor of performance
for your application almost by definition.  However, the mean
performance on the suite may or may not be well correlated with that
component, or your application may not resemble ANY of the components on
the suite.  Then there are variations with compiler, operating system,
memory configuration, scaling (or lack thereof!) with CPU clock.  As
Mark says, TBBIYOC is the only safe rule if you seek to compare systems
on the basis of "benchmarks".

I personally tend to view large application benchmarks like linpack and
spec with a jaded eye and prefer lmbench and my own microbenchmarks to
learn something about the DETAILED performance of my architecture on
very specific tasks that might be components of a large application,
supplemented with YOC.  Or rather MOC.

Zen question: Which one reflects the performance of an architecture, a
BLAS-based benchmark or an ATLAS-tuned BLAS-based benchmark?

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at linux-mag.com  Fri Oct 31 09:11:49 2003
From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine)
Date: Fri, 31 Oct 2003 09:11:49 -0500 (EST)
Subject: Cluster Poll Results
In-Reply-To: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com>
Message-ID: <Pine.LNX.4.44.0310310906100.22446-100000@boltzmann.basement-supercomputing.com>


For those interested, the latest poll at www.cluster-rant.com was on
cluster size. We had a record 102 responses! Take a look at
http://www.cluster-rant.com/article.pl?sid=03/10/25/1330216
for links to results and to the new poll on interconnects.

Doug

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Oct 31 11:55:43 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 31 Oct 2003 11:55:43 -0500 (EST)
Subject: Cluster Poll Results
In-Reply-To: <Pine.LNX.4.44.0310310906100.22446-100000@boltzmann.basement-supercomputing.com>
Message-ID: <Pine.LNX.4.44.0310311147370.4461-100000@lilith.rgb.private.net>

On Fri, 31 Oct 2003, Douglas Eadline, Cluster World Magazine wrote:

> 
> For those interested, the latest poll at www.cluster-rant.com was on
> cluster size. We had a record 102 responses! Take a look at
> http://www.cluster-rant.com/article.pl?sid=03/10/25/1330216
> for links to results and to the new poll on interconnects.

You need to let people vote more than once in something like this.  I
have three distinct clusters and there are two more I'd vote for the
owners here at Duke.  (They pretty much reflect the numbers you're
getting, which show well over half the clusters at 32 nodes or less).

It is interesting that this indicates that the small cluster is a lot
more common than big clusters, although the way numbers work there are a
lot more nodes in big clusters than in small clusters.  At least in your
biased and horribly unscientific (but FUN!) poll:-)

So from a human point of view, providing support for small clusters is
more important, but from an institutional/hardware point of view, big
clusters dominate.

It is also very interesting to me that RH (for example) thinks that
there is something that they are going to provide that is worth e.g.
several hundred thousand dollars in the case of a 1000+ node cluster
running their "workstation" product.  Fifty dollars certainly.  Five
hundred dollars maybe.  A thousand dollars possibly, but only if they
come up with a cluster-specific installation with some actual added
value.

Sigh.

   rgb

> 
> Doug
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcownie at etnus.com  Fri Oct 31 12:10:31 2003
From: jcownie at etnus.com (James Cownie)
Date: Fri, 31 Oct 2003 17:10:31 +0000
Subject: opteron VS Itanium 2 (Benchmark cheating)
In-Reply-To: Message from "Robert G. Brown" <rgb@phy.duke.edu> 
   of "Fri, 31 Oct 2003 11:02:29 EST." <Pine.LNX.4.44.0310311012020.4461-100000@lilith.rgb.private.net> 
Message-ID: <1AFcn9-5Y0-00@etnus.com>


> From way back with early linpack, this has left many benchmarks
> susceptible to vendor manipulation -- there are cases on record of
> vendors (DEC, IIRC, but likely others) actually altering CPU/memory
> architecture to optimize linpack performance because linpack was
> what sold their systems.

This certainly applied to some compilers which "optimized" sdot and
ddot by recognizing the source (down to the precise comments) and
plugged in a hand coded assembler routine.

Changing a comment (for instance mis-spelling Jack's name :-) or
replacing a loop variable called "i" with one called "k" could halve
the linpack result.

When $$$ are involved people are prepared to sail close to the wind...

-- Jim

James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Fri Oct 31 14:36:09 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Fri, 31 Oct 2003 11:36:09 -0800 (PST)
Subject: opteron VS Itanium 2 (Benchmark cheating)
In-Reply-To: <1AFcn9-5Y0-00@etnus.com>
Message-ID: <Pine.LNX.4.04.10310311127220.6114-100000@12-207-208-137.client.attbi.com>

On Fri, 31 Oct 2003, James Cownie wrote:
> > From way back with early linpack, this has left many benchmarks
> > susceptible to vendor manipulation -- there are cases on record of
> > vendors (DEC, IIRC, but likely others) actually altering CPU/memory
> > architecture to optimize linpack performance because linpack was
> > what sold their systems.
> 
> This certainly applied to some compilers which "optimized" sdot and
> ddot by recognizing the source (down to the precise comments) and
> plugged in a hand coded assembler routine.

Nvidia and ATI have recently done similar things, where their drivers would
attempt to detect benchmarks being run and then use optimized routines or
cheat on following specifications.  Renaming quake2.exe to something else
would cause a large decrease in framerate for example.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Fri Oct 31 14:45:04 2003
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Fri, 31 Oct 2003 11:45:04 -0800 (PST)
Subject: opteron VS Itanium 2
In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com>
Message-ID: <20031031194504.30508.qmail@web11404.mail.yahoo.com>

But still, at least the results showed that the G5s provided similar
performance, and less expensive than IA64...

Rayson

--- Greg Lindahl <lindahl at keyresearch.com> wrote:
> On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote:
> 
> > This would place the Big Mac in the 3rd place on the
> > top500 list
> 
> Except that there are several other new large clusters that will
> likely place higher -- LANL announced a 2,048 cpu Opteron cluster a
> while back, and LLNL has something new, too, I think. Comparing
> yourself to the obsolete list in multiple press releases isn't very
> clever.
> 
> -- greg
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears
http://launch.yahoo.com/promos/britneyspears/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john.hearns at clustervision.com  Fri Oct 31 12:38:20 2003
From: john.hearns at clustervision.com (John Hearns)
Date: Fri, 31 Oct 2003 18:38:20 +0100 (CET)
Subject: Cluster Poll Results
In-Reply-To: <Pine.LNX.4.44.0310311147370.4461-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com>

On Fri, 31 Oct 2003, Robert G. Brown wrote:

> 
> It is also very interesting to me that RH (for example) thinks that
> there is something that they are going to provide that is worth e.g.
> several hundred thousand dollars in the case of a 1000+ node cluster
> running their "workstation" product.  Fifty dollars certainly.  Five
> hundred dollars maybe.  A thousand dollars possibly, but only if they
> come up with a cluster-specific installation with some actual added
> value.
> 
I'll second that.

There has been a debate running on this topic on the Fedora list
over the last few days.

Sorry to be so boring, but its something we should debate too.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Oct 31 13:19:12 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 31 Oct 2003 10:19:12 -0800
Subject: opteron VS Itanium 2
In-Reply-To: <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com>
References: <200310301557.h9UFv0e06085@mycroft.ahpcrc.org> <20031031043101.99581.qmail@web16811.mail.tpe.yahoo.com>
Message-ID: <20031031181912.GB1289@greglaptop.internal.keyresearch.com>

On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote:

> This would place the Big Mac in the 3rd place on the
> top500 list

Except that there are several other new large clusters that will
likely place higher -- LANL announced a 2,048 cpu Opteron cluster a
while back, and LLNL has something new, too, I think. Comparing
yourself to the obsolete list in multiple press releases isn't very
clever.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From walkev at presearch.com  Fri Oct 31 14:44:59 2003
From: walkev at presearch.com (Vann H. Walke)
Date: Fri, 31 Oct 2003 14:44:59 -0500
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com>
References: 	 <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com>
Message-ID: <1067629499.21719.73.camel@localhost.localdomain>

On Fri, 2003-10-31 at 12:38, John Hearns wrote:
> On Fri, 31 Oct 2003, Robert G. Brown wrote:
> 
> > 
> > It is also very interesting to me that RH (for example) thinks that
> > there is something that they are going to provide that is worth e.g.
> > several hundred thousand dollars in the case of a 1000+ node cluster
> > running their "workstation" product.  Fifty dollars certainly.  Five
> > hundred dollars maybe.  A thousand dollars possibly, but only if they
> > come up with a cluster-specific installation with some actual added
> > value.
> > 
> I'll second that.
> 
> There has been a debate running on this topic on the Fedora list
> over the last few days.
> 
> Sorry to be so boring, but its something we should debate too.
> 

Hmm... Let's take the case of a 1000 node system.  If we assume a
$3000/node cost (probably low once rack, UPS, hardware support, and
interconnect are added in), we arrive at an approximate hardware cost of
$3,000,000.  If we were to use the RHEL WS list price of $179/node, we
get $179,000 or about 6% of the hardware cost.  That is assuming RedHat
will not provide any discount on large volume purchases (unlikely).  Is
6% unreasonable?    

What are the alternatives?

- Keep using an existing RH distro:  Only if you're willing to move into
do it yourself mode when RH stop support (December?).  I expect very few
would be happy with this option.  However, if you have a working RH7.3
cluster, it works, and you don't have to worry too much about security,
why change?  For new clusters though....
- Fedora - Planned releases 2-3 times a year.  So, if I build a system
on the Fedora release scheduled this Monday, who will be providing
security patches for it 2 years from now (after 4-6 new releases have
been dropped).  My guess is no-one. Again, we're in the do it yourself
maintenance or frequent OS upgrade mode.
- SUSE - Not sure about this one.  Their commercial pricing model is
pretty close to RedHat's.  Are they going to keep developing consumer
releases?  What will the support be for those releases?  Can we really
expect more than we get from a purely community developed system? 
Perhaps someone with more SUSE knowledge could comment?
- Debian - Could be a good option, but to some extent you end up in the
same position as Fedora.  How often do the releases come out.  Who
supports the old releases?  What hardware / software will work on the
platform?  
- Gentoo - Not reliable, stable enough to meet my needs for clustering
- Mandrake - Mandrake has their clustering distribution, which could be
a good possibility, but the cost is as high or higher than RedHat.  
- Scyld - Superior design, supported, but again very high cost and may
have to fight some compatibility issues since the it's market share in
the Linux world is less than tiny.
- OSCAR / Rocks / etc...  - generally installed on top of another
distribution.  We still have to pick a base distribution.

My conclusions - If you're in a research facility / university type
setting where limited amounts of down time are acceptable, a free or
nearly free system is perfect.  A new Fedora/Debian/SuSE release comes
out, shut the system down over Christmas break and rebuild it.  (As long
as you're happy spending a fair amount of time doing rebuilds and fixing
upgrade problems). 

If however you really need the thing to work - Corporate research sites,
satellite data processing, etc... the cost of the operating system may
be minuscule relative to the cost of having the system down.  If you
_really_ want a particular application to work having it certified and
supported on the OS may be important.  

The project on which I'm working - building sonar training simulators
for the US Navy Submarine force requires stable systems which should
operate without major maintenance / operational changes for many years. 
Knowing the RedHat will support the enterprise line for 5 years is a big
selling point.

The cluster management portion of the software stack would be great to
have integrated in to the product, but if third party vendors (Linux
Networx, OSCAR, Rocks, etc...) can provide the cluster management
portion on top of the distribution, a solution can be found.  In some
ways this is even better since your cluster management decision is
independent of the OS vendor.

I basically just want to make the point that the cluster space is filled
with people of many different needs.  Will everyone want RHEL?  My guess
is a resounding NO.  (In the days of RH7.3 you could almost say Yes.) 
But, there are situations in which a stable, supported product is
needed.  This is the market RedHat is trying to target and states so
pretty clearly ("Enterprise").  Small users and research systems get
somewhat left out in the cold, but we probably shouldn't complain after
having a free ride for the last 5+ years.  

So, is 6% unreasonable?

Vann


> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From eemoore at fyndo.com  Fri Oct 31 14:52:55 2003
From: eemoore at fyndo.com (Dr Eric Edward Moore)
Date: Fri, 31 Oct 2003 19:52:55 +0000
Subject: opteron VS Itanium 2
In-Reply-To: <Pine.LNX.4.44.0310301231150.20955-100000@coffee.psychology.mcmaster.ca> (Mark
 Hahn's message of "Thu, 30 Oct 2003 12:45:20 -0500 (EST)")
References: <Pine.LNX.4.44.0310301231150.20955-100000@coffee.psychology.mcmaster.ca>
Message-ID: <87he1pfalk.fsf@azathoth.fyndo.com>

Mark Hahn <hahn at physics.mcmaster.ca> writes:

> there's a characteristic "shape" to spec results - which scores are 
> high and low relative to the other scores for a single machine.  not only
> does this include outliers (drastic cache or compiler effects), but
> points at strengths/weaknesses of particular architectures.  how to do this,
> perhaps some kind of factor analysis?

Well, being bored, I tried factor analysis on the average results for
the submitted specfp benchmarks at http://www.specbench.org/

The 5 factors with the largest eigenvaslues are:

Eigenvalue:    0.314116    0.353034    0.799331    1.432038   10.614996 
               2.22%       2.25%       5.70%      10.22%      75.82%

168.wupwise   -0.4134913   0.0241240  -0.1437086  -0.2757206   0.2715672
171.swim       0.0245451   0.0965325   0.3495143   0.1209393   0.2783842
172.mgrid      0.1122617   0.1365769   0.3273285   0.1332301   0.2839204
173.applu      0.0299056   0.0439954   0.4163242   0.1913496   0.2725619
177.mesa       0.4791260   0.4190313  -0.0949648  -0.3785996   0.2448368
178.galgel    -0.0489231  -0.5404192  -0.2464610   0.2391370   0.2648068
179.art        0.0646181   0.5095081  -0.4736362   0.6508958   0.1054875
183.equake    -0.5560255   0.0841426   0.0214064   0.1615493   0.2794066
187.facerec   -0.0402649   0.0446221  -0.2628912  -0.0557252   0.2897607
188.ammp       0.3993861  -0.3404615  -0.1456043   0.0359475   0.2832809
189.lucas     -0.2380202   0.0908976   0.0801927  -0.2140971   0.2842518
191.fma3d     -0.0326577   0.1661895  -0.1149762  -0.3148501   0.2774768
200.sixtrack   0.1950678  -0.1574121   0.2852895   0.2008475   0.2741305
301.apsi       0.1128198  -0.2379642  -0.3013536  -0.1224494   0.2782804

Pretty much all the specfp tests correlate with each other pretty
well, except for 179.art, which correlates...  poorly with the
others (it's correlation with 177.mesa is just 0.03).  So most of the
variation in the results is some sort of "raw speed" number, which has
near-equal weightings of all the tests besides 179.art.  

Next most important is whatever makes art so different from all the
others (maybe it's a persistent cache-misser, or maybe it's just the
easiest for vendors to tweak).

Not entirely sure what to make of the others.  There does seem to be
some commonality between 171.swim 172.mgrid 173.applu and 200.sixtrack
in the third biggest factor (plus a lot of whatever art isn't) that
could be important.  

The next two seem to mostly have something to do with whatever makes
177.mesa special.  

This is presumably all useless, but someone might be entertained :)

> regards, mark hahn.

-- 
Eric E. Moore
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mathiasbrito at yahoo.com.br  Fri Oct 31 16:38:52 2003
From: mathiasbrito at yahoo.com.br (=?iso-8859-1?q?Mathias=20Brito?=)
Date: Fri, 31 Oct 2003 18:38:52 -0300 (ART)
Subject: sum of matrices
Message-ID: <20031031213852.87539.qmail@web12206.mail.yahoo.com>

Hi,

Last days I write a code(in c) that make the sum of 2
matrices. Let me say a little about how it works. I
send 1 row of the 1st matrice and 1 row of 2nd matrice
for each process, when a process finish its job, if
have more lines i send more to it and it make the sum
of these new 2 lines. The problem is, the program
works fine with 100x100(or less) matrice, but when I
increase this range, something like 10000x10000 i
receive the fallowing message:

p0_8467:  p4_error: Child process exited while making
connection to remote process on node2: 0

This is a MPI problem or it`s my code? What can I do
to fix this problem.


=====
Mathias Brito
Universidade Estadual de Santa Cruz - UESC
Departamento de Ci?ncias Exatas e Tecnol?gicas
Estudante do Curso de Ci?ncia da Computa??o

Yahoo! Mail - o melhor webmail do Brasil
http://mail.yahoo.com.br
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xyzzy at speakeasy.org  Fri Oct 31 15:52:12 2003
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Fri, 31 Oct 2003 12:52:12 -0800 (PST)
Subject: opteron VS Itanium 2
In-Reply-To: <20031031181912.GB1289@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.04.10310311245400.6163-100000@12-207-208-137.client.attbi.com>

On Fri, 31 Oct 2003, Greg Lindahl wrote:
> On Fri, Oct 31, 2003 at 12:31:01PM +0800, Andrew Wang wrote:
> > This would place the Big Mac in the 3rd place on the
> > top500 list
> 
> Except that there are several other new large clusters that will
> likely place higher -- LANL announced a 2,048 cpu Opteron cluster a
> while back, and LLNL has something new, too, I think. Comparing
> yourself to the obsolete list in multiple press releases isn't very
> clever.

I thought that the 3rd place was in the new preliminary top500 list that
included all the big machines that will be there when the official list comes
out.  But there's been so much poor and conflicting information about Big Mac
who knows?  I'd like to know how much they payed for the infiniband hardware.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From roger at ERC.MsState.Edu  Fri Oct 31 16:14:35 2003
From: roger at ERC.MsState.Edu (Roger L. Smith)
Date: Fri, 31 Oct 2003 15:14:35 -0600
Subject: opteron VS Itanium 2
In-Reply-To: <Pine.LNX.4.04.10310311245400.6163-100000@12-207-208-137.client.attbi.com>
References: <Pine.LNX.4.04.10310311245400.6163-100000@12-207-208-137.client.attbi.com>
Message-ID: <Pine.SGI.4.56.0310311512430.63936@Downforce.ERC.MsState.Edu>

On Fri, 31 Oct 2003, Trent Piepho wrote:

> I thought that the 3rd place was in the new preliminary top500 list that
> included all the big machines that will be there when the official list
> comes out.  But there's been so much poor and conflicting information
> about Big Mac who knows?  I'd like to know how much they payed for the
> infiniband hardware.

Yeah, me too.  As someone who just ponied up for a rather large IB
installation, I'm not sure that most people realize what a substantial
percentage of the cost of the cluster the IB might be.

 _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
| Roger L. Smith                        Phone: 662-325-3625               |
| Sr. Systems Administrator             FAX:   662-325-7692               |
| roger at ERC.MsState.Edu                 http://WWW.ERC.MsState.Edu/~roger |
|                       Mississippi State University                      |
|____________________________________ERC__________________________________|
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From weideng at uiuc.edu  Fri Oct 31 15:37:45 2003
From: weideng at uiuc.edu (Wei Deng)
Date: Fri, 31 Oct 2003 14:37:45 -0600
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain>
References: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> <1067629499.21719.73.camel@localhost.localdomain>
Message-ID: <20031031203745.GU1408@aminor.cs.uiuc.edu>

On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote:
> - OSCAR / Rocks / etc...  - generally installed on top of another
> distribution.  We still have to pick a base distribution.

>From what I heard from Rocks mailing list, they will release 3.1.0 the 
next Month, which will be based on RHEL 3.0, compiled from source code 
that is publicly available, and free of charge.

Even though Rocks is based on RedHat distribution, it is complete, which 
means you only need to download Rocks ISOs to accomplish your 
installation.

-- 
Wei Deng
Pablo Research Group
Department of Computer Science
University of Illinois
217-333-9052
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From josip at lanl.gov  Fri Oct 31 16:17:35 2003
From: josip at lanl.gov (Josip Loncaric)
Date: Fri, 31 Oct 2003 14:17:35 -0700
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain>
References: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> <1067629499.21719.73.camel@localhost.localdomain>
Message-ID: <3FA2D16F.4030807@lanl.gov>

Vann H. Walke wrote:

> On Fri, 2003-10-31 at 12:38, John Hearns wrote:
>>On Fri, 31 Oct 2003, Robert G. Brown wrote:
>>
>>>It is also very interesting to me that RH (for example) thinks that
>>>there is something that they are going to provide that is worth e.g.
>>>several hundred thousand dollars in the case of a 1000+ node cluster
>>>running their "workstation" product.  Fifty dollars certainly.  Five
>>>hundred dollars maybe.  A thousand dollars possibly, but only if they
>>>come up with a cluster-specific installation with some actual added
>>>value.
>>
>>I'll second that.
> 
> Hmm... Let's take the case of a 1000 node system.  If we assume a
> $3000/node cost (probably low once rack, UPS, hardware support, and
> interconnect are added in), we arrive at an approximate hardware cost of
> $3,000,000.  If we were to use the RHEL WS list price of $179/node, we
> get $179,000 or about 6% of the hardware cost.  That is assuming RedHat
> will not provide any discount on large volume purchases (unlikely).  Is
> 6% unreasonable?    

These days, one seldom builds 1000 node systems out of basic x86 boxes. 
  Consider a 1024 node AMD64 system instead: The list price on RHEL WS 
Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. 
  This is unlikely to create any sales.

RH should be paid for the valuable service they provide (patch streams 
etc.) but this is not worth $811K to builders of large clusters.  There 
are other good alternatives, most of them *MUCH* cheaper.  I fully agree 
with RGB that RH needs to announce a sensible pricing structure for 
clusters in order to participate in this market.

Would a single system image (BProc) cluster constructed by recompiling 
the kernel w/BProc patches fit RH's legal definition of a single 
"installed system" and a single "platform"?  If so, $792 for a 1024-node 
cluster would be quite acceptable...

Sincerely,
Josip

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Fri Oct 31 17:38:50 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 31 Oct 2003 17:38:50 -0500
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <3FA2D16F.4030807@lanl.gov>
References: 	 <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com>
	 <1067629499.21719.73.camel@localhost.localdomain>
	 <3FA2D16F.4030807@lanl.gov>
Message-ID: <1067639930.26872.1.camel@squash.scalableinformatics.com>


On Fri, 2003-10-31 at 16:17, Josip Loncaric wrote:

> These days, one seldom builds 1000 node systems out of basic x86 boxes. 
>   Consider a 1024 node AMD64 system instead: The list price on RHEL WS 
> Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. 
>   This is unlikely to create any sales.

SUSE AMD64 version of 9.0 is something like $120.  It was somewhat more
stable for my tests than the RH beta (GinGin64).  I hope that RH will
arrange for similar pricing.

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Fri Oct 31 19:00:30 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Fri, 31 Oct 2003 16:00:30 -0800 (PST)
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <3FA2E6FD.6050107@scali.com>
Message-ID: <Pine.LNX.4.44.0310311558130.8589-100000@twin.uoregon.edu>

On Fri, 31 Oct 2003, Steffen Persvold wrote:

> Josip Loncaric wrote:
> > 
> > 
> > These days, one seldom builds 1000 node systems out of basic x86 boxes. 
> >  Consider a 1024 node AMD64 system instead: The list price on RHEL WS 
> > Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. 
> >  This is unlikely to create any sales.

so download the source build and call you distro something other than 
redhat enterprise linux...

or use debian... or cope.

> > RH should be paid for the valuable service they provide (patch streams 
> > etc.) but this is not worth $811K to builders of large clusters.  There 
> > are other good alternatives, most of them *MUCH* cheaper.  I fully agree 
> > with RGB that RH needs to announce a sensible pricing structure for 
> > clusters in order to participate in this market.

so don't use redhat.

> Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't 
> claim support from RH for more than one of the systems.

read the liscsense agreement for you redhat enterprise disks...
 
> Regards,
> Steffen
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Oct 31 18:43:42 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 31 Oct 2003 15:43:42 -0800
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain>
References: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> <1067629499.21719.73.camel@localhost.localdomain>
Message-ID: <20031031234342.GC3744@greglaptop.internal.keyresearch.com>

On Fri, Oct 31, 2003 at 02:44:59PM -0500, Vann H. Walke wrote:

> So, is 6% unreasonable?

For just the base OS? Yes. The market-place has spoken very loudly
about that, especially people building large machines.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sp at scali.com  Fri Oct 31 17:49:33 2003
From: sp at scali.com (Steffen Persvold)
Date: Fri, 31 Oct 2003 23:49:33 +0100
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <3FA2D16F.4030807@lanl.gov>
References: <Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> <1067629499.21719.73.camel@localhost.localdomain> <3FA2D16F.4030807@lanl.gov>
Message-ID: <3FA2E6FD.6050107@scali.com>

Josip Loncaric wrote:
> 
> 
> These days, one seldom builds 1000 node systems out of basic x86 boxes. 
>  Consider a 1024 node AMD64 system instead: The list price on RHEL WS 
> Standard for AMD64 is $792 per node, or $811,008 for the whole cluster. 
>  This is unlikely to create any sales.
> 
> RH should be paid for the valuable service they provide (patch streams 
> etc.) but this is not worth $811K to builders of large clusters.  There 
> are other good alternatives, most of them *MUCH* cheaper.  I fully agree 
> with RGB that RH needs to announce a sensible pricing structure for 
> clusters in order to participate in this market.

Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't 
claim support from RH for more than one of the systems.

Regards,
Steffen


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From tod at gust.sr.unh.edu  Fri Oct 31 18:59:16 2003
From: tod at gust.sr.unh.edu (Tod Hagan)
Date: 31 Oct 2003 18:59:16 -0500
Subject: Cluster Poll Results (tangent into OS choices, Fedora and
	Debian)
In-Reply-To: <1067629499.21719.73.camel@localhost.localdomain>
References: 	<Pine.LNX.4.44.0310311835000.28751-100000@druifje.clustervision.com> 
	<1067629499.21719.73.camel@localhost.localdomain>
Message-ID: <1067644757.5702.219.camel@haze.sr.unh.edu>

On Fri, 2003-10-31 at 14:44, Vann H. Walke wrote:
> What are the alternatives?
> [snip]
> - Fedora - Planned releases 2-3 times a year.  So, if I build a system
> on the Fedora release scheduled this Monday, who will be providing
> security patches for it 2 years from now (after 4-6 new releases have
> been dropped).  My guess is no-one. Again, we're in the do it yourself
> maintenance or frequent OS upgrade mode.
> [snip]
> - Debian - Could be a good option, but to some extent you end up in the
> same position as Fedora.  How often do the releases come out.  Who
> supports the old releases?  What hardware / software will work on the
> platform?  

If Fedora achieves 2-3 upgrades per year then it will be fairly
different from Debian, which seems to be at 2-3 years per upgrade these
days, (well almost).

After a new release comes out Debian supports the old one for a period
of time (12 months?) with security updates before pulling the plug.

Debian can be upgraded in place as opposed to requiring a full
resinstall; while this is great for desktops and servers, I'm not sure
if this is important for a cluster.

As a result of the extended release cycle Debian stable tends to lack
support for the newest hardware (Opteron 64-bit, for example). This is
why Knoppix, which is based on Debian, isn't derived from Debian stable,
but rather from packages in the newer releases (testing, unstable and
experimental). But the flip side is that the stable release, while
dated, tends to work well as it's had a lot of testing.

Debian could probably use more recognition as a target platform by
commercial software vendors but it incorporates a huge number of
packages including many open source applications pertinent to science.
Breadth in packaged applications is probably more important for
workstations since clusters tend to use small numbers of apps very
intensely.

As a distribution Debian is more oriented towards servers than the
desktop (to the point that frustrated users have spawned the "Debian
Desktop" subproject). It seems to me that clusters have more in common
with servers than with desktops so that Debian's deliberate release rate
is a better match for the cluster environment than distros which release
often in order to incorporate the latest GUI improvements.

P.S.

While looking into the number of packages in Debian vs. Fedora I
stumbled across this frightening bit (gotta throw a Halloween reference
in somewhere) on the Fedora site:

http://fedora.redhat.com/participate/terminology.html
> Packages in Fedora Extras should avoid conflicts with other packages
> in Fedora Extras to the fullest extent possible. Packages in Fedora
> Extras must not conflict with packages in Fedora Core.

It seems that Fedora intends to achieve applications breadth through
"Fedora Extras" package sets in other repositories, but the prohibition
of conflicts between Extras packages isn't as strong as the absolute
prohibition of conflicts between Extras and Core packages. Could this
result in a new era of DLL hell a few years down the road?

Wow, I guess I just slung some FUD at Fedora, but maintaining a 2-3
releases per year rate probably requires a small core, putting the bulk
of applications into the Extras category and thus increasing the chance
of conflict. (Wasn't that the original recipe for DLL hell?) Debian has
avoided this through a much larger core, which of course slows the
release cycle.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctierney at hpti.com  Fri Oct 31 17:37:36 2003
From: ctierney at hpti.com (Craig Tierney)
Date: 31 Oct 2003 15:37:36 -0700
Subject: sum of matrices
In-Reply-To: <20031031213852.87539.qmail@web12206.mail.yahoo.com>
References: <20031031213852.87539.qmail@web12206.mail.yahoo.com>
Message-ID: <1067639856.6209.211.camel@hpti10.fsl.noaa.gov>

On Fri, 2003-10-31 at 14:38, Mathias Brito wrote:
> Hi,
> 
> Last days I write a code(in c) that make the sum of 2
> matrices. Let me say a little about how it works. I
> send 1 row of the 1st matrice and 1 row of 2nd matrice
> for each process, when a process finish its job, if
> have more lines i send more to it and it make the sum
> of these new 2 lines. The problem is, the program
> works fine with 100x100(or less) matrice, but when I
> increase this range, something like 10000x10000 i
> receive the fallowing message:
> 
> p0_8467:  p4_error: Child process exited while making
> connection to remote process on node2: 0
> 
> This is a MPI problem or it`s my code? What can I do
> to fix this problem.

It is probably your code.  Are you allocating the
matrix statically or dynamically?  Try increasing
the stack size on your node(s). 

Craig


> 
> 
> =====
> Mathias Brito
> Universidade Estadual de Santa Cruz - UESC
> Departamento de Ci?ncias Exatas e Tecnol?gicas
> Estudante do Curso de Ci?ncia da Computa??o
> 
> Yahoo! Mail - o melhor webmail do Brasil
> http://mail.yahoo.com.br
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sp at scali.com  Fri Oct 31 19:52:23 2003
From: sp at scali.com (Steffen Persvold)
Date: Sat, 01 Nov 2003 01:52:23 +0100
Subject: Cluster Poll Results (tangent into OS choices)
In-Reply-To: <Pine.LNX.4.44.0310311558130.8589-100000@twin.uoregon.edu>
References: <Pine.LNX.4.44.0310311558130.8589-100000@twin.uoregon.edu>
Message-ID: <3FA303C7.8050600@scali.com>

Joel Jaeggli wrote:
> On Fri, 31 Oct 2003, Steffen Persvold wrote:
[]
> 
>>Who says you have to pay 1024*$792 ? Why not only 1 license ? AFAIK you are may use that binary image as you like inside your cluster since it is covered by GPL, but you can't 
>>claim support from RH for more than one of the systems.
> 
> 
> read the liscsense agreement for you redhat enterprise disks...
>  

Well the EULA doesn't say anything about having to pay $792 for each node in a cluster (actually it doesn't mention paying license fee's at all). The only relevant stuff I can 
find is item 2, "Intellectual Property Rights" :

    "If Customer makes a commercial redistribution of
     the Software, unless a separate agreement with Red Hat is executed
     or other permission granted, then Customer must modify the files
     identified as REDHAT-LOGOS and anaconda-image to remove all
     images containing the Red Hat trademark or the Shadowman logo.
     Merely deleting these files may corrupt the Software."

And I wouldn't say that installing on your cluster nodes is "making a commercial redistribution" would you ? Or have I missed something fundamental ?

Regards,
Steffen

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf