From hraa at lncc.br  Fri Aug  1 13:58:59 2003
From: hraa at lncc.br (Ricardo)
Date: Fri, 1 Aug 2003 14:58:59 -0300 (BRT)
Subject: Filesystem
Message-ID: <Pine.LNX.4.53.0308011458160.17517@navegador.lncc.br>


Hi all

Which one is better to use, ext3 or raiserfs?
Someone have performance results comparing Ext3 with raiserfs?

Thanks

-------------------------------
Ricardo

     .-.
     /v\
    // \\    > L I N U X <
   /(   )\
    ^^-^^
-------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Aug  1 19:05:47 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 1 Aug 2003 16:05:47 -0700
Subject: New technology for trunking gigE switches
In-Reply-To: <Pine.LNX.4.44.0307301830350.6305-100000@coffee.psychology.mcmaster.ca>
References: <20030730190605.GA2640@sphere.math.ucdavis.edu> <Pine.LNX.4.44.0307301830350.6305-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20030801230547.GA2324@greglaptop.internal.keyresearch.com>

What good timing: Broadcom just released public info about their new
generation of gigE switch chips, which are capable of using
inexpensive 4-wire copper 10gig uplinks between boxes. The neat thing
about this is that instead of having to buy a bunch of 10gig optics,
which are very expensive, it uses a 4-wire 3.125 gbit copper
interconnect, same as InfiniBand.

You should expect to see this showing up in stackable 24 to 48 port
switches, allowing up to 384 gigE ports in a single blob, at around
$100/gigE port. The center is an 8-port 10gigE switch, so as you can
see, you have the same issue of the ratio of uplink bandwidth to local
bandwidth that you had in the fast ethernet stackables with 1gig
uplinks. You will note, however, that the Broadcom blurb says you can
get much better total fabric bandwidth than just one of those
chips. They don't explain how, and so I can't mention it -- but if
anyone finds a public explanation, please let me know. I believe that
it should be able to hit the quoted 640 gbits of total traffic,
i.e. at 384 ports, you can build a switch which almost has perfect
bisection. The total switch latency also shouldn't be so bad: say ~35
usec for first bit in to last bit out, which is just over double of
what you'll see with a standalone gigE switch. (The total latency seen
by an application using TCP/IP will be higher, of course.)

The HP Procurve guys had a quote in one of the press releases, but I'm
sure that other vendors will ship products based on this too; Broadcom
is already a high volume producer of chips used in ethernet switches.

-- greg

http://www.broadcom.com/docs/promostrataxgs.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Fri Aug  1 19:26:38 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Fri, 1 Aug 2003 19:26:38 -0400 (EDT)
Subject: Filesystem
In-Reply-To: <Pine.LNX.4.53.0308011458160.17517@navegador.lncc.br>
Message-ID: <Pine.LNX.4.44.0308011923120.17614-100000@coffee.psychology.mcmaster.ca>

> Which one is better to use, ext3 or raiserfs?

there is no clearcut winner.

> Someone have performance results comparing Ext3 with raiserfs?

yes, there's plenty available.  reiserfs people always focus on 
situations where directories have billions of small files.  that's 
not surprising, since that's their design target: efficient storage
of very small files, and efficient handling of ridiculously overfull
directories.  I question the value of worrying about very small
files (because disk is so cheap, and clusters mostly have big files);
big directories seem like someone's design mistake to me.

ext3 is designed as an ultra-stable journaling version of ext2,
and succeeds.  it's difficult to compare reliability, but ext3 does
generally have a better reputation than reiserfs.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Fri Aug  1 22:33:26 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 02 Aug 2003 06:33:26 +0400
Subject: nfs problem
In-Reply-To: <200308011902.h71J2Vw28853@NewBlue.Scyld.com>
Message-ID: <web-19446169@servihoo.com>

Hello
Thanks everbody it's working. 
I will need to install the MPICH now. 
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Fri Aug  1 22:33:26 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 02 Aug 2003 06:33:26 +0400
Subject: nfs problem
In-Reply-To: <200308011902.h71J2Vw28853@NewBlue.Scyld.com>
Message-ID: <web-19446169@servihoo.com>

Hello
Thanks everbody it's working. 
I will need to install the MPICH now. 
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Fri Aug  1 22:33:50 2003
From: timm at fnal.gov (Steven Timm)
Date: Fri, 01 Aug 2003 21:33:50 -0500
Subject: Filesystem
In-Reply-To: <Pine.LNX.4.53.0308011458160.17517@navegador.lncc.br>
Message-ID: <Pine.SGI.4.31.0308012132170.17510097-100000@fsgi01.fnal.gov>

I have some anecdotal evidence that ext3 starts taking performance hit
in cases where there is a lot of files getting written and then
quickly erased.  Also there's a performance penalty on burst I/O--e.g.
if you have a system doing near-continuous disk writes and reads it
will bump the load factor up.  But I don't have any information to
suggest that Reiser does it better.

Steve


------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Core Support Services Dept.
Assistant Group Leader, Scientific Computing Support Group
Lead of Computing Farms Team

On Fri, 1 Aug 2003, Ricardo wrote:

>
> Hi all
>
> Which one is better to use, ext3 or raiserfs?
> Someone have performance results comparing Ext3 with raiserfs?
>
> Thanks
>
> -------------------------------
> Ricardo
>
>      .-.
>      /v\
>     // \\    > L I N U X <
>    /(   )\
>     ^^-^^
> -------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Fri Aug  1 22:45:01 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Fri, 1 Aug 2003 19:45:01 -0700 (PDT)
Subject: Filesystem
In-Reply-To: <Pine.LNX.4.53.0308011458160.17517@navegador.lncc.br>
Message-ID: <Pine.LNX.3.96.1030801194203.13498A-100000@Maggie.Linux-Consulting.com>


hi ricardo


filesystem comparasons
	http://www.linux-sec.net/FileSystem/#FS

	http://aurora.zemris.fer.hr/filesystems/	

i think ext3 is better than reiserfs

i think ext3 is not any better than ext2 in terms
of somebody hitting pwer/reset w/o proper shutdown
	- i always allow it to run e2fsck when it does
	an unclean shutdown ...

	- yes ext3 will timeout and continue and restore from
	backups but ... am paranoid about the underlying ext2
	getting corrupted by random power off and resets

c ya
alvin


On Fri, 1 Aug 2003, Ricardo wrote:

> 
> Hi all
> 
> Which one is better to use, ext3 or raiserfs?
> Someone have performance results comparing Ext3 with raiserfs?
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Sat Aug  2 08:48:09 2003
From: angel at wolf.com (Angel Rivera)
Date: Sat, 02 Aug 2003 12:48:09 GMT
Subject: Filesystem
In-Reply-To: <Pine.SGI.4.31.0308012132170.17510097-100000@fsgi01.fnal.gov> 
References: <Pine.SGI.4.31.0308012132170.17510097-100000@fsgi01.fnal.gov>
Message-ID: <20030802124809.1437.qmail@houston.wolf.com>

Steven Timm writes: 

> I have some anecdotal evidence that ext3 starts taking performance hit
> in cases where there is a lot of files getting written and then
> quickly erased.  Also there's a performance penalty on burst I/O--e.g.
> if you have a system doing near-continuous disk writes and reads it
> will bump the load factor up.  But I don't have any information to
> suggest that Reiser does it better. 
> 

It depends what you are going to use the nodes for. For normal compute 
nodes, I don't think there is enough of a payback to change ext3. For our 
disk nodes, we use ext3 for system filesystems and XFS for the exported disk 
space (with NFS patches and tuning of couse) to get some serious 
performance.  We are currently testing different filesystems on one of the 
disk nodes we just purchased and have seen a dramatic rise in performance 
with the above. 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mukshere at rediffmail.com  Sat Aug  2 10:16:43 2003
From: mukshere at rediffmail.com (mukund govind umalkar)
Date: 2 Aug 2003 14:16:43 -0000
Subject: Beowulf Research
Message-ID: <20030802141643.9462.qmail@webmail7.rediffmail.com>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20030802/5ff69b21/attachment.ksh>

From sanjoy at chem.iitkgp.ernet.in  Sat Aug  2 13:33:21 2003
From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay)
Date: Sat, 2 Aug 2003 23:03:21 +0530 (IST)
Subject: NIS
In-Reply-To: <20030802124809.1437.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>

Hi,
I have a cluster running Rh 7.3 with NIS server. The cluster was running
fine. But suddenly after rebooting now the clients are having problems in
recognizing the NIS domain server name. while booting the clients it says:

Binding to the NIS domain: [OK]
Listening for an NIS domail server............[FAILED]

ypwhich on clients says 'Can't communicate with ypbind'

ypbind, ypserv are running fine on the server.

I will appreciate if anyone can help..
Thanks.
Sanjoy


--------------------------------------------------------------------
Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
Assistant Professor                                                
Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
Department of Chemistry                 91-3222-283345  (Home)
Indian Institute of Technology          91-3222-279938 (Home) 
Kharagpur 721 302                Fax  : 91-3222-255303
West Bengal, India.                     91-3222-282252
                                 http://www.chem.iitkgp.ernet.in/faculty/SB/
--------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bari at onelabs.com  Sat Aug  2 14:02:45 2003
From: bari at onelabs.com (Bari Ari)
Date: Sat, 02 Aug 2003 13:02:45 -0500
Subject: SH4 & SH5 Clustering
Message-ID: <3F2BFCC5.6060803@onelabs.com>

It's been a few years since anyone has posted anything here on clusters 
using the SH-4.

http://www.beowulf.org/pipermail/beowulf/1999-November/007339.html

Does anyone have results or experiences of building systems using the SH-4?

http://www.superh.com/products/sh4.htm
http://www.superh.com/products/sh5.htm

The SH-5 is finally showing up in silicon at 2.8GFLOPS,  400MHz, under 
1W/cpu. The caches are small at 32KB yet have a 3.2GB/s peak internal 
bus, the SOC's have DDR memory and 32bit/66MHz PCI. They look attractive 
for low power dense clusters/blade applications that won't be hurt much 
by their small cache size and the 264MB/s peak PCI interface. A 1-U 
could contain 24 - 32 of these and require only convection cooling for 
the cpu's. The DDR memory would be the "hot spots" and require some 
forced air cooling.

Bari


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sat Aug  2 14:42:14 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 2 Aug 2003 14:42:14 -0400 (EDT)
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>
Message-ID: <Pine.LNX.4.44.0308021416570.12859-100000@lilith.rgb.private.net>

On Sat, 2 Aug 2003, Sanjoy Bandyopadhyay wrote:

> Hi,
> I have a cluster running Rh 7.3 with NIS server. The cluster was running
> fine. But suddenly after rebooting now the clients are having problems in
> recognizing the NIS domain server name. while booting the clients it says:
> 
> Binding to the NIS domain: [OK]
> Listening for an NIS domail server............[FAILED]
> 
> ypwhich on clients says 'Can't communicate with ypbind'
> 
> ypbind, ypserv are running fine on the server.

Hmmm, so many possible causes.

If you say "suddenly after rebooting" and if it applies to all the
clients, I'd check the following:

  a) The network connection of the server.  All things being equal, I'd
have to say this is a prime candidate.  Don't forget to check the
wire(s) itself -- many is the perplexing networking or service problem
that turned out to be caused by somebody kicking a wire so that the plug
was no longer properly seated.  Check network connectivity in other ways
to -- is the switch port suddenly bad, do I need to power cycle the
switch (switches sometimes "wedge" and need a cycle to rebuild their
tables), and so forth.  On some switches it is possible to block
broadcasts -- NIS requires them, so be sure that this didn't get done by
mistake.

  b) When you've eliminated hardware as a possible cause (and have
validated perfect network connectivity) then you can look for software
problems.  A "sudden" problem like this is odd -- perhaps you
accidentally updated with a broken RPM?  Perhaps somebody trashed a
table?  Did somebody update iptables or ipchains or change their rules
so port access is blocked that way?

See if checking out these systems solves it.  If not, in your next post
include more detail on your network and so forth.  Usually this kind of
thing is solved by doggedly testing one system at a time until the
culprit emerges, starting with the most likely.  Don't forget, you have
tools like tcpdump that will let you snoop the network packets one at a
time if necessary to be sure that they are indeed arriving at the server
from the clients.  I recall that you can turn on ypserv with -d for
debug to get a much more verbose operational mode to help debug as well.

HTH,

    rgb

> 
> I will appreciate if anyone can help..
> Thanks.
> Sanjoy
> 
> 
> --------------------------------------------------------------------
> Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
> Assistant Professor                                                
> Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
> Department of Chemistry                 91-3222-283345  (Home)
> Indian Institute of Technology          91-3222-279938 (Home) 
> Kharagpur 721 302                Fax  : 91-3222-255303
> West Bengal, India.                     91-3222-282252
>                                  http://www.chem.iitkgp.ernet.in/faculty/SB/
> --------------------------------------------------------------------
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sat Aug  2 14:50:03 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 2 Aug 2003 14:50:03 -0400 (EDT)
Subject: Beowulf Research
In-Reply-To: <20030802141643.9462.qmail@webmail7.rediffmail.com>
Message-ID: <Pine.LNX.4.44.0308021442220.12859-100000@lilith.rgb.private.net>

On 2 Aug 2003, mukund govind umalkar wrote:

> hello sir,
> i am a graduate student, and i am intrested in doing research on 
> Beowulf clusters, so plzz send me some material and let me know 
> about the various papers that have presented on Beowulf.
> 
> If possible please some useful URLs for the same

There are lots of starting points, and the better sites form for all
practical purposes a webring with mutual links interconnecting them so
sites you don't find on one you're likely to find on another linked to
it.

One such starting point is:

  http://www.phy.duke.edu/brahma

(look under e.g. resources and links and papers).  Brahma will lead you
do the beowulf underground, to the original/main beowulf site, and to
many other well-known clustering sites and resources.

To find "real" papers on clustering, check out e.g. ;login and various
other computer geek journals and magazines.  Linux Magazine has an
excellent clustering column by Forrest Hoffman.  There are some online
webzines devoted to clustering (some linked to brahma).  Google is your
friend here -- with google you can find out pretty much anything that is
online.

   rgb

> 
> thanx
> Mukund
> 
> 
> 
> ___________________________________________________
> Download the hottest & happening ringtones here!
> OR SMS: Top tone to 7333
> Click here now: 
> http://sms.rediff.com/cgi-bin/ringtone/ringhome.pl
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Sat Aug  2 18:11:17 2003
From: angel at wolf.com (Angel Rivera)
Date: Sat, 02 Aug 2003 22:11:17 GMT
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in> 
References: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>
Message-ID: <20030802221117.3967.qmail@houston.wolf.com>

Sanjoy Bandyopadhyay writes: 

> Hi,
> I have a cluster running Rh 7.3 with NIS server. The cluster was running
> fine. But suddenly after rebooting now the clients are having problems in
> recognizing the NIS domain server name. while booting the clients it says: 
> 
> Binding to the NIS domain: [OK]
> Listening for an NIS domail server............[FAILED] 
> 
> ypwhich on clients says 'Can't communicate with ypbind' 
> 
> ypbind, ypserv are running fine on the server. 
> 
> I will appreciate if anyone can help..

Check to make sure your NIS server is running and talking (TCPDUMP).
If your node is running fine?  Trying running "/etc/rc.d/init.d/ypbind 
restart" and see what error crops up. 

Also, try nisdomainname and see what crops up there. 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sanjoy at chem.iitkgp.ernet.in  Sun Aug  3 01:36:00 2003
From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay)
Date: Sun, 3 Aug 2003 11:06:00 +0530 (IST)
Subject: NIS
In-Reply-To: <20030802221117.3967.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.4.10.10308031057360.436-100000@chem.iitkgp.ernet.in>


On Sat, 2 Aug 2003, Angel Rivera wrote:

> Check to make sure your NIS server is running and talking (TCPDUMP).
> If your node is running fine?  Trying running "/etc/rc.d/init.d/ypbind 
> restart" and see what error crops up. 

yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this:
Shutting down NIS services:    [FAILED]
Binding to the NIS domain:     [OK]
Listening for an NIS domain server................... [FAILED]

> Also, try nisdomainname and see what crops up there. 

nisdomainname gives correct domain name.


We have the Sever filesystems NFS mounted on the clients. I can see
now that this NFS mounting is not working for the clients. While the
clients tries to mount the NFS filesystem, it gives this error:

Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable
to receive

Thanks..
-Sanjoy

--------------------------------------------------------------------
Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
Assistant Professor                                                
Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
Department of Chemistry                 91-3222-283345  (Home)
Indian Institute of Technology          91-3222-279938 (Home) 
Kharagpur 721 302                Fax  : 91-3222-255303
West Bengal, India.                     91-3222-282252
                                 http://www.chem.iitkgp.ernet.in/faculty/SB/
--------------------------------------------------------------------

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From leventeh at hotmail.com  Sun Aug  3 02:14:23 2003
From: leventeh at hotmail.com (Levente Horvath)
Date: Sun, 03 Aug 2003 06:14:23 +0000
Subject: MPI & linux compilers
Message-ID: <Law10-F55uXJC9Qh6CG0002616c@hotmail.com>

To whom it may concern,

We have 12 PCs set up for parallel computation. All are running linux 
(Redhat 7.3) and MPI.
We would like to compute eigenvalues and eigenvectors for large matrices.

We have managed to do up to 10000x10000 matrix no problem. Our program uses 
Scalapack and Blacs
routines. These routines require two matrix to be declared. On single 
precision two 10000x10000
matrix occupies 800Mb of memory which is already exceeds the 512Mb local 
memory of
each computer in our cluster. This memory were equally distributed over the 
12 computers
upon computation. So, we think that in theory we shouldn't have any problem 
going
to large matrices; as our distributed memory is quite large 12*512Mb.

Now, if we try to run a larger size then the compiler mpif77 returns
a "large matrix" error. We have traced the compiler and found that mpif77 is 
a script
that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we 
found that
there is no problem with the compilation up to a size of 15000x15000, then 
the
compiler crashes. After tracing the compilation procedure, we found that
the linker "as" cannot link some of the .o and .s files in our /tmp 
directory.

So, we used C rather than fortran. Statically, we cannot declare more than
a 1500x1500 matrix (that put in to a hello world program for MPI). We 
thought
it might be the problem with the static allocation of memory. So, we tried
to allocate this space dynamically without any success....

Our questions are: Are we doing something wrong here. Or are the compilers 
gcc and g77-3
responsible for such an array limit. Or are we missing the ways to allocate
memory for large matrices....

This is not the end of our story. We tried "ifc" IBM fortran 90 compiler. 
Unfortunately, we
cannot link mpi libraries against this "ifc" compiler. It just doesn't see 
them. We have
tried to compile ifc with the full path names of libraries using either 
static and dynamics libraries.
In either case we had no success...

We would appreciate all of your comments and suggestions.
Thank you in advance....

_________________________________________________________________
ninemsn Extra Storage comes with McAfee Virus Scanning - to keep your 
Hotmail account and PC safe. Click here  http://join.msn.com/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Sun Aug  3 12:17:19 2003
From: angel at wolf.com (Angel Rivera)
Date: Sun, 03 Aug 2003 16:17:19 GMT
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308031057360.436-100000@chem.iitkgp.ernet.in> 
References: <Pine.LNX.4.10.10308031057360.436-100000@chem.iitkgp.ernet.in>
Message-ID: <20030803161719.30576.qmail@houston.wolf.com>

Sanjoy Bandyopadhyay writes: 

> 
> On Sat, 2 Aug 2003, Angel Rivera wrote: 
> 
>> Check to make sure your NIS server is running and talking (TCPDUMP).
>> If your node is running fine?  Trying running "/etc/rc.d/init.d/ypbind 
>> restart" and see what error crops up. 
> 
> yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this:
> Shutting down NIS services:    [FAILED]
> Binding to the NIS domain:     [OK]
> Listening for an NIS domain server................... [FAILED] 
> 
>> Also, try nisdomainname and see what crops up there. 
> 
> nisdomainname gives correct domain name. 
> 
> 
> We have the Sever filesystems NFS mounted on the clients. I can see
> now that this NFS mounting is not working for the clients. While the
> clients tries to mount the NFS filesystem, it gives this error: 
> 
> Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable
> to receive

It is not seeing the ypserver. have you tried rpcinfo -p <servername> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bropers at lsu.edu  Sat Aug  2 17:28:18 2003
From: bropers at lsu.edu (Brian D. Ropers-Huilman)
Date: Sat, 2 Aug 2003 16:28:18 -0500 (CDT)
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>
References: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>
Message-ID: <Pine.LNX.4.56.0308021627230.12488@cannondale.ocs.lsu.edu>

On Sat, 2 Aug 2003, Sanjoy Bandyopadhyay wrote:
> Hi,
> I have a cluster running Rh 7.3 with NIS server. The cluster was running
> fine. But suddenly after rebooting now the clients are having problems in
> recognizing the NIS domain server name. while booting the clients it says:
> 
> Binding to the NIS domain: [OK]
> Listening for an NIS domail server............[FAILED]
> 
> ypwhich on clients says 'Can't communicate with ypbind'
> 
> ypbind, ypserv are running fine on the server.
> 
> I will appreciate if anyone can help..
> Thanks.
> Sanjoy

Sanjoy,

You say that ypbind is running fine /on the SERVER/, what about ypbind running 
on the /CLIENT/? ypbind should not run on the server, it runs on the clients.

--  
Brian D. Ropers-Huilman                        (225) 578-0461 (V)
Systems Administrator                 AIX      (225) 578-6400 (F)
Office of Computing Services       GNU Linux   brian at ropers-huilman.net
High Performance Computing            .^.      http://www.ropers-huilman.net/
Fred Frey Building, Rm. 201, E-1Q     /V\                          \o/
Louisiana State University           (/ \)           --  __o   /    |
Baton Rouge, LA 70803-1900           (   )          --- `\<,  /    `\\,
                                     ^^-^^              O/ O /     O/ O
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Aug  3 13:05:32 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 3 Aug 2003 13:05:32 -0400 (EDT)
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308031057360.436-100000@chem.iitkgp.ernet.in>
Message-ID: <Pine.LNX.4.44.0308031259430.26471-100000@lilith.rgb.private.net>

On Sun, 3 Aug 2003, Sanjoy Bandyopadhyay wrote:

> 
> On Sat, 2 Aug 2003, Angel Rivera wrote:
> 
> > Check to make sure your NIS server is running and talking (TCPDUMP).
> > If your node is running fine?  Trying running "/etc/rc.d/init.d/ypbind 
> > restart" and see what error crops up. 
> 
> yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this:
> Shutting down NIS services:    [FAILED]
> Binding to the NIS domain:     [OK]
> Listening for an NIS domain server................... [FAILED]
> 
> > Also, try nisdomainname and see what crops up there. 
> 
> nisdomainname gives correct domain name.
> 
> 
> We have the Sever filesystems NFS mounted on the clients. I can see
> now that this NFS mounting is not working for the clients. While the
> clients tries to mount the NFS filesystem, it gives this error:
> 
> Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable
> to receive

Yah.  How about ping?  Can you ping the server?  

Seriously, this looks like your problem is just a bad network
connection, or conceivably a downed portmapper.  If you can't ping,
obviously your network is down and you need to fix it.  If you can ping
and ssh back and forth and the like, then make sure that portmap is
running on your clients and server (an rpm that updated but installed
the new one off?).  In fact, do chkconfig --list and look at ALL of your
network services to make sure they still make sense.

Be careful here -- trojanned portmappers and other broken rpc services
are a favorite way for crackers to enter your system.  What you are
seeing COULD be symptoms of being cracked, as trojanned portmappers not
infrequently are broken (for a variety of reasons).  You might prefer to
back up your data and do a full reinstall of the server and a client, to
check the rpm MD5 checksums, and to presume that you may have been
cracked (monitoring your net traffic with TCPDUMP looking for bad guys)
while you proceed.  At least stay aware of the possibility.  It's
happened to me; it could have happened to you.

   rgb

> 
> Thanks..
> -Sanjoy
> 
> --------------------------------------------------------------------
> Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
> Assistant Professor                                                
> Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
> Department of Chemistry                 91-3222-283345  (Home)
> Indian Institute of Technology          91-3222-279938 (Home) 
> Kharagpur 721 302                Fax  : 91-3222-255303
> West Bengal, India.                     91-3222-282252
>                                  http://www.chem.iitkgp.ernet.in/faculty/SB/
> --------------------------------------------------------------------
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Aug  3 13:16:36 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 3 Aug 2003 13:16:36 -0400 (EDT)
Subject: NIS
In-Reply-To: <Pine.LNX.4.56.0308021627230.12488@cannondale.ocs.lsu.edu>
Message-ID: <Pine.LNX.4.44.0308031311450.26471-100000@lilith.rgb.private.net>

On Sat, 2 Aug 2003, Brian D. Ropers-Huilman wrote:
> Sanjoy,
> 
> You say that ypbind is running fine /on the SERVER/, what about ypbind running 
> on the /CLIENT/? ypbind should not run on the server, it runs on the clients.

Right, but if NFS is also not running with an RPC error, it really
suggests either raw networking problems or problems with the RPC
subsystem, e.g. portmap.  He also originally said that he had it working
and then it stopped.  If that is true it doubly points to networking
or RPC.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Sun Aug  3 14:18:17 2003
From: gropp at mcs.anl.gov (William Gropp)
Date: Sun, 03 Aug 2003 13:18:17 -0500
Subject: MPI & linux compilers
In-Reply-To: <Law10-F55uXJC9Qh6CG0002616c@hotmail.com>
Message-ID: <5.1.1.6.2.20030803131144.02f00e50@localhost>

At 06:14 AM 8/3/2003 +0000, Levente Horvath wrote:
>To whom it may concern,
>
>We have 12 PCs set up for parallel computation. All are running linux 
>(Redhat 7.3) and MPI.
>We would like to compute eigenvalues and eigenvectors for large matrices.
>
>We have managed to do up to 10000x10000 matrix no problem. Our program 
>uses Scalapack and Blacs
>routines. These routines require two matrix to be declared. On single 
>precision two 10000x10000
>matrix occupies 800Mb of memory which is already exceeds the 512Mb local 
>memory of
>each computer in our cluster. This memory were equally distributed over 
>the 12 computers
>upon computation. So, we think that in theory we shouldn't have any 
>problem going
>to large matrices; as our distributed memory is quite large 12*512Mb.

You need to declare only the local part of the matrix that is distributed 
across the processes, not the entire matrix.  MPI doesn't provide any 
support for automatically distributing the data, though libraries written 
using MPI can do this if the data is allocated dynamically by the 
library.  Languages such as HPF can do this for you, but have their own 
limitations.

Bill

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xavier at zeth.ciencias.uchile.cl  Sun Aug  3 21:59:56 2003
From: xavier at zeth.ciencias.uchile.cl (Xavier Andrade)
Date: Sun, 3 Aug 2003 21:59:56 -0400 (CLT)
Subject: MPI & linux compilers
In-Reply-To: <Law10-F55uXJC9Qh6CG0002616c@hotmail.com>
Message-ID: <Pine.LNX.4.44.0308032151370.17095-100000@zeth.ciencias.uchile.cl>

On Sun, 3 Aug 2003, Levente Horvath wrote:

> To whom it may concern,
>
> We have 12 PCs set up for parallel computation. All are running linux
> (Redhat 7.3) and MPI.
> We would like to compute eigenvalues and eigenvectors for large matrices.
>
> We have managed to do up to 10000x10000 matrix no problem. Our program uses
> Scalapack and Blacs
> routines. These routines require two matrix to be declared. On single
> precision two 10000x10000
> matrix occupies 800Mb of memory which is already exceeds the 512Mb local
> memory of
> each computer in our cluster. This memory were equally distributed over the
> 12 computers
> upon computation. So, we think that in theory we shouldn't have any problem
> going
> to large matrices; as our distributed memory is quite large 12*512Mb.
>
> Now, if we try to run a larger size then the compiler mpif77 returns
> a "large matrix" error. We have traced the compiler and found that mpif77 is
> a script
> that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we
> found that
> there is no problem with the compilation up to a size of 15000x15000, then
> the
> compiler crashes. After tracing the compilation procedure, we found that
> the linker "as" cannot link some of the .o and .s files in our /tmp
> directory.
>
> So, we used C rather than fortran. Statically, we cannot declare more than
> a 1500x1500 matrix (that put in to a hello world program for MPI). We
> thought
> it might be the problem with the static allocation of memory. So, we tried
> to allocate this space dynamically without any success....
>
> Our questions are: Are we doing something wrong here. Or are the compilers
> gcc and g77-3
> responsible for such an array limit. Or are we missing the ways to allocate
> memory for large matrices....
>
> This is not the end of our story. We tried "ifc" IBM fortran 90 compiler.
> Unfortunately, we
> cannot link mpi libraries against this "ifc" compiler. It just doesn't see
> them. We have
> tried to compile ifc with the full path names of libraries using either
> static and dynamics libraries.
> In either case we had no success...
>

Running "mpif77 -showme" will show you the line that mpif77 actually
calls for compiling, if you want to change the compiler that mpif77 calls
set the enviroment variable LAMHF77 (i.e. with `export LAMHF77=ifc`
mpif77 will compile using ifc instead of f77).

Xavier

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sanjoy at chem.iitkgp.ernet.in  Mon Aug  4 01:09:13 2003
From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay)
Date: Mon, 4 Aug 2003 10:39:13 +0530 (IST)
Subject: NIS
In-Reply-To: <Pine.LNX.4.44.0308031311450.26471-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.10.10308041032200.6882-100000@chem.iitkgp.ernet.in>

Hi,

I figured out what was wrong.. the nsswitch.conf file was somehow
corrupted. nis was not mentioned for passwd,group,shadow files.
Now everything is under control.

Thanks very much to all of you who helped with their valuable
suggestions.

-Sanjoy

--------------------------------------------------------------------
Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
Assistant Professor                                                
Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
Department of Chemistry                 91-3222-283345  (Home)
Indian Institute of Technology          91-3222-279938 (Home) 
Kharagpur 721 302                Fax  : 91-3222-255303
West Bengal, India.                     91-3222-282252
                                 http://www.chem.iitkgp.ernet.in/faculty/SB/
--------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From javier.crespo at itp.es  Mon Aug  4 02:53:05 2003
From: javier.crespo at itp.es (Javier Crespo)
Date: Mon, 04 Aug 2003 08:53:05 +0200
Subject: MPI & linux compilers
References: <Law10-F55uXJC9Qh6CG0002616c@hotmail.com>
Message-ID: <3F2E02D1.E834011B@itp.es>

Levente Horvath wrote:

> To whom it may concern,
>
> We have 12 PCs set up for parallel computation. All are running linux
> (Redhat 7.3) and MPI.
> We would like to compute eigenvalues and eigenvectors for large matrices.
>
> We have managed to do up to 10000x10000 matrix no problem. Our program uses
> Scalapack and Blacs
> routines. These routines require two matrix to be declared. On single
> precision two 10000x10000
> matrix occupies 800Mb of memory which is already exceeds the 512Mb local
> memory of
> each computer in our cluster. This memory were equally distributed over the
> 12 computers
> upon computation. So, we think that in theory we shouldn't have any problem
> going
> to large matrices; as our distributed memory is quite large 12*512Mb.
>
> Now, if we try to run a larger size then the compiler mpif77 returns
> a "large matrix" error. We have traced the compiler and found that mpif77 is
> a script
> that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we
> found that
> there is no problem with the compilation up to a size of 15000x15000, then
> the
> compiler crashes. After tracing the compilation procedure, we found that
> the linker "as" cannot link some of the .o and .s files in our /tmp
> directory.
>
> So, we used C rather than fortran. Statically, we cannot declare more than
> a 1500x1500 matrix (that put in to a hello world program for MPI). We
> thought
> it might be the problem with the static allocation of memory. So, we tried
> to allocate this space dynamically without any success....
>
> Our questions are: Are we doing something wrong here. Or are the compilers
> gcc and g77-3
> responsible for such an array limit. Or are we missing the ways to allocate
> memory for large matrices....
>
> This is not the end of our story. We tried "ifc" IBM fortran 90 compiler.
> Unfortunately, we
> cannot link mpi libraries against this "ifc" compiler. It just doesn't see
> them. We have
> tried to compile ifc with the full path names of libraries using either
> static and dynamics libraries.
> In either case we had no success...
>
> We would appreciate all of your comments and suggestions.
> Thank you in advance....

If you want to link to mpi but compiling with "ifc" (is it really IBM? - I think it comes from intel), you first
at all should have to compile that libraries with the same compiler that you are going to use for the main
program, typically using the options "-fc=ifc","--f90=ifc" and "-f90linker=ifc" when configuring MPI and then
installing it in you path (in a different place than the MPI libraries compiled with f77).

Javier


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Mon Aug  4 08:02:50 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Mon, 4 Aug 2003 14:02:50 +0200 (CEST)
Subject: Cisco switches for lam mpi
In-Reply-To: <BAY1-F72QsFBL4JcpE90000e41d@hotmail.com>
Message-ID: <Pine.LNX.4.33.0308041355060.24026-100000@maloney.ethz.ch>

On Tue, 29 Jul 2003, Jack Douglas wrote:
> We have just installed a 32 Node Dual Xeon Cluster, with a Cisco Cataslyst
> 4003 Chassis with 48 1000Base-t ports.
>
> We are running LAM MPI over gigabit, but we seem to be experiencing
> bottlenecks within the switch
>
> Typically, using the cisco, we only see CPU utilisation of around 30-40%
[...]

I'm not a Cisco expert, but...

We once got a Cisco switch from our networking people that we had to
return immediately because it delivered such a bad performance. It was
a Catalyst 2900XL with 24 Fast Ethernet ports, but it could only
handle 12 ports at full speed. Above that, the performance brake down
completely.

For some benchmark results see, e.g.:
http://www.cs.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf

As a comparison, the quite nice results of a CentreCom 742i:
http://www.cs.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf

Disclaimer: Maybe the Cisco you mentioned is better, or Ciscos improved
anyway since spring 2001 when I did the above tests. Besides, the
situation for Gigabit Ethernet could be different.

As we described on our workshop paper at CAC03 you can not trust the
data sheets of switches anyway:
http://www.cs.inf.ethz.ch/CoPs/publications/#cac03

Conclusion: If you need a very high performing switch, you have to
evaluate/benchmark it yourself.

- Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Mon Aug  4 15:31:22 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 04 Aug 2003 15:31:22 -0400
Subject: large filesystem & fileserver architecture issues.
Message-ID: <1060025481.28642.81.camel@roughneck>

Hey all -- here is our situation.

We currently have several clusters that are configured with either IBM
x342 or Dell 2650 serves with their respective vendors SCSI RAID arrays
hanging off of them. Each server + array is good for around 600GB after
RAID 5 and formatting. The IBM's have the added ability to do a RAID 50
of multiple arrays ( which seems to work & perform quite nicely ). Each
of the servers then exports the filesystem via NFS, and is mounted on
the nodes. The clusters range from 24 to 128 nodes. For backups we
maintain an offline server + array that we use to rsync the data
nightly, then use our amanda server  and tape robot to backup. We use an
offline sync, as we need a level 0 dump every 2 weeks, and doing a level
0 dump of 600GB just trashes the performance on a live server. As we are
a .edu and all of the clusters were purchased by the individual groups,
the options we can explore have to be very cost efficient for hardware,
and free for software.

Now for the problem...
A couple of our clusters are using the available filespace quite
rapidly, and we are looking to add space. The most cost efficient
approach we have found is to buy a IDE RAID box, like those available
from RaidZone or PogoLinux. This allows us to use the cheap IDE systems
as the offline sync, and use the scsi systems as online servers.

And the questions:

1) Is there a better way to backup the systems without the need for an
offline sync? 

2) Does anyone have experience doing RAID 50 with Dell hardware? How bad
does it bite ?

3) Are there any recommended IDE RAID systems? We are not looking for
super stellar performance, just a solid system that does it's job as an
offline sync for backups.
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Mon Aug  4 22:49:35 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: 04 Aug 2003 22:49:35 -0400
Subject: updated run_mpiblast code
Message-ID: <1060051774.25281.22.camel@protein.scalableinformatics.com>

Hi folks:

 Updated and documented the run_mpiblast code.  Better data from --debug
switch.

 To see the man page, either

	perldoc run_mpiblast

or run

	run_mpiblast --help

Will be working on an RPM and a tarball installer in short order.  It
can be pulled from http://scalableinformatics.com/sge_mpiblast.html. 
The documentation (pod generated) can be viewed at
http://scalableinformatics.com/run_mpiblast.html .  


-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Tue Aug  5 08:54:57 2003
From: rouds at servihoo.com (RoUdY)
Date: Tue, 05 Aug 2003 16:54:57 +0400
Subject: mpich2-0.93
In-Reply-To: <200308041901.h74J1Tw27276@NewBlue.Scyld.com>
Message-ID: <web-19599931@servihoo.com>

hello everybody
I have download MPICH2-0.93 and I have some difficulty in 
implementing it. That is, according to some research done 
I need to amend the file "machines.LINUX" so that the 
parallel computing can start and to choose which node  to 
form part of the cluster. But the problem is that there is 
no file which name "machine.LINUX" and the file is suppose 
to be found in the directory 
.../mpich2-0.93/util/machines.
Well, I use redhat9.0 - hope to hear from you very soon
If there is a web site to get the necessary information 
please let me know.
Cheers 
Roudy.
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Tue Aug  5 11:47:07 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 05 Aug 2003 11:47:07 -0400
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>
Message-ID: <1060098427.30922.6.camel@roughneck>

On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> On 4 Aug 2003, Nicholas Henke wrote:
> 
> We have a lot of experience with IDE RAID arrays at client sites.  The DOE
> lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them.  
> The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write)
> and the price is hard to beat.  The raid array that serves home
> directories to their clusters and workstations is backed up nightly to a
> second raid server, similarly to your system.  To speed things along we
> installed an extra gigabit card in the primary and backup servers and
> connected the two directly.  The nightly backup (cp -auf via NFS) of 410
> GBs take just over an hour using the dedicated gbit link.  Rsync would
> probably be faster.  Without the shortcircuit gigabit link, it used to run
> four or five times longer and seriously impact NFS performance for the
> rest of the systems on the LAN.
> 
> Hope this helps.
> 
> Regards,
> 
> Mike Prinkey
> Aeolus Research, Inc.

Definately does -- can you recommend hardware for the IDE RAID, or list
what you guys have used ?

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mitchel at navships.com  Tue Aug  5 15:11:58 2003
From: mitchel at navships.com (Mitchel Kagawa)
Date: Tue, 5 Aug 2003 09:11:58 -1000
Subject: large filesystem & fileserver architecture issues.
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes> <1060098427.30922.6.camel@roughneck>
Message-ID: <009701c35b85$714e7110$7101a8c0@Navatek.local>

I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true
NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable.
Although they do offer a NFS box that will turn one of these arrays into a
standalone.  We have had great success with these units
(http://neptune.navships.com/images/harddrivearrays.jpg) .  We first
acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's.
We have set it up in a RAID-5 configuration and have not yet had to replace
even one of the drives (Knockin on wood).  After a year we picked up the
14slot chassis and filled it with 160 maxtor drives and it has performed
flawless...  I think we paig about $4000 for the 14 slot chassis. you can
add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver
for $1500 and you got about 2TB of storage for around $7000

Mitchel Kagawa

----- Original Message -----
From: "Nicholas Henke" <henken at seas.upenn.edu>
To: "Michael T. Prinkey" <mprinkey at aeolusresearch.com>
Cc: <beowulf at beowulf.org>
Sent: Tuesday, August 05, 2003 5:47 AM
Subject: Re: large filesystem & fileserver architecture issues.


> On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> > On 4 Aug 2003, Nicholas Henke wrote:
> >
> > We have a lot of experience with IDE RAID arrays at client sites.  The
DOE
> > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for
them.
> > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec
write)
> > and the price is hard to beat.  The raid array that serves home
> > directories to their clusters and workstations is backed up nightly to a
> > second raid server, similarly to your system.  To speed things along we
> > installed an extra gigabit card in the primary and backup servers and
> > connected the two directly.  The nightly backup (cp -auf via NFS) of 410
> > GBs take just over an hour using the dedicated gbit link.  Rsync would
> > probably be faster.  Without the shortcircuit gigabit link, it used to
run
> > four or five times longer and seriously impact NFS performance for the
> > rest of the systems on the LAN.
> >
> > Hope this helps.
> >
> > Regards,
> >
> > Mike Prinkey
> > Aeolus Research, Inc.
>
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?
>
> Nic
> --
> Nicholas Henke
> Penguin Herder & Linux Cluster System Programmer
> Liniac Project - Univ. of Pennsylvania
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From egan at sense.net  Tue Aug  5 18:12:21 2003
From: egan at sense.net (Egan Ford)
Date: Tue, 5 Aug 2003 16:12:21 -0600
Subject: Power monitoring
Message-ID: <095d01c35b9e$a4ae90d0$0664a8c0@titan>

I know this was discussed recently with "kill-a-watt" as a popular choice,
however I am looking for the next step up, something more on the circuit level
that I can hardwire between my lab and breakers.  Support for multiple circuits
would be nice too as well as 110/220 support.  Add a serial port for remote
monitoring and I'm set.  However I am looking for a cheap solution, a web cam
pointing to a meter is an option.  I'll even settle for analogue, I just need
kwh.

Thanks.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Tue Aug  5 18:35:09 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Tue, 5 Aug 2003 15:35:09 -0700 (PDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <009701c35b85$714e7110$7101a8c0@Navatek.local>
Message-ID: <Pine.LNX.3.96.1030805153205.10494B-100000@Maggie.Linux-Consulting.com>


hi ya

On Tue, 5 Aug 2003, Mitchel Kagawa wrote:

> I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true
> NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable.

thought acnc.com has good stuff . :-)

> Although they do offer a NFS box that will turn one of these arrays into a
> standalone.  We have had great success with these units
> (http://neptune.navships.com/images/harddrivearrays.jpg) .  We first
> acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's.
> We have set it up in a RAID-5 configuration and have not yet had to replace
> even one of the drives (Knockin on wood).  After a year we picked up the
> 14slot chassis and filled it with 160 maxtor drives and it has performed
> flawless...  I think we paig about $4000 for the 14 slot chassis. you can
> add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver
> for $1500 and you got about 2TB of storage for around $7000

8 drives at 250GB each is 2TB in one 1U chassis ...
	250GB disks is about $250 now days.... maybe less on the online
	webstores

backup of 2TB should be done on another 2TB systems .. 3rd 2TB machine if
the data cannot be recreated 

save only the raw data/apps needed to regenerate the output data

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Tue Aug  5 18:40:27 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Tue, 5 Aug 2003 15:40:27 -0700 (PDT)
Subject: large filesystem & fileserver architecture issues. -hw
In-Reply-To: <1060098427.30922.6.camel@roughneck>
Message-ID: <Pine.LNX.3.96.1030805153528.10494C-100000@Maggie.Linux-Consulting.com>


hi ya

On 5 Aug 2003, Nicholas Henke wrote:

> 
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?

you have basically 2 choices ...

- leave the ide as an ide disks ... ( software raid )
	- get a $50 ide controller ( 4 drives on it ) and 4 drives on the mb

- convert the ide to look like a scsi drives ( tho not really )
	- 3ware 7500-8 series  for 8 "scsi" disks on it

- or get a real hardware raid card for lots of $$$
	- mylex, adaptec

- for a list of hardware raid card that is supported by linux
	http://www.linux-ide.org/chipsets.html
	http://www.1u-raid5.net  sw/hw raid5 howto's

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From m0ukb at unb.ca  Wed Aug  6 08:50:13 2003
From: m0ukb at unb.ca (White, Adam Murray)
Date: Wed,  6 Aug 2003 09:50:13 -0300
Subject: Performance monitoring tool
Message-ID: <1060174213.3f30f9859afaf@webmail.unb.ca>

Hello,

I am interested in acquiring a good real time cluster performance monitoring tool, which at 
least displays (dynamically while the program is running) each thread's cpu utilization and 
memory usage (graphically). Not a postmortem display. Free as well.

Any help would be much appreciated.

Regards,
A. M. White

######################################################
Adam M. White
University of New Brunswick Saint John
http://www.unbsj.ca/sase/csas
m0ukb at unb.ca
######################################################
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Aug  6 13:21:02 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 6 Aug 2003 13:21:02 -0400 (EDT)
Subject: Performance monitoring tool
In-Reply-To: <1060174213.3f30f9859afaf@webmail.unb.ca>
Message-ID: <Pine.LNX.4.44.0308061235470.28754-100000@lilith.rgb.private.net>

On Wed, 6 Aug 2003, White, Adam Murray wrote:

> Hello,
> 
> I am interested in acquiring a good real time cluster performance monitoring tool, which at 
> least displays (dynamically while the program is running) each thread's cpu utilization and 
> memory usage (graphically). Not a postmortem display. Free as well.
> 
> Any help would be much appreciated.

At this time it won't QUITE do what you like, but it is within spitting
distance of it.  Check out:

  xmlsysd

and

  wulfstat

on brahma (http://www.phy.duke.edu/brahma).

xmlsysd is a daemon that runs on a cluster and obtains by a variety of
means statistics of interest on the system.  Some of these it parses
from proc, others by the use of systems calls.  It is not promiscuous
(it doesn't provide e.g. a complete copy of /proc to clients that
connect to it) but rather offers a digested view that can be throttled
so that one or more "sets" of interesting statistics can be monitored.
This is to keep it lightweight, both on the system it is monitoring and
on the network and client -- it is (literally) a parallel application in
its own right and it isn't a good idea for a monitor application to
significantly compete for any of the resources that might bottleneck a
"production" parallel application.

Its "prepackaged" return sets include load avg (5,10,15 min), memory
(basically the data underlying the "free" command), ethernet network
usage for one or more devices, date/time/cpu information, basically the
kind of data one finds digested at the top of the "top" command or made
available by e.g xosview in kin in graphical windows.

It also has a "pid" mode where it can monitor running processes.  Here
throttling and filtering is a bit trickier, as one generally does NOT
want to monitor every process running on a system with a supposedly
lightweight tool.  I thus implemented pid selection by means of matching
task name or user name, a mode that returns all "userspace" tasks that
have accumulated more than some cutoff in total time (5 seconds?  I
can't remember), as well as a to-be-rarely-used promiscuous mode that
returns everything it can find including root tasks.

xmlsysd's returns are in xml, and hence are easy to parse out with any
xml parser for application in anything you like.

That's the good news.

The other good news is that wulfstat, the provided client, lets you use
most of these features in a tty/ncurses window.

The bad news it that there is no GUI display with little graphs and the
like.  This is mixed news, really, not necessarily bad.  A tty display
lets you use the pgup and pgdn keys and scroll arrows to page quickly
through a lot of hosts, seeing instantly the full detail (actual
numbers) for each field being monitored -- you might find wulfstat to be
adequate.  If it isn't adequate, though, you'll likely need to write
some sort of client application that polls the daemon at some interval
(I tend to use 5 seconds as the default, but it can be set up or down as
low as 1 second, depending on how many hosts one wishes to monitor,
again remembering that it is supposed to be lightweight and that it is a
bad idea to run it so fast that the return latency causes the loop to
pile up).

This should be pretty easy -- you can actually talk to the daemon with
telnet, so watching it work and testing the api is not a problem.
You've got wulfstat sources to play with (both tools fully GPL).  The
daemon returns XML, which is easy to parse out.  Finally, there are a
fair number of tools or libraries that you can pipe this output into to
generate graphs, either on the web or some other console. One day I'll
actually write such a tool myself, but wulfstat proved so adequate for
most of what we use it for that I haven't been able to justify advancing
the project to the top of the triage-heap of bloody and neglected
projects that fill my life:-).  If you do write one, feel free to do so
collaboratively and donate it back to the project so we can all share,
although of course the GPL wouldn't require this as far as I can see for
clients not derived from wulfstat code or that you write for yourself.

xmlsysd and wulfstat have been in "production" use locally for some
time, but they are still probably beta level code because most people
use ganglia with its web-based displays.  Personally I think
xmlsysd/wulfstat provide a pretty rich set of monitor options (and
actually is derived from code I originally wrote and was using somewhat
before the ganglia project was begun, so I can't be accused of foolishly
duplicating an existing project:-).  If you have any problems with them
I will cheerfully fix them, and if you have any ideas for additions or
improvements that wouldn't drive me mad timewise to implement, I was
cheerfully add them.

   rgb

> 
> Regards,
> A. M. White
> 
> ######################################################
> Adam M. White
> University of New Brunswick Saint John
> http://www.unbsj.ca/sase/csas
> m0ukb at unb.ca
> ######################################################
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Tue Aug  5 11:45:20 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Tue, 5 Aug 2003 11:45:20 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <1060025481.28642.81.camel@roughneck>
Message-ID: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>

On 4 Aug 2003, Nicholas Henke wrote:

We have a lot of experience with IDE RAID arrays at client sites.  The DOE
lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them.  
The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write)
and the price is hard to beat.  The raid array that serves home
directories to their clusters and workstations is backed up nightly to a
second raid server, similarly to your system.  To speed things along we
installed an extra gigabit card in the primary and backup servers and
connected the two directly.  The nightly backup (cp -auf via NFS) of 410
GBs take just over an hour using the dedicated gbit link.  Rsync would
probably be faster.  Without the shortcircuit gigabit link, it used to run
four or five times longer and seriously impact NFS performance for the
rest of the systems on the LAN.

Hope this helps.

Regards,

Mike Prinkey
Aeolus Research, Inc.

> Hey all -- here is our situation.
> 
> We currently have several clusters that are configured with either IBM
> x342 or Dell 2650 serves with their respective vendors SCSI RAID arrays
> hanging off of them. Each server + array is good for around 600GB after
> RAID 5 and formatting. The IBM's have the added ability to do a RAID 50
> of multiple arrays ( which seems to work & perform quite nicely ). Each
> of the servers then exports the filesystem via NFS, and is mounted on
> the nodes. The clusters range from 24 to 128 nodes. For backups we
> maintain an offline server + array that we use to rsync the data
> nightly, then use our amanda server  and tape robot to backup. We use an
> offline sync, as we need a level 0 dump every 2 weeks, and doing a level
> 0 dump of 600GB just trashes the performance on a live server. As we are
> a .edu and all of the clusters were purchased by the individual groups,
> the options we can explore have to be very cost efficient for hardware,
> and free for software.
> 
> Now for the problem...
> A couple of our clusters are using the available filespace quite
> rapidly, and we are looking to add space. The most cost efficient
> approach we have found is to buy a IDE RAID box, like those available
> from RaidZone or PogoLinux. This allows us to use the cheap IDE systems
> as the offline sync, and use the scsi systems as online servers.
> 
> And the questions:
> 
> 1) Is there a better way to backup the systems without the need for an
> offline sync? 
> 
> 2) Does anyone have experience doing RAID 50 with Dell hardware? How bad
> does it bite ?
> 
> 3) Are there any recommended IDE RAID systems? We are not looking for
> super stellar performance, just a solid system that does it's job as an
> offline sync for backups.
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Tue Aug  5 12:34:03 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Tue, 5 Aug 2003 12:34:03 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <1060098427.30922.6.camel@roughneck>
Message-ID: <Pine.LNX.4.44.0308051157020.2488-100000@ra.thebes>

On 5 Aug 2003, Nicholas Henke wrote:
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?
> 
> Nic
> 

I started building these arrays when 20 GBs was a big drive and hardware
ide raid controllers were very expensive.  So old habits die hard.  Most
of my experience has been with Software RAID in Linux.  We use Promise
Ultra66/100/133 controller cards, Maxtor 80 - 200 GB 5400-rpm drives, and
Intel-chipset motherboards.

I use the Promise cards, again because they were what was available and
supported in Linux in the late 90s.  They are limited to two IDE channels
per cards, but I have used 3 cards in addition to the on-board IDE in
large arrays before.  Some people buy the IDE Raid cards that have 4 or 8
IDE channels and then use Software RAID instead.  The conventional wisdom
is that you should only put one drive on each IDE channel to maximize
performance.  I have built arrays with single drive per channel and two
drives per channel and find that is not really true for ATA100 and faster
controllers.  Two of these drives cannot saturate a 100 or 133 MB/s
channel.

Typically, we put eight drives in an array.  I have been using a 4U rack
enclosure that has 8 exposed 5.25 bays.  This works well because mounting
the drives in a 5.25 bay gives a nice air gap for cooling.  Stacking 3 or
more drives tightly together heats the middle ones up quite a bit.  I also
usually use 5400-RPM drives to keep the heat production down.

I only use Intel chipset motherboards.  Normally just single CPU P4.  One
of the boards with 1 or 2 onboard gigabit controllers would be a nice
choice.  1 GB of RAM is more than enough, but do use ECC.  Also, if you
use the newest kernels, the onboard IDE controllers are fast enough to be
used in the array.  For an 8-drive array, I will normally use 1 promise
addin card and the two on-board channels.

Important Miscellany:

  - Power Supply.  Don't skimp.  400W+ from a good vendor

  - IDE cables <=24" long.  I tried to use the 36" IDE cables once and it
nearly drove me nuts with drive corruption and random errors.  The 24"  
ones work very well and usually give you enough length to route to 8
drives in an enclosure.  Once Serial ATA gets cheaper, this will no longer 
be an issue.

  - UPS.  In general, you can NEVER allow a power failure to take down the
raid server.  There is at least a 50% chance of low-level drive corruption
on an 8-drive array if it loses power.  (Don't ask about the time the
cleaning crew unplugged the array from the USP!)  We use a smart UPS and
UPS monitoring software (upsmon) to unmount the array and raidstop it if
the power goes out for more than 30 secs.  I am also tempted to not even
connect the power switch on the front panel.  Reseting a crashed system is
OK, but powering it off doesn't give the hard drives a chance to flush
their buffers to disk.  With 8+ spinning drives, there is a good chance at
least one of them will be corrupted.

  - Bonnie and burn-in.  There are many problems that can crop up when you
build the array.  IRQ issues, etc.  It is paramount that you throughly
abuse the array with something like bonnie to make sure that everything is
working.  I typically mkraid which starts the array synching, mke2fs on
the raid device, and then mount the filesystem and run bonnie on it all
while it is still synching.  This is pretty hard on the whole system and
if there is a problem, you will notice quickly.  Once it is done
resyncing, I usually run bonnie overnight to burn it in and verify that
performance is reasonable.

  - Fixing things.  If you do have a power failure and the raid doesn't 
come back up, it is usually do to a hard drive problem.  The only way to 
fix it is to run a low-level utility (Maxtor Powermax) on the drive.  
Maybe someone know how to do something similary within Linux.  If so, I 
would love to hear about it.

Again, our approach is not necessarily exhaustively researched.  This is
just "what we do."  So, take it for what it's worth.

Best,

Mike Prinkey
Aeolus Research, Inc.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Mon Aug  4 09:50:09 2003
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Mon, 04 Aug 2003 08:50:09 -0500
Subject: Cisco switches for lam mpi
In-Reply-To: <Pine.LNX.4.33.0308041355060.24026-100000@maloney.ethz.ch>
References: <Pine.LNX.4.33.0308041355060.24026-100000@maloney.ethz.ch>
Message-ID: <3F2E6491.1020802@tamu.edu>

I should have commented earlier, but I didn't think I had time...

My experience with the Cisco 4006 was that as an aggregation switch it 
was OK for 10/100 or GBE.  It did fine for normal "enterprise switching. 
  The 4006's I've used had only older Supervisor Modules and ran CAT-OS, 
rather than IOS like the 4506 I'm testing now.

For higher performance, while CPU utilization stays low, the switch 
falls off at higher loads.

Caveat:  I did not test these devices in a cluster environment; the 
thought never crossed my mind.  I'd be using a 6509 if I had to use a 
Cisco, but I'd probably be shopping for HP ProCurves, Foundry's, 
Riverstones, or NEC Bluefires, based on what I've seen and done lately. 
  I tested the 4006 in normal enterprise mode, and loaded it for 
high-perf network modes.  If you ever need QoS do NOT use a 4006.  Or a 
4506.  They can't handle it too well.  But I digress.

I'm gonna try to get a couple of ProCurves in and test 'em against a LAN 
tester made by Anritsu (MD1230/1231) for small packet capability 
(RFC-2544).  That's been a killer for a lot of switches I've looked at.

gerry

Felix Rauch wrote:
> On Tue, 29 Jul 2003, Jack Douglas wrote:
> 
>>We have just installed a 32 Node Dual Xeon Cluster, with a Cisco Cataslyst
>>4003 Chassis with 48 1000Base-t ports.
>>
>>We are running LAM MPI over gigabit, but we seem to be experiencing
>>bottlenecks within the switch
>>
>>Typically, using the cisco, we only see CPU utilisation of around 30-40%
> 
> [...]
> 
> I'm not a Cisco expert, but...
> 
> We once got a Cisco switch from our networking people that we had to
> return immediately because it delivered such a bad performance. It was
> a Catalyst 2900XL with 24 Fast Ethernet ports, but it could only
> handle 12 ports at full speed. Above that, the performance brake down
> completely.
> 
> For some benchmark results see, e.g.:
> http://www.cs.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf
> 
> As a comparison, the quite nice results of a CentreCom 742i:
> http://www.cs.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf
> 
> Disclaimer: Maybe the Cisco you mentioned is better, or Ciscos improved
> anyway since spring 2001 when I did the above tests. Besides, the
> situation for Gigabit Ethernet could be different.
> 
> As we described on our workshop paper at CAC03 you can not trust the
> data sheets of switches anyway:
> http://www.cs.inf.ethz.ch/CoPs/publications/#cac03
> 
> Conclusion: If you need a very high performing switch, you have to
> evaluate/benchmark it yourself.
> 
> - Felix
> 

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Wed Aug  6 08:07:45 2003
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Wed, 06 Aug 2003 07:07:45 -0500
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <1060098427.30922.6.camel@roughneck>
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes> <1060098427.30922.6.camel@roughneck>
Message-ID: <3F30EF91.6080606@tamu.edu>

We just implemented an IDE RAID system for some meteorology data/work. 
We're pretty happy with the results so far.  Our hardwre complement is:

SuperMicro X5DAE Motherboard
dual Xeon 2.8GHz processors
2 GB Kingston Registered ECC RAM
2 HighPoint RocketRAID 404 4-channel IDE RAID adapters
10 Maxtor 250 GB 7200 RPM disks
1 Maxtor 60 GB drive for system work
1 long multi-drop disk power cable...
SuperMicro case (nomenclature escapes me, however, it has 1 disk bays 
and fits the X5DAE MoBo
Cheapest PCI video card I could find (no integrated video on MoBo)
Add-on Intel GBE SC fiber adapter

Drawbacks:
1.  I should have checked for integrated video for simplicity
2.  Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with 
ALL the patches
3.  Make sure you order the rack mount parts when you order the case; it 
  only appeared they were included...
4.  Questions have been raised about the E-1000 integrated GBE copper 
NIC on the Mobo; Doesn't matter: it's gonna be connected to a 100M 
switch and GBE will be on fiber like God intended data to be passed (No, 
I don't trust most terminations for GBE on copper!)

It's up and working.  Burning in for the last 2 weeks with no problems, 
it's going to the Texas GigaPoP today where it'll be live on Internet2.

HTH, Gerry

Nicholas Henke wrote:
> On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> 
>>On 4 Aug 2003, Nicholas Henke wrote:
>>
>>We have a lot of experience with IDE RAID arrays at client sites.  The DOE
>>lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them.  
>>The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write)
>>and the price is hard to beat.  The raid array that serves home
>>directories to their clusters and workstations is backed up nightly to a
>>second raid server, similarly to your system.  To speed things along we
>>installed an extra gigabit card in the primary and backup servers and
>>connected the two directly.  The nightly backup (cp -auf via NFS) of 410
>>GBs take just over an hour using the dedicated gbit link.  Rsync would
>>probably be faster.  Without the shortcircuit gigabit link, it used to run
>>four or five times longer and seriously impact NFS performance for the
>>rest of the systems on the LAN.
>>
>>Hope this helps.
>>
>>Regards,
>>
>>Mike Prinkey
>>Aeolus Research, Inc.
> 
> 
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?
> 
> Nic

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Douglas.L.Farley at nasa.gov  Wed Aug  6 08:35:10 2003
From: Douglas.L.Farley at nasa.gov (Doug Farley)
Date: Wed, 06 Aug 2003 08:35:10 -0400
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <009701c35b85$714e7110$7101a8c0@Navatek.local>
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>
 <1060098427.30922.6.camel@roughneck>
Message-ID: <5.0.2.1.2.20030806081148.00a94be8@pop.larc.nasa.gov>

I noticed with acnc's 14 unit raid they used an IDE-SCSI U3 something or 
another, anyone know what type of hardware they used to convert the drives 
for this array? Just direct IDE-SCSI adaptors (which I've not seen cheaper 
than $80) on each drive and then connecting to something like an adaptec 
Raid card?  Does anyone have any experience with doing this (with off the 
shelf parts) to create a semi-cheep raid (maybe 10 x $250 for 250G disk, + 
10 x $80 IDE-SCSI converter + $800 expensive adaptec 2200 esq card 
)?  Those costs are higher (~$420/disk ) than doing 10 disks on a 3ware 
7500-12 (~$320/disk)  (costs excluding host system), so is whatever gained 
really worth it?

Doug

At 09:11 AM 8/5/2003 -1000, you wrote:
>I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true
>NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable.
>Although they do offer a NFS box that will turn one of these arrays into a
>standalone.  We have had great success with these units
>(http://neptune.navships.com/images/harddrivearrays.jpg) .  We first
>acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's.
>We have set it up in a RAID-5 configuration and have not yet had to replace
>even one of the drives (Knockin on wood).  After a year we picked up the
>14slot chassis and filled it with 160 maxtor drives and it has performed
>flawless...  I think we paig about $4000 for the 14 slot chassis. you can
>add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver
>for $1500 and you got about 2TB of storage for around $7000
>
>Mitchel Kagawa
>
>----- Original Message -----
>From: "Nicholas Henke" <henken at seas.upenn.edu>
>To: "Michael T. Prinkey" <mprinkey at aeolusresearch.com>
>Cc: <beowulf at beowulf.org>
>Sent: Tuesday, August 05, 2003 5:47 AM
>Subject: Re: large filesystem & fileserver architecture issues.
>
>
> > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> > > On 4 Aug 2003, Nicholas Henke wrote:
> > >
> > > We have a lot of experience with IDE RAID arrays at client sites.  The
>DOE
> > > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for
>them.
> > > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec
>write)
> > > and the price is hard to beat.  The raid array that serves home
> > > directories to their clusters and workstations is backed up nightly to a
> > > second raid server, similarly to your system.  To speed things along we
> > > installed an extra gigabit card in the primary and backup servers and
> > > connected the two directly.  The nightly backup (cp -auf via NFS) of 410
> > > GBs take just over an hour using the dedicated gbit link.  Rsync would
> > > probably be faster.  Without the shortcircuit gigabit link, it used to
>run
> > > four or five times longer and seriously impact NFS performance for the
> > > rest of the systems on the LAN.
> > >
> > > Hope this helps.
> > >
> > > Regards,
> > >
> > > Mike Prinkey
> > > Aeolus Research, Inc.
> >
> > Definately does -- can you recommend hardware for the IDE RAID, or list
> > what you guys have used ?
> >
> > Nic
> > --
> > Nicholas Henke
> > Penguin Herder & Linux Cluster System Programmer
> > Liniac Project - Univ. of Pennsylvania
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

==============================
Doug Farley

Data Analysis and Imaging Branch
Systems Engineering Competency
NASA Langley Research Center

< D.L.FARLEY at LaRC.NASA.GOV >
< Phone +1 757 864-8141 >

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctierney at hpti.com  Wed Aug  6 15:09:59 2003
From: ctierney at hpti.com (Craig Tierney)
Date: 06 Aug 2003 13:09:59 -0600
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <3F30EF91.6080606@tamu.edu>
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>
	 <1060098427.30922.6.camel@roughneck>  <3F30EF91.6080606@tamu.edu>
Message-ID: <1060196998.8961.17.camel@woody>

On Wed, 2003-08-06 at 06:07, Gerry Creager N5JXS wrote:
> We just implemented an IDE RAID system for some meteorology data/work. 
> We're pretty happy with the results so far.  Our hardwre complement is:
> 
> SuperMicro X5DAE Motherboard
> dual Xeon 2.8GHz processors
> 2 GB Kingston Registered ECC RAM
> 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters
> 10 Maxtor 250 GB 7200 RPM disks
> 1 Maxtor 60 GB drive for system work
> 1 long multi-drop disk power cable...
> SuperMicro case (nomenclature escapes me, however, it has 1 disk bays 
> and fits the X5DAE MoBo
> Cheapest PCI video card I could find (no integrated video on MoBo)
> Add-on Intel GBE SC fiber adapter
> 

Hardware choices look good.  How did you configure it?
Are there 1 or 2 filesystems?  Raid 0, 1, 5?  Do you
have any performance numbers on the setup (perferably 
large file, dd type tests)?

Thanks,
Craig


> Drawbacks:
> 1.  I should have checked for integrated video for simplicity
> 2.  Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with 
> ALL the patches
> 3.  Make sure you order the rack mount parts when you order the case; it 
>   only appeared they were included...
> 4.  Questions have been raised about the E-1000 integrated GBE copper 
> NIC on the Mobo; Doesn't matter: it's gonna be connected to a 100M 
> switch and GBE will be on fiber like God intended data to be passed (No, 
> I don't trust most terminations for GBE on copper!)
> 
> It's up and working.  Burning in for the last 2 weeks with no problems, 
> it's going to the Texas GigaPoP today where it'll be live on Internet2.
> 
> HTH, Gerry
> 
> Nicholas Henke wrote:
> > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> > 
> >>On 4 Aug 2003, Nicholas Henke wrote:
> >>
> >>We have a lot of experience with IDE RAID arrays at client sites.  The DOE
> >>lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them.  
> >>The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write)
> >>and the price is hard to beat.  The raid array that serves home
> >>directories to their clusters and workstations is backed up nightly to a
> >>second raid server, similarly to your system.  To speed things along we
> >>installed an extra gigabit card in the primary and backup servers and
> >>connected the two directly.  The nightly backup (cp -auf via NFS) of 410
> >>GBs take just over an hour using the dedicated gbit link.  Rsync would
> >>probably be faster.  Without the shortcircuit gigabit link, it used to run
> >>four or five times longer and seriously impact NFS performance for the
> >>rest of the systems on the LAN.
> >>
> >>Hope this helps.
> >>
> >>Regards,
> >>
> >>Mike Prinkey
> >>Aeolus Research, Inc.
> > 
> > 
> > Definately does -- can you recommend hardware for the IDE RAID, or list
> > what you guys have used ?
> > 
> > Nic
-- 
Craig Tierney <ctierney at hpti.com>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Wed Aug  6 16:55:09 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 6 Aug 2003 16:55:09 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <3F30EF91.6080606@tamu.edu>
Message-ID: <Pine.LNX.4.44.0308061644300.22413-100000@coffee.psychology.mcmaster.ca>

> SuperMicro X5DAE Motherboard
> dual Xeon 2.8GHz processors
> 2 GB Kingston Registered ECC RAM
> 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters
> 10 Maxtor 250 GB 7200 RPM disks
> 1 Maxtor 60 GB drive for system work
> 1 long multi-drop disk power cable...
> SuperMicro case (nomenclature escapes me, however, it has 1 disk bays 
> and fits the X5DAE MoBo
> Cheapest PCI video card I could find (no integrated video on MoBo)
> Add-on Intel GBE SC fiber adapter
> 
> Drawbacks:
> 1.  I should have checked for integrated video for simplicity

I did something similar a little while back: a tyan thunder e7500 board,
just one Xeon, just 1G ram, integrated video/gigabit (copper), 3ware 8500-8 
in jbod mode, 8x200G WD JB disks and a ~500W PS.

I don't see any reason for adding extra ram or putting in multiple,
higher-powered CPUs for a fileserver.

> 2.  Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with 
> ALL the patches

I'll be doing more boxes, probably with something like 8x250 SATA disks,
with a pair of promise tx4 cards.  open-source drivers for these cards 
recently became available, btw.

there was a very interesting talk at OLS about doing raid intelligently
over a network...

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From luis.licon at yakko.cimav.edu.mx  Thu Aug  7 12:05:45 2003
From: luis.licon at yakko.cimav.edu.mx (Luis Fernando Licon Padilla)
Date: Thu, 07 Aug 2003 10:05:45 -0600
Subject: test
Message-ID: <3F3278D9.5000709@yakko.cimav.edu.mx>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From John.Hearns at micromuse.com  Thu Aug  7 09:12:55 2003
From: John.Hearns at micromuse.com (John Hearns)
Date: Thu, 07 Aug 2003 14:12:55 +0100
Subject: AMD core maths  library
Message-ID: <3F325057.4080801@micromuse.com>

Sorry if this is old news to everyone.

I saw a snippet in Linux Magazine (UK/German type) on the AMD Core Math 
Library
for Opterons.
https://wwwsecure.amd.com/gb-uk/Processors/DevelopWithAMD/0,,30_2252_2282,00.html 


Says it is initially released in FORTAN, with BLAS, LAPACK and FFTs.
g77 under Linux and Windows.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Thu Aug  7 09:54:29 2003
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Thu, 07 Aug 2003 08:54:29 -0500
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308061644300.22413-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0308061644300.22413-100000@coffee.psychology.mcmaster.ca>
Message-ID: <3F325A15.80901@tamu.edu>

Mark Hahn wrote:
>>SuperMicro X5DAE Motherboard
>>dual Xeon 2.8GHz processors
>>2 GB Kingston Registered ECC RAM
>>2 HighPoint RocketRAID 404 4-channel IDE RAID adapters
>>10 Maxtor 250 GB 7200 RPM disks
>>1 Maxtor 60 GB drive for system work
>>1 long multi-drop disk power cable...
>>SuperMicro case (nomenclature escapes me, however, it has 1 disk bays 
>>and fits the X5DAE MoBo
>>Cheapest PCI video card I could find (no integrated video on MoBo)
>>Add-on Intel GBE SC fiber adapter
>>
>>Drawbacks:
>>1.  I should have checked for integrated video for simplicity
> 
> 
> I did something similar a little while back: a tyan thunder e7500 board,
> just one Xeon, just 1G ram, integrated video/gigabit (copper), 3ware 8500-8 
> in jbod mode, 8x200G WD JB disks and a ~500W PS.
> 
> I don't see any reason for adding extra ram or putting in multiple,
> higher-powered CPUs for a fileserver.

This one will A) be on the Unidata weather distribution network for 
general weather data AND the newer real-time radar feeds; B) be 
extracting some of that data for graphics; C) be doing NNTP for Unidata 
(one, exactly, newsgroup) for a research project; D) reside on the I2 
Logistical Backbone...  It's a busy box.

>>2.  Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with 
>>ALL the patches
> 
> I'll be doing more boxes, probably with something like 8x250 SATA disks,
> with a pair of promise tx4 cards.  open-source drivers for these cards 
> recently became available, btw.
> 
> there was a very interesting talk at OLS about doing raid intelligently
> over a network...

Check out loki.cs.utk.edu (I think: It's certainly a project called 
'loki' and run by Micah Beck at utk.edu) about the logistical backbone.

I didn't go with Promise cards because of one of my grad students, who's 
obviously better funded than me... He's looked at Promise, HighPoint and 
at least one other card, and had comparisons, and strongly recommended 
HighPoint as a Price/Performance leader.  The HighPoints were less 
expensive and currently boast the same performance as the tx4's.

Everyone's getting into the SATA game; I didn't go that way because I 
wanted to get to the 2 TB point and couldn't reasonably do it today with 
SATA; maybe later.

I didn't want to take the time to hack the drivers HighPoint had 
available, since i'm overloaded these days.
-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Wed Aug  6 19:55:07 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Wed, 6 Aug 2003 19:55:07 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308061644300.22413-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0308061953190.26626-100000@ra.thebes>

On Wed, 6 Aug 2003, Mark Hahn wrote:
> 
> there was a very interesting talk at OLS about doing raid intelligently
> over a network...
> 

I have considered trying this using network block devices, but I haven't 
had the opportunity to try it.  Is this what you are talking about or 
something different?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Aug  7 14:15:39 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 7 Aug 2003 14:15:39 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308061953190.26626-100000@ra.thebes>
Message-ID: <Pine.LNX.4.44.0308071410520.29492-100000@coffee.psychology.mcmaster.ca>

On Wed, 6 Aug 2003, Michael T. Prinkey wrote:

> On Wed, 6 Aug 2003, Mark Hahn wrote:
> > 
> > there was a very interesting talk at OLS about doing raid intelligently
> > over a network...
> > 
> 
> I have considered trying this using network block devices, but I haven't 
> had the opportunity to try it.  Is this what you are talking about or 
> something different?

thre are similarities:

http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf

but it's really a development beyond NBD or DRDB.  hmm, I'm not sure 
that brief pdf is either complete or does the idea justice.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Aug  7 15:15:01 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 7 Aug 2003 15:15:01 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308071426470.26626-100000@ra.thebes>
Message-ID: <Pine.LNX.4.44.0308071459110.29492-100000@coffee.psychology.mcmaster.ca>

> I read the abstract last evening and got a taste for it.  That is really a 
> remarkable idea to use the ethernet checksum for data integrity of stored 
> data.  Thanks for the heads-up.

for me, the crux of the idea is:

	- if you want big storage, $/GB drives you to IDE.

	- IDE is not amazingly fast, reliable or scalable.

	- building storage bricks out of IDE makes a lot of sense,
	since they can now be quite dense, low-overhead, etc.

	- ethernet is a wonderfully hot-pluggable interconnect for this 
	kind of thing.

	- doing raid over a multicast-capable network is pretty cool.

	- using eth's checksumming is pretty cool.

	- doing it this way (all open-source, including software raid)
	means the system is much more transparent - you are not dependent
	on some closed-source vendor tools to control/monitor/upgrade
	your storage.

Ben's approach (along with Lustre, for instance) seems very sweet for HPC
type storage needs.

one thing I do ponder, though, is whether it really makes sense to hide 
raid so firmly under the block layer.  it's conceptually tidy, to be sure,
and works well in practice.  but suppose:

	- to create a filesystem, you hand some arbitrary collection of 
	block-device extents to the mkfs tool.  you also let it know 
	which extents happen to reside on the same disk, bus, host, UPS,
	geographic location, etc.

	- you can tell the FS that your default policy should be for 
	reliability - that raid5 across separate disks is OK, for instance.
	or maybe you can tell it that a particular file should be raid10
	instead.  or that a file should be raid1 across each geographic site.
	or that updates to a file should be logged.  or that it should 
	transparently compress older files.

	- the FS might do other HSM-like things, such as incorporating
	knowlege of what's on your tape/DVD/cdrom's.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Thu Aug  7 14:33:23 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Thu, 7 Aug 2003 14:33:23 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308071410520.29492-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0308071426470.26626-100000@ra.thebes>

> > 
> > I have considered trying this using network block devices, but I haven't 
> > had the opportunity to try it.  Is this what you are talking about or 
> > something different?
> 
> thre are similarities:
> 
> http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf
> 
> but it's really a development beyond NBD or DRDB.  hmm, I'm not sure 
> that brief pdf is either complete or does the idea justice.
> 

I read the abstract last evening and got a taste for it.  That is really a 
remarkable idea to use the ethernet checksum for data integrity of stored 
data.  Thanks for the heads-up.

Mike

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From twhitcomb at apl.washington.edu  Thu Aug  7 15:55:15 2003
From: twhitcomb at apl.washington.edu (Timothy R. Whitcomb)
Date: Thu, 7 Aug 2003 12:55:15 -0700 (PDT)
Subject: (Scyld) Nodes going down unexpectedly
Message-ID: <Pine.LNX.4.44.0308071251210.26263-100000@snark.apl.washington.edu>

We have a 10-processor cluster and are currently running a weather model
on 4 of the processors.  When I try to up the number, it works for a
while, then the "beostatus" window will show one node's information not
changing for a little while before it shows the node status as "down".
Each node is dual-processor and I have noticed (but not verified) that
this becomes an issue when both processors on a node are in use.

After the node status changes to "down", I cannot restart it through the
console tools on the root node.  However, I know that the node is still
alive and on the network because I can ping it successfully.  This problem
requires me to actually restart the node by hand, which is a bit of an
issue since we're on opposite sides of the building.

What's going on here and what can I do to mitigate/fix this?

Tim Whitcomb
twhitcomb at apl.washington.edu
Applied Physics Lab
University of Washington

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Aug  7 17:51:10 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 7 Aug 2003 14:51:10 -0700
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308071459110.29492-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0308071426470.26626-100000@ra.thebes> <Pine.LNX.4.44.0308071459110.29492-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20030807215110.GA2780@greglaptop.internal.keyresearch.com>

On Thu, Aug 07, 2003 at 03:15:01PM -0400, Mark Hahn wrote:

> 	- IDE is not amazingly fast, reliable or scalable.

That's about like saying "commondity servers are not fast, reliable,
or scalable, so I'm going to buy an SGI Altix instead of a Beowulf."

More facts, less religion.

-- greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From roger at ERC.MsState.Edu  Thu Aug  7 18:04:25 2003
From: roger at ERC.MsState.Edu (Roger L. Smith)
Date: Thu, 7 Aug 2003 17:04:25 -0500
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <20030807215110.GA2780@greglaptop.internal.keyresearch.com>
References: <Pine.LNX.4.44.0308071426470.26626-100000@ra.thebes>
 <Pine.LNX.4.44.0308071459110.29492-100000@coffee.psychology.mcmaster.ca>
 <20030807215110.GA2780@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.SGI.4.56.0308071703040.1001@Downforce.ERC.MsState.Edu>

On Thu, 7 Aug 2003, Greg Lindahl wrote:

> On Thu, Aug 07, 2003 at 03:15:01PM -0400, Mark Hahn wrote:
>
> > 	- IDE is not amazingly fast, reliable or scalable.
>
> That's about like saying "commondity servers are not fast, reliable,
> or scalable, so I'm going to buy an SGI Altix instead of a Beowulf."
>
> More facts, less religion.

Since when has the value of facts outweighed religion on *THIS* list?!

 _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
| Roger L. Smith                        Phone: 662-325-3625               |
| Sr. Systems Administrator             FAX:   662-325-7692               |
| roger at ERC.MsState.Edu                 http://WWW.ERC.MsState.Edu/~roger |
|                       Mississippi State University                      |
|____________________________________ERC__________________________________|
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sun Aug 10 04:16:42 2003
From: rouds at servihoo.com (RoUdY)
Date: Sun, 10 Aug 2003 12:16:42 +0400
Subject: Implementing MPICH2-0.93
In-Reply-To: <200308081902.h78J20w27961@NewBlue.Scyld.com>
Message-ID: <web-19828605@servihoo.com>

Hello
Can someone tell me if they ever use this MPI version. 
Because I have some difficulty in implementing it. I was 
unable to implement the slave nodes.
thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sun Aug 10 04:16:42 2003
From: rouds at servihoo.com (RoUdY)
Date: Sun, 10 Aug 2003 12:16:42 +0400
Subject: Implementing MPICH2-0.93
In-Reply-To: <200308081902.h78J20w27961@NewBlue.Scyld.com>
Message-ID: <web-19828605@servihoo.com>

Hello
Can someone tell me if they ever use this MPI version. 
Because I have some difficulty in implementing it. I was 
unable to implement the slave nodes.
thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Sun Aug 10 14:51:40 2003
From: gropp at mcs.anl.gov (William Gropp)
Date: Sun, 10 Aug 2003 13:51:40 -0500
Subject: Implementing MPICH2-0.93
In-Reply-To: <web-19828605@servihoo.com>
References: <200308081902.h78J20w27961@NewBlue.Scyld.com>
Message-ID: <5.1.1.6.2.20030810135037.016afce8@localhost>

At 12:16 PM 8/10/2003 +0400, RoUdY wrote:
>Hello
>Can someone tell me if they ever use this MPI version. Because I have some 
>difficulty in implementing it. I was unable to implement the slave nodes.

Questions and bug reports on MPICH2 should be sent to 
mpich2-maint at mcs.anl.gov .  Thanks!

Bill 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Sun Aug 10 14:51:40 2003
From: gropp at mcs.anl.gov (William Gropp)
Date: Sun, 10 Aug 2003 13:51:40 -0500
Subject: Implementing MPICH2-0.93
In-Reply-To: <web-19828605@servihoo.com>
References: <200308081902.h78J20w27961@NewBlue.Scyld.com>
Message-ID: <5.1.1.6.2.20030810135037.016afce8@localhost>

At 12:16 PM 8/10/2003 +0400, RoUdY wrote:
>Hello
>Can someone tell me if they ever use this MPI version. Because I have some 
>difficulty in implementing it. I was unable to implement the slave nodes.

Questions and bug reports on MPICH2 should be sent to 
mpich2-maint at mcs.anl.gov .  Thanks!

Bill 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Mon Aug 11 22:44:45 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Tue, 12 Aug 2003 10:44:45 +0800 (CST)
Subject: PBSPro with 1024 nodes :-O (oh!)
Message-ID: <20030812024445.80371.qmail@web16812.mail.tpe.yahoo.com>

Looks like the problems with OpenPBS in large clusters
were all fixed in PBSPro, ASU has a 1024 node cluster
(http://www.pbspro.com/press_030811.html).

Also, heard from PBS developers that the next release
of PBSPro (5.4) will add fault tolerance in the master
node, very similar to the shadow master concept in
Gridengine.

Sounds to me PBSPro is very much better than OpenPBS.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Tue Aug 12 20:59:17 2003
From: becker at scyld.com (Donald Becker)
Date: Tue, 12 Aug 2003 20:59:17 -0400 (EDT)
Subject: $900,000 RFP for climate simulation machine at UC Irvine (fwd)
Message-ID: <Pine.LNX.4.44.0308122058400.1561-100000@beotest.scyld.com>


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

---------- Forwarded message ----------
Date: Tue, 12 Aug 2003 16:54:44 -0700
From: Charlie Zender <zender at uci.edu>
To: Donald Becker <becker at scyld.com>
Subject: $900,000 RFP for climate simulation machine at UC Irvine

Dear Donald,

Ooops. Forgot the announcement itself. Here it is. Please disseminate!

Thanks,
Charlie
Cut here
======================================================================

Dear High Performance Computing Vendor,

The University of California at Irvine is pleased to announce the
immediate availability of US$900,000 towards the purchase of an
Earth System Modeling Facility (ESMF). Following a competitive bid
process open to all interested vendors, the ESMF contract will be
awarded to the proposal with the most competitive response to our
Request for Proposals (RFP). All necessary details about the ESMF and 
the RFP process are available from the ESMF homepage:

http://www.ess.uci.edu/esmf

Bids are due August 22, 2003. Please visit the ESMF homepage for more
details and contact Mr. Ralph Kupcha <rckupcha at uci.edu> with any
questions. All further contact contact with potential vendors will
take place on the ESMF Potential Vendor Mail List. You may subscribe
to this list by visiting 

https://maillists.uci.edu/mailman/listinfo/esmfvnd

Please pass this Announcement of Opportunity on to any interested
colleagues.

Sincerely,
Ralph Kupcha
Senior Buyer, Procurement Services, UCI


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jhearns at micromuse.com  Thu Aug 14 05:04:57 2003
From: jhearns at micromuse.com (John Hearns)
Date: Thu, 14 Aug 2003 10:04:57 +0100
Subject: Slashdot thread on supercomputers
Message-ID: <3F3B50B9.4090405@micromuse.com>

Everyone has probably seen the thread on Slashdot.
Here are links to the two relevant stories.

http://www.eetimes.com/story/OEG20030811S0018
http://www.eetimes.com/story/OEG20030812S0011

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From farschad at myrealbox.com  Thu Aug 14 15:21:47 2003
From: farschad at myrealbox.com (Farschad Torabi)
Date: Fri, 15 Aug 2003 00:11:47 +0450
Subject: MPICH
Message-ID: <1060890107.c2a01a60farschad@myrealbox.com>

Hi,
I am a new user to this mailing list. 
And also I am very new to Beowulf clusters.
So I will have to many questions, please be patient :^)

At the moment, I want to run a sample program using
MPI. The program is in F90 and I use PGF90 to compile it.

I installed the MPICH and pgf90 and compiled the program successfully. Now, my question is that how can I run the output executable file on a cluster??

Should I use lamboot to make the systems ready to work together?? It seems that lamboot is not for MPICH. It is for LAM and now I want to know, what is 
the alternative command for lamboot!!

Thank you in advance
Farschad Torabi

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jconnor at atmos.colostate.edu  Thu Aug 14 18:55:20 2003
From: jconnor at atmos.colostate.edu (Jason Connor)
Date: 14 Aug 2003 16:55:20 -0600
Subject: MPICH
In-Reply-To: <1060890107.c2a01a60farschad@myrealbox.com>
References: <1060890107.c2a01a60farschad@myrealbox.com>
Message-ID: <1060901719.6160.11.camel@gentoo.atmos.colostate.edu>

Hi Farschad,

Here are only some possible answers to your questions. Like all things,
there is more than one way to do these things.

On Thu, 2003-08-14 at 13:21, Farschad Torabi wrote:
> Hi,
> I am a new user to this mailing list. 
> And also I am very new to Beowulf clusters.
> So I will have to many questions, please be patient :^)
> 
> At the moment, I want to run a sample program using
> MPI. The program is in F90 and I use PGF90 to compile it.
> 
> I installed the MPICH and pgf90 and compiled the program successfully. Now, my question is that how can I run the output executable file on a cluster??

using mpich:
<mpich install directory>/bin/mpirun -np <# of nodes to run on> \
-machinefile <mpich install directory>/util/machines/machines.LINUX \
<executable name>
the -machinefile doesn't need need to be explicit, as long as you have
the file mentioned above filled with the names of your cluster nodes.

mpirun --help is always a good reference =)

> 
> Should I use lamboot to make the systems ready to work together?? It seems that lamboot is not for MPICH. It is for LAM and now I want to know, what is 
> the alternative command for lamboot!!

There isn't one. Just have whatever shell your using with mpich (rsh or
ssh) setup so that you don't need a password to login to the nodes.

> 
> Thank you in advance
> Farschad Torabi
> 

I hope this helps.
In case you care, I like lam better. =)

Jason Connor
Colorado State University
Prof. Scott Denning's BioCycle Research Group
jconnor at atmos.colostate.edu

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Aug 14 21:52:06 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 15 Aug 2003 11:52:06 +1000
Subject: Scalable PBS
Message-ID: <200308151152.09499.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

Just joined the list, so apologies if this is already well known.

I noticed a recent message in the archive about OpenPBS and problems with 
scalability, and I think it's worth noting that there is an alternative (and 
actively developed) fork of OpenPBS called "Scalable PBS" available from:

	http://www.supercluster.org/projects/pbs/

Amongst other features it has (quoting the website):

	Better Scalability
	   - Significantly improved server to MOM communication model, the ability to
	     handle larger clusters, larger jobs, larger messages, etc.
	   - Scales up to 2K nodes vs ~300 nodes for standard OpenPBS.
 
	Improved Usability by incorporating more extensive logging, as well as, more
	human readable logging(ie no more 'error 15038 on command 42').

We're using SPBS here at VPAC on our IBM cluster and it's a lot better than 
the last OpenPBS release (2.3.16, from 2001). They forked off from 2.3.12 
rather than the last OpenPBS because it had a more open license.

The folks behind the project have worked very quickly with us to fix bugs 
we've been finding in it, typically when I found a bug they had fixed it 
within a day or so, usually overnight from my perspective in Oz. :-)

If you are considering using it I'd suggest using the current snapshot release 
from:

	http://www.supercluster.org/downloads/spbs/temp/

as that irons out a couple of bugs that might bite.

For the less adventurous there is a new release SOpenPBS-2.3.12p4 due out in 
the near future that will include the fixes from the current snapshot.

cheers,
Chris
- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/PDzGO2KABBYQAh8RAnKwAJ9OeSE508v7elkeDHL2qDehjH9LvwCfUrmu
J4wal1ph00ExP8w/5HgVCek=
=Nyjb
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From josip at lanl.gov  Thu Aug 14 13:28:21 2003
From: josip at lanl.gov (Josip Loncaric)
Date: Thu, 14 Aug 2003 11:28:21 -0600
Subject: Two AMD Opteron clusters for LANL
Message-ID: <3F3BC6B5.5040706@lanl.gov>

This October, LANL will be getting large AMD Opteron model 244 clusters 
("Lightning" consisting of 1408 dual-CPU machines and "Orange" 
consisting of 256 dual-CPU machines, both built by Linux Networx):

http://www.itworld.com/Comp/1437/030814supercomp/
http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~73744,00.html

Sincerely,
Josip


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kapurs at seas.upenn.edu  Fri Aug 15 11:41:39 2003
From: kapurs at seas.upenn.edu (kapurs at seas.upenn.edu)
Date: Fri, 15 Aug 2003 11:41:39 -0400
Subject: Hard Drive Upgrade(Internal or External)
Message-ID: <1060962099.3f3cff33d8b6c@webmail.seas.upenn.edu>

Hi-


Does any one know if we can add an external or internal hard drive (EIDE, 
200GB) to the Dell  Precision 530 Workstation.


It's running on red hat linux 7.1, has a USB 1.1 port, two 36-GB SCSI hard 
drives. The primary EIDE controler on system board is empty.


thanks-
-sumeet-
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Matthew_Wygant at dell.com  Fri Aug 15 17:54:05 2003
From: Matthew_Wygant at dell.com (Matthew_Wygant at dell.com)
Date: Fri, 15 Aug 2003 16:54:05 -0500
Subject: Hard Drive Upgrade(Internal or External)
Message-ID: <6CB36426C6B9D541A8B1D2022FEA7FC10273F510@ausx2kmpc108.aus.amer.dell.com>

The 530 appears to include both a SCSI U160 and 2 ATA100 IDE channels.  The
ATA100 defaults to 'auto' in the BIOS, so I would imagine the node should
pick it up.  

-matt

-----Original Message-----
From: kapurs at seas.upenn.edu [mailto:kapurs at seas.upenn.edu] 
Sent: Friday, August 15, 2003 10:42 AM
To: beowulf at beowulf.org
Subject: Hard Drive Upgrade(Internal or External)


Hi-


Does any one know if we can add an external or internal hard drive (EIDE, 
200GB) to the Dell  Precision 530 Workstation.


It's running on red hat linux 7.1, has a USB 1.1 port, two 36-GB SCSI hard 
drives. The primary EIDE controler on system board is empty.


thanks-
-sumeet-
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From zender at uci.edu  Fri Aug 15 18:14:58 2003
From: zender at uci.edu (Charlie Zender)
Date: Fri, 15 Aug 2003 15:14:58 -0700
Subject: Bid deadline extended for UC Irvine climate computer
Message-ID: <E19nmqY-0002xP-00@ashes.ess.uci.edu>

Hi Donald,

Response from members on the beowulf list has been so positive that we
are extending our bid deadline in order to give your list members who
want to bid a fair chance to prepare competitive bids. Would you
please allow posting of this notice of extension so that those vendors
who thought they may not have enough time to submit bids become aware
of the two week extension? I promise not to bother you again :)

One thought: We are not the only Institution buying medium size
"super-computers" that Beowulf vendors might like to know about.
It might be a good idea for the whole Beowulf community to create
a separate list for RFPs. Such a list would help buyers and Beowulf
vendors find eachother.

Thanks!
Charlie 
-- 
Charlie Zender, zender at uci dot edu, (949) 824-2987, Department of
Earth System Science, University of California, Irvine CA 92697-3100
--------------------------------------------------------------------

Dear HPC Vendors,

We are extending by two weeks the deadline for submission of bids in
response to the $900,000 Earth System Modeling Facility RFP:

http://www.ess.uci.edu/esmf

The new bid deadline is Friday, September 5. 
All other deadlines and the expected timeline are also shifted by two
weeks, and these changes are reflected on the recently updated web
page and conference summary. Consequently, the deadline to send
bid-related questions to Ralph Kupcha <rckupcha at uci.edu> is Friday,
August 29. 

We hope that this extension provides some additional breathing room to
improve any parts of your bid that you might have rushed to finish. 
At the same time, we are now ready to accept any completed proposals
and look forward to reading your ideas on how best to meet our coupled
climate modeling needs.

Sincerely,
Ralph Kupcha


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sat Aug 16 00:03:57 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sat, 16 Aug 2003 12:03:57 +0800 (CST)
Subject: Scalable PBS
In-Reply-To: <200308151152.09499.csamuel@vpac.org>
Message-ID: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>

How big is your cluster?

Did you use Gridengine before -- how does SPBS compare
to SGE?

Andrew.

 --- Chris Samuel <csamuel at vpac.org> ????>
-----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi all,
> 
> Just joined the list, so apologies if this is
> already well known.
> 
> I noticed a recent message in the archive about
> OpenPBS and problems with 
> scalability, and I think it's worth noting that
> there is an alternative (and 
> actively developed) fork of OpenPBS called "Scalable
> PBS" available from:
> 
> 	http://www.supercluster.org/projects/pbs/
> 
> Amongst other features it has (quoting the website):
> 
> 	Better Scalability
> 	   - Significantly improved server to MOM
> communication model, the ability to
> 	     handle larger clusters, larger jobs, larger
> messages, etc.
> 	   - Scales up to 2K nodes vs ~300 nodes for
> standard OpenPBS.
>  
> 	Improved Usability by incorporating more extensive
> logging, as well as, more
> 	human readable logging(ie no more 'error 15038 on
> command 42').
> 
> We're using SPBS here at VPAC on our IBM cluster and
> it's a lot better than 
> the last OpenPBS release (2.3.16, from 2001). They
> forked off from 2.3.12 
> rather than the last OpenPBS because it had a more
> open license.
> 
> The folks behind the project have worked very
> quickly with us to fix bugs 
> we've been finding in it, typically when I found a
> bug they had fixed it 
> within a day or so, usually overnight from my
> perspective in Oz. :-)
> 
> If you are considering using it I'd suggest using
> the current snapshot release 
> from:
> 
> 	http://www.supercluster.org/downloads/spbs/temp/
> 
> as that irons out a couple of bugs that might bite.
> 
> For the less adventurous there is a new release
> SOpenPBS-2.3.12p4 due out in 
> the near future that will include the fixes from the
> current snapshot.
> 
> cheers,
> Chris
> - -- 
>  Chris Samuel -- VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing
>  Bldg 91, 110 Victoria Street, Carlton South,
>  VIC 3053, Australia - http://www.vpac.org/
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
> 
>
iD8DBQE/PDzGO2KABBYQAh8RAnKwAJ9OeSE508v7elkeDHL2qDehjH9LvwCfUrmu
> J4wal1ph00ExP8w/5HgVCek=
> =Nyjb
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sat Aug 16 00:51:40 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 16 Aug 2003 08:51:40 +0400
Subject: Beowulf digest, Vol 1 #1412 - 5 msgs
In-Reply-To: <200308151905.h7FJ5uw13967@NewBlue.Scyld.com>
Message-ID: <web-20094923@servihoo.com>

Hello Jason Connor,
It look as if you know something about mpich, well I am 
using MPICH2-0.93 and in this one their no directory for 
'machines.linux' instead we have mpd.hosts. 
But my problem is that I do not know now to configure this 
file despite of reading the online help.
Please help me
Thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sat Aug 16 00:51:40 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 16 Aug 2003 08:51:40 +0400
Subject: Beowulf digest, Vol 1 #1412 - 5 msgs
In-Reply-To: <200308151905.h7FJ5uw13967@NewBlue.Scyld.com>
Message-ID: <web-20094923@servihoo.com>

Hello Jason Connor,
It look as if you know something about mpich, well I am 
using MPICH2-0.93 and in this one their no directory for 
'machines.linux' instead we have mpd.hosts. 
But my problem is that I do not know now to configure this 
file despite of reading the online help.
Please help me
Thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From farschad at myrealbox.com  Sat Aug 16 08:47:55 2003
From: farschad at myrealbox.com (Farschad Torabi)
Date: Sat, 16 Aug 2003 17:37:55 +0450
Subject: Beowulf digest, Vol 1 #1412 - 5 msgs
Message-ID: <1061039275.c61518e0farschad@myrealbox.com>

Dear Jason Connor and Roudy,
I think that my question covers Roudy's questions too ;^)

First of all Roudy, the new version of MPICH is available on the net i.e. mpich-1.2.5; you can dl it.

As Jason Connor advised me, I ran the following command:

<mpich-1.2.5 installed directory>/bin/mpirun -np 1 -machinefile machs -arch machines.arc a.out

the contents of machs is like this
    node1
    node1
and the contents of machines.arc (architecture file):
    node1.parallel.net
    node1.parallel.net
    node1.parallel.net

(Roudy I think that you have to use your file like this! the name of the machines are written in this file; in your case let say -arch mpd.hosts)

the program runs well on -np 1 machine but, when I wanted to define two processes on a single machine (i.e -np 2)it messages me:

"Could not find enough architecture for machines LINUX"

the question is, can we define more that ONE processes on a SINGLE machine??

Thanks


-----Original Message-----
From: "RoUdY" <rouds at servihoo.com>
To: beowulf at scyld.com, beowulf at beowulf.org
Date: Sat, 16 Aug 2003 08:51:40 +0400 
Subject: Re: Beowulf digest, Vol 1 #1412 - 5 msgs

Hello Jason Connor,
It look as if you know something about mpich, well I am 
using MPICH2-0.93 and in this one their no directory for 
'machines.linux' instead we have mpd.hosts. 
But my problem is that I do not know now to configure this 
file despite of reading the online help.
Please help me
Thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Sat Aug 16 10:29:17 2003
From: rodmur at maybe.org (Dale Harris)
Date: Sat, 16 Aug 2003 07:29:17 -0700
Subject: Scalable PBS
In-Reply-To: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>
References: <200308151152.09499.csamuel@vpac.org> <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>
Message-ID: <20030816142917.GA24928@maybe.org>

On Sat, Aug 16, 2003 at 12:03:57PM +0800, Andrew Wang elucidated:
> How big is your cluster?
> 
> Did you use Gridengine before -- how does SPBS compare
> to SGE?
> 
> Andrew.
> 


In a quick glance, it already wins points with me because it uses GNU
autoconf instead of aimk to build.


--
Dale Harris   
rodmur at maybe.org
/.-)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Sat Aug 16 12:07:13 2003
From: rodmur at maybe.org (Dale Harris)
Date: Sat, 16 Aug 2003 09:07:13 -0700
Subject: Scalable PBS
In-Reply-To: <20030816142917.GA24928@maybe.org>
References: <200308151152.09499.csamuel@vpac.org> <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> <20030816142917.GA24928@maybe.org>
Message-ID: <20030816160713.GB24928@maybe.org>

On Sat, Aug 16, 2003 at 07:29:17AM -0700, Dale Harris elucidated:
> 
> 
> In a quick glance, it already wins points with me because it uses GNU
> autoconf instead of aimk to build.
> 

However, the fact that it requires tcl/tk does not.  Whatever happen to
the concept of making a simple tool that just does it's job well.  I
don't see why I need a GUI for a job scheduler.  Let the emacs people
make some frontend for it.

Dale

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sat Aug 16 23:12:21 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sun, 17 Aug 2003 11:12:21 +0800 (CST)
Subject: Scalable PBS
In-Reply-To: <20030816160713.GB24928@maybe.org>
Message-ID: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com>

For SGE, I simply download the binary package, and
then do the full install. I don't have to build the
source so it doesn't matter if it use aimk or
autoconf.

I looked at SPBS a while ago, I think if you don't
need to build the GUI, then you don't need tcl/tk, and
you just need to use the command line for managing the
cluster.

Andrew.

 --- Dale Harris <rodmur at maybe.org> ????
> On Sat, Aug 16, 2003 at 07:29:17AM -0700, Dale
> However, the fact that it requires tcl/tk does not. 
> Whatever happen to
> the concept of making a simple tool that just does
> it's job well.  I
> don't see why I need a GUI for a job scheduler.  Let
> the emacs people
> make some frontend for it.
> 
> Dale
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dmcollins79 at hotmail.com  Sun Aug 17 07:14:24 2003
From: dmcollins79 at hotmail.com (Timothy M Collins)
Date: Sun, 17 Aug 2003 12:14:24 +0100
Subject: Request for parallel applications to test on beowulf cluster.
Message-ID: <Law9-F7299ir3wSimD300061f49@hotmail.com>


Hi,
I have built a beowulf (Redhat8 with PVM&LAM)
Looking for parallel applications for different size and complexity to test 
fault tolerance.
If anybody has one or knows where I can find one/some, please let me know.
Kind regards
Collins

_________________________________________________________________
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Sun Aug 17 21:52:56 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 18 Aug 2003 11:52:56 +1000
Subject: Scalable PBS
In-Reply-To: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>
References: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>
Message-ID: <200308181152.57812.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 16 Aug 2003 02:03 pm, Andrew Wang wrote:

> How big is your cluster?

http://www.vpac.org/content/services_and_support/facility/linux_cluster.php

(If it looks a little sparse, that's because someone's in the process of 
updating it)

> Did you use Gridengine before -- how does SPBS compare
> to SGE?

Nope, it's always been running OpenPBS prior to migrating to SPBS.

- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QDF4O2KABBYQAh8RAncrAJoDWbSivr52PpPy/jyNkqdVFqLLCwCfVK8S
604i8kwR1wNA+7J5oWMPxBg=
=Znzi
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Sun Aug 17 21:55:24 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 18 Aug 2003 11:55:24 +1000
Subject: Scalable PBS
In-Reply-To: <20030816160713.GB24928@maybe.org>
References: <200308151152.09499.csamuel@vpac.org> <20030816142917.GA24928@maybe.org> <20030816160713.GB24928@maybe.org>
Message-ID: <200308181155.25278.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 17 Aug 2003 02:07 am, Dale Harris wrote:

> However, the fact that it requires tcl/tk does not.  Whatever happen to
> the concept of making a simple tool that just does it's job well.  I
> don't see why I need a GUI for a job scheduler.  Let the emacs people
> make some frontend for it.

1) I don't believe it requires tk/tcl

2) The tk/tcl isn't for a GUI, it's for one of the example schedulers.

3) That was inherited from OpenPBS

4) There is a GUI (plain old X) for monitoring PBS, xpbsmon, but I'd ignore it 
if I were you..

cheers,
Chris
- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QDIMO2KABBYQAh8RAnYiAJ9TBbBiGNRSJTP122dhqr8fXtQF9ACfatF7
XL5HFH/3hMPqm1K0FuCJlc8=
=+U9N
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Sun Aug 17 21:57:25 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 18 Aug 2003 11:57:25 +1000
Subject: Scalable PBS
In-Reply-To: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com>
References: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com>
Message-ID: <200308181157.26287.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 17 Aug 2003 01:12 pm, Andrew Wang wrote:

> I looked at SPBS a while ago, I think if you don't
> need to build the GUI, then you don't need tcl/tk, and
> you just need to use the command line for managing the
> cluster.

The tk/tcl is for one of the example schedulers (there are 3, one written in 
C, one in tk/tcl and one in BASL).

Viz:

  --set-sched=TYPE        sets the scheduler type. If TYPE is
                          "c" the scheduler will be written in C
                          "tcl" the server will use a Tcl based scheduler
                          "basl" will use the rule based scheduler
                          "no" then their will be no scheduling done
                          (the "c" scheduler is the default)


- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QDKFO2KABBYQAh8RAtIZAJwN0D0dts5DyU3tSN4eLsucYn6DsQCgiB7q
wVSIraBXrPWoODE2LbglW14=
=4Etb
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rene.storm at emplics.com  Mon Aug 18 05:26:17 2003
From: rene.storm at emplics.com (Rene Storm)
Date: Mon, 18 Aug 2003 11:26:17 +0200
Subject: mulitcast copy or snowball copy 
Message-ID: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>

Hi Beowulfers,
 
Problem:
I want to distribute large files over a cluster.
To raise performance I decided to copy the file to the local HD of any node in the cluster.
 
Did someone find a multicast solution for that or maybe something with snowball principle?
 
Till now I've take a look at msync (multicast rsync).
Does someone have experiences with JETfs ?
 
My idea was to write some scripts which copy files via rsync with snowball,
but there are some heavy problems.
e.g. 
What happens if one node (in the middle) is down.
How does the next snowball generation know when to start copying (the last ones have finished copying)?
 
 
Any ideas ?
 
Thanks in advance
Ren?
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Mon Aug 18 09:12:52 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Mon, 18 Aug 2003 15:12:52 +0200 (CEST)
Subject: mulitcast copy or snowball copy 
In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
Message-ID: <Pine.LNX.4.33.0308181504150.3086-100000@maloney.ethz.ch>

On Mon, 18 Aug 2003, Rene Storm wrote:
> Problem:
> I want to distribute large files over a cluster.
> To raise performance I decided to copy the file to the local HD of
> any node in the cluster.

Quick solution:
Dolly [1]
;-)

Longer description: I once wrote a tool called "Dolly" to clone whole
hard-disk drives, partitions, or large files to many nodes in a
cluster. It does so by sending the files concurrently around the
cluster in a "TCP chain". In a switched network, this solution is
often faster then IP multicast becauce Dolly can use the proven TCP
congestion control and error correction, whereas high-speed reliable
multicast is something difficult.

> Till now I've take a look at msync (multicast rsync).

Another tool is "udpcast".

> What happens if one node (in the middle) is down.

Dolly, can't handle that (it's a working prototype), but Atsushi
Manabe extended Dolly into Dolly++, which supposedly can handle node
failures (see link in [1]).

We use Dolly regularly to clone our small 16-node cluster and the
local support group uses Dolly to clone the larger 128-node
cluster. Because that cluster has two Fast Ethernet networks, we can
clone whole disks with about 20 MByte/s to all nodes in the cluster.

If you want to clone files instead of partitions, just specify your
file name in the config file instead of the device file.

- Felix

[1] http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mike at etek.chalmers.se  Mon Aug 18 08:56:33 2003
From: mike at etek.chalmers.se (Mikael Fredriksson)
Date: Mon, 18 Aug 2003 14:56:33 +0200
Subject: mulitcast copy or snowball copy
References: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
Message-ID: <3F40CD01.89E6AB62@etek.chalmers.se>

Rene Storm wrote:
> 
> Hi Beowulfers,
> 
> Problem:
> I want to distribute large files over a cluster.

<cut>

> Any ideas ?


Jepp, there is a distribution system for large files mainly for the
Internet, but it can probbably be of use for you.  It's a fast way to
distribute a large file from one host to several others, at the same
time.
Check out: http://bitconjurer.org/BitTorrent/index.html


MF

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 10:51:27 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 10:51:27 -0400 (EDT)
Subject: mulitcast copy or snowball copy 
In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
Message-ID: <Pine.LNX.4.44.0308180931440.3226-100000@beotest.scyld.com>

On Mon, 18 Aug 2003, Rene Storm wrote:

> I want to distribute large files over a cluster.

How large?  Some people think that 1MB is large, while others consider
large files to be 2GB+ (e.g. "Large File Summit").  This will have a
significant impact on how you copy the file.

> To raise performance I decided to copy the file to the local HD of any
> node in the cluster. 
>  
> Did someone find a multicast solution for that or maybe something with
> snowball principle? 

There are several multicast file distribution protocols, but they all
share the same potential flaw: they use multicast.  That means that they
will work great in a few specific installations, generally small
clusters on a single Ethernet switch.

But as you grow, multicast becomes more of a problem.

Here is a strong indicator for using multicast
   A shared media or repeater-based network (e.g. traditional Ethernet)

Here are a few of the contra-indications for using multicast
   Larger clusters
   Non-Ethernet networks 
   "Smart" Ethernet switches which try to filter packets
   Random communication traffic while copying
   Heavy non-multicast traffic while copying
   Multiple multicast streams
   NICs with mediocre, broken or slow to configure multicast filters
   Drivers not tuned for rapid multicast filter changes

Or, in summary, "using the cluster for something besides a multicast demo.

Here is an example: The Intel EEPro100 design configures the multicast
filter with a special command appended to the transmit command queue.
The command is followed by a list of the multicast addresses to accept.
While the command is usually queued to avoid delaying the OS, the chip
makes an effort to keep the Rx side synchronous by turning off the
receiver while it's computing the new multicast filter.  So the longer
the multicast filter list and the more frequently it is changed, the
more packets dropped.  And what's the biggest performance killer with
multicast?  Dropped packets..

> My idea was to write some scripts which copy files via rsync with snowball,

If you are doing this for yourself, the solution is easy.
Try the different approaches and stop when you find one that works for you.
If you are building a system for use by others (as we do), then the
problem becomes more challenging.  

> but there are some heavy problems.
> e.g. 
> What happens if one node (in the middle) is down.

Good: first consider the semantics of failure.  That means both recovery
and reporting the failure.

My first suggesting is that *not* implement a program that copies a file
to every available node.  Instead use a system where you first get a
list of available ("up") nodes, and then copy the files to that node
list.  When the copy completes continue to use that node list rather
then letting jobs use newly-generated "up" lists.

A geometrically cascading copy can work very well.  It very effectively
uses current networks (switched Ethernet, Myrinet, SCI, Quadrics,
Infiniband), and can make use of the sendfile() system call. 

For a system such as Scyld, use a zero-base geometric cascade: move the
work off of the master as the first step.  The master generates the work
list and immediately shifts the process creation work off to the first
compute node.  The master then only monitors for completion.

You can implement low-overhead fault checking by counting down job
issues and job completion.  As the first machine falls idle, check that
the final machine to assign work is still running.  As the next-to-last
job completes, check that the one machine still working is up.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From erik at aarg.net  Mon Aug 18 10:14:25 2003
From: erik at aarg.net (Erik Arneson)
Date: Mon, 18 Aug 2003 07:14:25 -0700
Subject: mulitcast copy or snowball copy
In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
References: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
Message-ID: <20030818141424.GA16386@aarg.net>

On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote:
> Hi Beowulfers,
>  
> Problem:
> I want to distribute large files over a cluster.
> To raise performance I decided to copy the file to the local HD of any node in the cluster.
>  
> Did someone find a multicast solution for that or maybe something with snowball principle?

I am really new to the Beowulf thing, so I am not sure if this solution is a
good one or not.  But have you taken a look at the various network
filesystems?  OpenAFS has a configurable client-side cache, and if the files
are needed only for reading this ends up being a very quick and easy way to
distribute changes throughout a number of nodes.

(However, I have noticed that network filesystems are not often mentioned in
conjunction with Beowulf clusters, and I would really love to learn why.
Performance?  Latency?  Complexity?)

-- 
;; Erik Arneson <erik at aarg.net>      SD, Ashland Lodge No. 23 ;;
;; GPG Key ID: 2048R/8B4CBC9C   CoTH, Siskiyou Chapter No. 21 ;;
;; <http://erik.arneson.org/>     <http://www.aarg.net/mason> ;;

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 481 bytes
Desc: not available
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20030818/1a5556dd/attachment.sig>

From farschad at myrealbox.com  Mon Aug 18 12:17:14 2003
From: farschad at myrealbox.com (Farschad Torabi)
Date: Mon, 18 Aug 2003 21:07:14 +0450
Subject: MPICH again
Message-ID: <1061224634.8d1769c0farschad@myrealbox.com>

Hi All,
I still have some problems running MPICH on my machine :^(

I've installed MPICH and PGF90 on my PC and I am able to compile parallel codes using MPI with mpif90 command. But the problem arise when I want to run the executable file on a Bowulf cluster.

As Jason Connor told me, I use the following command

<MPICH_directory>/bin/mpirun -machinefile machs -np 2 a.out

But it prompts me that there are not enough architecture on LINUX.

In this case it is like when I run the executable file (i.e.  a.out) manually without using mpirun.

what do you think about this??
Thank you in advance

Farschad Torabi

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rene.storm at emplics.com  Mon Aug 18 11:34:16 2003
From: rene.storm at emplics.com (Rene Storm)
Date: Mon, 18 Aug 2003 17:34:16 +0200
Subject: AW: mulitcast copy or snowball copy 
Message-ID: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com>

Hi Donald,


> I want to distribute large files over a cluster.
How large?  Some people think that 1MB is large, while others consider
large files to be 2GB+ (e.g. "Large File Summit").  This will have a
significant impact on how you copy the file.

Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB
each. (Overall 30 GB)
	And the cluster is 128++ nodes.


Here is an example: The Intel EEPro100 design configures the multicast
filter with a special command appended to the transmit command queue.
The command is followed by a list of the multicast addresses to accept.
While the command is usually queued to avoid delaying the OS, the chip
makes an effort to keep the Rx side synchronous by turning off the
receiver while it's computing the new multicast filter.  So the longer
the multicast filter list and the more frequently it is changed, the
more packets dropped.  And what's the biggest performance killer with
multicast?  Dropped packets..


Rene: Thats right, but what if I ignore dropped packets and accept the
corrupt files ?
	I would be able to rsync them later on.
	First Multicast to create files, Second step is to compare with
rsync.
	I've tried this and it isn't really slow, if you're doing the
rsync via snowball.

If you are doing this for yourself, the solution is easy.
Try the different approaches and stop when you find one that works for
you. If you are building a system for use by others (as we do), then the
problem becomes more challenging.  

Rene: That's the problem with all the things you do, first they are for
your own and then everybody wants them ;o)

> but there are some heavy problems.
> e.g.
> What happens if one node (in the middle) is down.

Good: first consider the semantics of failure.  That means both recovery
and reporting the failure.

My first suggesting is that *not* implement a program that copies a file
to every available node.  Instead use a system where you first get a
list of available ("up") nodes, and then copy the files to that node
list.  When the copy completes continue to use that node list rather
then letting jobs use newly-generated "up" lists.

Rene: Good idea


You can implement low-overhead fault checking by counting down job
issues and job completion.  As the first machine falls idle, check that
the final machine to assign work is still running.  As the next-to-last
job completes, check that the one machine still working is up.

Rene: But how do I get this status back to my "master", e.g command from
master: node16 copy to node17? 
	I don't want do de-centralize my job, like fire and forget.


Cya,

Rene

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 12:50:57 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 12:50:57 -0400 (EDT)
Subject: AW: mulitcast copy or snowball copy 
In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com>
Message-ID: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>

On Mon, 18 Aug 2003, Rene Storm wrote:

> Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB
> each. (Overall 30 GB)
> 	And the cluster is 128++ nodes.

Those are important parameters.
What network type are you using?
  If Ethernet, what switches and topology?
     (My guess is that you are using "smart" switches, likely connected
      with a chassis backplane.)

> > Here is an example: ...the longer
> the multicast filter list ... the more packets dropped.

> Rene: Thats right, but what if I ignore dropped packets and accept the
> corrupt files ?  I would be able to rsync them later on.

This is costly.  "Open loop" multicast protocols work by having the
receiver track the missing blocks, and requesting (or interpolating)
them later.  Here you are discarding that information and doing much
extra work on both the sending and receiving side by later locating the
missing blocks.

An alternative is closed-loop multicast, with positive acknowledgment
before proceeding more than one window.

> First Multicast to create files, Second step is to compare with rsync.
> 	I've tried this and it isn't really slow, if you're doing the
> rsync via snowball.

This is verifying/filling with a neighbor instead of the original
sender.  Except here you don't know when you are both missing the same
blocks.

> If you are doing this for yourself, the solution is easy.
...
> Rene: That's the problem with all the things you do, first they are for
> your own and then everybody wants them ;o)

If your end goal is to publish papers, do the hack.
If your end goal is make works useful for other, you have to start with
a wider view.

>> [Do] *not* implement a program that copies a file
>> to every available node.  Instead use a system where you first get a
>> list of available ("up") nodes, and then copy the files to that node
>> list.  When the copy completes continue to use that node list rather
>> then letting jobs use newly-generated "up" lists.
> 
> Rene: Good idea

This approach applies to a wide range of cluster tasks.
A similar idea is that you don't care as much about which nodes are
currently up as you care about which nodes have remained up since you
last checked.

[[ Ideally you could ask "which nodes will be up when this program
completes", but there are all sorts of temporal and halting issues
there. ]]

>> You can implement low-overhead fault checking by counting down job
>> issues and job completion.  As the first machine falls idle, check that
>> the final machine to assign work is still running.  As the next-to-last
>> job completes, check that the one machine still working is up.
> 
> Rene: But how do I get this status back to my "master", e.g command from
> master: node16 copy to node17? 

We have a positive completion indication as part of the Job/Process
Management subsystem.

If you consider the problem, the final acknowledgment must flow from the
last worker to the process that is checking for job completion.  You
might as well put that process on the cluster master.  The natural
Unix-style implementation is having the controlling machine hold the
parent of the process tree implementing the work, even if the work is
divided elsewhere.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 13:31:17 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 13:31:17 -0400 (EDT)
Subject: mulitcast copy or snowball copy
In-Reply-To: <20030818141424.GA16386@aarg.net>
Message-ID: <Pine.LNX.4.44.0308181256290.3226-100000@beotest.scyld.com>

On Mon, 18 Aug 2003, Erik Arneson wrote:
> On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote:
> > Hi Beowulfers,
> >  
> > Problem:
> > I want to distribute large files over a cluster.
> > To raise performance I decided to copy the file to the local HD of any node in the cluster.
> >  
> > Did someone find a multicast solution for that or maybe something with snowball principle?
> 
> I am really new to the Beowulf thing, so I am not sure if this solution is a
> good one or not.  But have you taken a look at the various network
> filesystems?  OpenAFS has a configurable client-side cache, and if the files
> are needed only for reading this ends up being a very quick and easy way to
> distribute changes throughout a number of nodes.

This is a good example of why Grid/wide-area tools should not be
confused with local cluster approaches.  The time scale, performance and
complexity issues are much different.

AFS uses TCP/IP to transfer whole files from a server.  With multiple
servers the configuration is static or slow changing.

> (However, I have noticed that network filesystems are not often mentioned in
> conjunction with Beowulf clusters, and I would really love to learn why.
> Performance?  Latency?  Complexity?)

It's because file systems are critically important to many applications.

There is no universal cluster file system, and thus no single solution.
The best approach is not tie the cluster management, membership, or
process control to the file system in any way.  Instead the file system
should be selection based on the application's need for consistency,
performance and reliability.  For instance, NFS is great for small,
read-only input files.  But using NFS for large files, or when any files
will be written or updated, results in both performance and consistency
problems.

When working from a large read-only database, explicitly pre-staging
(copying) the database to the compute nodes is usually better than
relying on an underlying FS.  It's easier, more predictable and more
explicit than per-directory tuning FS cache parameters.

As as example of why predictability is very important, imagine what
happens to an adaptive algorithm when a cached parameter file expires,
or a daemon does a bunch of work.  That machine suddenly is slower, and
that part of the problem now looks "harder".  So the work is reshuffled,
only to be shuffled back during the next time step.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lange at informatik.Uni-Koeln.DE  Mon Aug 18 14:21:59 2003
From: lange at informatik.Uni-Koeln.DE (Thomas Lange)
Date: Mon, 18 Aug 2003 20:21:59 +0200
Subject: AW: mulitcast copy or snowball copy 
In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com>
References: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com>
Message-ID: <16193.6471.238244.224191@informatik.uni-koeln.de>

Hi,

I would try rgang, a nice tools which uses a tree structure for
copying files or executing commands on a large list of nodes. It's
written in python but there's also a compiled binary. It's very
flexible and fast. Search for rgang in google to find the download page.


To allow scaling to kiloclusters, the new rgang can utilize a
tree-structure, via an "nway" switch.  When so invoked, rgang uses
rsh/ssh to spawn copies of itself on multiple nodes.  These copies in
turn spawn additional copies.


Product Name:           rgang
Product Version:        2.5  ("rgang" cvs rev. 1.103)
Date (mm/dd/yyyy):      06/23/2003

                                   ORIGIN
                                   ======
Author

    Ron Rechenmacher

    Fermi National Accelerator Laboratory - Mail Station 234
    P.O Box 500
    Batavia, IL 60510
    Internet: rgang-support at fnal.gov


-- 
regards Thomas
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mitchskin at comcast.net  Mon Aug 18 13:40:59 2003
From: mitchskin at comcast.net (Mitchell Skinner)
Date: 18 Aug 2003 10:40:59 -0700
Subject: AW: mulitcast copy or snowball copy
In-Reply-To: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>
References: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>
Message-ID: <1061228458.5291.32.camel@zeitgeist>

On Mon, 2003-08-18 at 09:50, Donald Becker wrote:
> This is costly.  "Open loop" multicast protocols work by having the
> receiver track the missing blocks, and requesting (or interpolating)
> them later.  Here you are discarding that information and doing much
> extra work on both the sending and receiving side by later locating the
> missing blocks.

Some possible google terms include: reliable multicast, forward error
correction

There's an ietf working group on reliable multicast that wasn't making a
whole lot of progress the last time I checked.  At that time, I recall
there being some acknowledgment-based implementations as well as one
forward error correction-based implementation using reed-solomon codes,
from an academic in Italy whose name I forgot.

It's been a little while, but when I looked at the code for that
FEC-based reliable multicast program (rmdp?) I think it could only
handle pretty small files.  My understanding is that FEC-based
approaches should scale better in terms of the number of receiving
nodes, but the algorithms can be very time/space intensive.  There's a
patented algorithm from Digital Fountain that's supposed to be pretty
efficient (google tornado codes, michael luby, digital fountain) but I'm
not aware that they have a cluster-oriented product.  My impression of
them was that they were pretty WAN-oriented.

If I was less lazy I'd give some links instead of google terms, but
hopefully that's some food for thought.

Mitch

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 16:00:04 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 16:00:04 -0400 (EDT)
Subject: AW: mulitcast copy or snowball copy
In-Reply-To: <1061228458.5291.32.camel@zeitgeist>
Message-ID: <Pine.LNX.4.44.0308181518250.3226-100000@beotest.scyld.com>

On 18 Aug 2003, Mitchell Skinner wrote:
> On Mon, 2003-08-18 at 09:50, Donald Becker wrote:

> > This is costly.  "Open loop" multicast protocols work by having the
> > receiver track the missing blocks, and requesting (or interpolating)
> > them later.  Here you are discarding that information and doing much
> > extra work on both the sending and receiving side by later locating the
> > missing blocks.
..
> There's an ietf working group on reliable multicast that wasn't making a
> whole lot of progress the last time I checked.

It's a hard problem, and when they agree on a protocol it likely won't
apply to clusters.

The packet loss characteristic and cost trade-off is much different on a
WAN than with a local Ethernet switch on a cluster.  On a WAN every
packet is costly to transport, so it's worth having both end stations
doing extensive computations.

On a cluster we might talk about doing more computation to avoid
communication, but that's only for a few applications.  In reality we
prefer to do minimal work.  Thus we prefer OS-bypass for application
communication, and kernel-only for file system I/O.  Notice the
attention given to zero copy, TCP offload, TOE/TSO and sendfile().

Multicast and packet FEC add exactly what people are trying to avoid,
extra copying, complexity and work.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rene.storm at emplics.com  Mon Aug 18 17:27:56 2003
From: rene.storm at emplics.com (Rene Storm)
Date: Mon, 18 Aug 2003 23:27:56 +0200
Subject: AW: AW: mulitcast copy or snowball copy
Message-ID: <29B376A04977B944A3D87D22C495FB2301278E@vertrieb.emplics.com>

Ok,

A geometrically cascading structure gives me some more disadvantages.
If you are using an additional high performance network, eg myrinet or infiniband you won't have problems with the switch bandwidth.

If you are using low cost Ethernet/Gigabit network topology with 2 or more hups between the nodes (like FFN), the last "generation" of the snowball could be a heavy bottleneck.

It seems, there are too many variables for too many kinds of clusters.

A big cluster farm often got a "idle" network, but only one, while a MPI cluster could have a network for the message passing and one for commands and copying. You could use this service-network to copy our files without using full bandwidth of this network. 
But this would cost something cluster users don't have: time.


Rene 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Mon Aug 18 17:37:27 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Mon, 18 Aug 2003 14:37:27 -0700
Subject: big memory opteron
In-Reply-To: <1061228458.5291.32.camel@zeitgeist>
References: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com> <1061228458.5291.32.camel@zeitgeist>
Message-ID: <20030818213727.GB2131@greglaptop.internal.keyresearch.com>

I'm attempting to put together a big memory 2-cpu Opteron box, without
success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of
memory. Now that's a pretty strange number, since if I was out of chip
selects, it should see exactly 4 GBytes.

Any clues?

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From farooqkamal_76 at yahoo.com  Mon Aug 18 18:38:21 2003
From: farooqkamal_76 at yahoo.com (Farooq Kamal)
Date: Mon, 18 Aug 2003 15:38:21 -0700 (PDT)
Subject: Newbie
Message-ID: <20030818223821.11770.qmail@web21209.mail.yahoo.com>

Hi Everyone,

Its my first email to this group. What I was looking
for is that "is beowulf transparent to applications
running".

What I mean by that is suppose I run a apache server
on the master node; will the cluster manage the load
balancing and process migration itself? or every
application that is intented to run on beowulf must be
written from scracth to do so.

And at last if beowulf can't do, is there anyother
implematation of clusters that has these above said
qualities

Regards
Farooq Kamal
SZABIST - Karachi
Pakistan


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Mon Aug 18 22:41:49 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 19 Aug 2003 12:41:49 +1000
Subject: Newbie
In-Reply-To: <20030818223821.11770.qmail@web21209.mail.yahoo.com>
References: <20030818223821.11770.qmail@web21209.mail.yahoo.com>
Message-ID: <200308191241.50419.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 19 Aug 2003 08:38 am, Farooq Kamal wrote:

> And at last if beowulf can't do, is there anyother
> implematation of clusters that has these above said
> qualities

I think what you're looking for is OpenMOSIX.

	http://www.openmosix.org/

There's an introduction to it at the Intel website at:

http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic+Editorial%3a%3axeon_openmosix&cntType=IDS_EDITORIAL&catCode=BMB

Excuse the large URL!

good luck!
Chris
- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QY5tO2KABBYQAh8RAqKrAJ9SY5wfCvvL35hLPubrEa8/xFuYsgCdFHYi
4wDadQBbfYpz06hX3YRkwRI=
=QIb3
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 22:43:57 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 22:43:57 -0400 (EDT)
Subject: AW: AW: mulitcast copy or snowball copy
In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278E@vertrieb.emplics.com>
Message-ID: <Pine.LNX.4.44.0308182219100.3226-100000@beotest.scyld.com>

On Mon, 18 Aug 2003, Rene Storm wrote:

> A geometrically cascading structure gives me some more disadvantages.
> If you are using an additional high performance network, eg myrinet or
> infiniband you won't have problems with the switch bandwidth. 
>
> If you are using low cost Ethernet/Gigabit network topology with 2 or
> more hups between the nodes (like FFN), the last "generation" of the
> snowball could be a heavy bottleneck. 

No one uses Ethernet repeaters on a cluster.  32 port Fast Ethernet
switches are under $5/port.  Even for Gigabit Ethernet, 8 port switches
can be found for $20/port.

An unusual topology might be better utilized by mapping the copy
topology to the physical, but that's not the usual case.  The typical
case is an essentially flat topology, or one close enough that treating
it as flat avoids complexity.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Mon Aug 18 23:21:16 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Tue, 19 Aug 2003 11:21:16 +0800 (CST)
Subject: Scalable PBS
In-Reply-To: <200308181152.57812.csamuel@vpac.org>
Message-ID: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com>

--- Chris Samuel <csamuel at vpac.org> ????
>http://www.vpac.org/content/services_and_support/facility/linux_cluster.php

Interesting ;->

May be you can take a look at the PBS addons like
mpiexec, maui scheduler.

> Nope, it's always been running OpenPBS prior to
> migrating to SPBS.

SGE is sponsored by Sun, and is opensource, I am
currently using it.

http://gridengine.sunsource.net/

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Mon Aug 18 23:47:40 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 19 Aug 2003 13:47:40 +1000
Subject: Scalable PBS
In-Reply-To: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com>
References: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com>
Message-ID: <200308191347.42057.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 19 Aug 2003 01:21 pm, Andrew Wang wrote:

> May be you can take a look at the PBS addons like
> mpiexec, maui scheduler.

Already there. :-)

We've got some users using mpiexec (though it does mean that you can nolonger 
restart a mom and have an MPI job keep going like you could with MPICH's 
mpirun) and we swapped to the MAUI scheduler yesterday (not without 
problems).

> > Nope, it's always been running OpenPBS prior to
> > migrating to SPBS.
>
> SGE is sponsored by Sun, and is opensource, I am
> currently using it.
>
> http://gridengine.sunsource.net/

What's your impression of it ?

Does it integrate with commercial molecular modelling packages like MSI ?

cheers,
Chris
- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QZ3cO2KABBYQAh8RAnBbAJ9MbVoDWNp0pjp6CHANpDZe9K2i0QCfSbE9
jlJDiWkEkM2a1uY+qCETprU=
=9w8a
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bradshaw at mcs.anl.gov  Tue Aug 19 00:42:52 2003
From: bradshaw at mcs.anl.gov (Rick Bradshaw)
Date: Mon, 18 Aug 2003 23:42:52 -0500
Subject: big memory opteron
In-Reply-To: <20030818213727.GB2131@greglaptop.internal.keyresearch.com> (Greg
 Lindahl's message of "Mon, 18 Aug 2003 14:37:27 -0700")
References: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>
	<1061228458.5291.32.camel@zeitgeist>
	<20030818213727.GB2131@greglaptop.internal.keyresearch.com>
Message-ID: <87k79agsvn.fsf@skywalker-lin.mcs.anl.gov>

Greg,
        This seems to be a huge bug that has been in the Bios for over
a year now. I have only seen this on the AGP motherboards
though. Unfortunetly they still perform much better than the none AGP
boards that do recognise all the memory.

Rick
Greg Lindahl <lindahl at keyresearch.com> writes:

> I'm attempting to put together a big memory 2-cpu Opteron box, without
> success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of
> memory. Now that's a pretty strange number, since if I was out of chip
> selects, it should see exactly 4 GBytes.
>
> Any clues?
>
> -- greg
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Tue Aug 19 08:42:21 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Tue, 19 Aug 2003 14:42:21 +0200 (CEST)
Subject: mulitcast copy or snowball copy 
In-Reply-To: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>
Message-ID: <Pine.LNX.4.33.0308191439530.31520-100000@maloney.ethz.ch>

On Mon, 18 Aug 2003, Donald Becker wrote:
> On Mon, 18 Aug 2003, Rene Storm wrote:
[...]
> > Rene: That's the problem with all the things you do, first they are for
> > your own and then everybody wants them ;o)
>
> If your end goal is to publish papers, do the hack.

If you want to write a paper you might also want to consider reading
the following papers as related work:

@article{
CCPE2002,
  author = "Felix Rauch and Christian Kurmann and Thomas M. Stricker",
  title = "{Optimizing the Distribution of Large Data Sets in Theory and Practice}",
  journal = "Concurrency and Computation: Practice and Experience",
  year = 2002,
  volume = 14,
  number = 3,
  pages = "165--181",
  month = apr
}

% frisbee-usenix03.pdf
% Cloning tool, with multicast data distribution, compression techniques etc.
@inproceedings{
Frisbee-Usenix2003,
  author = "Mike Hibler and Leigh Stoller and Jay Lepreau and Robert Ricci and Chad Barb",
  title = "{Fast, Scalable Disk Imaging with Frisbee}",
  booktitle = "Proceedings of the USENIX Annual Technical Conference 2003",
  year = 2003,
  month = jun,
  organization = "The USENIX Association"
}

Regards,
Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lehi.gracia at amd.com  Tue Aug 19 10:15:33 2003
From: lehi.gracia at amd.com (lehi.gracia at amd.com)
Date: Tue, 19 Aug 2003 09:15:33 -0500
Subject: big memory opteron
Message-ID: <99F2150714F93F448942F9A9F112634C07BE62CD@txexmtae.amd.com>

Greg,

You might want to try upgrading to the lates BIOS, what type of board do you have?

-Lehi

-----Original Message-----
From: Greg Lindahl [mailto:lindahl at keyresearch.com] 
Sent: Monday, August 18, 2003 4:37 PM
To: beowulf at beowulf.org
Subject: big memory opteron


I'm attempting to put together a big memory 2-cpu Opteron box, without success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of memory. Now that's a pretty strange number, since if I was out of chip selects, it should see exactly 4 GBytes.

Any clues?

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gmpc at sanger.ac.uk  Tue Aug 19 12:53:59 2003
From: gmpc at sanger.ac.uk (Guy Coates)
Date: Tue, 19 Aug 2003 17:53:59 +0100 (BST)
Subject: AW: mulitcast copy or snowball copy
In-Reply-To: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com>
References: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com>
Message-ID: <Pine.OSF.4.44.0308191728060.3629692-100000@ecs2f.internal.sanger.ac.uk>


We've tried both multicast and snowball for data distribution on our
cluster. We have a 60Gig dataset which we have to distribute to 1000
nodes.

We started off using snowball copies. They work, but care is needed in
your choice of tools for the file-transfers.  rsync works, but can have
problems with large (> 2Gig) files if you use rsh as the transport
mechanism. (this is an rsh bug on some redhat versions rather than an
rsync bug).

rsync over ssh gets around that problem, but of course has the added
encryption overhead.

You should also avoid the incremental update mode of rsync (which is the
default). We've found that it will silently corrupt your files if you
rsync across different architectures (eg alpha-->ia32). It also has
problems with large files.


The only usable multicast code we've found that actually works is udpcast.

http://udpcast.linux.lu/

There are plenty of other multicast codes to choose from out on the web,
and most of them fall over horribly as soon as you cross more than one
switch or have more than 10-20 hosts.

We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used
it to sucessfully distribute our 60gig dataset over large numbers of nodes
simultaneously.

In practice, on gigabit, we find that disk write speed is the limiting
factor rather than the network. Lawrence Livermore use udpcast to install
OS images on the MCR cluster, and I believe they side-step the disk
performance issue by writing data to a ramdisk as an intermediate step.
Obviously this only makes sense if your dataset < size of memory.

Our current file distribution strategy is to use a combination of rsync
and updcast. We do a dummy rsync to find out what files need updating, tar
them up, pipe the tarball through udpcast and then untar the files and the
client.

The main performance killer we've found for udpcast is cheap switches.

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csmith at lnxi.com  Tue Aug 19 13:48:24 2003
From: csmith at lnxi.com (Curtis Smith)
Date: Tue, 19 Aug 2003 11:48:24 -0600
Subject: AW: mulitcast copy or snowball copy
References: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com> <Pine.OSF.4.44.0308191728060.3629692-100000@ecs2f.internal.sanger.ac.uk>
Message-ID: <072a01c3667a$16624c90$a423a8c0@blueberry>

You might want to look into the Clusterworx product from Linux Networx. It
has been used to boot and image clusters over 1100 nodes in size using
multicast, and supports image sizes over 4GB. Multiple images can be served
by a single server using ethernet. Each channel can use 100% of the network
bandwidth (12.5MB per second on Fast Ethernet) or can be throttled to a
specific rate. We typically use a transmission rate of 10MB per second on
Fast Ethernet (30 seconds for a 300MB image), allowing DHCP traffic to get
through. The multicast server can also be throttled to ensure that its
doesn't overdrive the switch or hub (if you are using cheap ones) which in
many cases can account for up to 95% of packet loss. If your switch is fast
and is IGMP enabled, you will generally experience little to no packet loss.
The technology is based on UDP and multicast and works with LinuxBios and
Etherboot, and was used to image the MCR cluster many times prior to its
deployment at LLNL. MCR could go from powered-off bare metal to running in
about 7 minutes (most of which was disk formatting).

Curtis Smith
Principal Software Engineer
Linux Networx (www.lnxi.com)

----- Original Message -----
From: "Guy Coates" <gmpc at sanger.ac.uk>
To: <beowulf at scyld.com>
Sent: Tuesday, August 19, 2003 10:53 AM
Subject: Re:AW: mulitcast copy or snowball copy


>
> We've tried both multicast and snowball for data distribution on our
> cluster. We have a 60Gig dataset which we have to distribute to 1000
> nodes.
>
> We started off using snowball copies. They work, but care is needed in
> your choice of tools for the file-transfers.  rsync works, but can have
> problems with large (> 2Gig) files if you use rsh as the transport
> mechanism. (this is an rsh bug on some redhat versions rather than an
> rsync bug).
>
> rsync over ssh gets around that problem, but of course has the added
> encryption overhead.
>
> You should also avoid the incremental update mode of rsync (which is the
> default). We've found that it will silently corrupt your files if you
> rsync across different architectures (eg alpha-->ia32). It also has
> problems with large files.
>
>
> The only usable multicast code we've found that actually works is udpcast.
>
> http://udpcast.linux.lu/
>
> There are plenty of other multicast codes to choose from out on the web,
> and most of them fall over horribly as soon as you cross more than one
> switch or have more than 10-20 hosts.
>
> We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used
> it to sucessfully distribute our 60gig dataset over large numbers of nodes
> simultaneously.
>
> In practice, on gigabit, we find that disk write speed is the limiting
> factor rather than the network. Lawrence Livermore use udpcast to install
> OS images on the MCR cluster, and I believe they side-step the disk
> performance issue by writing data to a ramdisk as an intermediate step.
> Obviously this only makes sense if your dataset < size of memory.
>
> Our current file distribution strategy is to use a combination of rsync
> and updcast. We do a dummy rsync to find out what files need updating, tar
> them up, pipe the tarball through udpcast and then untar the files and the
> client.
>
> The main performance killer we've found for udpcast is cheap switches.
>
> Cheers,
>
> Guy Coates
>
> --
> Guy Coates,  Informatics System Group
> The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
> Tel: +44 (0)1223 834244 ex 7199
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john152 at libero.it  Wed Aug 20 07:26:43 2003
From: john152 at libero.it (john152 at libero.it)
Date: Wed, 20 Aug 2003 13:26:43 +0200
Subject: Detection performance?
Message-ID: <HJX14J$21D56BA0B57FCA847280F4D74B872403@libero.it>

Hi all,
does anyone know about the performance 
of Mii-diag using ioctl calls?

Using Mii-diag, what could be the average delay 
between the link-status change ( phisically ) 
and the detection of this event.

I'm using a 3Com 905 Tornado PC card; is there 
a different delay for each PC card in changing the 
status register? 
How long could this delay be in your experience?

I'd like to have a delay minor than 1 ms
between the time in which i phisically 
disconnect the cable and the time in which
I have the detection (in example with a printf on video, ...)

In your experience, is it reasonable? 
Normally do I have to wait  for a greater delay?

Thanks in advance for your kind answer and observations.

Giovanni di Giacomo


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Wed Aug 20 08:30:58 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 20 Aug 2003 14:30:58 +0200 (CEST)
Subject: mulitcast copy or snowball copy
In-Reply-To: <Pine.OSF.4.44.0308191728060.3629692-100000@ecs2f.internal.sanger.ac.uk>
Message-ID: <Pine.LNX.4.33.0308201419350.28889-100000@maloney.ethz.ch>

On Tue, 19 Aug 2003, Guy Coates wrote:
> The only usable multicast code we've found that actually works is udpcast.
>
> http://udpcast.linux.lu/
>
> There are plenty of other multicast codes to choose from out on the web,
> and most of them fall over horribly as soon as you cross more than one
> switch or have more than 10-20 hosts.
>
> We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used
> it to sucessfully distribute our 60gig dataset over large numbers of nodes
> simultaneously.

That's interesting, since I tried udpcast once (just a few tests) on
our Cabletron SmartSwitchRouter with Gigabit Ethernet without disk
accesses and I got about 350 Mbps, while Dolly ran with approx. 500 Mbps
on Machines with 1 GHz processors.

I even used Dolly once (already many years ago, with 400 MHz machines)
to clone two 24-node clusters at the same time, they were connected to
two different switches and had a router in between. The throughput for
the nodes was about 6.9 MByte/s over Fast Ethernet for every of the
nodes.

> The main performance killer we've found for udpcast is cheap switches.

True. I tried it once with a cheap and simple ATI 24-port Fast
Ethernet switch. Udpcast run with only about 1 MByte/s since the
switch decided to multicast everything with only 10 Mbps (one machine
that wasn't a member of the multicast group was connected with only
10 Mbps). Dolly on the other hand worked perfect with full wire speed
on that switch.

- Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Wed Aug 20 09:39:16 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Wed, 20 Aug 2003 15:39:16 +0200 (CEST)
Subject: Detection performance?
In-Reply-To: <HJX14J$21D56BA0B57FCA847280F4D74B872403@libero.it>
Message-ID: <Pine.LNX.4.44.0308201517440.22455-100000@kenzo.iwr.uni-heidelberg.de>

On Wed, 20 Aug 2003,  wrote:

> Using Mii-diag, what could be the average delay 
> between the link-status change ( phisically ) 
> and the detection of this event.

Depends on the card capabilities and the driver. Most drivers poll for 
change, some use an interrupt.

> I'm using a 3Com 905 Tornado PC card; is there 
> a different delay for each PC card in changing the 
> status register? 

I don't understand the question...

> I'd like to have a delay minor than 1 ms
> between the time in which i phisically 
> disconnect the cable and the time in which
> I have the detection (in example with a printf on video, ...)

The 3c59x driver polls every 60 seconds for media status when using 
autonegotiation (default). People from the HA and bonding projects have 
modified this to allow very fine polling of the media registers, however 
this has a big disadvantage: the CPU spends a lot of time waiting for 
completion of in/out operations - the finer the poll, the more CPU lost.
The time taken to talk to th MII does not depend on the CPU speed, but on 
th PCI speed, so the faster the CPU, the more instruction cycles are lost 
to I/O.

> In your experience, is it reasonable? 

No, because the network card should transfer data, not be a watchdog.
There is one other solution, but there is no code for it yet. At least 
the Tornado cards allow generating an interrupt whenever the media 
changes. This would alleviate the need to continually poll the media 
registers and would give an indication very soon after the event happened. 
This was on my to-do list for a long time, but it was never done and 
probably won't be done soon.

> Normally do I have to wait  for a greater delay?

If by "normally" you mean "the 359x driver distributed with the kernel" or 
"the 3c59x driver from Scyld", then yes.

> Thanks in advance for your kind answer and observations.

This isn't really beowulf related. Please use vortex at scyld.com for 
discussing the 3c59x driver.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jhearns at micromuse.com  Wed Aug 20 10:50:30 2003
From: jhearns at micromuse.com (John Hearns)
Date: Wed, 20 Aug 2003 15:50:30 +0100
Subject: Detection performance?
In-Reply-To: <HJX14J$21D56BA0B57FCA847280F4D74B872403@libero.it>
References: <HJX14J$21D56BA0B57FCA847280F4D74B872403@libero.it>
Message-ID: <3F438AB6.6090507@micromuse.com>

That's an interesting question.
Can you tell us what your application is, and why it needs fast response?

First thought I had would be to SNMP trap the port status on the switch, 
rather than
the card. But I must admit I have no idea of the latency there, but 
would I would expect it to be
much more than 1ms.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Wed Aug 20 12:33:49 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 20 Aug 2003 12:33:49 -0400
Subject: clubmask 0.5 released
Message-ID: <1061397229.16487.45.camel@roughneck>

Name        : Clubmask
Version     : 0.5                             
Release     : 1
Group       : Cluster Resource Management and Scheduling
Vendor      : Liniac Project, University of Pennsylvania
License     : GPL-2
URL         : http://clubmask.sourceforge.net

What is Clubmask
----------------

Clubmask is a resource manager designed to allow Bproc based clusters
enjoy the full scheduling power and configuration of the Maui HPC
Scheduler.

Clubmask uses a modified version of the Supermon resource monitoring
software to gather resource information from the cluster nodes. This
information is combined with job submission data and delivered to the
Maui scheduler. Maui issues job control commands back to Clubmask, which
then starts or stops the job scripts using the Bproc environment. 

Clubmask also provides builtin support for a supermon2ganglia translator
that allows a standard Ganlgia  web backend to contact supermon and get
XML data that will disply through the Ganglia web interface.

Clubmask is currently running on around 10 clusters, varying in size
from 8 to 128 nodes, and has been tested up to 5000 jobs.

Links
-------------
Bproc: http://bproc.sourceforge.net
Ganglia: http://ganglia.sourceforge.net
Maui Scheduler: http://www.supercluster.org/maui
Supermon: http://supermon.sourceforge.net

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Fri Aug 22 00:39:04 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Fri, 22 Aug 2003 12:39:04 +0800 (CST)
Subject: SGE on AMD Opteron ?
In-Reply-To: <200308201609.UAA08558@nocserv.free.net>
Message-ID: <20030822043904.18171.qmail@web16811.mail.tpe.yahoo.com>

Using the 32-bit x86 glinux binary package, it works
on my machine. SGE gets the load information and the
system/hardware information correctly:

> qhost
HOSTNAME             ARCH       NPROC  LOAD   MEMTOT  
MEMUSE   SWAPTO   SWAPUS
-------------------------------------------------------------------------------
global               -              -     -        -  
     -        -        -
opteron1            glinux         2  0.00   997.0M   
47.8M     1.0G     4.3M

Andrew.


 --- Mikhail Kuzminsky <kus at free.net> ????
>    Sorry, is here somebody who
> works w/Sun GrideEngine on AMD Opteron platform ?
> I'm interesting in any information -
> about binary SGE distribution in 32-bit mode,
> or about compilation from the source for x86-64
> mode,
> under SuSE or RedHat distribution etc.
> 
> Yours
> Mikhail Kuzminsky
> Zelinsky Institute of Organic Chemistry
> Moscow
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jhearns at micromuse.com  Sat Aug 23 03:48:58 2003
From: jhearns at micromuse.com (John Hearns)
Date: 23 Aug 2003 08:48:58 +0100
Subject: SGE on AMD Opteron ?
In-Reply-To: <200308201609.UAA08558@nocserv.free.net>
References: <200308201609.UAA08558@nocserv.free.net>
Message-ID: <1061624938.1182.57.camel@harwood>

On Wed, 2003-08-20 at 17:09, Mikhail Kuzminsky wrote:
>    Sorry, is here somebody who
> works w/Sun GrideEngine on AMD Opteron platform ?
> I'm interesting in any information 


>From - 
Return-Path: <>
Received: from localhost by clarice with LMTP for
<jhearns at micromuse.com>;
	Sat, 23 Aug 2003 08:51:33 +0100
Received: from mta.micromuse.com (mta.micromuse.com [194.131.185.92]) by
	mailstore.micromuse.co.uk (Switch-2.2.6/Switch-2.2.4) with ESMTP id
	h7N7pXZ27346 for <jhearns at micromuse.com>; Sat, 23 Aug 2003 08:51:33
+0100
Received: from marstons.services.quay.plus.net
	(marstons.services.quay.plus.net [212.159.14.223]) by mta.micromuse.com
	(Switch-2.2.6/Switch-2.2.6) with SMTP id h7N7pWY27479 for
	<jhearns at micromuse.com>; Sat, 23 Aug 2003 08:51:32 +0100
Message-Id: <200308230751.h7N7pWY27479 at mta.micromuse.com>
Received: (qmail 19110 invoked for bounce); 23 Aug 2003 07:51:26 -0000
Date: 23 Aug 2003 07:51:26 -0000
From: MAILER-DAEMON at marstons.services.quay.plus.net
To: jhearns at micromuse.com
Subject: failure notice
X-Perlmx-Spam: Gauge=XXXIIIIIIIII, Probability=39%,
	Report="FAILURE_NOTICE_1, MAILER_DAEMON, NO_MX_FOR_FROM, NO_REAL_NAME,
	SPAM_PHRASE_00_01"
X-Evolution-Source: imap://jhearns at mta.micromuse.com/
Mime-Version: 1.0

Hi. This is the qmail-send program at marstons.services.quay.plus.net.
I'm afraid I wasn't able to deliver your message to the following
addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

<bewoulf at bewoulf.org>:
Sorry, I couldn't find any host named bewoulf.org. (#5.1.2)

--- Below this line is a copy of the message.

Return-Path: <jhearns at micromuse.com>
Received: (qmail 19106 invoked by uid 10001); 23 Aug 2003 07:51:26 -0000
Received: from dockyard.plus.com (HELO .) (212.159.87.168)
  by marstons.services.quay.plus.net with SMTP; 23 Aug 2003 07:51:26
-0000
Subject: Re: SGE on AMD Opteron ?
From: John Hearns <jhearns at micromuse.com>
To: bewoulf at bewoulf.org
In-Reply-To: <200308201609.UAA08558 at nocserv.free.net>
References: <200308201609.UAA08558 at nocserv.free.net>
Content-Type: text/plain
Organization: Micromuse
Message-Id: <1061624843.1183.52.camel at harwood>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) 
Date: 23 Aug 2003 08:47:23 +0100
Content-Transfer-Encoding: 7bit

On Wed, 2003-08-20 at 17:09, Mikhail Kuzminsky wrote:
>    Sorry, is here somebody who
> works w/Sun GrideEngine on AMD Opteron platform ?
> I'm interesting in any information -


I'm working with this.
More news when I get it.

Also, and I know that all I have to do is Google and do some
reading, but does andone on the list have experience with lm_sensors
on Opteron? Specifically HDAMA motherboards.

A quick Google just turned up a post by Mikhail in June
on this very subject...


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From saville at comcast.net  Sat Aug 23 17:38:49 2003
From: saville at comcast.net (Gregg Germain)
Date: Sat, 23 Aug 2003 17:38:49 -0400
Subject: Help! Endless RARP requests
Message-ID: <3F47DEE9.F00C5213@comcast.net>

Hi,

 I have installed the Scyle basic edition I got from Linux Central (RH
6.2).

 I've done the installation and I selected the range of IP addresses
they suggest for the slave nodes. ifconfig shows that eth1 is operating.

 I connect a slave node to the Master node by connecting the Slave's
eth0 card to the Master's eth1 card.

 I created a slave boot floppy, and boot the slave. It boots ok but
starts sending RARP requests that never get satisfied. It sits there
forever making more requests (well eventually it reboots itself and
tries again but then there's endless RARP requests).

 Can any one give me a hint?

 Do I have to go through a hub to connect that first slave to the
master? I know I'll have to have a hub for the second slave, but I
thought I could make a direct connection for the first one.

 Any help would be greatly appreciated.

thanks

Gregg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From adm35 at georgetown.edu  Sun Aug 24 14:36:02 2003
From: adm35 at georgetown.edu (adm35 at georgetown.edu)
Date: Sun, 24 Aug 2003 14:36:02 -0400
Subject: Help! Endless RARP requests
Message-ID: <1967012801.1280119670@georgetown.edu>

You'll either need a hub, switch or crossover cable.

Arnie Miles
Systems Administrator:  Advanced Research Computing
Adjunct Faculty:  Computer Science
202.687.9379
168 Reiss Science Building
http://www.georgetown.edu/users/adm35
http://www.guppi.arc.georgetown.edu

----- Original Message -----
From: Gregg Germain <saville at comcast.net>
Date: Saturday, August 23, 2003 5:38 pm
Subject: Help! Endless RARP requests

> Hi,
> 
> I have installed the Scyle basic edition I got from Linux Central (RH
> 6.2).
> 
> I've done the installation and I selected the range of IP addresses
> they suggest for the slave nodes. ifconfig shows that eth1 is 
> operating.
> I connect a slave node to the Master node by connecting the Slave's
> eth0 card to the Master's eth1 card.
> 
> I created a slave boot floppy, and boot the slave. It boots ok but
> starts sending RARP requests that never get satisfied. It sits there
> forever making more requests (well eventually it reboots itself and
> tries again but then there's endless RARP requests).
> 
> Can any one give me a hint?
> 
> Do I have to go through a hub to connect that first slave to the
> master? I know I'll have to have a hub for the second slave, but I
> thought I could make a direct connection for the first one.
> 
> Any help would be greatly appreciated.
> 
> thanks
> 
> Gregg
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Mon Aug 25 03:29:29 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Mon, 25 Aug 2003 09:29:29 +0200
Subject: PCI-X/133 NICs on PCI-X/100
In-Reply-To: <200308221815.WAA27091@nocserv.free.net>
References: <200308221815.WAA27091@nocserv.free.net>
Message-ID: <200308250929.29082.joachim@ccrl-nece.de>

Mikhail Kuzminsky:
> Really I need to estimate: will Mellanox MTPB23108 IB PCI-X/133 cards
> work w/PCI-X/100 slots on Opteron-based mobos (most of
> them have PCI-X/100, exclusions that I know are Tyan S2885 and Apppro
> mobos) - i.e. how high is the probability that they are
> incompatible ?

Very low. But why don't you ask the vendor directly?

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wade.hampton at nsc1.net  Mon Aug 25 08:31:34 2003
From: wade.hampton at nsc1.net (Wade Hampton)
Date: Mon, 25 Aug 2003 08:31:34 -0400
Subject: help with e1000 upgrade
Message-ID: <3F4A01A6.4090608@nsc1.net>

G'day,

I am upgrading the larger of my clusters to 1G ethernet.  All nodes are
TYAN motherboards (including the head node), and have on-board
1G.  I've been using the default e1000 driver on my head node for
the past year.  It's version 4.1.7.  However, when I try to boot my
slave nodes, they appear to "hang" after initializing the NIC.

I tried upgrading to the newer 5.1.3 driver.  The head node
is up and working.  I made a boot floppy and tried booting,
but once again, it hung right after the line displaying the
e1000 and its IRQ.

In the slave node BIOS, I have turned off the eepro100
and turned on the e1000. 

Any help would be appreciated. 

Cheers,
--
Wade Hampton


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wade.hampton at nsc1.net  Mon Aug 25 11:33:26 2003
From: wade.hampton at nsc1.net (Wade Hampton)
Date: Mon, 25 Aug 2003 11:33:26 -0400
Subject: help with e1000 upgrade
In-Reply-To: <3F4A163E.B3301A38@accessgate.net>
References: <3F4A01A6.4090608@nsc1.net> <3F4A163E.B3301A38@accessgate.net>
Message-ID: <3F4A2C46.6040007@nsc1.net>

Doug Shubert wrote:

>Hello Wade,
>
>Wade Hampton wrote:
>
>  
>
>>G'day,
>>
>>I am upgrading the larger of my clusters to 1G ethernet.  All nodes are
>>    
>>
>
>Are the on-board NIC's Intel ?
>
Intel

>>I tried upgrading to the newer 5.1.3 driver.  The head node
>>is up and working.  I made a boot floppy and tried booting,
>>but once again, it hung right after the line displaying the
>>e1000 and its IRQ.
>>
>>    
>>
>
>Are you using Cat5e or Cat6 cabling?
>
>We have found that Cat6 works more reliably
>on auto sense 10/100/1000 NIC's and switches.
>
So far, CAT5E (3-6 foot cables).

>>In the slave node BIOS, I have turned off the eepro100
>>and turned on the e1000.
>>    
>>
>We are using the E1000 driver in kernel 2.4.21 and it works flawlessly.
>
I've been using it in the Scyld 2.4.17 kernel from my head node for 
nearly a year
without any issues.  The master has the same motherboard and chips, only
more disks, etc.

The issue seems to be with booting my slave nodes from the Scyld boot disc.

Thanks,
--
Wade Hampton

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Tue Aug 26 01:33:33 2003
From: rouds at servihoo.com (RoUdY)
Date: Tue, 26 Aug 2003 09:33:33 +0400
Subject: linpack
In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com>
Message-ID: <web-20577157@servihoo.com>

Hello,
Can someone please tell me a bit more about linpack and 
how to implement it so that i can measure its performance 
.
And also some recommended sites

Thnx
roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Tue Aug 26 01:33:33 2003
From: rouds at servihoo.com (RoUdY)
Date: Tue, 26 Aug 2003 09:33:33 +0400
Subject: linpack
In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com>
Message-ID: <web-20577157@servihoo.com>

Hello,
Can someone please tell me a bit more about linpack and 
how to implement it so that i can measure its performance 
.
And also some recommended sites

Thnx
roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Tue Aug 26 07:16:42 2003
From: rouds at servihoo.com (RoUdY)
Date: Tue, 26 Aug 2003 15:16:42 +0400
Subject: mpich2-0.93
In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com>
Message-ID: <web-20597250@servihoo.com>

hello 
Please help me, i really need help
Because i can run mpd on the localhost but not in a ring 
of PC's
When I type "mpiexec -np 3 ~/mpich2-0.93/examples/cpi "
I get the answer 
Permission to node1 denied
Permission to node 2  denied..................

Hope to hear from u very soon 
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Tue Aug 26 10:33:05 2003
From: angel at wolf.com (Angel Rivera)
Date: Tue, 26 Aug 2003 14:33:05 GMT
Subject: Change Management Control
Message-ID: <20030826143305.24318.qmail@houston.wolf.com>

I am looking for information/sites and a formal best practice change control 
for clusters.  Can someone point me in the right direction? thx -ar
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nordquist at geosci.uchicago.edu  Tue Aug 26 16:50:34 2003
From: nordquist at geosci.uchicago.edu (Russell Nordquist)
Date: Tue, 26 Aug 2003 15:50:34 -0500 (CDT)
Subject: mpich2-0.93
In-Reply-To: <web-20597250@servihoo.com>
Message-ID: <Pine.GSO.4.44.0308261539390.8787-100000@geosci.uchicago.edu>


It sounds like you haven't setup password-less communication between your
nodes.

Can you "rsh node1" or "ssh node1" and not be prompted? If not you need to
setup password-less rsh (usaully .rhosts) or ssh. You can tell which one
mpirun (or change it) is using by the value of RSHCOMMAND (at least in
the 1.2 version) in mpirun.

russell

On Tue, 26 Aug 2003 at 15:16, RoUdY wrote:

> hello
> Please help me, i really need help
> Because i can run mpd on the localhost but not in a ring
> of PC's
> When I type "mpiexec -np 3 ~/mpich2-0.93/examples/cpi "
> I get the answer
> Permission to node1 denied
> Permission to node 2  denied..................
>
> Hope to hear from u very soon
> --------------------------------------------------
> Get your free email address from Servihoo.com!
> http://www.servihoo.com
> The Portal of Mauritius
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

- - - - - - - - - - - -
Russell Nordquist
UNIX Systems Administrator
Geophysical Sciences Computing
http://geosci.uchicago.edu/computing
NSIT, University of Chicago
 - - - - - - - - - - -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Aug 26 20:34:38 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 27 Aug 2003 10:34:38 +1000
Subject: mpich2-0.93
In-Reply-To: <Pine.GSO.4.44.0308261539390.8787-100000@geosci.uchicago.edu>
References: <Pine.GSO.4.44.0308261539390.8787-100000@geosci.uchicago.edu>
Message-ID: <200308271034.39916.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 27 Aug 2003 06:50 am, Russell Nordquist wrote:

> Can you "rsh node1" or "ssh node1" and not be prompted? If not you need to
> setup password-less rsh (usaully .rhosts) or ssh. You can tell which one
> mpirun (or change it) is using by the value of RSHCOMMAND (at least in
> the 1.2 version) in mpirun.

Standard security blah - rsh is evil, ssh is your friend. :-)

It is possible to not install rsh, rlogin and rcp and replace them with 
symbolic links to ssh, slogin and scp.

This should work for most cases, but of course, test, test and test again.

We were fortunate, although we had installed the r-series clients on our 
cluster the daemons weren't enabled in inetd, so we knew we couldn't break 
anything by removing them (as they'd never have worked in the first place).

So far not found anything that has a problem because of this - although don't 
nuke users .rhosts files as some other programs, like PBS, call ruserok() to 
validate connections!

cheers,
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/S/yeO2KABBYQAh8RAr/QAKCNHOz5hxIejvGOW34KZsRW74u0NwCeOONj
C49BRL6ceXRIHHNhl1mqHss=
=BM9q
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ajiang at mail.eecis.udel.edu  Wed Aug 27 17:32:38 2003
From: ajiang at mail.eecis.udel.edu (Ao Jiang)
Date: Wed, 27 Aug 2003 17:32:38 -0400 (EDT)
Subject: A question about Bewoulf software:
Message-ID: <Pine.GSO.4.33.0308271724160.6747-100000@stimpy.eecis.udel.edu>

   Hi,
   These days, our lab are planning to built up
a Beowulf cluster, which uses Intel Xeon Processors
or Pentium 4, and Intel Pro Gigabit (10/100/100)
ethernet card.
   We wonder if we choose commerical software, such as
scyld, which version will support Xeon Processor or
Pentium 4 respectively? And which version will support
Intel Pro Gigabit Ethernet card?
   If we try buliding by ourself, which version of software
we should choose?
   Thanks a lot for your kind suggestion. I am looking
forward to hearing from you. Thanks again.

   Tom

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Wed Aug 27 19:41:17 2003
From: becker at scyld.com (Donald Becker)
Date: Wed, 27 Aug 2003 19:41:17 -0400 (EDT)
Subject: A question about Bewoulf software:
In-Reply-To: <Pine.GSO.4.33.0308271724160.6747-100000@stimpy.eecis.udel.edu>
Message-ID: <Pine.LNX.4.44.0308271928500.1780-100000@beotest.scyld.com>

On Wed, 27 Aug 2003, Ao Jiang wrote:

>    These days, our lab are planning to built up
> a Beowulf cluster, which uses Intel Xeon Processors
> or Pentium 4, and Intel Pro Gigabit (10/100/100)
> ethernet card.
>    We wonder if we choose commerical software, such as
> scyld, which version will support Xeon Processor or
> Pentium 4 respectively?

Most Linux distributions will "support" the Pentium 4 and Xeon.  The
question is if the kernel is compiled to take advantage of the newer
processor features.

The Scyld distribution now has about a dozen different kernels to match the
processor types and UP/SMP on the master and compute nodes.  Typically
only two to four of the kernels are installed, based which checkboxes
are slected during installation.  We always install a safe, featureless
i386 uniprocessor 

BTW, you might think that the processor family is the most important
optimization, but there is an even bigger difference between
uniprocessor and SMP kernels.

> If we try buliding by ourself, which version of software
> we should choose?

You pretty much have two choices:
   be library version compatible with a consumer/workstation
      distribution (Red Hat, SuSE, Debian), or
   use a meta-distribution such as GenToo or Debian and compile
     everything yourself.

> And which version will support Intel Pro Gigabit Ethernet card?

Every few weeks Intel comes out with a new card version with a new PCI
ID.  The e1000 driver is one of the five or so drivers that we are
constantly updating to support just-introduced chips.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From exa at kablonet.com.tr  Wed Aug 27 20:39:21 2003
From: exa at kablonet.com.tr (Eray Ozkural)
Date: Thu, 28 Aug 2003 03:39:21 +0300
Subject: gigabit switches for 32-64 nodes
Message-ID: <200308280339.21458.exa@kablonet.com.tr>

hi there,

are there any high performance gigabit ethernet switches for a beowulf cluster 
consisting of 32 to 64 nodes? what do you recommend for the interconnect of 
such a system?

regards,

-- 
Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From widyono at cis.upenn.edu  Tue Aug 26 17:33:06 2003
From: widyono at cis.upenn.edu (Daniel Widyono)
Date: Tue, 26 Aug 2003 17:33:06 -0400
Subject: perl-bproc bindings
Message-ID: <20030826213306.GA2497@central.cis.upenn.edu>

Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc,
something more recent than spring of 2001?  Failing that, anyone have a
chance to flesh out the missing information in Dan's work (e.g. C constant()
function which doesn't seem to exist, error handling, etc.)?  I have it "just
working" for "just users", and barely at that.  Error handling consists of
returning -128 minus negated error code if there's an error.

I've already Googled and checked these archives (perl bproc binding).
Everything points back to Dan's work.

Thanks,
Dan W.

-- 
-- Daniel Widyono                      http://www.cis.upenn.edu/~widyono
-- Liniac Project,     CIS Dept.,    SEAS,    University of Pennsylvania
-- Mail: CIS Dept, 302 Levine     3330 Walnut St  Philadelphia, PA 19104
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Thu Aug 28 01:49:42 2003
From: becker at scyld.com (Donald Becker)
Date: Thu, 28 Aug 2003 01:49:42 -0400 (EDT)
Subject: perl-bproc bindings
In-Reply-To: <20030826213306.GA2497@central.cis.upenn.edu>
Message-ID: <Pine.LNX.4.44.0308280148180.1780-100000@beotest.scyld.com>

On Tue, 26 Aug 2003, Daniel Widyono wrote:

> Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc,
> something more recent than spring of 2001?

There are updated bindings, and a small example, at
  ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mitchel at navships.com  Thu Aug 28 05:08:13 2003
From: mitchel at navships.com (Mitchel Kagawa)
Date: Wed, 27 Aug 2003 23:08:13 -1000
Subject: gigabit switches for 32-64 nodes
References: <200308280339.21458.exa@kablonet.com.tr>
Message-ID: <000601c36d43$ea127360$0a02a8c0@kitsu2>

We use a Foundry Fastiron II Plus with 64 non-blocking copper gigabit ports.
A little on the pricy side but it works very well.

~Mitchel

----- Original Message -----
From: "Eray Ozkural" <exa at kablonet.com.tr>
To: <beowulf at beowulf.org>
Sent: Wednesday, August 27, 2003 2:39 PM
Subject: gigabit switches for 32-64 nodes


> hi there,
>
> are there any high performance gigabit ethernet switches for a beowulf
cluster
> consisting of 32 to 64 nodes? what do you recommend for the interconnect
of
> such a system?
>
> regards,
>
> --
> Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
> Comp. Sci. Dept., Bilkent University, Ankara  KDE Project:
http://www.kde.org
> www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction:
http://mp3.com/ariza
> GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D
539C
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Thu Aug 28 08:57:15 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Thu, 28 Aug 2003 14:57:15 +0200 (CEST)
Subject: 32bit slots and riser cards
Message-ID: <Pine.LNX.4.44.0308281440470.4549-100000@kenzo.iwr.uni-heidelberg.de>


Dear beowulfers,

In planning for some new cluster nodes, I hit a small problem. I want:
- a modern mainboard for dual-Xeon (preferred) or dual-Athlon
- 1U or 2U rackmounted case
- to use an old Myrinet M2M-PCI32 (Lanai4) which requires a riser card for 
installing in the case

The problem is that all mainboards that I looked at position the 32bit PCI
slot(s) near the edge of the mainboard and I cannot see how the riser
card can be installed into them so that the card still fits in the case; 
the Myrinet card does not fit (keyed differently) into the 64bit PCI slots 
or 64bit risers. Is there some solution to this problem or do I have to go 
back to midi-tower cases ?

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Thu Aug 28 10:24:19 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 28 Aug 2003 10:24:19 -0400
Subject: perl-bproc bindings
In-Reply-To: <Pine.LNX.4.44.0308280148180.1780-100000@beotest.scyld.com>
References: <Pine.LNX.4.44.0308280148180.1780-100000@beotest.scyld.com>
Message-ID: <1062080659.7565.0.camel@roughneck>

On Thu, 2003-08-28 at 01:49, Donald Becker wrote:
> On Tue, 26 Aug 2003, Daniel Widyono wrote:
> 
> > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc,
> > something more recent than spring of 2001?
> 
> There are updated bindings, and a small example, at
>   ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz

Any chance you guys have updated python bindings as well?

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at bellsouth.net  Thu Aug 28 11:04:52 2003
From: laytonjb at bellsouth.net (Jeffrey B. Layton)
Date: Thu, 28 Aug 2003 11:04:52 -0400
Subject: Intel acquiring Pallas
Message-ID: <3F4E1A14.8030900@bellsouth.net>

Good morning!

   I though I would post this for those who haven't seen it yet:

http://www.theregister.co.uk/content/4/32522.html

"Intel has signed on to acquire German software maker Pallas,
hoping the company's performance tools can give it an edge in
the compute cluster arena."

Enjoy!

Jeff


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 11:30:30 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 11:30:30 -0400 (EDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E1A14.8030900@bellsouth.net>
Message-ID: <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>

On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:

> Good morning!
> 
>    I though I would post this for those who haven't seen it yet:
> 
> http://www.theregister.co.uk/content/4/32522.html
> 
> "Intel has signed on to acquire German software maker Pallas,
> hoping the company's performance tools can give it an edge in
> the compute cluster arena."

Interesting.  I'm trying to understand where and how this will help them
-- more often than not it is a Bad Thing when hardware mfrs start
dabbling in something higher than firmware or compilers -- Apple (and
Next in its day) stands at one end of that path.

It's especially curious given that Intel is already overwhelmingly
dominant in the compute cluster arena (with only AMD a meaningful
cluster competitor, and with apple and the PPC perhas a distant third).
Not to mention the fact that if they REALLY wanted to get an edge in the
compute cluster arena, they'd acquire somebody like Dolphin or Myricom.

Monitoring is lovely and even important for application tuning, but it
is an application layer on TOP of both systems software and the network.
Or perhaps they are buying them so they can instrument their compilers?

   rgb

> 
> Enjoy!
> 
> Jeff
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at bellsouth.net  Thu Aug 28 11:50:41 2003
From: laytonjb at bellsouth.net (Jeffrey B. Layton)
Date: Thu, 28 Aug 2003 11:50:41 -0400
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>
Message-ID: <3F4E24D1.9010301@bellsouth.net>

Robert G. Brown wrote:

>On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:
>  
>
>>Good morning!
>>
>>   I though I would post this for those who haven't seen it yet:
>>
>>http://www.theregister.co.uk/content/4/32522.html
>>
>>"Intel has signed on to acquire German software maker Pallas,
>>hoping the company's performance tools can give it an edge in
>>the compute cluster arena."
>>    
>>
>
>Interesting.  I'm trying to understand where and how this will help them
>-- more often than not it is a Bad Thing when hardware mfrs start
>dabbling in something higher than firmware or compilers -- Apple (and
>Next in its day) stands at one end of that path.
>
>It's especially curious given that Intel is already overwhelmingly
>dominant in the compute cluster arena (with only AMD a meaningful
>cluster competitor, and with apple and the PPC perhas a distant third).
>Not to mention the fact that if they REALLY wanted to get an edge in the
>compute cluster arena, they'd acquire somebody like Dolphin or Myricom.
>
>Monitoring is lovely and even important for application tuning, but it
>is an application layer on TOP of both systems software and the network.
>Or perhaps they are buying them so they can instrument their compilers?
>
>   rgb
>

Bob,

   Very interesting observation. I wonder if Intel doesn't have something
else up their sleeve? Could they be trying to get back into Supercomputer
game (not likely, but didn't they get some DoD money recently?). Could
they be helping with networking stuff (Intel has been discussing the next
generation networking stuff lately). Maybe some sort of TCP Offload
Engine? Maybe something with their new bus ( PCI Express?) They have also
created CSA (Communication Streaming Architecture) in their new chipset
to bypass the PCI bottleneck. Of course they could also be after the Pallas
parallel debuggers to integrate into their compilers (like you mentioned)
or perhaps to help with debugging threaded code in the hyperthreaded chips.
   Not that you mention it, this is a somewhat interesting development.
I wonder what they're up to?

>Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>Duke University Dept. of Physics, Box 90305
>Durham, N.C. 27708-0305
>Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>  
>

Jeff


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 12:05:52 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 12:05:52 -0400 (EDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E24D1.9010301@bellsouth.net>
Message-ID: <Pine.LNX.4.44.0308281157370.20373-100000@ganesh.phy.duke.edu>

On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:

> to bypass the PCI bottleneck. Of course they could also be after the Pallas
> parallel debuggers to integrate into their compilers (like you mentioned)
> or perhaps to help with debugging threaded code in the hyperthreaded chips.
>    Not that you mention it, this is a somewhat interesting development.
> I wonder what they're up to?

My guess is something like this, given what pallas does, but if this is
the case, they may be preparing to attempt a task that has brought
strong programmers to their knees repeatedly in the past -- create a
true parallel compiler.  A compiler where the thread library
transparently hides a network-based cluster, complete with migration and
load balancing.  So the same code, written on top of a threading
library, could compile and run transparently on a single processor or a
multiprocessor or a distributed cluster.  Or something.

Hell, they're one of the few entities that can afford to tackle such a
blue-sky project, and just perhaps it is time for the project to be
tackled.  At least they can attack it from both ends at once -- writing
the compiler at the same time they hack the hardware around.  But
they're going to have create a hardware-level virtual interface for a
variety of IPC mechanism's for this to work, I think, in order to
instrument it locally and globally with no particular penalty either
way.  Or, of course, buy SCI and start putting the chipset on their
motherboards as a standard feature on a custom bus.  Myricom wouldn't
like that (or Dolphin if they went the other way), but it would make a
hell of a clustering motherboard.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rocky at atipa.com  Thu Aug 28 12:22:01 2003
From: rocky at atipa.com (Rocky McGaugh)
Date: Thu, 28 Aug 2003 11:22:01 -0500 (CDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E24D1.9010301@bellsouth.net>
Message-ID: <Pine.LNX.4.44.0308281113070.12077-100000@rocky>


On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:

> Bob,
> 
>    Very interesting observation. I wonder if Intel doesn't have something
> else up their sleeve? Could they be trying to get back into Supercomputer
> game (not likely, but didn't they get some DoD money recently?). Could
> they be helping with networking stuff (Intel has been discussing the next
> generation networking stuff lately). Maybe some sort of TCP Offload
> Engine? Maybe something with their new bus ( PCI Express?) They have also
> created CSA (Communication Streaming Architecture) in their new chipset
> to bypass the PCI bottleneck. Of course they could also be after the Pallas
> parallel debuggers to integrate into their compilers (like you mentioned)
> or perhaps to help with debugging threaded code in the hyperthreaded chips.
>    Not that you mention it, this is a somewhat interesting development.
> I wonder what they're up to?
> 

Intel's already dropped Infiniband, and they have also recently gotten 
very quiet about using PCI Express as a node interconnect. In fact, this
use of PCI Express has recently been switched to one of their "non-Goals" 
for the technology.

I'd guess that Intel does not care about this market. This is fine by me.
I'd rather have the Myricom's and Dolphin's that live or die by their
products to ensure the products are getting the attention they deserve.


-- 
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://67.8450073/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From walkev at presearch.com  Thu Aug 28 12:49:25 2003
From: walkev at presearch.com (Vann H. Walke)
Date: Thu, 28 Aug 2003 12:49:25 -0400
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281157370.20373-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0308281157370.20373-100000@ganesh.phy.duke.edu>
Message-ID: <1062089290.4363.22.camel@localhost.localdomain>

Hmmm...

Not to throw water on hopes for parallelizing compilers and Intel
supported parallel debuggers, but my guess is that Intel's move is much
less revolutionary (but perhaps still important).  

Pallas's main HPC product is Vampir/Vampirtrace.  These are performance
analysis tools.  As such they would only be peripherally useful for
compiler design (perhaps to measure the effects of certain changes). 
Even for this purpose, Vampir/Vampirtrace doesn't provide the amount of
detail that Intel's own V-Tune product does.  

For debugging, Pallas resells Etnus Totalview.  For compiler options
Pallas has the Intel compilers as well as PGI.  As far as I can tell,
Pallas doesn't do any significant independent development for these
systems.

So, what does the Pallas performance analysis product do that is
important?  Vampir/Vampirtrace allows the collection and display of data
from a large number of programs running in parallel.  Doing this well is
not trivial.  Time differences between machines must be taken into
account.  The tools  must be able to handle a potentially huge amount of
trace data (running a profiler on a 1000 process system is a much
different animal from instrumenting a single process job).  And, finally
once all this data is collected it has to be presented in some way which
can actually be of use.  VA/VT is among the best available tools for
this purpose.

So, why would Intel want to acquire Pallas?  First, they have a good
product which can be sold at a high price.  Combined with some Intel
marketing they "should" be able to make money on the product.  Second,
Vampirtrace has the capability of using processor performance counters. 
By pushing the capabilities of VA/VT to work on Intel processors it
promotes "lock-in" to Intel processors.  In this way a developer using
the Intel compilers, V-Tune for single process analysis, and Vampir for
parallel profiling, wouldn't be likely to move to an AMD or Power
platform.  

Is this a good thing?  For the most part probably so.  Intel should be
able to help improve the Vampir software.  Making it work even better on
Intel processors doesn't really hurt things if you're using another
system and might make things really nice for those of us on Intel
hardware.  Hopefully it's development on other systems won't languish.  

But, on the basis of this acquisition, I wouldn't hold my breath for
parallel compilers or a full fledged Intel return to the HPC market.

Vann
Presearch, Inc.


On Thu, 2003-08-28 at 12:05, Robert G. Brown wrote:
> On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:
> 
> > to bypass the PCI bottleneck. Of course they could also be after the Pallas
> > parallel debuggers to integrate into their compilers (like you mentioned)
> > or perhaps to help with debugging threaded code in the hyperthreaded chips.
> >    Not that you mention it, this is a somewhat interesting development.
> > I wonder what they're up to?
> 
> My guess is something like this, given what pallas does, but if this is
> the case, they may be preparing to attempt a task that has brought
> strong programmers to their knees repeatedly in the past -- create a
> true parallel compiler.  A compiler where the thread library
> transparently hides a network-based cluster, complete with migration and
> load balancing.  So the same code, written on top of a threading
> library, could compile and run transparently on a single processor or a
> multiprocessor or a distributed cluster.  Or something.
> 
> Hell, they're one of the few entities that can afford to tackle such a
> blue-sky project, and just perhaps it is time for the project to be
> tackled.  At least they can attack it from both ends at once -- writing
> the compiler at the same time they hack the hardware around.  But
> they're going to have create a hardware-level virtual interface for a
> variety of IPC mechanism's for this to work, I think, in order to
> instrument it locally and globally with no particular penalty either
> way.  Or, of course, buy SCI and start putting the chipset on their
> motherboards as a standard feature on a custom bus.  Myricom wouldn't
> like that (or Dolphin if they went the other way), but it would make a
> hell of a clustering motherboard.
> 
>    rgb
> 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Thu Aug 28 13:53:32 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 28 Aug 2003 10:53:32 -0700
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E24D1.9010301@bellsouth.net>
References: <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>
 <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>
Message-ID: <5.2.0.9.2.20030828104843.03073620@mailhost4.jpl.nasa.gov>

At 11:50 AM 8/28/2003 -0400, Jeffrey B. Layton wrote:
>Robert G. Brown wrote:
>
>>
>>Interesting.  I'm trying to understand where and how this will help them
>>-- more often than not it is a Bad Thing when hardware mfrs start
>>dabbling in something higher than firmware or compilers -- Apple (and
>>Next in its day) stands at one end of that path.
>>
>>It's especially curious given that Intel is already overwhelmingly
>>dominant in the compute cluster arena (with only AMD a meaningful
>>cluster competitor, and with apple and the PPC perhas a distant third).
>>Not to mention the fact that if they REALLY wanted to get an edge in the
>>compute cluster arena, they'd acquire somebody like Dolphin or Myricom.
>>
>>   rgb
>
>Bob,
>
>   Very interesting observation. I wonder if Intel doesn't have something
>else up their sleeve? Could they be trying to get back into Supercomputer
>game (not likely, but didn't they get some DoD money recently?). Could
>they be helping with networking stuff (Intel has been discussing the next
>generation networking stuff lately). Maybe some sort of TCP Offload
>Engine? Maybe something with their new bus ( PCI Express?) They have also
>created CSA (Communication Streaming Architecture) in their new chipset
>to bypass the PCI bottleneck. Of course they could also be after the Pallas
>parallel debuggers to integrate into their compilers (like you mentioned)
>or perhaps to help with debugging threaded code in the hyperthreaded chips.
>   Not that you mention it, this is a somewhat interesting development.
>I wonder what they're up to?
>
>Jeff

Intel is making a big push into wireless and RF technology.  A recent 
article ( I don't recall where exactly,but one of the trade rags..) 
mentioned that the mass market (consumer) don't seem to need much more 
processor crunch (at least until Windows XXXP comes out, then you'll need 
all that power just to apply the patches), but that they saw a big market 
opportunity in integrated wireless networking.  Simultaneously, the 
generalized tanking of the telecom industry has meant that they can hire 
very skilled RF engineers for reasonable wages without having to compete 
against speculative piles of options, etc. (I suspect that there are some 
skilled RF engineers who are now older and wiser and less speculative, 
too!)   We're talking about RF chip designers, as well as PWB layout, 
circuit designers, and antenna folks.

It wouldn't surprise me that Intel is looking at other areas than 
traditional CPU and processor support kinds of roles.


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glen at cert.ucr.edu  Thu Aug 28 14:19:28 2003
From: glen at cert.ucr.edu (Glen Kaukola)
Date: Thu, 28 Aug 2003 11:19:28 -0700
Subject: thrashing
Message-ID: <3F4E47B0.3000805@cert.ucr.edu>

Hi there,

So for our newest simulations, we're working with a different domain, 
where each of our grid cells are much smaller, and so we're expecting 
the runs to take about 4 times longer.  But actually they're taking 
around 40 times longer.  I'm thinking this may have something to do with 
not having enough memory.  The problem with this theory is that I'm not 
really sure how to tell if my machines are thrashing.  On a desktop 
machine I can tell no problem, as the disk starts going crazy and the 
system pretty much grinds to a halt.  But on a machine up in my server 
room on which I don't have any gui and where it's too loud to hear any 
disk activity, I'm really not sure how to tell whether it's thrashing or 
not.  I mean, I can look at top, and free, and sar and everything 
doesn't look much different than when the other simulations were 
running, except for maybe 'sar -W', which is a little bit higher.  
Anyway, if someone could help me out with a way to determine without a 
doubt if my machines are thrashing or not, then I'd greatly appriciate it.

Thanks for your time,
Glen Kaukola

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Thu Aug 28 14:35:49 2003
From: becker at scyld.com (Donald Becker)
Date: Thu, 28 Aug 2003 14:35:49 -0400 (EDT)
Subject: perl-bproc bindings
In-Reply-To: <1062080659.7565.0.camel@roughneck>
Message-ID: <Pine.LNX.4.44.0308281428240.18703-100000@beotest.scyld.com>

On 28 Aug 2003, Nicholas Henke wrote:

> On Thu, 2003-08-28 at 01:49, Donald Becker wrote:
> > On Tue, 26 Aug 2003, Daniel Widyono wrote:
> > 
> > > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc,
> > > something more recent than spring of 2001?
> > 
> > There are updated bindings, and a small example, at
> >   ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz
> 
> Any chance you guys have updated python bindings as well?

0.9-8 is the current version -- which are you using?
The last bugfix was logged in October of 2003.

The next planned refresh has added bindings for the Beostat statistics
library, Beomap job mapping and BBQ job scheduling systems.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at bellsouth.net  Thu Aug 28 14:39:02 2003
From: laytonjb at bellsouth.net (Jeffrey B. Layton)
Date: Thu, 28 Aug 2003 14:39:02 -0400
Subject: thrashing
In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu>
References: <3F4E47B0.3000805@cert.ucr.edu>
Message-ID: <3F4E4C46.1010300@bellsouth.net>

Use vmstat. Try something like,

vmstat 1 10

(1 second delay, 10 repeats). Look in the columns labeled,

 swap
si   so

that will give you the information you want.

Good Luck!

Jeff


> Hi there,
>
> So for our newest simulations, we're working with a different domain, 
> where each of our grid cells are much smaller, and so we're expecting 
> the runs to take about 4 times longer.  But actually they're taking 
> around 40 times longer.  I'm thinking this may have something to do 
> with not having enough memory.  The problem with this theory is that 
> I'm not really sure how to tell if my machines are thrashing.  On a 
> desktop machine I can tell no problem, as the disk starts going crazy 
> and the system pretty much grinds to a halt.  But on a machine up in 
> my server room on which I don't have any gui and where it's too loud 
> to hear any disk activity, I'm really not sure how to tell whether 
> it's thrashing or not.  I mean, I can look at top, and free, and sar 
> and everything doesn't look much different than when the other 
> simulations were running, except for maybe 'sar -W', which is a little 
> bit higher.  Anyway, if someone could help me out with a way to 
> determine without a doubt if my machines are thrashing or not, then 
> I'd greatly appriciate it.
>
> Thanks for your time,
> Glen Kaukola
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Thu Aug 28 15:08:53 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: 28 Aug 2003 15:08:53 -0400
Subject: thrashing
In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu>
References: <3F4E47B0.3000805@cert.ucr.edu>
Message-ID: <1062097733.8882.120.camel@protein.scalableinformatics.com>

Hi Glen:

  Several methods.

1) vmstat

	vmstat 1

and look at the so/si columns, not to mention the r/b/w.

2) swapon -s

to see the swap usage

3) top

has an ok summary of the vm info

4) cat /proc/meminfo 

can give a crude picture of the memory system.

On Thu, 2003-08-28 at 14:19, Glen Kaukola wrote:
> Hi there,
> 
> So for our newest simulations, we're working with a different domain, 
> where each of our grid cells are much smaller, and so we're expecting 
> the runs to take about 4 times longer.  But actually they're taking 
> around 40 times longer.  I'm thinking this may have something to do with 
> not having enough memory.  The problem with this theory is that I'm not 
> really sure how to tell if my machines are thrashing.  On a desktop 
> machine I can tell no problem, as the disk starts going crazy and the 
> system pretty much grinds to a halt.  But on a machine up in my server 
> room on which I don't have any gui and where it's too loud to hear any 
> disk activity, I'm really not sure how to tell whether it's thrashing or 
> not.  I mean, I can look at top, and free, and sar and everything 
> doesn't look much different than when the other simulations were 
> running, except for maybe 'sar -W', which is a little bit higher.  
> Anyway, if someone could help me out with a way to determine without a 
> doubt if my machines are thrashing or not, then I'd greatly appriciate it.
> 
> Thanks for your time,
> Glen Kaukola
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Aug 28 15:43:08 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 28 Aug 2003 12:43:08 -0700
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.4.44.0308281440470.4549-100000@kenzo.iwr.uni-heidelberg.de>
References: <Pine.LNX.4.44.0308281440470.4549-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <20030828194308.GA1778@greglaptop.internal.keyresearch.com>

On Thu, Aug 28, 2003 at 02:57:15PM +0200, Bogdan Costescu wrote:

> the Myrinet card does not fit (keyed differently) into the 64bit PCI slots 
> or 64bit risers. Is there some solution to this problem or do I have to go 
> back to midi-tower cases ?

Doesn't that mean that the Myrinet card is 5 volts, and you only have
3.3 volt PCI slots? It's such an old Myrinet card that I don't
remember the details of when PCI got a 3.3 volt option.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 16:24:47 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 16:24:47 -0400 (EDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E44D8.7090200@wildopensource.com>
Message-ID: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>

On Thu, 28 Aug 2003, Stephen Gaudet wrote:

> With Itanium2 this is not the case.  Both Red Hat and SuSe have a fixed 
> cost of about $400.00, plus or minus a few dollars per system.  
> Therefore, due to this fixed cost, MOST people looking at a cluster 
> won't touch Itanium2. 

Steve,

Are you suggesting RH has put together a package that is NOT GPL in any
way that would significantly affect the 64 bit market?  The kernel, the
compiler, and damn near every package is GPL, much of it from Gnu
itself.  Am I crazy here?

So I'm having a hard time seeing why one would HAVE to pay them
$400/system for anything except perhaps proprietary non-GPL "advanced
server" packages that almost certainly wouldn't be important to HPC
cluster builders (and which they would have had to damn near develop in
a sealed room to avoid incorporating GPL stuff in it anywhere).

> Some white box resellers are looking at taking RH Advanced Server and 
> stripping it down and offering on their ia64 clusters.  However, if 
> their not working with code lawyers, and paying very close attention to 
> copy right laws, they could end up with law suits down the road.

If Red Hat isn't careful and not working very carefully with code
lawyers, I think the reverse is a lot more likely, as Richard Stallman
is known to take the Gnu Public License (free as in air at the source
level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't
"own" a hell of a lot of code in what they sell; the bulk of what they
HAVE written is GPL derived and hence GPL by inheritance alone.  The
Open Source community would stomp anything out of line with hobnailed
boots and club it until it stopped twitching...

So although many a business may cheerfully pay $400/seat for advanced
server because it is a cost and support model they are comfortable with,
I don't see what there is to stop anyone from taking an advanced server
copy (which necessarily either comes with src rpm's or makes them
publically available somewhere), doing an rpmbuild on all the src rpm's
(as if anyone would care that you went through an independent rebuild vs
just used the distribution rpm's) and putting it on 1000 systems, or
giving the sources to a friend, or even reselling a repackaging of the
whole thing (as long as they don't call them Red Hat and as long as they
omit any really proprietary non-GPL work).

I even thought there were some people on the list who were using at
least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm
wrong...:-(

  rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Aug 28 16:23:37 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 28 Aug 2003 13:23:37 -0700
Subject: thrashing
In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu>
References: <3F4E47B0.3000805@cert.ucr.edu>
Message-ID: <20030828202337.GA1964@greglaptop.internal.keyresearch.com>

On Thu, Aug 28, 2003 at 11:19:28AM -0700, Glen Kaukola wrote:

> I mean, I can look at top, and free, and sar and everything 
> doesn't look much different than when the other simulations were 
> running,

A clear sign of thrashing is that the program should be getting a lot
less than 100% of the cpu, because it's waiting for blocks from the
disk.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Aug 28 16:31:20 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 28 Aug 2003 13:31:20 -0700
Subject: Change Management Control
In-Reply-To: <20030826143305.24318.qmail@houston.wolf.com>
References: <20030826143305.24318.qmail@houston.wolf.com>
Message-ID: <20030828203120.GB1964@greglaptop.internal.keyresearch.com>

> I am looking for information/sites and a formal best practice change 
> control for clusters.  Can someone point me in the right direction? thx -ar

Most clusters are a lot more informal, and don't have any kind of
change control.

I suspect your best bet would be to look at people involved in LISA:
Large Installation Systems Administration. These guys are mostly
commercial, and we (the HPC cluster community) don't talk to them
much, even though we should.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 16:42:19 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 16:42:19 -0400 (EDT)
Subject: thrashing
In-Reply-To: <1062097733.8882.120.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.44.0308281634330.20373-100000@ganesh.phy.duke.edu>

On 28 Aug 2003, Joseph Landman wrote:

> Hi Glen:
> 
>   Several methods.
> 
> 1) vmstat
> 
> 	vmstat 1
> 
> and look at the so/si columns, not to mention the r/b/w.
> 
> 2) swapon -s
> 
> to see the swap usage
> 
> 3) top
> 
> has an ok summary of the vm info
> 
> 4) cat /proc/meminfo 
> 
> can give a crude picture of the memory system.

and if you want to watch pretty much all of this information in parallel
(on all the systems at once) xmlsysd provides output fields with the
information available in both vmstat and free (cat /proc/meminfo), so
you can actually watch for swapping or paging or leaks on lots of
systems at once in wulfstat.  It easily handles updates with a 5 second
granularity and can often manage 1 second (depending on your network and
number of nodes and so forth).

It's on the brahma website or linked under my own.

I don't really provide a direct monitor of disk activity (partly out of
irritation at custom-parsing the multidelimited "disk_io" field in
/proc/stat), but if you were really interested in it I could probably
bite the bullet and add a "disk" display that would work for up to four
disks in a few hours of work.

I'd guess that ganglia could also manage this sort of monitoring as
well, but I don't use it (as I wrote my package before they started
theirs by a year or three) so I don't know for sure.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 16:49:06 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 16:49:06 -0400 (EDT)
Subject: 32bit slots and riser cards
In-Reply-To: <20030828194308.GA1778@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0308281644070.20373-100000@ganesh.phy.duke.edu>

On Thu, 28 Aug 2003, Greg Lindahl wrote:

> On Thu, Aug 28, 2003 at 02:57:15PM +0200, Bogdan Costescu wrote:
> 
> > the Myrinet card does not fit (keyed differently) into the 64bit PCI slots 
> > or 64bit risers. Is there some solution to this problem or do I have to go 
> > back to midi-tower cases ?
> 
> Doesn't that mean that the Myrinet card is 5 volts, and you only have
> 3.3 volt PCI slots? It's such an old Myrinet card that I don't
> remember the details of when PCI got a 3.3 volt option.

I think that this is right, Greg -- the keying is related to voltage.

If your actual PCI slots are keyed correctly, they should be able to
manage either voltage (IIRC), but you may have to replace the risers.
We've had trouble getting risers that didn't key correctly or work
correctly for one kind of card or the other (or one motherboard or
another) in the past.  It sounds like this might be your problem if
you're referring to replacing the cases and not the motherboard itself.
Look around and see if you find better/different risers -- there are a
fair number of different kinds of risers out there, at least for 2U.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Aug 28 17:06:40 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 28 Aug 2003 14:06:40 -0700 (PDT)
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.LNX.4.44.0308281350160.9879-100000@twin.uoregon.edu>

The cost of the os, either of a blessed one, or a roll your own one hasn't 
been a significant factor in our reluctance to use Itanium II.

The lack of commodity mainboards.

The steep price of the cpu's.

and lack of a clear view into intels product lifecycle for itaniumII.

have been issues.

Itanium II 1.3ghz 3mb cpu's have only recently arrived at ~$1400ea. 
opteron 244s are less than half that and that's before we put the rest of 
the system around it.

we have some off-the-shelf compaq itanium boxes to evaluate but at around 
$8000 ea that sort of a non-starter.

joelja

On Thu, 28 Aug 2003, Robert G. Brown wrote:

> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> 
> > With Itanium2 this is not the case.  Both Red Hat and SuSe have a fixed 
> > cost of about $400.00, plus or minus a few dollars per system.  
> > Therefore, due to this fixed cost, MOST people looking at a cluster 
> > won't touch Itanium2. 
> 
> Steve,
> 
> Are you suggesting RH has put together a package that is NOT GPL in any
> way that would significantly affect the 64 bit market?  The kernel, the
> compiler, and damn near every package is GPL, much of it from Gnu
> itself.  Am I crazy here?
> 
> So I'm having a hard time seeing why one would HAVE to pay them
> $400/system for anything except perhaps proprietary non-GPL "advanced
> server" packages that almost certainly wouldn't be important to HPC
> cluster builders (and which they would have had to damn near develop in
> a sealed room to avoid incorporating GPL stuff in it anywhere).
> 
> > Some white box resellers are looking at taking RH Advanced Server and 
> > stripping it down and offering on their ia64 clusters.  However, if 
> > their not working with code lawyers, and paying very close attention to 
> > copy right laws, they could end up with law suits down the road.
> 
> If Red Hat isn't careful and not working very carefully with code
> lawyers, I think the reverse is a lot more likely, as Richard Stallman
> is known to take the Gnu Public License (free as in air at the source
> level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't
> "own" a hell of a lot of code in what they sell; the bulk of what they
> HAVE written is GPL derived and hence GPL by inheritance alone.  The
> Open Source community would stomp anything out of line with hobnailed
> boots and club it until it stopped twitching...
> 
> So although many a business may cheerfully pay $400/seat for advanced
> server because it is a cost and support model they are comfortable with,
> I don't see what there is to stop anyone from taking an advanced server
> copy (which necessarily either comes with src rpm's or makes them
> publically available somewhere), doing an rpmbuild on all the src rpm's
> (as if anyone would care that you went through an independent rebuild vs
> just used the distribution rpm's) and putting it on 1000 systems, or
> giving the sources to a friend, or even reselling a repackaging of the
> whole thing (as long as they don't call them Red Hat and as long as they
> omit any really proprietary non-GPL work).
> 
> I even thought there were some people on the list who were using at
> least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm
> wrong...:-(
> 
>   rgb
> 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Aug 28 17:16:38 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 28 Aug 2003 14:16:38 -0700 (PDT)
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.4.44.0308281440470.4549-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.LNX.3.96.1030828141255.4158A-100000@Maggie.Linux-Consulting.com>


On Thu, 28 Aug 2003, Bogdan Costescu wrote:

> In planning for some new cluster nodes, I hit a small problem. I want:
> - a modern mainboard for dual-Xeon (preferred) or dual-Athlon
> - 1U or 2U rackmounted case
> - to use an old Myrinet M2M-PCI32 (Lanai4) which requires a riser card for 
> installing in the case

the 2U chassis should be trivial to solve for either 32bit or 64bit pci
slots 

for 1U chassis... you need to pick "right motherboard" that works with the
chassis ... and pci cards 
	( you cannot do a mix and match with any motherboard )

	if you want performance out of your pci card, you will have to use
	64bit pci slots or 32bit pci slot - but the riser card should be
	one piece instead of the whacky non-conforming wires  between
	the "2 sections of the pci riser"

32 and 64 bit pci riser cards  (not cheap but lot better than most others)
	http://www.adexelec.com
 
c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From johnb at quadrics.com  Thu Aug 28 17:22:13 2003
From: johnb at quadrics.com (John Brookes)
Date: Thu, 28 Aug 2003 22:22:13 +0100
Subject: thrashing
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E2C1@stegosaurus.bristol.quadrics.com>

The test I would probably suggest to someone whose machine I had no access
to is 'swapoff -a'. It's not big and it's not clever, but largely removes
the need for value judgements: if it bombs in an OOM style, you were most
probably thrashing.

Just a thought.

Cheers,

John Brookes
Quadrics

> -----Original Message-----
> From: Glen Kaukola [mailto:glen at mail.cert.ucr.edu]
> Sent: 28 August 2003 19:19
> To: beowulf at beowulf.org
> Subject: thrashing
> 
> 
> Hi there,
> 
> So for our newest simulations, we're working with a different domain, 
> where each of our grid cells are much smaller, and so we're expecting 
> the runs to take about 4 times longer.  But actually they're taking 
> around 40 times longer.  I'm thinking this may have something 
> to do with 
> not having enough memory.  The problem with this theory is 
> that I'm not 
> really sure how to tell if my machines are thrashing.  On a desktop 
> machine I can tell no problem, as the disk starts going crazy and the 
> system pretty much grinds to a halt.  But on a machine up in 
> my server 
> room on which I don't have any gui and where it's too loud to 
> hear any 
> disk activity, I'm really not sure how to tell whether it's 
> thrashing or 
> not.  I mean, I can look at top, and free, and sar and everything 
> doesn't look much different than when the other simulations were 
> running, except for maybe 'sar -W', which is a little bit higher.  
> Anyway, if someone could help me out with a way to determine 
> without a 
> doubt if my machines are thrashing or not, then I'd greatly 
> appriciate it.
> 
> Thanks for your time,
> Glen Kaukola
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From exa at kablonet.com.tr  Thu Aug 28 13:56:53 2003
From: exa at kablonet.com.tr (Eray Ozkural)
Date: Thu, 28 Aug 2003 20:56:53 +0300
Subject: Filesystem
In-Reply-To: <Pine.LNX.3.96.1030801194203.13498A-100000@Maggie.Linux-Consulting.com>
References: <Pine.LNX.3.96.1030801194203.13498A-100000@Maggie.Linux-Consulting.com>
Message-ID: <200308282056.54106.exa@kablonet.com.tr>

On Saturday 02 August 2003 05:45, Alvin Oga wrote:
> i think ext3 is better than reiserfs
>
> i think ext3 is not any better than ext2 in terms
> of somebody hitting pwer/reset w/o proper shutdown
> 	- i always allow it to run e2fsck when it does
> 	an unclean shutdown ...
>
> 	- yes ext3 will timeout and continue and restore from
> 	backups but ... am paranoid about the underlying ext2
> 	getting corrupted by random power off and resets
>

I basically think ext3 and ext2 are a joke and we use XFS on the nodes with no 
performance problem. Excellent reliability!


Regards,

-- 
Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sp at scali.com  Thu Aug 28 16:55:51 2003
From: sp at scali.com (Steffen Persvold)
Date: Thu, 28 Aug 2003 22:55:51 +0200
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
Message-ID: <3F4E6C57.9030406@scali.com>

Robert G. Brown wrote:
> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> 
> 
>>With Itanium2 this is not the case.  Both Red Hat and SuSe have a fixed 
>>cost of about $400.00, plus or minus a few dollars per system.  
>>Therefore, due to this fixed cost, MOST people looking at a cluster 
>>won't touch Itanium2. 
> 
> 
> Steve,
> 
> Are you suggesting RH has put together a package that is NOT GPL in any
> way that would significantly affect the 64 bit market?  The kernel, the
> compiler, and damn near every package is GPL, much of it from Gnu
> itself.  Am I crazy here?
> 
> So I'm having a hard time seeing why one would HAVE to pay them
> $400/system for anything except perhaps proprietary non-GPL "advanced
> server" packages that almost certainly wouldn't be important to HPC
> cluster builders (and which they would have had to damn near develop in
> a sealed room to avoid incorporating GPL stuff in it anywhere).
> 
> 
>>Some white box resellers are looking at taking RH Advanced Server and 
>>stripping it down and offering on their ia64 clusters.  However, if 
>>their not working with code lawyers, and paying very close attention to 
>>copy right laws, they could end up with law suits down the road.
> 
> 
> If Red Hat isn't careful and not working very carefully with code
> lawyers, I think the reverse is a lot more likely, as Richard Stallman
> is known to take the Gnu Public License (free as in air at the source
> level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't
> "own" a hell of a lot of code in what they sell; the bulk of what they
> HAVE written is GPL derived and hence GPL by inheritance alone.  The
> Open Source community would stomp anything out of line with hobnailed
> boots and club it until it stopped twitching...
> 
> So although many a business may cheerfully pay $400/seat for advanced
> server because it is a cost and support model they are comfortable with,
> I don't see what there is to stop anyone from taking an advanced server
> copy (which necessarily either comes with src rpm's or makes them
> publically available somewhere), doing an rpmbuild on all the src rpm's
> (as if anyone would care that you went through an independent rebuild vs
> just used the distribution rpm's) and putting it on 1000 systems, or
> giving the sources to a friend, or even reselling a repackaging of the
> whole thing (as long as they don't call them Red Hat and as long as they
> omit any really proprietary non-GPL work).
> 
> I even thought there were some people on the list who were using at
> least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm
> wrong...:-(
> 


Robert,

AFAIK, there is no "proprietary non-GPL" work in RedHat's Enterprise Linux line. I think the price is so high because of the support level you're buing. All the source for RHEL, 
either 32bit or 64bit is available on their ftp sites for download. And as long as they do that I don't think they're violating GPL, but I might be wrong (as I'm not a lawyers, 
but I'm sure RH has plenty of them).

And actually, according to their web site, the cheapest (most suitable cluster) release for ITP2; RHEL WS (workstation) is $792, AS (advanced server) is $1992 for standard edition 
and $2988 for premium edition.

Regards,
-- 
       Steffen Persvold           ,,,       mailto: sp at scali.com
    Senior Software Engineer     (o-o)      http://www.scali.com
-----------------------------oOO-(_)-OOo-----------------------------
Scali AS, PObox 150, Oppsal, N-0619 Oslo, Norway, Tel: +4792484511

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Aug 28 18:11:19 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 28 Aug 2003 15:11:19 -0700 (PDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E6C57.9030406@scali.com>
Message-ID: <Pine.LNX.4.44.0308281508210.9879-100000@twin.uoregon.edu>

Stephen... anyone who wants can grab the entire srpms dir for AS and build 
it. The only way they'll end up with a  lawsuit is if they represent the 
result as official suppoprt redhat linux AS...

If you like you can pick it up from the RH mirrors including mine.

> > On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> > 
> >>Some white box resellers are looking at taking RH Advanced Server and 
> >>stripping it down and offering on their ia64 clusters.  However, if 
> >>their not working with code lawyers, and paying very close attention to 
> >>copy right laws, they could end up with law suits down the road.
> > 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Thu Aug 28 18:25:42 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 29 Aug 2003 00:25:42 +0200 (CEST)
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.3.96.1030828141255.4158A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0308290019530.10632-100000@kenzo.iwr.uni-heidelberg.de>

On Thu, 28 Aug 2003, Alvin Oga wrote:

> the 2U chassis should be trivial to solve for either 32bit or 64bit pci
> slots 

Well, maybe trivial for you who do this for a living :-)

> for 1U chassis... you need to pick "right motherboard" that works with the
> chassis ... and pci cards 
> 	( you cannot do a mix and match with any motherboard )

Sure, but I was looking for example at the Intel offerings which pair dual 
Xeon mainboards with 1U/2U cases that are certified to work together.

> 	if you want performance out of your pci card,

I know that this 32bit/33MHz card looks slow by today's standards, but I 
think that it can still provide lower latency than e1000 or tg3 -driven 
cards, so Id' like to continue to use them.

> 32 and 64 bit pci riser cards  (not cheap but lot better than most others)
> 	http://www.adexelec.com

Many thanks for this address. I did try to use google before writting to 
the list, but I came with all sorts of shops, but nothing with good 
descriptions, which is what I needed most.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Thu Aug 28 18:44:24 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 29 Aug 2003 00:44:24 +0200 (CEST)
Subject: 32bit slots and riser cards
In-Reply-To: <20030828194308.GA1778@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0308290025520.10632-100000@kenzo.iwr.uni-heidelberg.de>

On Thu, 28 Aug 2003, Greg Lindahl wrote:

> Doesn't that mean that the Myrinet card is 5 volts, and you only have
> 3.3 volt PCI slots ?

Bingo. This is exactly what most people that wrote to me off-list probably 
missed, although I did mention that it doesn't fit because of a different 
keying - I should have probably mentioned this explicitly. All the 32bit 
slots that I've seen on these mainboards allow inserting such cards, which 
makes me believe that they support both 5V and 3.3V cards; but the 64bit 
slots are 3.3V only.

I don't have much experience with rackmounted systems, which it's probably 
evident, so I didn't know what to expect from a riser. Thanks to Alvin 
Oga's mention of Adexelec site, I was able to find out that the risers 
exist in many different variations. For example, I was wondering if such a 
riser exist that would allow mounting of the card from the edge toward the 
middle of the mainboard, while most common way is the other way around - 
I still need to find out if the case allows fixing of the card the other 
way around, but this is an easier problem to solve.

One other thing that turned me off was that in a system composed of only
Intel components (SE7501WV2 mainboard and SR2300 case) the 64bit PCI slots
on the mainboard allow inserting of the Myrinet card (but didn't try to
see if it works), while the riser cards that came with the case do not,
allowing only 3.3V ones - so the riser imposes additional limitations...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gotero at linuxprophet.com  Thu Aug 28 19:46:29 2003
From: gotero at linuxprophet.com (Glen Otero)
Date: Thu, 28 Aug 2003 16:46:29 -0700
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281508210.9879-100000@twin.uoregon.edu>
Message-ID: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>

Joel-

Have you actually built RH AS from scratch using their SRPMS?  Or do  
you know anyone that has? I'm very interested in doing this but I heard  
there were some pretty significant obstacles along the lines of package  
dependencies.

Glen

On Thursday, August 28, 2003, at 03:11  PM, Joel Jaeggli wrote:

> Stephen... anyone who wants can grab the entire srpms dir for AS and  
> build
> it. The only way they'll end up with a  lawsuit is if they represent  
> the
> result as official suppoprt redhat linux AS...
>
> If you like you can pick it up from the RH mirrors including mine.
>
>>> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
>>>
>>>> Some white box resellers are looking at taking RH Advanced Server  
>>>> and
>>>> stripping it down and offering on their ia64 clusters.  However, if
>>>> their not working with code lawyers, and paying very close  
>>>> attention to
>>>> copy right laws, they could end up with law suits down the road.
>>>
>
> --  
> ----------------------------------------------------------------------- 
> ---
> Joel Jaeggli  	       Unix Consulting 	        
> joelja at darkwing.uoregon.edu
> GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F  
> 56B2
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
Glen Otero, Ph.D.
Linux Prophet
619.917.1772

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From walkev at presearch.com  Thu Aug 28 19:57:53 2003
From: walkev at presearch.com (Vann H. Walke)
Date: Thu, 28 Aug 2003 19:57:53 -0400
Subject: Intel acquiring Pallas (Redhat AS Rebuild)
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
References: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <1062115073.7007.2.camel@localhost.localdomain>

Haven't tried it but...

http://www2.uibk.ac.at/zid/software/unix/linux/rhel-rebuild.htm
http://www.uibk.ac.at/zid/software/unix/linux/rhel-rebuild-l.html

Vann

On Thu, 2003-08-28 at 19:46, Glen Otero wrote:
> Joel-
> 
> Have you actually built RH AS from scratch using their SRPMS?  Or do  
> you know anyone that has? I'm very interested in doing this but I heard  
> there were some pretty significant obstacles along the lines of package  
> dependencies.
> 
> Glen
> 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Aug 28 20:54:43 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 29 Aug 2003 10:54:43 +1000
Subject: Intel acquiring Pallas
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
References: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <200308291054.45059.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 29 Aug 2003 09:46 am, Glen Otero wrote:

> Have you actually built RH AS from scratch using their SRPMS?  Or do  
> you know anyone that has?

- From the Rocks Cluster Distribution website:

http://www.rocksclusters.org/Rocks/

[...]
Rocks 2.3.2 IA64 is based on Red Hat Advanced Workstation 2.1 recompiled from 
Red Hat's publicly available source RPMs.
[...]

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/TqRTO2KABBYQAh8RAnd4AJkBCFmq3tyb97EgHvg5x9mrsqkGGQCghGqG
9cF9eAKLTHD6lQS4kZGtg0A=
=WVIz
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Thu Aug 28 20:59:57 2003
From: timm at fnal.gov (Steven Timm)
Date: Thu, 28 Aug 2003 19:59:57 -0500
Subject: Intel acquiring Pallas
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <Pine.SGI.4.31.0308281959160.3207571-100000@fsgi01.fnal.gov>

The ROCKS distribution at www.rocksclusters.org claims
to have done so for the IA64 architecture.. I have not tested it myself.
Your mileage may vary.

Steve

------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Core Support Services Dept.
Assistant Group Leader, Scientific Computing Support Group
Lead of Computing Farms Team

On Thu, 28 Aug 2003, Glen Otero wrote:

> Joel-
>
> Have you actually built RH AS from scratch using their SRPMS?  Or do
> you know anyone that has? I'm very interested in doing this but I heard
> there were some pretty significant obstacles along the lines of package
> dependencies.
>
> Glen
>
> On Thursday, August 28, 2003, at 03:11  PM, Joel Jaeggli wrote:
>
> > Stephen... anyone who wants can grab the entire srpms dir for AS and
> > build
> > it. The only way they'll end up with a  lawsuit is if they represent
> > the
> > result as official suppoprt redhat linux AS...
> >
> > If you like you can pick it up from the RH mirrors including mine.
> >
> >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> >>>
> >>>> Some white box resellers are looking at taking RH Advanced Server
> >>>> and
> >>>> stripping it down and offering on their ia64 clusters.  However, if
> >>>> their not working with code lawyers, and paying very close
> >>>> attention to
> >>>> copy right laws, they could end up with law suits down the road.
> >>>
> >
> > --
> > -----------------------------------------------------------------------
> > ---
> > Joel Jaeggli  	       Unix Consulting
> > joelja at darkwing.uoregon.edu
> > GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F
> > 56B2
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> Glen Otero, Ph.D.
> Linux Prophet
> 619.917.1772
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nfalano at hotmail.com  Thu Aug 28 21:29:34 2003
From: nfalano at hotmail.com (Norman Alano)
Date: Fri, 29 Aug 2003 09:29:34 +0800
Subject: mpich
Message-ID: <Law15-F27gnZgVohmvB0000e296@hotmail.com>


greetings !

i already installed mpich... but the problem is whenever i run an 
application for instant the examples in the mpich the graphics wont show.... 
how can i configure so that i can run the application with graphic?

cheers

norman

_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE* 
http://join.msn.com/?page=features/junkmail

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Fri Aug 29 00:04:27 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 28 Aug 2003 21:04:27 -0700 (PDT)
Subject: Intel acquiring Pallas
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <Pine.LNX.4.44.0308282053050.11230-100000@twin.uoregon.edu>

I've built almost all of it with the exception of gtk and kde related 
stuff which was outside the scope of my interest, on a redhat 7.2 box... I 
wouldn't try it on a 9 host.

joelja

On Thu, 28 Aug 2003, Glen Otero 
wrote:

> Joel-
> 
> Have you actually built RH AS from scratch using their SRPMS?  Or do  
> you know anyone that has? I'm very interested in doing this but I heard  
> there were some pretty significant obstacles along the lines of package  
> dependencies.
> 
> Glen
> 
> On Thursday, August 28, 2003, at 03:11  PM, Joel Jaeggli wrote:
> 
> > Stephen... anyone who wants can grab the entire srpms dir for AS and  
> > build
> > it. The only way they'll end up with a  lawsuit is if they represent  
> > the
> > result as official suppoprt redhat linux AS...
> >
> > If you like you can pick it up from the RH mirrors including mine.
> >
> >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> >>>
> >>>> Some white box resellers are looking at taking RH Advanced Server  
> >>>> and
> >>>> stripping it down and offering on their ia64 clusters.  However, if
> >>>> their not working with code lawyers, and paying very close  
> >>>> attention to
> >>>> copy right laws, they could end up with law suits down the road.
> >>>
> >
> > --  
> > ----------------------------------------------------------------------- 
> > ---
> > Joel Jaeggli  	       Unix Consulting 	        
> > joelja at darkwing.uoregon.edu
> > GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F  
> > 56B2
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit  
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> Glen Otero, Ph.D.
> Linux Prophet
> 619.917.1772
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sgaudet at wildopensource.com  Fri Aug 29 09:52:31 2003
From: sgaudet at wildopensource.com (Stephen Gaudet)
Date: Fri, 29 Aug 2003 09:52:31 -0400
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
Message-ID: <3F4F5A9F.5000809@wildopensource.com>


Robert, and everyone else,

To be clear on this without breaking NDA's see below;

> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> 
> 
>>With Itanium2 this is not the case.  Both Red Hat and SuSe have a fixed 
>>cost of about $400.00, plus or minus a few dollars per system.  
>>Therefore, due to this fixed cost, MOST people looking at a cluster 
>>won't touch Itanium2. 
> 
> 
> Steve,
> 
> Are you suggesting RH has put together a package that is NOT GPL in any
> way that would significantly affect the 64 bit market?  The kernel, the
> compiler, and damn near every package is GPL, much of it from Gnu
> itself.  Am I crazy here?
> 
> So I'm having a hard time seeing why one would HAVE to pay them
> $400/system for anything except perhaps proprietary non-GPL "advanced
> server" packages that almost certainly wouldn't be important to HPC
> cluster builders (and which they would have had to damn near develop in
> a sealed room to avoid incorporating GPL stuff in it anywhere).
> 
> 
>>Some white box resellers are looking at taking RH Advanced Server and 
>>stripping it down and offering on their ia64 clusters.  However, if 
>>their not working with code lawyers, and paying very close attention to 
>>copy right laws, they could end up with law suits down the road.

I can't really comment here on what I hear resellers looking to do.

> If Red Hat isn't careful and not working very carefully with code
> lawyers, I think the reverse is a lot more likely, as Richard Stallman
> is known to take the Gnu Public License (free as in air at the source
> level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't
> "own" a hell of a lot of code in what they sell; the bulk of what they
> HAVE written is GPL derived and hence GPL by inheritance alone.  The
> Open Source community would stomp anything out of line with hobnailed
> boots and club it until it stopped twitching...
> 
> So although many a business may cheerfully pay $400/seat for advanced
> server because it is a cost and support model they are comfortable with,
> I don't see what there is to stop anyone from taking an advanced server
> copy (which necessarily either comes with src rpm's or makes them
> publically available somewhere), doing an rpmbuild on all the src rpm's
> (as if anyone would care that you went through an independent rebuild vs
> just used the distribution rpm's) and putting it on 1000 systems, or
> giving the sources to a friend, or even reselling a repackaging of the
> whole thing (as long as they don't call them Red Hat and as long as they
> omit any really proprietary non-GPL work).
> 
> I even thought there were some people on the list who were using at
> least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm
> wrong...:-(
> 


In regards to the high-performance/technical computing space.

People buy Red Hat Advanced Server and SuSE Linux Enterprise Server
because that's what the ISVs support (Oracle, DB2, Sybase, WebLogic
etc.).  RHAS and SLES are primarity targeted at the commercial
computing space.

In the HPC space, there is a void in the sense that Red Hat doesn't
have a "community" distribution for IA-64 anymore (7.2 was the last).
Don't know whether SuSE make their bits readily available.

There are, however, several free alternatives:

  - Debian, for instance, is available for all HP hardware (as it is
    the internal software development vehicle at HP).

  - MSC Linux is also available for download (www.msclinux.com).

  - Rocks (www.rocksclusters.org) is a stripped and shipped Red Hat
    Advanced Server 2.1 for IA-64.


So it's perfectly reasonable to use any of the above - as long as you
don't require technical support (something WOS could provide, though).

The strip and ship game works for now. However, given the increasing
customization and branding done by Red Hat in later releases (8 and 9,
also in RHAS 3) it is probably not going to be feasible to keep doing
this going forward.  Red Hat's brand is very strong and consequently
it's all over the place in their products now.  So I guesstimate that
debranding is going to be at least an order of magnitude harder for
RHAS 3.


And just to clear up confusion.  Here's the scoop with RHAS,
availabity, support agreements, etc.:

1. Red Hat has decided *not* to make binaries/ISO images of RHAS
    available for download.  Given that the distribution is covered by
    the GPL, *nothing* prevents somebody else from making it available.
    It is out there on the net if you look hard enough.

2. Again, being covered by the GPL, nothing prevents you from
    distributing it in unmodified form.  It's perfectly legal to burn
    CDs and give them to customers.

3. If you modify the product in any way you invalidate the branding on
    RHAS as a whole, and you can no longer call the result RHAS without
    infringing Red Hat's trademarks.

4. If you buy RHAS from Red Hat you have to sign a service level
    agreement.  This agreement is not restricting distribution of the
    RHAS binaries or source.  It is a service level agreement between
    you and Red Hat (which you unfortunately have to sign to get access
    to the product in the first place).

5. One of the clauses in the SLA states that you agree to pay a
    support fee for each system you use RHAS on (and you grant RH the
    right to audit your network).  If you choose not to comply with
    this clause, Red Hat will declare the service agreement null and
    void and you will no longer have access to patches and security
    fixes.

6. Given that the update packages are covered by the GPL, *nothing*
    prevents a receiver of said packages to make them available for
    download on the Internet.  Red Hat can do *nothing* to prevent
    further distribution.  IOW, nothing prevents you from buying one
    license and make the updates available to the rest of the world.

    Red Hat can, however, potentially decide not to provide you with
    future updates if you do this.  This is a bit unclear in the SLA.

Ok.  So, executive summary: Red Hat are using a service customer level
agreement to limit spreading of binary versions of RHAS.  Given that
RHAS is covered by the GPL, they cannot prevent distribution.  Their
only rebuttal will be refusal of further updates as per the SLA.

But in the case of technical computing it isn't really that important
whether the product is called RHAS, Rocks or HP Linux for HPC.  They
are all functionally identical.


mkp,
Resident Paralegal

-- Martin K. Petersen Wild Open Source, Inc. mkp at wildopensource.com 
http://www.wildopensource.com/

BTW: http://www.msclinux.com/  has been shut down.

-- 
Steve Gaudet

Wild Open Source (home office)
----------------------
Bedford, NH 03110
pH:603-488-1599
cell:603-498-1600
http://www.wildopensource.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From asabigue at fing.edu.uy  Fri Aug 29 04:06:38 2003
From: asabigue at fing.edu.uy (Ariel Sabiguero)
Date: Fri, 29 Aug 2003 11:06:38 +0300
Subject: European Commission Patentability rules
Message-ID: <3F4F098E.7080202@fing.edu.uy>

Dear all:

I have not seen comments on the list regarding to this subject. 
I know that this might be considered political and off-topic but I
believe that most of our (beowulf) software technology is Open/Free
and that the results of further regulations might affect our work.
Sorry for the noise for those of you who already knew this.

Regards

Ariel

<extract>
On September 1st the European Commission is going to vote a revised 
version of the European Patentability rules. The proposed revision 
contains a set of serious challenges to Open Source development since 
regulation regarding software patents will be broadly extended and might 
forbid independent development of innovative (Open Source and not) 
software-based solutions.

The European Open Source community is very concerned about the upcoming 
new regulation and has organized a demo protest for August 27, asking 
Open Source supporting sites to change their home pages to let everyone 
know what is going on at the European Parliament. 

For further information please see http://swpat.ffii.org and 
http://petition.eurolinux.org.
</extract>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Aug 29 10:17:09 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 29 Aug 2003 10:17:09 -0400 (EDT)
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.4.44.0308290025520.10632-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.LNX.4.44.0308291010060.23587-100000@ganesh.phy.duke.edu>

On Fri, 29 Aug 2003, Bogdan Costescu wrote:

> One other thing that turned me off was that in a system composed of only
> Intel components (SE7501WV2 mainboard and SR2300 case) the 64bit PCI slots
> on the mainboard allow inserting of the Myrinet card (but didn't try to
> see if it works), while the riser cards that came with the case do not,
> allowing only 3.3V ones - so the riser imposes additional limitations...

One last possible solution you can consider if you're using 2U cases and
don't mind ugly is that MANY of the cards you might want to add nowadays
are half-height cards on full height backplates.  Usually the backplate
is held on by two little screws.  The half height cards will snap into a
regular PCI slot normally (vertically) and still permit the case to
close with no riser at all.  The two negatives are there there are no
"half height riser backplates" that I know of, so the back of each
chassis will be open to the air, which may or may not screw around with
cooling airflow in negative ways, and the fact that you can't "screw the
cards down".  Both of these can be solved (or ignored) with a teeny bit
of effort, although you'll probably prefer to just get a riser that
meets your needs -- there are risers with a key that fits in the AGP
slot, risers with 32 bit keys, risers with 64 bit keys.. shop around.
Be aware that some of the risers you can buy don't work properly (why I
can't say, given that they appear to be little more than bus extenders
with keys to grab power and timing/address lines).

At a guess this won't help you with an old myrinet card as it is
probably full height, but if you get desperate and it's not, you could
likely make this work.

   rgb

> 
> -- 
> Bogdan Costescu
> 
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Fri Aug 29 11:10:43 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 29 Aug 2003 17:10:43 +0200 (CEST)
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.4.44.0308291010060.23587-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.LNX.4.44.0308291705250.17481-100000@kenzo.iwr.uni-heidelberg.de>

On Fri, 29 Aug 2003, Robert G. Brown wrote:

> is that MANY of the cards you might want to add nowadays are half-height
> cards on full height backplates.

Nice try :-)
It's a full-height card. And buying a taller case for each node with these
Myrinet cards to allow vertical mounting would make me start looking for
an 100U rack :-)

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lehi.gracia at amd.com  Fri Aug 29 10:48:55 2003
From: lehi.gracia at amd.com (lehi.gracia at amd.com)
Date: Fri, 29 Aug 2003 09:48:55 -0500
Subject: Intel acquiring Pallas
Message-ID: <99F2150714F93F448942F9A9F112634C07BE62F3@txexmtae.amd.com>


>6. Given that the update packages are covered by the GPL, *nothing*
>    prevents a receiver of said packages to make them available for
>    download on the Internet.  Red Hat can do *nothing* to prevent
>    further distribution.  IOW, nothing prevents you from buying one
>    license and make the updates available to the rest of the world.
>
>    Red Hat can, however, potentially decide not to provide you with
>    future updates if you do this.  This is a bit unclear in the SLA.

Correct me if I'm wrong, I though part of the GPL was that you have to give the source code to anyone that asks for it, is it not? Per section 2b.

2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: 

b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. 

http://www.gnu.org/copyleft/gpl.html?cid=6

They still keep patches on their web site do they not?

Lehi Gracia


-----Original Message-----
From: Stephen Gaudet [mailto:sgaudet at wildopensource.com] 
Sent: Friday, August 29, 2003 8:53 AM
To: Robert G. Brown
Cc: Rocky McGaugh; Jeffrey B. Layton; beowulf at beowulf.org
Subject: Re: Intel acquiring Pallas


Robert, and everyone else,

To be clear on this without breaking NDA's see below;

> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> 
> 
>>With Itanium2 this is not the case.  Both Red Hat and SuSe have a 
>>fixed
>>cost of about $400.00, plus or minus a few dollars per system.  
>>Therefore, due to this fixed cost, MOST people looking at a cluster 
>>won't touch Itanium2. 
> 
> 
> Steve,
> 
> Are you suggesting RH has put together a package that is NOT GPL in 
> any way that would significantly affect the 64 bit market?  The 
> kernel, the compiler, and damn near every package is GPL, much of it 
> from Gnu itself.  Am I crazy here?
> 
> So I'm having a hard time seeing why one would HAVE to pay them 
> $400/system for anything except perhaps proprietary non-GPL "advanced 
> server" packages that almost certainly wouldn't be important to HPC 
> cluster builders (and which they would have had to damn near develop 
> in a sealed room to avoid incorporating GPL stuff in it anywhere).
> 
> 
>>Some white box resellers are looking at taking RH Advanced Server and
>>stripping it down and offering on their ia64 clusters.  However, if 
>>their not working with code lawyers, and paying very close attention to 
>>copy right laws, they could end up with law suits down the road.

I can't really comment here on what I hear resellers looking to do.

> If Red Hat isn't careful and not working very carefully with code 
> lawyers, I think the reverse is a lot more likely, as Richard Stallman 
> is known to take the Gnu Public License (free as in air at the source 
> level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't 
> "own" a hell of a lot of code in what they sell; the bulk of what they 
> HAVE written is GPL derived and hence GPL by inheritance alone.  The 
> Open Source community would stomp anything out of line with hobnailed 
> boots and club it until it stopped twitching...
> 
> So although many a business may cheerfully pay $400/seat for advanced 
> server because it is a cost and support model they are comfortable 
> with, I don't see what there is to stop anyone from taking an advanced 
> server copy (which necessarily either comes with src rpm's or makes 
> them publically available somewhere), doing an rpmbuild on all the src 
> rpm's (as if anyone would care that you went through an independent 
> rebuild vs just used the distribution rpm's) and putting it on 1000 
> systems, or giving the sources to a friend, or even reselling a 
> repackaging of the whole thing (as long as they don't call them Red 
> Hat and as long as they omit any really proprietary non-GPL work).
> 
> I even thought there were some people on the list who were using at 
> least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm 
> wrong...:-(
> 


In regards to the high-performance/technical computing space.

People buy Red Hat Advanced Server and SuSE Linux Enterprise Server because that's what the ISVs support (Oracle, DB2, Sybase, WebLogic etc.).  RHAS and SLES are primarity targeted at the commercial computing space.

In the HPC space, there is a void in the sense that Red Hat doesn't have a "community" distribution for IA-64 anymore (7.2 was the last). Don't know whether SuSE make their bits readily available.

There are, however, several free alternatives:

  - Debian, for instance, is available for all HP hardware (as it is
    the internal software development vehicle at HP).

  - MSC Linux is also available for download (www.msclinux.com).

  - Rocks (www.rocksclusters.org) is a stripped and shipped Red Hat
    Advanced Server 2.1 for IA-64.


So it's perfectly reasonable to use any of the above - as long as you don't require technical support (something WOS could provide, though).

The strip and ship game works for now. However, given the increasing customization and branding done by Red Hat in later releases (8 and 9, also in RHAS 3) it is probably not going to be feasible to keep doing this going forward.  Red Hat's brand is very strong and consequently it's all over the place in their products now.  So I guesstimate that debranding is going to be at least an order of magnitude harder for RHAS 3.


And just to clear up confusion.  Here's the scoop with RHAS, availabity, support agreements, etc.:

1. Red Hat has decided *not* to make binaries/ISO images of RHAS
    available for download.  Given that the distribution is covered by
    the GPL, *nothing* prevents somebody else from making it available.
    It is out there on the net if you look hard enough.

2. Again, being covered by the GPL, nothing prevents you from
    distributing it in unmodified form.  It's perfectly legal to burn
    CDs and give them to customers.

3. If you modify the product in any way you invalidate the branding on
    RHAS as a whole, and you can no longer call the result RHAS without
    infringing Red Hat's trademarks.

4. If you buy RHAS from Red Hat you have to sign a service level
    agreement.  This agreement is not restricting distribution of the
    RHAS binaries or source.  It is a service level agreement between
    you and Red Hat (which you unfortunately have to sign to get access
    to the product in the first place).

5. One of the clauses in the SLA states that you agree to pay a
    support fee for each system you use RHAS on (and you grant RH the
    right to audit your network).  If you choose not to comply with
    this clause, Red Hat will declare the service agreement null and
    void and you will no longer have access to patches and security
    fixes.

6. Given that the update packages are covered by the GPL, *nothing*
    prevents a receiver of said packages to make them available for
    download on the Internet.  Red Hat can do *nothing* to prevent
    further distribution.  IOW, nothing prevents you from buying one
    license and make the updates available to the rest of the world.

    Red Hat can, however, potentially decide not to provide you with
    future updates if you do this.  This is a bit unclear in the SLA.

Ok.  So, executive summary: Red Hat are using a service customer level agreement to limit spreading of binary versions of RHAS.  Given that RHAS is covered by the GPL, they cannot prevent distribution.  Their only rebuttal will be refusal of further updates as per the SLA.

But in the case of technical computing it isn't really that important whether the product is called RHAS, Rocks or HP Linux for HPC.  They are all functionally identical.


mkp,
Resident Paralegal

-- Martin K. Petersen Wild Open Source, Inc. mkp at wildopensource.com 
http://www.wildopensource.com/

BTW: http://www.msclinux.com/  has been shut down.

-- 
Steve Gaudet

Wild Open Source (home office)
----------------------
Bedford, NH 03110
pH:603-488-1599
cell:603-498-1600
http://www.wildopensource.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Fri Aug 29 11:17:04 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 29 Aug 2003 11:17:04 -0400
Subject: Intel acquiring Pallas
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
References: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <1062170224.9421.4.camel@roughneck>

On Thu, 2003-08-28 at 19:46, Glen Otero wrote:
> Joel-
> 
> Have you actually built RH AS from scratch using their SRPMS?  Or do  
> you know anyone that has? I'm very interested in doing this but I heard  
> there were some pretty significant obstacles along the lines of package  
> dependencies.
> 

The links to the rhel-rebuild howto and mailing list are enought to get
this done -- I just did 2.1 ES ( why bother with spending more for AS ?
). We purchased one copy of ES, and I used that to do the rebuild. Of
course, it is not completely automatic, but there are only a handfull of
packages that do not build without a bit of tweaking.

As far as pkg dependencies go, it is _much_ easier to build on a similar
system.

Now for the $10K question -- are there any reasons that I ( or someone
else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It
of course still has the RH branding all over it, but it could be
distributed being called 'Nics Fun RH clone', or something similar.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From robert at yu.org  Fri Aug 29 12:09:46 2003
From: robert at yu.org (Robert K. Yu)
Date: Fri, 29 Aug 2003 09:09:46 -0700 (PDT)
Subject: beowulf to a good home
Message-ID: <20030829160946.6897.qmail@web40904.mail.yahoo.com>

Hi,

I have the following:

16 machines
450 MHz dual Celeron each (i.e. 32 CPU)
128M memory each
100BaseT switch
6G drive each

I would like to donate these machines and see them
put to good use.  Pick up from the San Francisco south bay area,
or you pay for shipment.  Thanks.

-Robert

=====
Robert K. Yu
mailto:robert at yu.org
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Fri Aug 29 12:19:31 2003
From: becker at scyld.com (Donald Becker)
Date: Fri, 29 Aug 2003 12:19:31 -0400 (EDT)
Subject: Intel acquiring Pallas
In-Reply-To: <99F2150714F93F448942F9A9F112634C07BE62F3@txexmtae.amd.com>
Message-ID: <Pine.LNX.4.44.0308291149540.18703-100000@beotest.scyld.com>

On Fri, 29 Aug 2003 lehi.gracia at amd.com wrote:

  Red Hat can do *nothing* to prevent
> >    further distribution.  IOW, nothing prevents you from buying one
> >    license and make the updates available to the rest of the world.
> >
> >    Red Hat can, however, potentially decide not to provide you with
> >    future updates if you do this.  This is a bit unclear in the SLA.
> 
> Correct me if I'm wrong, I though part of the GPL was that you have to
> give the source code to anyone that asks for it, is it not? Per section
> 2b. 

No, section 2b states that you must propage the license, not make the
source code available to any third party.

Section 3 covers distribution and redistribution.  You don't have to
make the source code available to an arbitrary third party, just those
with the offers in 3b or 3c.  For distributions Red Hat ship with the
source code, they have no further obligations.

> >6. Given that the update packages are covered by the GPL, *nothing*
> >    prevents a receiver of said packages to make them available for
> >    download on the Internet.

For most individual packages, correct.

And the following discussion covers individual packages, not the
distribution as a whole.
If the package contains a trademarked logo embedded with GPL code they
   Should grant the right to use a package unmodified, including the logo
     (The GPL doesn't explicitly cover the case of logos, but a
     reasonable reading is that if Red Hat itself packages up the logo
     you have the right of unmodified distribution.)
   May require you to remove the logo with any modificatation

The entire distribution is another issue.  It may be protected by
copyright on the collection.  The may restrict distribution of packages
consisting of Red Hat branding and logos, which means some level of
content reassembly is necessary to distribute.

Red Hat may also insist that you not misrepresent a copy as a Red Hat
product.  This is an area where it's difficult to generalize.  They may
require removing packages/elements consisting of just logos or Red Hat
documentation.  And third parties can use the trademark name where it's
descriptive, but not misleading.  Consider the difference between
"Chevrolet Service Station" and "Service Station for Chevrolets"

[[ Native English speakers immediately understand the difference, and
think of this rule as just part of the language.  But you will not find
this legally-inculcated distinction as a part of the grammer. ]]

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Fri Aug 29 13:16:06 2003
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Fri, 29 Aug 2003 10:16:06 -0700 (PDT)
Subject: IBM releases C/C++/F90 compilers - optimized for G5
Message-ID: <20030829171606.62071.qmail@web11408.mail.yahoo.com>

Free download:

http://www-3.ibm.com/software/awdtools/ccompilers/ 
http://www-3.ibm.com/software/awdtools/fortran/

Rayson


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Fri Aug 29 14:52:19 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Fri, 29 Aug 2003 11:52:19 -0700 (PDT)
Subject: Intel acquiring Pallas
In-Reply-To: <5E9EB8D6-DA4C-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <Pine.LNX.4.44.0308291145110.18203-100000@twin.uoregon.edu>

On Fri, 29 Aug 2003, Glen Otero wrote:
> 
> You can redistribute it as long as it doesn't have RH all over it and  
> you don't use the RH name while endorsing/promoting it. I suppose you 
> could say it's RH compliant and built from RH srpms. The loop hole that 
> RH is taking advantage of is the fact that they are compliant with the 
> GPL as long as they release the sources. They comply with the GPL by 
> releasing the sources in srpm format, and so technically do not have to 
> make the isos freely available. By making it slightly difficult to 
> build your own distro, and not offering support to those who do, RH is 
> coaxing people to take the path of least resistance (wrt effort) and 
> buy licenses.

I wouldn't really consider it a loophole, it's compatible with the spirit 
of the gpl. it's not as convenient as some people might like... but the 
sources are all there and they build and work.
 
> Glen
> >
> > Nic
> > -- 
> > Nicholas Henke
> > Penguin Herder & Linux Cluster System Programmer
> > Liniac Project - Univ. of Pennsylvania
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit 
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> Glen Otero, Ph.D.
> Linux Prophet
> 619.917.1772
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gotero at linuxprophet.com  Fri Aug 29 14:12:37 2003
From: gotero at linuxprophet.com (Glen Otero)
Date: Fri, 29 Aug 2003 11:12:37 -0700
Subject: Intel acquiring Pallas
In-Reply-To: <1062170224.9421.4.camel@roughneck>
Message-ID: <5E9EB8D6-DA4C-11D7-A174-000393911A90@linuxprophet.com>


On Friday, August 29, 2003, at 08:17  AM, Nicholas Henke wrote:

> On Thu, 2003-08-28 at 19:46, Glen Otero wrote:
>> Joel-
>>
>> Have you actually built RH AS from scratch using their SRPMS?  Or do
>> you know anyone that has? I'm very interested in doing this but I 
>> heard
>> there were some pretty significant obstacles along the lines of 
>> package
>> dependencies.
>>
>
> The links to the rhel-rebuild howto and mailing list are enought to get
> this done -- I just did 2.1 ES ( why bother with spending more for AS ?
> ). We purchased one copy of ES, and I used that to do the rebuild. Of
> course, it is not completely automatic, but there are only a handfull 
> of
> packages that do not build without a bit of tweaking.
>
> As far as pkg dependencies go, it is _much_ easier to build on a 
> similar
> system.
>
> Now for the $10K question -- are there any reasons that I ( or someone
> else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It
> of course still has the RH branding all over it, but it could be
> distributed being called 'Nics Fun RH clone', or something similar.

You can redistribute it as long as it doesn't have RH all over it and  
you don't use the RH name while endorsing/promoting it. I suppose you 
could say it's RH compliant and built from RH srpms. The loop hole that 
RH is taking advantage of is the fact that they are compliant with the 
GPL as long as they release the sources. They comply with the GPL by 
releasing the sources in srpm format, and so technically do not have to 
make the isos freely available. By making it slightly difficult to 
build your own distro, and not offering support to those who do, RH is 
coaxing people to take the path of least resistance (wrt effort) and 
buy licenses.

Glen
>
> Nic
> -- 
> Nicholas Henke
> Penguin Herder & Linux Cluster System Programmer
> Liniac Project - Univ. of Pennsylvania
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
Glen Otero, Ph.D.
Linux Prophet
619.917.1772

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Fri Aug 29 19:01:03 2003
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Fri, 29 Aug 2003 16:01:03 -0700 (PDT)
Subject: IBM releases C/C++/F90 compilers - optimized for Apple G5
In-Reply-To: <99F2150714F93F448942F9A9F112634C07BE62F5@txexmtae.amd.com>
Message-ID: <20030829230103.91270.qmail@web11407.mail.yahoo.com>

(Sorry, didn't made it clear in my last email...)

The compilers are for MacOSX.

Rayson

> Which one do we use for Linux, will the AIX one work?
>
 
> > Free download:
> > 
> > http://www-3.ibm.com/software/awdtools/ccompilers/ 
> > http://www-3.ibm.com/software/awdtools/fortran/
> > 


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sat Aug 30 12:48:36 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 30 Aug 2003 20:48:36 +0400
Subject: .rhosts or /etc/hosts.equiv
In-Reply-To: <200308291902.h7TJ2Ow14727@NewBlue.Scyld.com>
Message-ID: <web-20830086@servihoo.com>

hi
If i don't find these to file should i create it?
i know that .rhosts is hidden but when I do ls -a
i cannot find it even if i use the command locate
therefore if i create it what permission should i give 
them
thanks
roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Sat Aug 30 13:53:56 2003
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Sat, 30 Aug 2003 12:53:56 -0500
Subject: .rhosts or /etc/hosts.equiv
In-Reply-To: <web-20830086@servihoo.com>; from rouds@servihoo.com on Sat, Aug 30, 2003 at 08:48:36PM +0400
References: <200308291902.h7TJ2Ow14727@NewBlue.Scyld.com> <web-20830086@servihoo.com>
Message-ID: <20030830125356.C3206@mikee.ath.cx>

On Sat, 30 Aug 2003, RoUdY wrote:

> hi
> If i don't find these to file should i create it?
> i know that .rhosts is hidden but when I do ls -a
> i cannot find it even if i use the command locate
> therefore if i create it what permission should i give 
> them
> thanks
> roudy

the file ~/.rhosts should have permissions of 600
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Sun Aug 31 19:45:52 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 1 Sep 2003 09:45:52 +1000
Subject: Trademark caveats about building RHAS from SRPMS (was Re: Intel acquiring Pallas)
In-Reply-To: <1062170224.9421.4.camel@roughneck>
References: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com> <1062170224.9421.4.camel@roughneck>
Message-ID: <200309010945.53871.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 30 Aug 2003 01:17 am, Nicholas Henke wrote:

> Now for the $10K question -- are there any reasons that I ( or someone
> else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It
> of course still has the RH branding all over it, but it could be
> distributed being called 'Nics Fun RH clone', or something similar.

Redhat have a set of rules of what you can and cannot do. Basically whilst 
they comply with the GPL they do restrict what you can do with their 
trademarks (i.e. things like Redhat and the ShadowMan logo).

Two of the major things are:

http://www.redhat.com/about/corporate/trademark/guidelines/page6.html

C. You may not state that your product "contains Red Hat Linux X.X." This 
would amount to impermissible use of Red Hat's trademarks. [...]

D. You must modify the files identified as REDHAT-LOGOS and ANACONDA-IMAGES so 
as to remove all use of images containing the "Red Hat" trademark or Red 
Hat's Shadow Man logo. Note that mere deletion of these files may corrupt the 
software.

So if you want to build and redistribute from their SRPMS you will need to do 
extra work to make them happy.

Note that RMS thinks that this use of trademark in relation to the GPL is 
legitimate, in an interview quoted on the "Open For Business" website he says 
(in regards to Mandrake):

http://www.ofb.biz/modules.php?name=News&file=article&sid=260

[quote]

TRB: Another interesting current issue is the concept of what might be seen as 
"hybrid licensing." For example, MandrakeSoft's Multi-Network Firewall is 
based on entirely Free Software, however the Mandrake branding itself is 
placed under a more restrictive license (you can't redistribute it for a 
fee). This give the user or consultant two choices -- use the software under 
the more restrictive licensing or remove the Mandrake artwork. What are your 
thoughts on this type or approach?

 RMS: I think it is legitimate. Freedom to redistribute and change software is 
a human right that must be protected, but the commercial use of a logo is a 
very different matter. Provided that removing the logo from the software is 
easy to do in practice, the requirement to pay for use of the logo does not 
stain the free status of the software itself. 

[/quote] 

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/UoiwO2KABBYQAh8RAji6AJ4smNhqZ/my4k8i787Uaqs+n4rfsACcC4yS
BLtsLZDIzG8Hm0KEACBOZyo=
=A0dE
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hraa at lncc.br  Fri Aug  1 13:58:59 2003
From: hraa at lncc.br (Ricardo)
Date: Fri, 1 Aug 2003 14:58:59 -0300 (BRT)
Subject: Filesystem
Message-ID: <Pine.LNX.4.53.0308011458160.17517@navegador.lncc.br>


Hi all

Which one is better to use, ext3 or raiserfs?
Someone have performance results comparing Ext3 with raiserfs?

Thanks

-------------------------------
Ricardo

     .-.
     /v\
    // \\    > L I N U X <
   /(   )\
    ^^-^^
-------------------------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Fri Aug  1 19:05:47 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Fri, 1 Aug 2003 16:05:47 -0700
Subject: New technology for trunking gigE switches
In-Reply-To: <Pine.LNX.4.44.0307301830350.6305-100000@coffee.psychology.mcmaster.ca>
References: <20030730190605.GA2640@sphere.math.ucdavis.edu> <Pine.LNX.4.44.0307301830350.6305-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20030801230547.GA2324@greglaptop.internal.keyresearch.com>

What good timing: Broadcom just released public info about their new
generation of gigE switch chips, which are capable of using
inexpensive 4-wire copper 10gig uplinks between boxes. The neat thing
about this is that instead of having to buy a bunch of 10gig optics,
which are very expensive, it uses a 4-wire 3.125 gbit copper
interconnect, same as InfiniBand.

You should expect to see this showing up in stackable 24 to 48 port
switches, allowing up to 384 gigE ports in a single blob, at around
$100/gigE port. The center is an 8-port 10gigE switch, so as you can
see, you have the same issue of the ratio of uplink bandwidth to local
bandwidth that you had in the fast ethernet stackables with 1gig
uplinks. You will note, however, that the Broadcom blurb says you can
get much better total fabric bandwidth than just one of those
chips. They don't explain how, and so I can't mention it -- but if
anyone finds a public explanation, please let me know. I believe that
it should be able to hit the quoted 640 gbits of total traffic,
i.e. at 384 ports, you can build a switch which almost has perfect
bisection. The total switch latency also shouldn't be so bad: say ~35
usec for first bit in to last bit out, which is just over double of
what you'll see with a standalone gigE switch. (The total latency seen
by an application using TCP/IP will be higher, of course.)

The HP Procurve guys had a quote in one of the press releases, but I'm
sure that other vendors will ship products based on this too; Broadcom
is already a high volume producer of chips used in ethernet switches.

-- greg

http://www.broadcom.com/docs/promostrataxgs.html
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Fri Aug  1 19:26:38 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Fri, 1 Aug 2003 19:26:38 -0400 (EDT)
Subject: Filesystem
In-Reply-To: <Pine.LNX.4.53.0308011458160.17517@navegador.lncc.br>
Message-ID: <Pine.LNX.4.44.0308011923120.17614-100000@coffee.psychology.mcmaster.ca>

> Which one is better to use, ext3 or raiserfs?

there is no clearcut winner.

> Someone have performance results comparing Ext3 with raiserfs?

yes, there's plenty available.  reiserfs people always focus on 
situations where directories have billions of small files.  that's 
not surprising, since that's their design target: efficient storage
of very small files, and efficient handling of ridiculously overfull
directories.  I question the value of worrying about very small
files (because disk is so cheap, and clusters mostly have big files);
big directories seem like someone's design mistake to me.

ext3 is designed as an ultra-stable journaling version of ext2,
and succeeds.  it's difficult to compare reliability, but ext3 does
generally have a better reputation than reiserfs.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Fri Aug  1 22:33:26 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 02 Aug 2003 06:33:26 +0400
Subject: nfs problem
In-Reply-To: <200308011902.h71J2Vw28853@NewBlue.Scyld.com>
Message-ID: <web-19446169@servihoo.com>

Hello
Thanks everbody it's working. 
I will need to install the MPICH now. 
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Fri Aug  1 22:33:26 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 02 Aug 2003 06:33:26 +0400
Subject: nfs problem
In-Reply-To: <200308011902.h71J2Vw28853@NewBlue.Scyld.com>
Message-ID: <web-19446169@servihoo.com>

Hello
Thanks everbody it's working. 
I will need to install the MPICH now. 
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Fri Aug  1 22:33:50 2003
From: timm at fnal.gov (Steven Timm)
Date: Fri, 01 Aug 2003 21:33:50 -0500
Subject: Filesystem
In-Reply-To: <Pine.LNX.4.53.0308011458160.17517@navegador.lncc.br>
Message-ID: <Pine.SGI.4.31.0308012132170.17510097-100000@fsgi01.fnal.gov>

I have some anecdotal evidence that ext3 starts taking performance hit
in cases where there is a lot of files getting written and then
quickly erased.  Also there's a performance penalty on burst I/O--e.g.
if you have a system doing near-continuous disk writes and reads it
will bump the load factor up.  But I don't have any information to
suggest that Reiser does it better.

Steve


------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Core Support Services Dept.
Assistant Group Leader, Scientific Computing Support Group
Lead of Computing Farms Team

On Fri, 1 Aug 2003, Ricardo wrote:

>
> Hi all
>
> Which one is better to use, ext3 or raiserfs?
> Someone have performance results comparing Ext3 with raiserfs?
>
> Thanks
>
> -------------------------------
> Ricardo
>
>      .-.
>      /v\
>     // \\    > L I N U X <
>    /(   )\
>     ^^-^^
> -------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Fri Aug  1 22:45:01 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Fri, 1 Aug 2003 19:45:01 -0700 (PDT)
Subject: Filesystem
In-Reply-To: <Pine.LNX.4.53.0308011458160.17517@navegador.lncc.br>
Message-ID: <Pine.LNX.3.96.1030801194203.13498A-100000@Maggie.Linux-Consulting.com>


hi ricardo


filesystem comparasons
	http://www.linux-sec.net/FileSystem/#FS

	http://aurora.zemris.fer.hr/filesystems/	

i think ext3 is better than reiserfs

i think ext3 is not any better than ext2 in terms
of somebody hitting pwer/reset w/o proper shutdown
	- i always allow it to run e2fsck when it does
	an unclean shutdown ...

	- yes ext3 will timeout and continue and restore from
	backups but ... am paranoid about the underlying ext2
	getting corrupted by random power off and resets

c ya
alvin


On Fri, 1 Aug 2003, Ricardo wrote:

> 
> Hi all
> 
> Which one is better to use, ext3 or raiserfs?
> Someone have performance results comparing Ext3 with raiserfs?
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Sat Aug  2 08:48:09 2003
From: angel at wolf.com (Angel Rivera)
Date: Sat, 02 Aug 2003 12:48:09 GMT
Subject: Filesystem
In-Reply-To: <Pine.SGI.4.31.0308012132170.17510097-100000@fsgi01.fnal.gov> 
References: <Pine.SGI.4.31.0308012132170.17510097-100000@fsgi01.fnal.gov>
Message-ID: <20030802124809.1437.qmail@houston.wolf.com>

Steven Timm writes: 

> I have some anecdotal evidence that ext3 starts taking performance hit
> in cases where there is a lot of files getting written and then
> quickly erased.  Also there's a performance penalty on burst I/O--e.g.
> if you have a system doing near-continuous disk writes and reads it
> will bump the load factor up.  But I don't have any information to
> suggest that Reiser does it better. 
> 

It depends what you are going to use the nodes for. For normal compute 
nodes, I don't think there is enough of a payback to change ext3. For our 
disk nodes, we use ext3 for system filesystems and XFS for the exported disk 
space (with NFS patches and tuning of couse) to get some serious 
performance.  We are currently testing different filesystems on one of the 
disk nodes we just purchased and have seen a dramatic rise in performance 
with the above. 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mukshere at rediffmail.com  Sat Aug  2 10:16:43 2003
From: mukshere at rediffmail.com (mukund govind umalkar)
Date: 2 Aug 2003 14:16:43 -0000
Subject: Beowulf Research
Message-ID: <20030802141643.9462.qmail@webmail7.rediffmail.com>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20030802/5ff69b21/attachment-0001.ksh>

From sanjoy at chem.iitkgp.ernet.in  Sat Aug  2 13:33:21 2003
From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay)
Date: Sat, 2 Aug 2003 23:03:21 +0530 (IST)
Subject: NIS
In-Reply-To: <20030802124809.1437.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>

Hi,
I have a cluster running Rh 7.3 with NIS server. The cluster was running
fine. But suddenly after rebooting now the clients are having problems in
recognizing the NIS domain server name. while booting the clients it says:

Binding to the NIS domain: [OK]
Listening for an NIS domail server............[FAILED]

ypwhich on clients says 'Can't communicate with ypbind'

ypbind, ypserv are running fine on the server.

I will appreciate if anyone can help..
Thanks.
Sanjoy


--------------------------------------------------------------------
Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
Assistant Professor                                                
Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
Department of Chemistry                 91-3222-283345  (Home)
Indian Institute of Technology          91-3222-279938 (Home) 
Kharagpur 721 302                Fax  : 91-3222-255303
West Bengal, India.                     91-3222-282252
                                 http://www.chem.iitkgp.ernet.in/faculty/SB/
--------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bari at onelabs.com  Sat Aug  2 14:02:45 2003
From: bari at onelabs.com (Bari Ari)
Date: Sat, 02 Aug 2003 13:02:45 -0500
Subject: SH4 & SH5 Clustering
Message-ID: <3F2BFCC5.6060803@onelabs.com>

It's been a few years since anyone has posted anything here on clusters 
using the SH-4.

http://www.beowulf.org/pipermail/beowulf/1999-November/007339.html

Does anyone have results or experiences of building systems using the SH-4?

http://www.superh.com/products/sh4.htm
http://www.superh.com/products/sh5.htm

The SH-5 is finally showing up in silicon at 2.8GFLOPS,  400MHz, under 
1W/cpu. The caches are small at 32KB yet have a 3.2GB/s peak internal 
bus, the SOC's have DDR memory and 32bit/66MHz PCI. They look attractive 
for low power dense clusters/blade applications that won't be hurt much 
by their small cache size and the 264MB/s peak PCI interface. A 1-U 
could contain 24 - 32 of these and require only convection cooling for 
the cpu's. The DDR memory would be the "hot spots" and require some 
forced air cooling.

Bari


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sat Aug  2 14:42:14 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 2 Aug 2003 14:42:14 -0400 (EDT)
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>
Message-ID: <Pine.LNX.4.44.0308021416570.12859-100000@lilith.rgb.private.net>

On Sat, 2 Aug 2003, Sanjoy Bandyopadhyay wrote:

> Hi,
> I have a cluster running Rh 7.3 with NIS server. The cluster was running
> fine. But suddenly after rebooting now the clients are having problems in
> recognizing the NIS domain server name. while booting the clients it says:
> 
> Binding to the NIS domain: [OK]
> Listening for an NIS domail server............[FAILED]
> 
> ypwhich on clients says 'Can't communicate with ypbind'
> 
> ypbind, ypserv are running fine on the server.

Hmmm, so many possible causes.

If you say "suddenly after rebooting" and if it applies to all the
clients, I'd check the following:

  a) The network connection of the server.  All things being equal, I'd
have to say this is a prime candidate.  Don't forget to check the
wire(s) itself -- many is the perplexing networking or service problem
that turned out to be caused by somebody kicking a wire so that the plug
was no longer properly seated.  Check network connectivity in other ways
to -- is the switch port suddenly bad, do I need to power cycle the
switch (switches sometimes "wedge" and need a cycle to rebuild their
tables), and so forth.  On some switches it is possible to block
broadcasts -- NIS requires them, so be sure that this didn't get done by
mistake.

  b) When you've eliminated hardware as a possible cause (and have
validated perfect network connectivity) then you can look for software
problems.  A "sudden" problem like this is odd -- perhaps you
accidentally updated with a broken RPM?  Perhaps somebody trashed a
table?  Did somebody update iptables or ipchains or change their rules
so port access is blocked that way?

See if checking out these systems solves it.  If not, in your next post
include more detail on your network and so forth.  Usually this kind of
thing is solved by doggedly testing one system at a time until the
culprit emerges, starting with the most likely.  Don't forget, you have
tools like tcpdump that will let you snoop the network packets one at a
time if necessary to be sure that they are indeed arriving at the server
from the clients.  I recall that you can turn on ypserv with -d for
debug to get a much more verbose operational mode to help debug as well.

HTH,

    rgb

> 
> I will appreciate if anyone can help..
> Thanks.
> Sanjoy
> 
> 
> --------------------------------------------------------------------
> Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
> Assistant Professor                                                
> Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
> Department of Chemistry                 91-3222-283345  (Home)
> Indian Institute of Technology          91-3222-279938 (Home) 
> Kharagpur 721 302                Fax  : 91-3222-255303
> West Bengal, India.                     91-3222-282252
>                                  http://www.chem.iitkgp.ernet.in/faculty/SB/
> --------------------------------------------------------------------
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sat Aug  2 14:50:03 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 2 Aug 2003 14:50:03 -0400 (EDT)
Subject: Beowulf Research
In-Reply-To: <20030802141643.9462.qmail@webmail7.rediffmail.com>
Message-ID: <Pine.LNX.4.44.0308021442220.12859-100000@lilith.rgb.private.net>

On 2 Aug 2003, mukund govind umalkar wrote:

> hello sir,
> i am a graduate student, and i am intrested in doing research on 
> Beowulf clusters, so plzz send me some material and let me know 
> about the various papers that have presented on Beowulf.
> 
> If possible please some useful URLs for the same

There are lots of starting points, and the better sites form for all
practical purposes a webring with mutual links interconnecting them so
sites you don't find on one you're likely to find on another linked to
it.

One such starting point is:

  http://www.phy.duke.edu/brahma

(look under e.g. resources and links and papers).  Brahma will lead you
do the beowulf underground, to the original/main beowulf site, and to
many other well-known clustering sites and resources.

To find "real" papers on clustering, check out e.g. ;login and various
other computer geek journals and magazines.  Linux Magazine has an
excellent clustering column by Forrest Hoffman.  There are some online
webzines devoted to clustering (some linked to brahma).  Google is your
friend here -- with google you can find out pretty much anything that is
online.

   rgb

> 
> thanx
> Mukund
> 
> 
> 
> ___________________________________________________
> Download the hottest & happening ringtones here!
> OR SMS: Top tone to 7333
> Click here now: 
> http://sms.rediff.com/cgi-bin/ringtone/ringhome.pl
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Sat Aug  2 18:11:17 2003
From: angel at wolf.com (Angel Rivera)
Date: Sat, 02 Aug 2003 22:11:17 GMT
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in> 
References: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>
Message-ID: <20030802221117.3967.qmail@houston.wolf.com>

Sanjoy Bandyopadhyay writes: 

> Hi,
> I have a cluster running Rh 7.3 with NIS server. The cluster was running
> fine. But suddenly after rebooting now the clients are having problems in
> recognizing the NIS domain server name. while booting the clients it says: 
> 
> Binding to the NIS domain: [OK]
> Listening for an NIS domail server............[FAILED] 
> 
> ypwhich on clients says 'Can't communicate with ypbind' 
> 
> ypbind, ypserv are running fine on the server. 
> 
> I will appreciate if anyone can help..

Check to make sure your NIS server is running and talking (TCPDUMP).
If your node is running fine?  Trying running "/etc/rc.d/init.d/ypbind 
restart" and see what error crops up. 

Also, try nisdomainname and see what crops up there. 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sanjoy at chem.iitkgp.ernet.in  Sun Aug  3 01:36:00 2003
From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay)
Date: Sun, 3 Aug 2003 11:06:00 +0530 (IST)
Subject: NIS
In-Reply-To: <20030802221117.3967.qmail@houston.wolf.com>
Message-ID: <Pine.LNX.4.10.10308031057360.436-100000@chem.iitkgp.ernet.in>


On Sat, 2 Aug 2003, Angel Rivera wrote:

> Check to make sure your NIS server is running and talking (TCPDUMP).
> If your node is running fine?  Trying running "/etc/rc.d/init.d/ypbind 
> restart" and see what error crops up. 

yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this:
Shutting down NIS services:    [FAILED]
Binding to the NIS domain:     [OK]
Listening for an NIS domain server................... [FAILED]

> Also, try nisdomainname and see what crops up there. 

nisdomainname gives correct domain name.


We have the Sever filesystems NFS mounted on the clients. I can see
now that this NFS mounting is not working for the clients. While the
clients tries to mount the NFS filesystem, it gives this error:

Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable
to receive

Thanks..
-Sanjoy

--------------------------------------------------------------------
Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
Assistant Professor                                                
Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
Department of Chemistry                 91-3222-283345  (Home)
Indian Institute of Technology          91-3222-279938 (Home) 
Kharagpur 721 302                Fax  : 91-3222-255303
West Bengal, India.                     91-3222-282252
                                 http://www.chem.iitkgp.ernet.in/faculty/SB/
--------------------------------------------------------------------

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From leventeh at hotmail.com  Sun Aug  3 02:14:23 2003
From: leventeh at hotmail.com (Levente Horvath)
Date: Sun, 03 Aug 2003 06:14:23 +0000
Subject: MPI & linux compilers
Message-ID: <Law10-F55uXJC9Qh6CG0002616c@hotmail.com>

To whom it may concern,

We have 12 PCs set up for parallel computation. All are running linux 
(Redhat 7.3) and MPI.
We would like to compute eigenvalues and eigenvectors for large matrices.

We have managed to do up to 10000x10000 matrix no problem. Our program uses 
Scalapack and Blacs
routines. These routines require two matrix to be declared. On single 
precision two 10000x10000
matrix occupies 800Mb of memory which is already exceeds the 512Mb local 
memory of
each computer in our cluster. This memory were equally distributed over the 
12 computers
upon computation. So, we think that in theory we shouldn't have any problem 
going
to large matrices; as our distributed memory is quite large 12*512Mb.

Now, if we try to run a larger size then the compiler mpif77 returns
a "large matrix" error. We have traced the compiler and found that mpif77 is 
a script
that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we 
found that
there is no problem with the compilation up to a size of 15000x15000, then 
the
compiler crashes. After tracing the compilation procedure, we found that
the linker "as" cannot link some of the .o and .s files in our /tmp 
directory.

So, we used C rather than fortran. Statically, we cannot declare more than
a 1500x1500 matrix (that put in to a hello world program for MPI). We 
thought
it might be the problem with the static allocation of memory. So, we tried
to allocate this space dynamically without any success....

Our questions are: Are we doing something wrong here. Or are the compilers 
gcc and g77-3
responsible for such an array limit. Or are we missing the ways to allocate
memory for large matrices....

This is not the end of our story. We tried "ifc" IBM fortran 90 compiler. 
Unfortunately, we
cannot link mpi libraries against this "ifc" compiler. It just doesn't see 
them. We have
tried to compile ifc with the full path names of libraries using either 
static and dynamics libraries.
In either case we had no success...

We would appreciate all of your comments and suggestions.
Thank you in advance....

_________________________________________________________________
ninemsn Extra Storage comes with McAfee Virus Scanning - to keep your 
Hotmail account and PC safe. Click here  http://join.msn.com/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Sun Aug  3 12:17:19 2003
From: angel at wolf.com (Angel Rivera)
Date: Sun, 03 Aug 2003 16:17:19 GMT
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308031057360.436-100000@chem.iitkgp.ernet.in> 
References: <Pine.LNX.4.10.10308031057360.436-100000@chem.iitkgp.ernet.in>
Message-ID: <20030803161719.30576.qmail@houston.wolf.com>

Sanjoy Bandyopadhyay writes: 

> 
> On Sat, 2 Aug 2003, Angel Rivera wrote: 
> 
>> Check to make sure your NIS server is running and talking (TCPDUMP).
>> If your node is running fine?  Trying running "/etc/rc.d/init.d/ypbind 
>> restart" and see what error crops up. 
> 
> yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this:
> Shutting down NIS services:    [FAILED]
> Binding to the NIS domain:     [OK]
> Listening for an NIS domain server................... [FAILED] 
> 
>> Also, try nisdomainname and see what crops up there. 
> 
> nisdomainname gives correct domain name. 
> 
> 
> We have the Sever filesystems NFS mounted on the clients. I can see
> now that this NFS mounting is not working for the clients. While the
> clients tries to mount the NFS filesystem, it gives this error: 
> 
> Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable
> to receive

It is not seeing the ypserver. have you tried rpcinfo -p <servername> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bropers at lsu.edu  Sat Aug  2 17:28:18 2003
From: bropers at lsu.edu (Brian D. Ropers-Huilman)
Date: Sat, 2 Aug 2003 16:28:18 -0500 (CDT)
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>
References: <Pine.LNX.4.10.10308022246340.30779-100000@chem.iitkgp.ernet.in>
Message-ID: <Pine.LNX.4.56.0308021627230.12488@cannondale.ocs.lsu.edu>

On Sat, 2 Aug 2003, Sanjoy Bandyopadhyay wrote:
> Hi,
> I have a cluster running Rh 7.3 with NIS server. The cluster was running
> fine. But suddenly after rebooting now the clients are having problems in
> recognizing the NIS domain server name. while booting the clients it says:
> 
> Binding to the NIS domain: [OK]
> Listening for an NIS domail server............[FAILED]
> 
> ypwhich on clients says 'Can't communicate with ypbind'
> 
> ypbind, ypserv are running fine on the server.
> 
> I will appreciate if anyone can help..
> Thanks.
> Sanjoy

Sanjoy,

You say that ypbind is running fine /on the SERVER/, what about ypbind running 
on the /CLIENT/? ypbind should not run on the server, it runs on the clients.

--  
Brian D. Ropers-Huilman                        (225) 578-0461 (V)
Systems Administrator                 AIX      (225) 578-6400 (F)
Office of Computing Services       GNU Linux   brian at ropers-huilman.net
High Performance Computing            .^.      http://www.ropers-huilman.net/
Fred Frey Building, Rm. 201, E-1Q     /V\                          \o/
Louisiana State University           (/ \)           --  __o   /    |
Baton Rouge, LA 70803-1900           (   )          --- `\<,  /    `\\,
                                     ^^-^^              O/ O /     O/ O
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Aug  3 13:05:32 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 3 Aug 2003 13:05:32 -0400 (EDT)
Subject: NIS
In-Reply-To: <Pine.LNX.4.10.10308031057360.436-100000@chem.iitkgp.ernet.in>
Message-ID: <Pine.LNX.4.44.0308031259430.26471-100000@lilith.rgb.private.net>

On Sun, 3 Aug 2003, Sanjoy Bandyopadhyay wrote:

> 
> On Sat, 2 Aug 2003, Angel Rivera wrote:
> 
> > Check to make sure your NIS server is running and talking (TCPDUMP).
> > If your node is running fine?  Trying running "/etc/rc.d/init.d/ypbind 
> > restart" and see what error crops up. 
> 
> yes the NIS server is running. /etc/rc.d/init.d/ypbind restart gives this:
> Shutting down NIS services:    [FAILED]
> Binding to the NIS domain:     [OK]
> Listening for an NIS domain server................... [FAILED]
> 
> > Also, try nisdomainname and see what crops up there. 
> 
> nisdomainname gives correct domain name.
> 
> 
> We have the Sever filesystems NFS mounted on the clients. I can see
> now that this NFS mounting is not working for the clients. While the
> clients tries to mount the NFS filesystem, it gives this error:
> 
> Mounting NFS filesystem: mount : RPC : Port mapper failure - RPC: unable
> to receive

Yah.  How about ping?  Can you ping the server?  

Seriously, this looks like your problem is just a bad network
connection, or conceivably a downed portmapper.  If you can't ping,
obviously your network is down and you need to fix it.  If you can ping
and ssh back and forth and the like, then make sure that portmap is
running on your clients and server (an rpm that updated but installed
the new one off?).  In fact, do chkconfig --list and look at ALL of your
network services to make sure they still make sense.

Be careful here -- trojanned portmappers and other broken rpc services
are a favorite way for crackers to enter your system.  What you are
seeing COULD be symptoms of being cracked, as trojanned portmappers not
infrequently are broken (for a variety of reasons).  You might prefer to
back up your data and do a full reinstall of the server and a client, to
check the rpm MD5 checksums, and to presume that you may have been
cracked (monitoring your net traffic with TCPDUMP looking for bad guys)
while you proceed.  At least stay aware of the possibility.  It's
happened to me; it could have happened to you.

   rgb

> 
> Thanks..
> -Sanjoy
> 
> --------------------------------------------------------------------
> Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
> Assistant Professor                                                
> Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
> Department of Chemistry                 91-3222-283345  (Home)
> Indian Institute of Technology          91-3222-279938 (Home) 
> Kharagpur 721 302                Fax  : 91-3222-255303
> West Bengal, India.                     91-3222-282252
>                                  http://www.chem.iitkgp.ernet.in/faculty/SB/
> --------------------------------------------------------------------
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Aug  3 13:16:36 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 3 Aug 2003 13:16:36 -0400 (EDT)
Subject: NIS
In-Reply-To: <Pine.LNX.4.56.0308021627230.12488@cannondale.ocs.lsu.edu>
Message-ID: <Pine.LNX.4.44.0308031311450.26471-100000@lilith.rgb.private.net>

On Sat, 2 Aug 2003, Brian D. Ropers-Huilman wrote:
> Sanjoy,
> 
> You say that ypbind is running fine /on the SERVER/, what about ypbind running 
> on the /CLIENT/? ypbind should not run on the server, it runs on the clients.

Right, but if NFS is also not running with an RPC error, it really
suggests either raw networking problems or problems with the RPC
subsystem, e.g. portmap.  He also originally said that he had it working
and then it stopped.  If that is true it doubly points to networking
or RPC.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Sun Aug  3 14:18:17 2003
From: gropp at mcs.anl.gov (William Gropp)
Date: Sun, 03 Aug 2003 13:18:17 -0500
Subject: MPI & linux compilers
In-Reply-To: <Law10-F55uXJC9Qh6CG0002616c@hotmail.com>
Message-ID: <5.1.1.6.2.20030803131144.02f00e50@localhost>

At 06:14 AM 8/3/2003 +0000, Levente Horvath wrote:
>To whom it may concern,
>
>We have 12 PCs set up for parallel computation. All are running linux 
>(Redhat 7.3) and MPI.
>We would like to compute eigenvalues and eigenvectors for large matrices.
>
>We have managed to do up to 10000x10000 matrix no problem. Our program 
>uses Scalapack and Blacs
>routines. These routines require two matrix to be declared. On single 
>precision two 10000x10000
>matrix occupies 800Mb of memory which is already exceeds the 512Mb local 
>memory of
>each computer in our cluster. This memory were equally distributed over 
>the 12 computers
>upon computation. So, we think that in theory we shouldn't have any 
>problem going
>to large matrices; as our distributed memory is quite large 12*512Mb.

You need to declare only the local part of the matrix that is distributed 
across the processes, not the entire matrix.  MPI doesn't provide any 
support for automatically distributing the data, though libraries written 
using MPI can do this if the data is allocated dynamically by the 
library.  Languages such as HPF can do this for you, but have their own 
limitations.

Bill

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From xavier at zeth.ciencias.uchile.cl  Sun Aug  3 21:59:56 2003
From: xavier at zeth.ciencias.uchile.cl (Xavier Andrade)
Date: Sun, 3 Aug 2003 21:59:56 -0400 (CLT)
Subject: MPI & linux compilers
In-Reply-To: <Law10-F55uXJC9Qh6CG0002616c@hotmail.com>
Message-ID: <Pine.LNX.4.44.0308032151370.17095-100000@zeth.ciencias.uchile.cl>

On Sun, 3 Aug 2003, Levente Horvath wrote:

> To whom it may concern,
>
> We have 12 PCs set up for parallel computation. All are running linux
> (Redhat 7.3) and MPI.
> We would like to compute eigenvalues and eigenvectors for large matrices.
>
> We have managed to do up to 10000x10000 matrix no problem. Our program uses
> Scalapack and Blacs
> routines. These routines require two matrix to be declared. On single
> precision two 10000x10000
> matrix occupies 800Mb of memory which is already exceeds the 512Mb local
> memory of
> each computer in our cluster. This memory were equally distributed over the
> 12 computers
> upon computation. So, we think that in theory we shouldn't have any problem
> going
> to large matrices; as our distributed memory is quite large 12*512Mb.
>
> Now, if we try to run a larger size then the compiler mpif77 returns
> a "large matrix" error. We have traced the compiler and found that mpif77 is
> a script
> that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we
> found that
> there is no problem with the compilation up to a size of 15000x15000, then
> the
> compiler crashes. After tracing the compilation procedure, we found that
> the linker "as" cannot link some of the .o and .s files in our /tmp
> directory.
>
> So, we used C rather than fortran. Statically, we cannot declare more than
> a 1500x1500 matrix (that put in to a hello world program for MPI). We
> thought
> it might be the problem with the static allocation of memory. So, we tried
> to allocate this space dynamically without any success....
>
> Our questions are: Are we doing something wrong here. Or are the compilers
> gcc and g77-3
> responsible for such an array limit. Or are we missing the ways to allocate
> memory for large matrices....
>
> This is not the end of our story. We tried "ifc" IBM fortran 90 compiler.
> Unfortunately, we
> cannot link mpi libraries against this "ifc" compiler. It just doesn't see
> them. We have
> tried to compile ifc with the full path names of libraries using either
> static and dynamics libraries.
> In either case we had no success...
>

Running "mpif77 -showme" will show you the line that mpif77 actually
calls for compiling, if you want to change the compiler that mpif77 calls
set the enviroment variable LAMHF77 (i.e. with `export LAMHF77=ifc`
mpif77 will compile using ifc instead of f77).

Xavier

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sanjoy at chem.iitkgp.ernet.in  Mon Aug  4 01:09:13 2003
From: sanjoy at chem.iitkgp.ernet.in (Sanjoy Bandyopadhyay)
Date: Mon, 4 Aug 2003 10:39:13 +0530 (IST)
Subject: NIS
In-Reply-To: <Pine.LNX.4.44.0308031311450.26471-100000@lilith.rgb.private.net>
Message-ID: <Pine.LNX.4.10.10308041032200.6882-100000@chem.iitkgp.ernet.in>

Hi,

I figured out what was wrong.. the nsswitch.conf file was somehow
corrupted. nis was not mentioned for passwd,group,shadow files.
Now everything is under control.

Thanks very much to all of you who helped with their valuable
suggestions.

-Sanjoy

--------------------------------------------------------------------
Dr. Sanjoy Bandyopadhyay         E-mail: sanjoy at chem.iitkgp.ernet.in
Assistant Professor                                                
Molecular Modeling Laboratory    Phone: 91-3222-283344  (Office)
Department of Chemistry                 91-3222-283345  (Home)
Indian Institute of Technology          91-3222-279938 (Home) 
Kharagpur 721 302                Fax  : 91-3222-255303
West Bengal, India.                     91-3222-282252
                                 http://www.chem.iitkgp.ernet.in/faculty/SB/
--------------------------------------------------------------------


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From javier.crespo at itp.es  Mon Aug  4 02:53:05 2003
From: javier.crespo at itp.es (Javier Crespo)
Date: Mon, 04 Aug 2003 08:53:05 +0200
Subject: MPI & linux compilers
References: <Law10-F55uXJC9Qh6CG0002616c@hotmail.com>
Message-ID: <3F2E02D1.E834011B@itp.es>

Levente Horvath wrote:

> To whom it may concern,
>
> We have 12 PCs set up for parallel computation. All are running linux
> (Redhat 7.3) and MPI.
> We would like to compute eigenvalues and eigenvectors for large matrices.
>
> We have managed to do up to 10000x10000 matrix no problem. Our program uses
> Scalapack and Blacs
> routines. These routines require two matrix to be declared. On single
> precision two 10000x10000
> matrix occupies 800Mb of memory which is already exceeds the 512Mb local
> memory of
> each computer in our cluster. This memory were equally distributed over the
> 12 computers
> upon computation. So, we think that in theory we shouldn't have any problem
> going
> to large matrices; as our distributed memory is quite large 12*512Mb.
>
> Now, if we try to run a larger size then the compiler mpif77 returns
> a "large matrix" error. We have traced the compiler and found that mpif77 is
> a script
> that calls up f77 and mpi libraries. Upon replacing the f77 with g77-3, we
> found that
> there is no problem with the compilation up to a size of 15000x15000, then
> the
> compiler crashes. After tracing the compilation procedure, we found that
> the linker "as" cannot link some of the .o and .s files in our /tmp
> directory.
>
> So, we used C rather than fortran. Statically, we cannot declare more than
> a 1500x1500 matrix (that put in to a hello world program for MPI). We
> thought
> it might be the problem with the static allocation of memory. So, we tried
> to allocate this space dynamically without any success....
>
> Our questions are: Are we doing something wrong here. Or are the compilers
> gcc and g77-3
> responsible for such an array limit. Or are we missing the ways to allocate
> memory for large matrices....
>
> This is not the end of our story. We tried "ifc" IBM fortran 90 compiler.
> Unfortunately, we
> cannot link mpi libraries against this "ifc" compiler. It just doesn't see
> them. We have
> tried to compile ifc with the full path names of libraries using either
> static and dynamics libraries.
> In either case we had no success...
>
> We would appreciate all of your comments and suggestions.
> Thank you in advance....

If you want to link to mpi but compiling with "ifc" (is it really IBM? - I think it comes from intel), you first
at all should have to compile that libraries with the same compiler that you are going to use for the main
program, typically using the options "-fc=ifc","--f90=ifc" and "-f90linker=ifc" when configuring MPI and then
installing it in you path (in a different place than the MPI libraries compiled with f77).

Javier


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Mon Aug  4 08:02:50 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Mon, 4 Aug 2003 14:02:50 +0200 (CEST)
Subject: Cisco switches for lam mpi
In-Reply-To: <BAY1-F72QsFBL4JcpE90000e41d@hotmail.com>
Message-ID: <Pine.LNX.4.33.0308041355060.24026-100000@maloney.ethz.ch>

On Tue, 29 Jul 2003, Jack Douglas wrote:
> We have just installed a 32 Node Dual Xeon Cluster, with a Cisco Cataslyst
> 4003 Chassis with 48 1000Base-t ports.
>
> We are running LAM MPI over gigabit, but we seem to be experiencing
> bottlenecks within the switch
>
> Typically, using the cisco, we only see CPU utilisation of around 30-40%
[...]

I'm not a Cisco expert, but...

We once got a Cisco switch from our networking people that we had to
return immediately because it delivered such a bad performance. It was
a Catalyst 2900XL with 24 Fast Ethernet ports, but it could only
handle 12 ports at full speed. Above that, the performance brake down
completely.

For some benchmark results see, e.g.:
http://www.cs.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf

As a comparison, the quite nice results of a CentreCom 742i:
http://www.cs.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf

Disclaimer: Maybe the Cisco you mentioned is better, or Ciscos improved
anyway since spring 2001 when I did the above tests. Besides, the
situation for Gigabit Ethernet could be different.

As we described on our workshop paper at CAC03 you can not trust the
data sheets of switches anyway:
http://www.cs.inf.ethz.ch/CoPs/publications/#cac03

Conclusion: If you need a very high performing switch, you have to
evaluate/benchmark it yourself.

- Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Mon Aug  4 15:31:22 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 04 Aug 2003 15:31:22 -0400
Subject: large filesystem & fileserver architecture issues.
Message-ID: <1060025481.28642.81.camel@roughneck>

Hey all -- here is our situation.

We currently have several clusters that are configured with either IBM
x342 or Dell 2650 serves with their respective vendors SCSI RAID arrays
hanging off of them. Each server + array is good for around 600GB after
RAID 5 and formatting. The IBM's have the added ability to do a RAID 50
of multiple arrays ( which seems to work & perform quite nicely ). Each
of the servers then exports the filesystem via NFS, and is mounted on
the nodes. The clusters range from 24 to 128 nodes. For backups we
maintain an offline server + array that we use to rsync the data
nightly, then use our amanda server  and tape robot to backup. We use an
offline sync, as we need a level 0 dump every 2 weeks, and doing a level
0 dump of 600GB just trashes the performance on a live server. As we are
a .edu and all of the clusters were purchased by the individual groups,
the options we can explore have to be very cost efficient for hardware,
and free for software.

Now for the problem...
A couple of our clusters are using the available filespace quite
rapidly, and we are looking to add space. The most cost efficient
approach we have found is to buy a IDE RAID box, like those available
from RaidZone or PogoLinux. This allows us to use the cheap IDE systems
as the offline sync, and use the scsi systems as online servers.

And the questions:

1) Is there a better way to backup the systems without the need for an
offline sync? 

2) Does anyone have experience doing RAID 50 with Dell hardware? How bad
does it bite ?

3) Are there any recommended IDE RAID systems? We are not looking for
super stellar performance, just a solid system that does it's job as an
offline sync for backups.
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Mon Aug  4 22:49:35 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: 04 Aug 2003 22:49:35 -0400
Subject: updated run_mpiblast code
Message-ID: <1060051774.25281.22.camel@protein.scalableinformatics.com>

Hi folks:

 Updated and documented the run_mpiblast code.  Better data from --debug
switch.

 To see the man page, either

	perldoc run_mpiblast

or run

	run_mpiblast --help

Will be working on an RPM and a tarball installer in short order.  It
can be pulled from http://scalableinformatics.com/sge_mpiblast.html. 
The documentation (pod generated) can be viewed at
http://scalableinformatics.com/run_mpiblast.html .  


-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Tue Aug  5 08:54:57 2003
From: rouds at servihoo.com (RoUdY)
Date: Tue, 05 Aug 2003 16:54:57 +0400
Subject: mpich2-0.93
In-Reply-To: <200308041901.h74J1Tw27276@NewBlue.Scyld.com>
Message-ID: <web-19599931@servihoo.com>

hello everybody
I have download MPICH2-0.93 and I have some difficulty in 
implementing it. That is, according to some research done 
I need to amend the file "machines.LINUX" so that the 
parallel computing can start and to choose which node  to 
form part of the cluster. But the problem is that there is 
no file which name "machine.LINUX" and the file is suppose 
to be found in the directory 
.../mpich2-0.93/util/machines.
Well, I use redhat9.0 - hope to hear from you very soon
If there is a web site to get the necessary information 
please let me know.
Cheers 
Roudy.
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Tue Aug  5 11:47:07 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 05 Aug 2003 11:47:07 -0400
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>
Message-ID: <1060098427.30922.6.camel@roughneck>

On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> On 4 Aug 2003, Nicholas Henke wrote:
> 
> We have a lot of experience with IDE RAID arrays at client sites.  The DOE
> lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them.  
> The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write)
> and the price is hard to beat.  The raid array that serves home
> directories to their clusters and workstations is backed up nightly to a
> second raid server, similarly to your system.  To speed things along we
> installed an extra gigabit card in the primary and backup servers and
> connected the two directly.  The nightly backup (cp -auf via NFS) of 410
> GBs take just over an hour using the dedicated gbit link.  Rsync would
> probably be faster.  Without the shortcircuit gigabit link, it used to run
> four or five times longer and seriously impact NFS performance for the
> rest of the systems on the LAN.
> 
> Hope this helps.
> 
> Regards,
> 
> Mike Prinkey
> Aeolus Research, Inc.

Definately does -- can you recommend hardware for the IDE RAID, or list
what you guys have used ?

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mitchel at navships.com  Tue Aug  5 15:11:58 2003
From: mitchel at navships.com (Mitchel Kagawa)
Date: Tue, 5 Aug 2003 09:11:58 -1000
Subject: large filesystem & fileserver architecture issues.
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes> <1060098427.30922.6.camel@roughneck>
Message-ID: <009701c35b85$714e7110$7101a8c0@Navatek.local>

I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true
NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable.
Although they do offer a NFS box that will turn one of these arrays into a
standalone.  We have had great success with these units
(http://neptune.navships.com/images/harddrivearrays.jpg) .  We first
acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's.
We have set it up in a RAID-5 configuration and have not yet had to replace
even one of the drives (Knockin on wood).  After a year we picked up the
14slot chassis and filled it with 160 maxtor drives and it has performed
flawless...  I think we paig about $4000 for the 14 slot chassis. you can
add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver
for $1500 and you got about 2TB of storage for around $7000

Mitchel Kagawa

----- Original Message -----
From: "Nicholas Henke" <henken at seas.upenn.edu>
To: "Michael T. Prinkey" <mprinkey at aeolusresearch.com>
Cc: <beowulf at beowulf.org>
Sent: Tuesday, August 05, 2003 5:47 AM
Subject: Re: large filesystem & fileserver architecture issues.


> On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> > On 4 Aug 2003, Nicholas Henke wrote:
> >
> > We have a lot of experience with IDE RAID arrays at client sites.  The
DOE
> > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for
them.
> > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec
write)
> > and the price is hard to beat.  The raid array that serves home
> > directories to their clusters and workstations is backed up nightly to a
> > second raid server, similarly to your system.  To speed things along we
> > installed an extra gigabit card in the primary and backup servers and
> > connected the two directly.  The nightly backup (cp -auf via NFS) of 410
> > GBs take just over an hour using the dedicated gbit link.  Rsync would
> > probably be faster.  Without the shortcircuit gigabit link, it used to
run
> > four or five times longer and seriously impact NFS performance for the
> > rest of the systems on the LAN.
> >
> > Hope this helps.
> >
> > Regards,
> >
> > Mike Prinkey
> > Aeolus Research, Inc.
>
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?
>
> Nic
> --
> Nicholas Henke
> Penguin Herder & Linux Cluster System Programmer
> Liniac Project - Univ. of Pennsylvania
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From egan at sense.net  Tue Aug  5 18:12:21 2003
From: egan at sense.net (Egan Ford)
Date: Tue, 5 Aug 2003 16:12:21 -0600
Subject: Power monitoring
Message-ID: <095d01c35b9e$a4ae90d0$0664a8c0@titan>

I know this was discussed recently with "kill-a-watt" as a popular choice,
however I am looking for the next step up, something more on the circuit level
that I can hardwire between my lab and breakers.  Support for multiple circuits
would be nice too as well as 110/220 support.  Add a serial port for remote
monitoring and I'm set.  However I am looking for a cheap solution, a web cam
pointing to a meter is an option.  I'll even settle for analogue, I just need
kwh.

Thanks.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Tue Aug  5 18:35:09 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Tue, 5 Aug 2003 15:35:09 -0700 (PDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <009701c35b85$714e7110$7101a8c0@Navatek.local>
Message-ID: <Pine.LNX.3.96.1030805153205.10494B-100000@Maggie.Linux-Consulting.com>


hi ya

On Tue, 5 Aug 2003, Mitchel Kagawa wrote:

> I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true
> NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable.

thought acnc.com has good stuff . :-)

> Although they do offer a NFS box that will turn one of these arrays into a
> standalone.  We have had great success with these units
> (http://neptune.navships.com/images/harddrivearrays.jpg) .  We first
> acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's.
> We have set it up in a RAID-5 configuration and have not yet had to replace
> even one of the drives (Knockin on wood).  After a year we picked up the
> 14slot chassis and filled it with 160 maxtor drives and it has performed
> flawless...  I think we paig about $4000 for the 14 slot chassis. you can
> add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver
> for $1500 and you got about 2TB of storage for around $7000

8 drives at 250GB each is 2TB in one 1U chassis ...
	250GB disks is about $250 now days.... maybe less on the online
	webstores

backup of 2TB should be done on another 2TB systems .. 3rd 2TB machine if
the data cannot be recreated 

save only the raw data/apps needed to regenerate the output data

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Tue Aug  5 18:40:27 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Tue, 5 Aug 2003 15:40:27 -0700 (PDT)
Subject: large filesystem & fileserver architecture issues. -hw
In-Reply-To: <1060098427.30922.6.camel@roughneck>
Message-ID: <Pine.LNX.3.96.1030805153528.10494C-100000@Maggie.Linux-Consulting.com>


hi ya

On 5 Aug 2003, Nicholas Henke wrote:

> 
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?

you have basically 2 choices ...

- leave the ide as an ide disks ... ( software raid )
	- get a $50 ide controller ( 4 drives on it ) and 4 drives on the mb

- convert the ide to look like a scsi drives ( tho not really )
	- 3ware 7500-8 series  for 8 "scsi" disks on it

- or get a real hardware raid card for lots of $$$
	- mylex, adaptec

- for a list of hardware raid card that is supported by linux
	http://www.linux-ide.org/chipsets.html
	http://www.1u-raid5.net  sw/hw raid5 howto's

c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From m0ukb at unb.ca  Wed Aug  6 08:50:13 2003
From: m0ukb at unb.ca (White, Adam Murray)
Date: Wed,  6 Aug 2003 09:50:13 -0300
Subject: Performance monitoring tool
Message-ID: <1060174213.3f30f9859afaf@webmail.unb.ca>

Hello,

I am interested in acquiring a good real time cluster performance monitoring tool, which at 
least displays (dynamically while the program is running) each thread's cpu utilization and 
memory usage (graphically). Not a postmortem display. Free as well.

Any help would be much appreciated.

Regards,
A. M. White

######################################################
Adam M. White
University of New Brunswick Saint John
http://www.unbsj.ca/sase/csas
m0ukb at unb.ca
######################################################
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Aug  6 13:21:02 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 6 Aug 2003 13:21:02 -0400 (EDT)
Subject: Performance monitoring tool
In-Reply-To: <1060174213.3f30f9859afaf@webmail.unb.ca>
Message-ID: <Pine.LNX.4.44.0308061235470.28754-100000@lilith.rgb.private.net>

On Wed, 6 Aug 2003, White, Adam Murray wrote:

> Hello,
> 
> I am interested in acquiring a good real time cluster performance monitoring tool, which at 
> least displays (dynamically while the program is running) each thread's cpu utilization and 
> memory usage (graphically). Not a postmortem display. Free as well.
> 
> Any help would be much appreciated.

At this time it won't QUITE do what you like, but it is within spitting
distance of it.  Check out:

  xmlsysd

and

  wulfstat

on brahma (http://www.phy.duke.edu/brahma).

xmlsysd is a daemon that runs on a cluster and obtains by a variety of
means statistics of interest on the system.  Some of these it parses
from proc, others by the use of systems calls.  It is not promiscuous
(it doesn't provide e.g. a complete copy of /proc to clients that
connect to it) but rather offers a digested view that can be throttled
so that one or more "sets" of interesting statistics can be monitored.
This is to keep it lightweight, both on the system it is monitoring and
on the network and client -- it is (literally) a parallel application in
its own right and it isn't a good idea for a monitor application to
significantly compete for any of the resources that might bottleneck a
"production" parallel application.

Its "prepackaged" return sets include load avg (5,10,15 min), memory
(basically the data underlying the "free" command), ethernet network
usage for one or more devices, date/time/cpu information, basically the
kind of data one finds digested at the top of the "top" command or made
available by e.g xosview in kin in graphical windows.

It also has a "pid" mode where it can monitor running processes.  Here
throttling and filtering is a bit trickier, as one generally does NOT
want to monitor every process running on a system with a supposedly
lightweight tool.  I thus implemented pid selection by means of matching
task name or user name, a mode that returns all "userspace" tasks that
have accumulated more than some cutoff in total time (5 seconds?  I
can't remember), as well as a to-be-rarely-used promiscuous mode that
returns everything it can find including root tasks.

xmlsysd's returns are in xml, and hence are easy to parse out with any
xml parser for application in anything you like.

That's the good news.

The other good news is that wulfstat, the provided client, lets you use
most of these features in a tty/ncurses window.

The bad news it that there is no GUI display with little graphs and the
like.  This is mixed news, really, not necessarily bad.  A tty display
lets you use the pgup and pgdn keys and scroll arrows to page quickly
through a lot of hosts, seeing instantly the full detail (actual
numbers) for each field being monitored -- you might find wulfstat to be
adequate.  If it isn't adequate, though, you'll likely need to write
some sort of client application that polls the daemon at some interval
(I tend to use 5 seconds as the default, but it can be set up or down as
low as 1 second, depending on how many hosts one wishes to monitor,
again remembering that it is supposed to be lightweight and that it is a
bad idea to run it so fast that the return latency causes the loop to
pile up).

This should be pretty easy -- you can actually talk to the daemon with
telnet, so watching it work and testing the api is not a problem.
You've got wulfstat sources to play with (both tools fully GPL).  The
daemon returns XML, which is easy to parse out.  Finally, there are a
fair number of tools or libraries that you can pipe this output into to
generate graphs, either on the web or some other console. One day I'll
actually write such a tool myself, but wulfstat proved so adequate for
most of what we use it for that I haven't been able to justify advancing
the project to the top of the triage-heap of bloody and neglected
projects that fill my life:-).  If you do write one, feel free to do so
collaboratively and donate it back to the project so we can all share,
although of course the GPL wouldn't require this as far as I can see for
clients not derived from wulfstat code or that you write for yourself.

xmlsysd and wulfstat have been in "production" use locally for some
time, but they are still probably beta level code because most people
use ganglia with its web-based displays.  Personally I think
xmlsysd/wulfstat provide a pretty rich set of monitor options (and
actually is derived from code I originally wrote and was using somewhat
before the ganglia project was begun, so I can't be accused of foolishly
duplicating an existing project:-).  If you have any problems with them
I will cheerfully fix them, and if you have any ideas for additions or
improvements that wouldn't drive me mad timewise to implement, I was
cheerfully add them.

   rgb

> 
> Regards,
> A. M. White
> 
> ######################################################
> Adam M. White
> University of New Brunswick Saint John
> http://www.unbsj.ca/sase/csas
> m0ukb at unb.ca
> ######################################################
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Tue Aug  5 11:45:20 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Tue, 5 Aug 2003 11:45:20 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <1060025481.28642.81.camel@roughneck>
Message-ID: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>

On 4 Aug 2003, Nicholas Henke wrote:

We have a lot of experience with IDE RAID arrays at client sites.  The DOE
lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them.  
The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write)
and the price is hard to beat.  The raid array that serves home
directories to their clusters and workstations is backed up nightly to a
second raid server, similarly to your system.  To speed things along we
installed an extra gigabit card in the primary and backup servers and
connected the two directly.  The nightly backup (cp -auf via NFS) of 410
GBs take just over an hour using the dedicated gbit link.  Rsync would
probably be faster.  Without the shortcircuit gigabit link, it used to run
four or five times longer and seriously impact NFS performance for the
rest of the systems on the LAN.

Hope this helps.

Regards,

Mike Prinkey
Aeolus Research, Inc.

> Hey all -- here is our situation.
> 
> We currently have several clusters that are configured with either IBM
> x342 or Dell 2650 serves with their respective vendors SCSI RAID arrays
> hanging off of them. Each server + array is good for around 600GB after
> RAID 5 and formatting. The IBM's have the added ability to do a RAID 50
> of multiple arrays ( which seems to work & perform quite nicely ). Each
> of the servers then exports the filesystem via NFS, and is mounted on
> the nodes. The clusters range from 24 to 128 nodes. For backups we
> maintain an offline server + array that we use to rsync the data
> nightly, then use our amanda server  and tape robot to backup. We use an
> offline sync, as we need a level 0 dump every 2 weeks, and doing a level
> 0 dump of 600GB just trashes the performance on a live server. As we are
> a .edu and all of the clusters were purchased by the individual groups,
> the options we can explore have to be very cost efficient for hardware,
> and free for software.
> 
> Now for the problem...
> A couple of our clusters are using the available filespace quite
> rapidly, and we are looking to add space. The most cost efficient
> approach we have found is to buy a IDE RAID box, like those available
> from RaidZone or PogoLinux. This allows us to use the cheap IDE systems
> as the offline sync, and use the scsi systems as online servers.
> 
> And the questions:
> 
> 1) Is there a better way to backup the systems without the need for an
> offline sync? 
> 
> 2) Does anyone have experience doing RAID 50 with Dell hardware? How bad
> does it bite ?
> 
> 3) Are there any recommended IDE RAID systems? We are not looking for
> super stellar performance, just a solid system that does it's job as an
> offline sync for backups.
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Tue Aug  5 12:34:03 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Tue, 5 Aug 2003 12:34:03 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <1060098427.30922.6.camel@roughneck>
Message-ID: <Pine.LNX.4.44.0308051157020.2488-100000@ra.thebes>

On 5 Aug 2003, Nicholas Henke wrote:
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?
> 
> Nic
> 

I started building these arrays when 20 GBs was a big drive and hardware
ide raid controllers were very expensive.  So old habits die hard.  Most
of my experience has been with Software RAID in Linux.  We use Promise
Ultra66/100/133 controller cards, Maxtor 80 - 200 GB 5400-rpm drives, and
Intel-chipset motherboards.

I use the Promise cards, again because they were what was available and
supported in Linux in the late 90s.  They are limited to two IDE channels
per cards, but I have used 3 cards in addition to the on-board IDE in
large arrays before.  Some people buy the IDE Raid cards that have 4 or 8
IDE channels and then use Software RAID instead.  The conventional wisdom
is that you should only put one drive on each IDE channel to maximize
performance.  I have built arrays with single drive per channel and two
drives per channel and find that is not really true for ATA100 and faster
controllers.  Two of these drives cannot saturate a 100 or 133 MB/s
channel.

Typically, we put eight drives in an array.  I have been using a 4U rack
enclosure that has 8 exposed 5.25 bays.  This works well because mounting
the drives in a 5.25 bay gives a nice air gap for cooling.  Stacking 3 or
more drives tightly together heats the middle ones up quite a bit.  I also
usually use 5400-RPM drives to keep the heat production down.

I only use Intel chipset motherboards.  Normally just single CPU P4.  One
of the boards with 1 or 2 onboard gigabit controllers would be a nice
choice.  1 GB of RAM is more than enough, but do use ECC.  Also, if you
use the newest kernels, the onboard IDE controllers are fast enough to be
used in the array.  For an 8-drive array, I will normally use 1 promise
addin card and the two on-board channels.

Important Miscellany:

  - Power Supply.  Don't skimp.  400W+ from a good vendor

  - IDE cables <=24" long.  I tried to use the 36" IDE cables once and it
nearly drove me nuts with drive corruption and random errors.  The 24"  
ones work very well and usually give you enough length to route to 8
drives in an enclosure.  Once Serial ATA gets cheaper, this will no longer 
be an issue.

  - UPS.  In general, you can NEVER allow a power failure to take down the
raid server.  There is at least a 50% chance of low-level drive corruption
on an 8-drive array if it loses power.  (Don't ask about the time the
cleaning crew unplugged the array from the USP!)  We use a smart UPS and
UPS monitoring software (upsmon) to unmount the array and raidstop it if
the power goes out for more than 30 secs.  I am also tempted to not even
connect the power switch on the front panel.  Reseting a crashed system is
OK, but powering it off doesn't give the hard drives a chance to flush
their buffers to disk.  With 8+ spinning drives, there is a good chance at
least one of them will be corrupted.

  - Bonnie and burn-in.  There are many problems that can crop up when you
build the array.  IRQ issues, etc.  It is paramount that you throughly
abuse the array with something like bonnie to make sure that everything is
working.  I typically mkraid which starts the array synching, mke2fs on
the raid device, and then mount the filesystem and run bonnie on it all
while it is still synching.  This is pretty hard on the whole system and
if there is a problem, you will notice quickly.  Once it is done
resyncing, I usually run bonnie overnight to burn it in and verify that
performance is reasonable.

  - Fixing things.  If you do have a power failure and the raid doesn't 
come back up, it is usually do to a hard drive problem.  The only way to 
fix it is to run a low-level utility (Maxtor Powermax) on the drive.  
Maybe someone know how to do something similary within Linux.  If so, I 
would love to hear about it.

Again, our approach is not necessarily exhaustively researched.  This is
just "what we do."  So, take it for what it's worth.

Best,

Mike Prinkey
Aeolus Research, Inc.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Mon Aug  4 09:50:09 2003
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Mon, 04 Aug 2003 08:50:09 -0500
Subject: Cisco switches for lam mpi
In-Reply-To: <Pine.LNX.4.33.0308041355060.24026-100000@maloney.ethz.ch>
References: <Pine.LNX.4.33.0308041355060.24026-100000@maloney.ethz.ch>
Message-ID: <3F2E6491.1020802@tamu.edu>

I should have commented earlier, but I didn't think I had time...

My experience with the Cisco 4006 was that as an aggregation switch it 
was OK for 10/100 or GBE.  It did fine for normal "enterprise switching. 
  The 4006's I've used had only older Supervisor Modules and ran CAT-OS, 
rather than IOS like the 4506 I'm testing now.

For higher performance, while CPU utilization stays low, the switch 
falls off at higher loads.

Caveat:  I did not test these devices in a cluster environment; the 
thought never crossed my mind.  I'd be using a 6509 if I had to use a 
Cisco, but I'd probably be shopping for HP ProCurves, Foundry's, 
Riverstones, or NEC Bluefires, based on what I've seen and done lately. 
  I tested the 4006 in normal enterprise mode, and loaded it for 
high-perf network modes.  If you ever need QoS do NOT use a 4006.  Or a 
4506.  They can't handle it too well.  But I digress.

I'm gonna try to get a couple of ProCurves in and test 'em against a LAN 
tester made by Anritsu (MD1230/1231) for small packet capability 
(RFC-2544).  That's been a killer for a lot of switches I've looked at.

gerry

Felix Rauch wrote:
> On Tue, 29 Jul 2003, Jack Douglas wrote:
> 
>>We have just installed a 32 Node Dual Xeon Cluster, with a Cisco Cataslyst
>>4003 Chassis with 48 1000Base-t ports.
>>
>>We are running LAM MPI over gigabit, but we seem to be experiencing
>>bottlenecks within the switch
>>
>>Typically, using the cisco, we only see CPU utilisation of around 30-40%
> 
> [...]
> 
> I'm not a Cisco expert, but...
> 
> We once got a Cisco switch from our networking people that we had to
> return immediately because it delivered such a bad performance. It was
> a Catalyst 2900XL with 24 Fast Ethernet ports, but it could only
> handle 12 ports at full speed. Above that, the performance brake down
> completely.
> 
> For some benchmark results see, e.g.:
> http://www.cs.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf
> 
> As a comparison, the quite nice results of a CentreCom 742i:
> http://www.cs.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf
> 
> Disclaimer: Maybe the Cisco you mentioned is better, or Ciscos improved
> anyway since spring 2001 when I did the above tests. Besides, the
> situation for Gigabit Ethernet could be different.
> 
> As we described on our workshop paper at CAC03 you can not trust the
> data sheets of switches anyway:
> http://www.cs.inf.ethz.ch/CoPs/publications/#cac03
> 
> Conclusion: If you need a very high performing switch, you have to
> evaluate/benchmark it yourself.
> 
> - Felix
> 

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Wed Aug  6 08:07:45 2003
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Wed, 06 Aug 2003 07:07:45 -0500
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <1060098427.30922.6.camel@roughneck>
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes> <1060098427.30922.6.camel@roughneck>
Message-ID: <3F30EF91.6080606@tamu.edu>

We just implemented an IDE RAID system for some meteorology data/work. 
We're pretty happy with the results so far.  Our hardwre complement is:

SuperMicro X5DAE Motherboard
dual Xeon 2.8GHz processors
2 GB Kingston Registered ECC RAM
2 HighPoint RocketRAID 404 4-channel IDE RAID adapters
10 Maxtor 250 GB 7200 RPM disks
1 Maxtor 60 GB drive for system work
1 long multi-drop disk power cable...
SuperMicro case (nomenclature escapes me, however, it has 1 disk bays 
and fits the X5DAE MoBo
Cheapest PCI video card I could find (no integrated video on MoBo)
Add-on Intel GBE SC fiber adapter

Drawbacks:
1.  I should have checked for integrated video for simplicity
2.  Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with 
ALL the patches
3.  Make sure you order the rack mount parts when you order the case; it 
  only appeared they were included...
4.  Questions have been raised about the E-1000 integrated GBE copper 
NIC on the Mobo; Doesn't matter: it's gonna be connected to a 100M 
switch and GBE will be on fiber like God intended data to be passed (No, 
I don't trust most terminations for GBE on copper!)

It's up and working.  Burning in for the last 2 weeks with no problems, 
it's going to the Texas GigaPoP today where it'll be live on Internet2.

HTH, Gerry

Nicholas Henke wrote:
> On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> 
>>On 4 Aug 2003, Nicholas Henke wrote:
>>
>>We have a lot of experience with IDE RAID arrays at client sites.  The DOE
>>lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them.  
>>The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write)
>>and the price is hard to beat.  The raid array that serves home
>>directories to their clusters and workstations is backed up nightly to a
>>second raid server, similarly to your system.  To speed things along we
>>installed an extra gigabit card in the primary and backup servers and
>>connected the two directly.  The nightly backup (cp -auf via NFS) of 410
>>GBs take just over an hour using the dedicated gbit link.  Rsync would
>>probably be faster.  Without the shortcircuit gigabit link, it used to run
>>four or five times longer and seriously impact NFS performance for the
>>rest of the systems on the LAN.
>>
>>Hope this helps.
>>
>>Regards,
>>
>>Mike Prinkey
>>Aeolus Research, Inc.
> 
> 
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?
> 
> Nic

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Douglas.L.Farley at nasa.gov  Wed Aug  6 08:35:10 2003
From: Douglas.L.Farley at nasa.gov (Doug Farley)
Date: Wed, 06 Aug 2003 08:35:10 -0400
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <009701c35b85$714e7110$7101a8c0@Navatek.local>
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>
 <1060098427.30922.6.camel@roughneck>
Message-ID: <5.0.2.1.2.20030806081148.00a94be8@pop.larc.nasa.gov>

I noticed with acnc's 14 unit raid they used an IDE-SCSI U3 something or 
another, anyone know what type of hardware they used to convert the drives 
for this array? Just direct IDE-SCSI adaptors (which I've not seen cheaper 
than $80) on each drive and then connecting to something like an adaptec 
Raid card?  Does anyone have any experience with doing this (with off the 
shelf parts) to create a semi-cheep raid (maybe 10 x $250 for 250G disk, + 
10 x $80 IDE-SCSI converter + $800 expensive adaptec 2200 esq card 
)?  Those costs are higher (~$420/disk ) than doing 10 disks on a 3ware 
7500-12 (~$320/disk)  (costs excluding host system), so is whatever gained 
really worth it?

Doug

At 09:11 AM 8/5/2003 -1000, you wrote:
>I have 2 IDE RAID boxes from AC&C (http://www.acnc.com) They are not true
>NFS boxes, rather they connect to a cheap $1500 server via a scsi-3 cable.
>Although they do offer a NFS box that will turn one of these arrays into a
>standalone.  We have had great success with these units
>(http://neptune.navships.com/images/harddrivearrays.jpg) .  We first
>acquired the 8 slot chassis 2 years ago and filled it with 8 IBM 120GXP's.
>We have set it up in a RAID-5 configuration and have not yet had to replace
>even one of the drives (Knockin on wood).  After a year we picked up the
>14slot chassis and filled it with 160 maxtor drives and it has performed
>flawless...  I think we paig about $4000 for the 14 slot chassis. you can
>add 14 160 gb seagates for $129 from newegg.com and and a cheap fileserver
>for $1500 and you got about 2TB of storage for around $7000
>
>Mitchel Kagawa
>
>----- Original Message -----
>From: "Nicholas Henke" <henken at seas.upenn.edu>
>To: "Michael T. Prinkey" <mprinkey at aeolusresearch.com>
>Cc: <beowulf at beowulf.org>
>Sent: Tuesday, August 05, 2003 5:47 AM
>Subject: Re: large filesystem & fileserver architecture issues.
>
>
> > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> > > On 4 Aug 2003, Nicholas Henke wrote:
> > >
> > > We have a lot of experience with IDE RAID arrays at client sites.  The
>DOE
> > > lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for
>them.
> > > The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec
>write)
> > > and the price is hard to beat.  The raid array that serves home
> > > directories to their clusters and workstations is backed up nightly to a
> > > second raid server, similarly to your system.  To speed things along we
> > > installed an extra gigabit card in the primary and backup servers and
> > > connected the two directly.  The nightly backup (cp -auf via NFS) of 410
> > > GBs take just over an hour using the dedicated gbit link.  Rsync would
> > > probably be faster.  Without the shortcircuit gigabit link, it used to
>run
> > > four or five times longer and seriously impact NFS performance for the
> > > rest of the systems on the LAN.
> > >
> > > Hope this helps.
> > >
> > > Regards,
> > >
> > > Mike Prinkey
> > > Aeolus Research, Inc.
> >
> > Definately does -- can you recommend hardware for the IDE RAID, or list
> > what you guys have used ?
> >
> > Nic
> > --
> > Nicholas Henke
> > Penguin Herder & Linux Cluster System Programmer
> > Liniac Project - Univ. of Pennsylvania
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

==============================
Doug Farley

Data Analysis and Imaging Branch
Systems Engineering Competency
NASA Langley Research Center

< D.L.FARLEY at LaRC.NASA.GOV >
< Phone +1 757 864-8141 >

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ctierney at hpti.com  Wed Aug  6 15:09:59 2003
From: ctierney at hpti.com (Craig Tierney)
Date: 06 Aug 2003 13:09:59 -0600
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <3F30EF91.6080606@tamu.edu>
References: <Pine.LNX.4.44.0308051132350.2488-100000@ra.thebes>
	 <1060098427.30922.6.camel@roughneck>  <3F30EF91.6080606@tamu.edu>
Message-ID: <1060196998.8961.17.camel@woody>

On Wed, 2003-08-06 at 06:07, Gerry Creager N5JXS wrote:
> We just implemented an IDE RAID system for some meteorology data/work. 
> We're pretty happy with the results so far.  Our hardwre complement is:
> 
> SuperMicro X5DAE Motherboard
> dual Xeon 2.8GHz processors
> 2 GB Kingston Registered ECC RAM
> 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters
> 10 Maxtor 250 GB 7200 RPM disks
> 1 Maxtor 60 GB drive for system work
> 1 long multi-drop disk power cable...
> SuperMicro case (nomenclature escapes me, however, it has 1 disk bays 
> and fits the X5DAE MoBo
> Cheapest PCI video card I could find (no integrated video on MoBo)
> Add-on Intel GBE SC fiber adapter
> 

Hardware choices look good.  How did you configure it?
Are there 1 or 2 filesystems?  Raid 0, 1, 5?  Do you
have any performance numbers on the setup (perferably 
large file, dd type tests)?

Thanks,
Craig


> Drawbacks:
> 1.  I should have checked for integrated video for simplicity
> 2.  Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with 
> ALL the patches
> 3.  Make sure you order the rack mount parts when you order the case; it 
>   only appeared they were included...
> 4.  Questions have been raised about the E-1000 integrated GBE copper 
> NIC on the Mobo; Doesn't matter: it's gonna be connected to a 100M 
> switch and GBE will be on fiber like God intended data to be passed (No, 
> I don't trust most terminations for GBE on copper!)
> 
> It's up and working.  Burning in for the last 2 weeks with no problems, 
> it's going to the Texas GigaPoP today where it'll be live on Internet2.
> 
> HTH, Gerry
> 
> Nicholas Henke wrote:
> > On Tue, 2003-08-05 at 11:45, Michael T. Prinkey wrote:
> > 
> >>On 4 Aug 2003, Nicholas Henke wrote:
> >>
> >>We have a lot of experience with IDE RAID arrays at client sites.  The DOE
> >>lab in Morgantown, WV has about 4 TBs of IDE RAID that we built for them.  
> >>The performance is quite good (840 GBs, 140 MB/sec read, 80 MB/sec write)
> >>and the price is hard to beat.  The raid array that serves home
> >>directories to their clusters and workstations is backed up nightly to a
> >>second raid server, similarly to your system.  To speed things along we
> >>installed an extra gigabit card in the primary and backup servers and
> >>connected the two directly.  The nightly backup (cp -auf via NFS) of 410
> >>GBs take just over an hour using the dedicated gbit link.  Rsync would
> >>probably be faster.  Without the shortcircuit gigabit link, it used to run
> >>four or five times longer and seriously impact NFS performance for the
> >>rest of the systems on the LAN.
> >>
> >>Hope this helps.
> >>
> >>Regards,
> >>
> >>Mike Prinkey
> >>Aeolus Research, Inc.
> > 
> > 
> > Definately does -- can you recommend hardware for the IDE RAID, or list
> > what you guys have used ?
> > 
> > Nic
-- 
Craig Tierney <ctierney at hpti.com>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Wed Aug  6 16:55:09 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed, 6 Aug 2003 16:55:09 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <3F30EF91.6080606@tamu.edu>
Message-ID: <Pine.LNX.4.44.0308061644300.22413-100000@coffee.psychology.mcmaster.ca>

> SuperMicro X5DAE Motherboard
> dual Xeon 2.8GHz processors
> 2 GB Kingston Registered ECC RAM
> 2 HighPoint RocketRAID 404 4-channel IDE RAID adapters
> 10 Maxtor 250 GB 7200 RPM disks
> 1 Maxtor 60 GB drive for system work
> 1 long multi-drop disk power cable...
> SuperMicro case (nomenclature escapes me, however, it has 1 disk bays 
> and fits the X5DAE MoBo
> Cheapest PCI video card I could find (no integrated video on MoBo)
> Add-on Intel GBE SC fiber adapter
> 
> Drawbacks:
> 1.  I should have checked for integrated video for simplicity

I did something similar a little while back: a tyan thunder e7500 board,
just one Xeon, just 1G ram, integrated video/gigabit (copper), 3ware 8500-8 
in jbod mode, 8x200G WD JB disks and a ~500W PS.

I don't see any reason for adding extra ram or putting in multiple,
higher-powered CPUs for a fileserver.

> 2.  Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with 
> ALL the patches

I'll be doing more boxes, probably with something like 8x250 SATA disks,
with a pair of promise tx4 cards.  open-source drivers for these cards 
recently became available, btw.

there was a very interesting talk at OLS about doing raid intelligently
over a network...

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From luis.licon at yakko.cimav.edu.mx  Thu Aug  7 12:05:45 2003
From: luis.licon at yakko.cimav.edu.mx (Luis Fernando Licon Padilla)
Date: Thu, 07 Aug 2003 10:05:45 -0600
Subject: test
Message-ID: <3F3278D9.5000709@yakko.cimav.edu.mx>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From John.Hearns at micromuse.com  Thu Aug  7 09:12:55 2003
From: John.Hearns at micromuse.com (John Hearns)
Date: Thu, 07 Aug 2003 14:12:55 +0100
Subject: AMD core maths  library
Message-ID: <3F325057.4080801@micromuse.com>

Sorry if this is old news to everyone.

I saw a snippet in Linux Magazine (UK/German type) on the AMD Core Math 
Library
for Opterons.
https://wwwsecure.amd.com/gb-uk/Processors/DevelopWithAMD/0,,30_2252_2282,00.html 


Says it is initially released in FORTAN, with BLAS, LAPACK and FFTs.
g77 under Linux and Windows.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry.creager at tamu.edu  Thu Aug  7 09:54:29 2003
From: gerry.creager at tamu.edu (Gerry Creager N5JXS)
Date: Thu, 07 Aug 2003 08:54:29 -0500
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308061644300.22413-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0308061644300.22413-100000@coffee.psychology.mcmaster.ca>
Message-ID: <3F325A15.80901@tamu.edu>

Mark Hahn wrote:
>>SuperMicro X5DAE Motherboard
>>dual Xeon 2.8GHz processors
>>2 GB Kingston Registered ECC RAM
>>2 HighPoint RocketRAID 404 4-channel IDE RAID adapters
>>10 Maxtor 250 GB 7200 RPM disks
>>1 Maxtor 60 GB drive for system work
>>1 long multi-drop disk power cable...
>>SuperMicro case (nomenclature escapes me, however, it has 1 disk bays 
>>and fits the X5DAE MoBo
>>Cheapest PCI video card I could find (no integrated video on MoBo)
>>Add-on Intel GBE SC fiber adapter
>>
>>Drawbacks:
>>1.  I should have checked for integrated video for simplicity
> 
> 
> I did something similar a little while back: a tyan thunder e7500 board,
> just one Xeon, just 1G ram, integrated video/gigabit (copper), 3ware 8500-8 
> in jbod mode, 8x200G WD JB disks and a ~500W PS.
> 
> I don't see any reason for adding extra ram or putting in multiple,
> higher-powered CPUs for a fileserver.

This one will A) be on the Unidata weather distribution network for 
general weather data AND the newer real-time radar feeds; B) be 
extracting some of that data for graphics; C) be doing NNTP for Unidata 
(one, exactly, newsgroup) for a research project; D) reside on the I2 
Logistical Backbone...  It's a busy box.

>>2.  Current HighPoint drivers for RH9 are not RAID yet; use RH7.3 with 
>>ALL the patches
> 
> I'll be doing more boxes, probably with something like 8x250 SATA disks,
> with a pair of promise tx4 cards.  open-source drivers for these cards 
> recently became available, btw.
> 
> there was a very interesting talk at OLS about doing raid intelligently
> over a network...

Check out loki.cs.utk.edu (I think: It's certainly a project called 
'loki' and run by Micah Beck at utk.edu) about the logistical backbone.

I didn't go with Promise cards because of one of my grad students, who's 
obviously better funded than me... He's looked at Promise, HighPoint and 
at least one other card, and had comparisons, and strongly recommended 
HighPoint as a Price/Performance leader.  The HighPoints were less 
expensive and currently boast the same performance as the tx4's.

Everyone's getting into the SATA game; I didn't go that way because I 
wanted to get to the 2 TB point and couldn't reasonably do it today with 
SATA; maybe later.

I didn't want to take the time to hack the drivers HighPoint had 
available, since i'm overloaded these days.
-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Wed Aug  6 19:55:07 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Wed, 6 Aug 2003 19:55:07 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308061644300.22413-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0308061953190.26626-100000@ra.thebes>

On Wed, 6 Aug 2003, Mark Hahn wrote:
> 
> there was a very interesting talk at OLS about doing raid intelligently
> over a network...
> 

I have considered trying this using network block devices, but I haven't 
had the opportunity to try it.  Is this what you are talking about or 
something different?

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Aug  7 14:15:39 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 7 Aug 2003 14:15:39 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308061953190.26626-100000@ra.thebes>
Message-ID: <Pine.LNX.4.44.0308071410520.29492-100000@coffee.psychology.mcmaster.ca>

On Wed, 6 Aug 2003, Michael T. Prinkey wrote:

> On Wed, 6 Aug 2003, Mark Hahn wrote:
> > 
> > there was a very interesting talk at OLS about doing raid intelligently
> > over a network...
> > 
> 
> I have considered trying this using network block devices, but I haven't 
> had the opportunity to try it.  Is this what you are talking about or 
> something different?

thre are similarities:

http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf

but it's really a development beyond NBD or DRDB.  hmm, I'm not sure 
that brief pdf is either complete or does the idea justice.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at physics.mcmaster.ca  Thu Aug  7 15:15:01 2003
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Thu, 7 Aug 2003 15:15:01 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308071426470.26626-100000@ra.thebes>
Message-ID: <Pine.LNX.4.44.0308071459110.29492-100000@coffee.psychology.mcmaster.ca>

> I read the abstract last evening and got a taste for it.  That is really a 
> remarkable idea to use the ethernet checksum for data integrity of stored 
> data.  Thanks for the heads-up.

for me, the crux of the idea is:

	- if you want big storage, $/GB drives you to IDE.

	- IDE is not amazingly fast, reliable or scalable.

	- building storage bricks out of IDE makes a lot of sense,
	since they can now be quite dense, low-overhead, etc.

	- ethernet is a wonderfully hot-pluggable interconnect for this 
	kind of thing.

	- doing raid over a multicast-capable network is pretty cool.

	- using eth's checksumming is pretty cool.

	- doing it this way (all open-source, including software raid)
	means the system is much more transparent - you are not dependent
	on some closed-source vendor tools to control/monitor/upgrade
	your storage.

Ben's approach (along with Lustre, for instance) seems very sweet for HPC
type storage needs.

one thing I do ponder, though, is whether it really makes sense to hide 
raid so firmly under the block layer.  it's conceptually tidy, to be sure,
and works well in practice.  but suppose:

	- to create a filesystem, you hand some arbitrary collection of 
	block-device extents to the mkfs tool.  you also let it know 
	which extents happen to reside on the same disk, bus, host, UPS,
	geographic location, etc.

	- you can tell the FS that your default policy should be for 
	reliability - that raid5 across separate disks is OK, for instance.
	or maybe you can tell it that a particular file should be raid10
	instead.  or that a file should be raid1 across each geographic site.
	or that updates to a file should be logged.  or that it should 
	transparently compress older files.

	- the FS might do other HSM-like things, such as incorporating
	knowlege of what's on your tape/DVD/cdrom's.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mprinkey at aeolusresearch.com  Thu Aug  7 14:33:23 2003
From: mprinkey at aeolusresearch.com (Michael T. Prinkey)
Date: Thu, 7 Aug 2003 14:33:23 -0400 (EDT)
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308071410520.29492-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.44.0308071426470.26626-100000@ra.thebes>

> > 
> > I have considered trying this using network block devices, but I haven't 
> > had the opportunity to try it.  Is this what you are talking about or 
> > something different?
> 
> thre are similarities:
> 
> http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-LaHaise-OLS2003.pdf
> 
> but it's really a development beyond NBD or DRDB.  hmm, I'm not sure 
> that brief pdf is either complete or does the idea justice.
> 

I read the abstract last evening and got a taste for it.  That is really a 
remarkable idea to use the ethernet checksum for data integrity of stored 
data.  Thanks for the heads-up.

Mike

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From twhitcomb at apl.washington.edu  Thu Aug  7 15:55:15 2003
From: twhitcomb at apl.washington.edu (Timothy R. Whitcomb)
Date: Thu, 7 Aug 2003 12:55:15 -0700 (PDT)
Subject: (Scyld) Nodes going down unexpectedly
Message-ID: <Pine.LNX.4.44.0308071251210.26263-100000@snark.apl.washington.edu>

We have a 10-processor cluster and are currently running a weather model
on 4 of the processors.  When I try to up the number, it works for a
while, then the "beostatus" window will show one node's information not
changing for a little while before it shows the node status as "down".
Each node is dual-processor and I have noticed (but not verified) that
this becomes an issue when both processors on a node are in use.

After the node status changes to "down", I cannot restart it through the
console tools on the root node.  However, I know that the node is still
alive and on the network because I can ping it successfully.  This problem
requires me to actually restart the node by hand, which is a bit of an
issue since we're on opposite sides of the building.

What's going on here and what can I do to mitigate/fix this?

Tim Whitcomb
twhitcomb at apl.washington.edu
Applied Physics Lab
University of Washington

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Aug  7 17:51:10 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 7 Aug 2003 14:51:10 -0700
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <Pine.LNX.4.44.0308071459110.29492-100000@coffee.psychology.mcmaster.ca>
References: <Pine.LNX.4.44.0308071426470.26626-100000@ra.thebes> <Pine.LNX.4.44.0308071459110.29492-100000@coffee.psychology.mcmaster.ca>
Message-ID: <20030807215110.GA2780@greglaptop.internal.keyresearch.com>

On Thu, Aug 07, 2003 at 03:15:01PM -0400, Mark Hahn wrote:

> 	- IDE is not amazingly fast, reliable or scalable.

That's about like saying "commondity servers are not fast, reliable,
or scalable, so I'm going to buy an SGI Altix instead of a Beowulf."

More facts, less religion.

-- greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From roger at ERC.MsState.Edu  Thu Aug  7 18:04:25 2003
From: roger at ERC.MsState.Edu (Roger L. Smith)
Date: Thu, 7 Aug 2003 17:04:25 -0500
Subject: large filesystem & fileserver architecture issues.
In-Reply-To: <20030807215110.GA2780@greglaptop.internal.keyresearch.com>
References: <Pine.LNX.4.44.0308071426470.26626-100000@ra.thebes>
 <Pine.LNX.4.44.0308071459110.29492-100000@coffee.psychology.mcmaster.ca>
 <20030807215110.GA2780@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.SGI.4.56.0308071703040.1001@Downforce.ERC.MsState.Edu>

On Thu, 7 Aug 2003, Greg Lindahl wrote:

> On Thu, Aug 07, 2003 at 03:15:01PM -0400, Mark Hahn wrote:
>
> > 	- IDE is not amazingly fast, reliable or scalable.
>
> That's about like saying "commondity servers are not fast, reliable,
> or scalable, so I'm going to buy an SGI Altix instead of a Beowulf."
>
> More facts, less religion.

Since when has the value of facts outweighed religion on *THIS* list?!

 _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
| Roger L. Smith                        Phone: 662-325-3625               |
| Sr. Systems Administrator             FAX:   662-325-7692               |
| roger at ERC.MsState.Edu                 http://WWW.ERC.MsState.Edu/~roger |
|                       Mississippi State University                      |
|____________________________________ERC__________________________________|
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sun Aug 10 04:16:42 2003
From: rouds at servihoo.com (RoUdY)
Date: Sun, 10 Aug 2003 12:16:42 +0400
Subject: Implementing MPICH2-0.93
In-Reply-To: <200308081902.h78J20w27961@NewBlue.Scyld.com>
Message-ID: <web-19828605@servihoo.com>

Hello
Can someone tell me if they ever use this MPI version. 
Because I have some difficulty in implementing it. I was 
unable to implement the slave nodes.
thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sun Aug 10 04:16:42 2003
From: rouds at servihoo.com (RoUdY)
Date: Sun, 10 Aug 2003 12:16:42 +0400
Subject: Implementing MPICH2-0.93
In-Reply-To: <200308081902.h78J20w27961@NewBlue.Scyld.com>
Message-ID: <web-19828605@servihoo.com>

Hello
Can someone tell me if they ever use this MPI version. 
Because I have some difficulty in implementing it. I was 
unable to implement the slave nodes.
thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Sun Aug 10 14:51:40 2003
From: gropp at mcs.anl.gov (William Gropp)
Date: Sun, 10 Aug 2003 13:51:40 -0500
Subject: Implementing MPICH2-0.93
In-Reply-To: <web-19828605@servihoo.com>
References: <200308081902.h78J20w27961@NewBlue.Scyld.com>
Message-ID: <5.1.1.6.2.20030810135037.016afce8@localhost>

At 12:16 PM 8/10/2003 +0400, RoUdY wrote:
>Hello
>Can someone tell me if they ever use this MPI version. Because I have some 
>difficulty in implementing it. I was unable to implement the slave nodes.

Questions and bug reports on MPICH2 should be sent to 
mpich2-maint at mcs.anl.gov .  Thanks!

Bill 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gropp at mcs.anl.gov  Sun Aug 10 14:51:40 2003
From: gropp at mcs.anl.gov (William Gropp)
Date: Sun, 10 Aug 2003 13:51:40 -0500
Subject: Implementing MPICH2-0.93
In-Reply-To: <web-19828605@servihoo.com>
References: <200308081902.h78J20w27961@NewBlue.Scyld.com>
Message-ID: <5.1.1.6.2.20030810135037.016afce8@localhost>

At 12:16 PM 8/10/2003 +0400, RoUdY wrote:
>Hello
>Can someone tell me if they ever use this MPI version. Because I have some 
>difficulty in implementing it. I was unable to implement the slave nodes.

Questions and bug reports on MPICH2 should be sent to 
mpich2-maint at mcs.anl.gov .  Thanks!

Bill 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Mon Aug 11 22:44:45 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Tue, 12 Aug 2003 10:44:45 +0800 (CST)
Subject: PBSPro with 1024 nodes :-O (oh!)
Message-ID: <20030812024445.80371.qmail@web16812.mail.tpe.yahoo.com>

Looks like the problems with OpenPBS in large clusters
were all fixed in PBSPro, ASU has a 1024 node cluster
(http://www.pbspro.com/press_030811.html).

Also, heard from PBS developers that the next release
of PBSPro (5.4) will add fault tolerance in the master
node, very similar to the shadow master concept in
Gridengine.

Sounds to me PBSPro is very much better than OpenPBS.

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Tue Aug 12 20:59:17 2003
From: becker at scyld.com (Donald Becker)
Date: Tue, 12 Aug 2003 20:59:17 -0400 (EDT)
Subject: $900,000 RFP for climate simulation machine at UC Irvine (fwd)
Message-ID: <Pine.LNX.4.44.0308122058400.1561-100000@beotest.scyld.com>


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

---------- Forwarded message ----------
Date: Tue, 12 Aug 2003 16:54:44 -0700
From: Charlie Zender <zender at uci.edu>
To: Donald Becker <becker at scyld.com>
Subject: $900,000 RFP for climate simulation machine at UC Irvine

Dear Donald,

Ooops. Forgot the announcement itself. Here it is. Please disseminate!

Thanks,
Charlie
Cut here
======================================================================

Dear High Performance Computing Vendor,

The University of California at Irvine is pleased to announce the
immediate availability of US$900,000 towards the purchase of an
Earth System Modeling Facility (ESMF). Following a competitive bid
process open to all interested vendors, the ESMF contract will be
awarded to the proposal with the most competitive response to our
Request for Proposals (RFP). All necessary details about the ESMF and 
the RFP process are available from the ESMF homepage:

http://www.ess.uci.edu/esmf

Bids are due August 22, 2003. Please visit the ESMF homepage for more
details and contact Mr. Ralph Kupcha <rckupcha at uci.edu> with any
questions. All further contact contact with potential vendors will
take place on the ESMF Potential Vendor Mail List. You may subscribe
to this list by visiting 

https://maillists.uci.edu/mailman/listinfo/esmfvnd

Please pass this Announcement of Opportunity on to any interested
colleagues.

Sincerely,
Ralph Kupcha
Senior Buyer, Procurement Services, UCI


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jhearns at micromuse.com  Thu Aug 14 05:04:57 2003
From: jhearns at micromuse.com (John Hearns)
Date: Thu, 14 Aug 2003 10:04:57 +0100
Subject: Slashdot thread on supercomputers
Message-ID: <3F3B50B9.4090405@micromuse.com>

Everyone has probably seen the thread on Slashdot.
Here are links to the two relevant stories.

http://www.eetimes.com/story/OEG20030811S0018
http://www.eetimes.com/story/OEG20030812S0011

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From farschad at myrealbox.com  Thu Aug 14 15:21:47 2003
From: farschad at myrealbox.com (Farschad Torabi)
Date: Fri, 15 Aug 2003 00:11:47 +0450
Subject: MPICH
Message-ID: <1060890107.c2a01a60farschad@myrealbox.com>

Hi,
I am a new user to this mailing list. 
And also I am very new to Beowulf clusters.
So I will have to many questions, please be patient :^)

At the moment, I want to run a sample program using
MPI. The program is in F90 and I use PGF90 to compile it.

I installed the MPICH and pgf90 and compiled the program successfully. Now, my question is that how can I run the output executable file on a cluster??

Should I use lamboot to make the systems ready to work together?? It seems that lamboot is not for MPICH. It is for LAM and now I want to know, what is 
the alternative command for lamboot!!

Thank you in advance
Farschad Torabi

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jconnor at atmos.colostate.edu  Thu Aug 14 18:55:20 2003
From: jconnor at atmos.colostate.edu (Jason Connor)
Date: 14 Aug 2003 16:55:20 -0600
Subject: MPICH
In-Reply-To: <1060890107.c2a01a60farschad@myrealbox.com>
References: <1060890107.c2a01a60farschad@myrealbox.com>
Message-ID: <1060901719.6160.11.camel@gentoo.atmos.colostate.edu>

Hi Farschad,

Here are only some possible answers to your questions. Like all things,
there is more than one way to do these things.

On Thu, 2003-08-14 at 13:21, Farschad Torabi wrote:
> Hi,
> I am a new user to this mailing list. 
> And also I am very new to Beowulf clusters.
> So I will have to many questions, please be patient :^)
> 
> At the moment, I want to run a sample program using
> MPI. The program is in F90 and I use PGF90 to compile it.
> 
> I installed the MPICH and pgf90 and compiled the program successfully. Now, my question is that how can I run the output executable file on a cluster??

using mpich:
<mpich install directory>/bin/mpirun -np <# of nodes to run on> \
-machinefile <mpich install directory>/util/machines/machines.LINUX \
<executable name>
the -machinefile doesn't need need to be explicit, as long as you have
the file mentioned above filled with the names of your cluster nodes.

mpirun --help is always a good reference =)

> 
> Should I use lamboot to make the systems ready to work together?? It seems that lamboot is not for MPICH. It is for LAM and now I want to know, what is 
> the alternative command for lamboot!!

There isn't one. Just have whatever shell your using with mpich (rsh or
ssh) setup so that you don't need a password to login to the nodes.

> 
> Thank you in advance
> Farschad Torabi
> 

I hope this helps.
In case you care, I like lam better. =)

Jason Connor
Colorado State University
Prof. Scott Denning's BioCycle Research Group
jconnor at atmos.colostate.edu

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Aug 14 21:52:06 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 15 Aug 2003 11:52:06 +1000
Subject: Scalable PBS
Message-ID: <200308151152.09499.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

Just joined the list, so apologies if this is already well known.

I noticed a recent message in the archive about OpenPBS and problems with 
scalability, and I think it's worth noting that there is an alternative (and 
actively developed) fork of OpenPBS called "Scalable PBS" available from:

	http://www.supercluster.org/projects/pbs/

Amongst other features it has (quoting the website):

	Better Scalability
	   - Significantly improved server to MOM communication model, the ability to
	     handle larger clusters, larger jobs, larger messages, etc.
	   - Scales up to 2K nodes vs ~300 nodes for standard OpenPBS.
 
	Improved Usability by incorporating more extensive logging, as well as, more
	human readable logging(ie no more 'error 15038 on command 42').

We're using SPBS here at VPAC on our IBM cluster and it's a lot better than 
the last OpenPBS release (2.3.16, from 2001). They forked off from 2.3.12 
rather than the last OpenPBS because it had a more open license.

The folks behind the project have worked very quickly with us to fix bugs 
we've been finding in it, typically when I found a bug they had fixed it 
within a day or so, usually overnight from my perspective in Oz. :-)

If you are considering using it I'd suggest using the current snapshot release 
from:

	http://www.supercluster.org/downloads/spbs/temp/

as that irons out a couple of bugs that might bite.

For the less adventurous there is a new release SOpenPBS-2.3.12p4 due out in 
the near future that will include the fixes from the current snapshot.

cheers,
Chris
- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/PDzGO2KABBYQAh8RAnKwAJ9OeSE508v7elkeDHL2qDehjH9LvwCfUrmu
J4wal1ph00ExP8w/5HgVCek=
=Nyjb
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From josip at lanl.gov  Thu Aug 14 13:28:21 2003
From: josip at lanl.gov (Josip Loncaric)
Date: Thu, 14 Aug 2003 11:28:21 -0600
Subject: Two AMD Opteron clusters for LANL
Message-ID: <3F3BC6B5.5040706@lanl.gov>

This October, LANL will be getting large AMD Opteron model 244 clusters 
("Lightning" consisting of 1408 dual-CPU machines and "Orange" 
consisting of 256 dual-CPU machines, both built by Linux Networx):

http://www.itworld.com/Comp/1437/030814supercomp/
http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~73744,00.html

Sincerely,
Josip


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kapurs at seas.upenn.edu  Fri Aug 15 11:41:39 2003
From: kapurs at seas.upenn.edu (kapurs at seas.upenn.edu)
Date: Fri, 15 Aug 2003 11:41:39 -0400
Subject: Hard Drive Upgrade(Internal or External)
Message-ID: <1060962099.3f3cff33d8b6c@webmail.seas.upenn.edu>

Hi-


Does any one know if we can add an external or internal hard drive (EIDE, 
200GB) to the Dell  Precision 530 Workstation.


It's running on red hat linux 7.1, has a USB 1.1 port, two 36-GB SCSI hard 
drives. The primary EIDE controler on system board is empty.


thanks-
-sumeet-
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Matthew_Wygant at dell.com  Fri Aug 15 17:54:05 2003
From: Matthew_Wygant at dell.com (Matthew_Wygant at dell.com)
Date: Fri, 15 Aug 2003 16:54:05 -0500
Subject: Hard Drive Upgrade(Internal or External)
Message-ID: <6CB36426C6B9D541A8B1D2022FEA7FC10273F510@ausx2kmpc108.aus.amer.dell.com>

The 530 appears to include both a SCSI U160 and 2 ATA100 IDE channels.  The
ATA100 defaults to 'auto' in the BIOS, so I would imagine the node should
pick it up.  

-matt

-----Original Message-----
From: kapurs at seas.upenn.edu [mailto:kapurs at seas.upenn.edu] 
Sent: Friday, August 15, 2003 10:42 AM
To: beowulf at beowulf.org
Subject: Hard Drive Upgrade(Internal or External)


Hi-


Does any one know if we can add an external or internal hard drive (EIDE, 
200GB) to the Dell  Precision 530 Workstation.


It's running on red hat linux 7.1, has a USB 1.1 port, two 36-GB SCSI hard 
drives. The primary EIDE controler on system board is empty.


thanks-
-sumeet-
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From zender at uci.edu  Fri Aug 15 18:14:58 2003
From: zender at uci.edu (Charlie Zender)
Date: Fri, 15 Aug 2003 15:14:58 -0700
Subject: Bid deadline extended for UC Irvine climate computer
Message-ID: <E19nmqY-0002xP-00@ashes.ess.uci.edu>

Hi Donald,

Response from members on the beowulf list has been so positive that we
are extending our bid deadline in order to give your list members who
want to bid a fair chance to prepare competitive bids. Would you
please allow posting of this notice of extension so that those vendors
who thought they may not have enough time to submit bids become aware
of the two week extension? I promise not to bother you again :)

One thought: We are not the only Institution buying medium size
"super-computers" that Beowulf vendors might like to know about.
It might be a good idea for the whole Beowulf community to create
a separate list for RFPs. Such a list would help buyers and Beowulf
vendors find eachother.

Thanks!
Charlie 
-- 
Charlie Zender, zender at uci dot edu, (949) 824-2987, Department of
Earth System Science, University of California, Irvine CA 92697-3100
--------------------------------------------------------------------

Dear HPC Vendors,

We are extending by two weeks the deadline for submission of bids in
response to the $900,000 Earth System Modeling Facility RFP:

http://www.ess.uci.edu/esmf

The new bid deadline is Friday, September 5. 
All other deadlines and the expected timeline are also shifted by two
weeks, and these changes are reflected on the recently updated web
page and conference summary. Consequently, the deadline to send
bid-related questions to Ralph Kupcha <rckupcha at uci.edu> is Friday,
August 29. 

We hope that this extension provides some additional breathing room to
improve any parts of your bid that you might have rushed to finish. 
At the same time, we are now ready to accept any completed proposals
and look forward to reading your ideas on how best to meet our coupled
climate modeling needs.

Sincerely,
Ralph Kupcha


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sat Aug 16 00:03:57 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sat, 16 Aug 2003 12:03:57 +0800 (CST)
Subject: Scalable PBS
In-Reply-To: <200308151152.09499.csamuel@vpac.org>
Message-ID: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>

How big is your cluster?

Did you use Gridengine before -- how does SPBS compare
to SGE?

Andrew.

 --- Chris Samuel <csamuel at vpac.org> ????>
-----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi all,
> 
> Just joined the list, so apologies if this is
> already well known.
> 
> I noticed a recent message in the archive about
> OpenPBS and problems with 
> scalability, and I think it's worth noting that
> there is an alternative (and 
> actively developed) fork of OpenPBS called "Scalable
> PBS" available from:
> 
> 	http://www.supercluster.org/projects/pbs/
> 
> Amongst other features it has (quoting the website):
> 
> 	Better Scalability
> 	   - Significantly improved server to MOM
> communication model, the ability to
> 	     handle larger clusters, larger jobs, larger
> messages, etc.
> 	   - Scales up to 2K nodes vs ~300 nodes for
> standard OpenPBS.
>  
> 	Improved Usability by incorporating more extensive
> logging, as well as, more
> 	human readable logging(ie no more 'error 15038 on
> command 42').
> 
> We're using SPBS here at VPAC on our IBM cluster and
> it's a lot better than 
> the last OpenPBS release (2.3.16, from 2001). They
> forked off from 2.3.12 
> rather than the last OpenPBS because it had a more
> open license.
> 
> The folks behind the project have worked very
> quickly with us to fix bugs 
> we've been finding in it, typically when I found a
> bug they had fixed it 
> within a day or so, usually overnight from my
> perspective in Oz. :-)
> 
> If you are considering using it I'd suggest using
> the current snapshot release 
> from:
> 
> 	http://www.supercluster.org/downloads/spbs/temp/
> 
> as that irons out a couple of bugs that might bite.
> 
> For the less adventurous there is a new release
> SOpenPBS-2.3.12p4 due out in 
> the near future that will include the fixes from the
> current snapshot.
> 
> cheers,
> Chris
> - -- 
>  Chris Samuel -- VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing
>  Bldg 91, 110 Victoria Street, Carlton South,
>  VIC 3053, Australia - http://www.vpac.org/
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
> 
>
iD8DBQE/PDzGO2KABBYQAh8RAnKwAJ9OeSE508v7elkeDHL2qDehjH9LvwCfUrmu
> J4wal1ph00ExP8w/5HgVCek=
> =Nyjb
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sat Aug 16 00:51:40 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 16 Aug 2003 08:51:40 +0400
Subject: Beowulf digest, Vol 1 #1412 - 5 msgs
In-Reply-To: <200308151905.h7FJ5uw13967@NewBlue.Scyld.com>
Message-ID: <web-20094923@servihoo.com>

Hello Jason Connor,
It look as if you know something about mpich, well I am 
using MPICH2-0.93 and in this one their no directory for 
'machines.linux' instead we have mpd.hosts. 
But my problem is that I do not know now to configure this 
file despite of reading the online help.
Please help me
Thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sat Aug 16 00:51:40 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 16 Aug 2003 08:51:40 +0400
Subject: Beowulf digest, Vol 1 #1412 - 5 msgs
In-Reply-To: <200308151905.h7FJ5uw13967@NewBlue.Scyld.com>
Message-ID: <web-20094923@servihoo.com>

Hello Jason Connor,
It look as if you know something about mpich, well I am 
using MPICH2-0.93 and in this one their no directory for 
'machines.linux' instead we have mpd.hosts. 
But my problem is that I do not know now to configure this 
file despite of reading the online help.
Please help me
Thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From farschad at myrealbox.com  Sat Aug 16 08:47:55 2003
From: farschad at myrealbox.com (Farschad Torabi)
Date: Sat, 16 Aug 2003 17:37:55 +0450
Subject: Beowulf digest, Vol 1 #1412 - 5 msgs
Message-ID: <1061039275.c61518e0farschad@myrealbox.com>

Dear Jason Connor and Roudy,
I think that my question covers Roudy's questions too ;^)

First of all Roudy, the new version of MPICH is available on the net i.e. mpich-1.2.5; you can dl it.

As Jason Connor advised me, I ran the following command:

<mpich-1.2.5 installed directory>/bin/mpirun -np 1 -machinefile machs -arch machines.arc a.out

the contents of machs is like this
    node1
    node1
and the contents of machines.arc (architecture file):
    node1.parallel.net
    node1.parallel.net
    node1.parallel.net

(Roudy I think that you have to use your file like this! the name of the machines are written in this file; in your case let say -arch mpd.hosts)

the program runs well on -np 1 machine but, when I wanted to define two processes on a single machine (i.e -np 2)it messages me:

"Could not find enough architecture for machines LINUX"

the question is, can we define more that ONE processes on a SINGLE machine??

Thanks


-----Original Message-----
From: "RoUdY" <rouds at servihoo.com>
To: beowulf at scyld.com, beowulf at beowulf.org
Date: Sat, 16 Aug 2003 08:51:40 +0400 
Subject: Re: Beowulf digest, Vol 1 #1412 - 5 msgs

Hello Jason Connor,
It look as if you know something about mpich, well I am 
using MPICH2-0.93 and in this one their no directory for 
'machines.linux' instead we have mpd.hosts. 
But my problem is that I do not know now to configure this 
file despite of reading the online help.
Please help me
Thanks
Roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Sat Aug 16 10:29:17 2003
From: rodmur at maybe.org (Dale Harris)
Date: Sat, 16 Aug 2003 07:29:17 -0700
Subject: Scalable PBS
In-Reply-To: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>
References: <200308151152.09499.csamuel@vpac.org> <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>
Message-ID: <20030816142917.GA24928@maybe.org>

On Sat, Aug 16, 2003 at 12:03:57PM +0800, Andrew Wang elucidated:
> How big is your cluster?
> 
> Did you use Gridengine before -- how does SPBS compare
> to SGE?
> 
> Andrew.
> 


In a quick glance, it already wins points with me because it uses GNU
autoconf instead of aimk to build.


--
Dale Harris   
rodmur at maybe.org
/.-)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rodmur at maybe.org  Sat Aug 16 12:07:13 2003
From: rodmur at maybe.org (Dale Harris)
Date: Sat, 16 Aug 2003 09:07:13 -0700
Subject: Scalable PBS
In-Reply-To: <20030816142917.GA24928@maybe.org>
References: <200308151152.09499.csamuel@vpac.org> <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com> <20030816142917.GA24928@maybe.org>
Message-ID: <20030816160713.GB24928@maybe.org>

On Sat, Aug 16, 2003 at 07:29:17AM -0700, Dale Harris elucidated:
> 
> 
> In a quick glance, it already wins points with me because it uses GNU
> autoconf instead of aimk to build.
> 

However, the fact that it requires tcl/tk does not.  Whatever happen to
the concept of making a simple tool that just does it's job well.  I
don't see why I need a GUI for a job scheduler.  Let the emacs people
make some frontend for it.

Dale

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Sat Aug 16 23:12:21 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Sun, 17 Aug 2003 11:12:21 +0800 (CST)
Subject: Scalable PBS
In-Reply-To: <20030816160713.GB24928@maybe.org>
Message-ID: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com>

For SGE, I simply download the binary package, and
then do the full install. I don't have to build the
source so it doesn't matter if it use aimk or
autoconf.

I looked at SPBS a while ago, I think if you don't
need to build the GUI, then you don't need tcl/tk, and
you just need to use the command line for managing the
cluster.

Andrew.

 --- Dale Harris <rodmur at maybe.org> ????
> On Sat, Aug 16, 2003 at 07:29:17AM -0700, Dale
> However, the fact that it requires tcl/tk does not. 
> Whatever happen to
> the concept of making a simple tool that just does
> it's job well.  I
> don't see why I need a GUI for a job scheduler.  Let
> the emacs people
> make some frontend for it.
> 
> Dale
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From dmcollins79 at hotmail.com  Sun Aug 17 07:14:24 2003
From: dmcollins79 at hotmail.com (Timothy M Collins)
Date: Sun, 17 Aug 2003 12:14:24 +0100
Subject: Request for parallel applications to test on beowulf cluster.
Message-ID: <Law9-F7299ir3wSimD300061f49@hotmail.com>


Hi,
I have built a beowulf (Redhat8 with PVM&LAM)
Looking for parallel applications for different size and complexity to test 
fault tolerance.
If anybody has one or knows where I can find one/some, please let me know.
Kind regards
Collins

_________________________________________________________________
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Sun Aug 17 21:52:56 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 18 Aug 2003 11:52:56 +1000
Subject: Scalable PBS
In-Reply-To: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>
References: <20030816040357.83603.qmail@web16810.mail.tpe.yahoo.com>
Message-ID: <200308181152.57812.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 16 Aug 2003 02:03 pm, Andrew Wang wrote:

> How big is your cluster?

http://www.vpac.org/content/services_and_support/facility/linux_cluster.php

(If it looks a little sparse, that's because someone's in the process of 
updating it)

> Did you use Gridengine before -- how does SPBS compare
> to SGE?

Nope, it's always been running OpenPBS prior to migrating to SPBS.

- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QDF4O2KABBYQAh8RAncrAJoDWbSivr52PpPy/jyNkqdVFqLLCwCfVK8S
604i8kwR1wNA+7J5oWMPxBg=
=Znzi
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Sun Aug 17 21:55:24 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 18 Aug 2003 11:55:24 +1000
Subject: Scalable PBS
In-Reply-To: <20030816160713.GB24928@maybe.org>
References: <200308151152.09499.csamuel@vpac.org> <20030816142917.GA24928@maybe.org> <20030816160713.GB24928@maybe.org>
Message-ID: <200308181155.25278.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 17 Aug 2003 02:07 am, Dale Harris wrote:

> However, the fact that it requires tcl/tk does not.  Whatever happen to
> the concept of making a simple tool that just does it's job well.  I
> don't see why I need a GUI for a job scheduler.  Let the emacs people
> make some frontend for it.

1) I don't believe it requires tk/tcl

2) The tk/tcl isn't for a GUI, it's for one of the example schedulers.

3) That was inherited from OpenPBS

4) There is a GUI (plain old X) for monitoring PBS, xpbsmon, but I'd ignore it 
if I were you..

cheers,
Chris
- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QDIMO2KABBYQAh8RAnYiAJ9TBbBiGNRSJTP122dhqr8fXtQF9ACfatF7
XL5HFH/3hMPqm1K0FuCJlc8=
=+U9N
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Sun Aug 17 21:57:25 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 18 Aug 2003 11:57:25 +1000
Subject: Scalable PBS
In-Reply-To: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com>
References: <20030817031221.37764.qmail@web16812.mail.tpe.yahoo.com>
Message-ID: <200308181157.26287.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 17 Aug 2003 01:12 pm, Andrew Wang wrote:

> I looked at SPBS a while ago, I think if you don't
> need to build the GUI, then you don't need tcl/tk, and
> you just need to use the command line for managing the
> cluster.

The tk/tcl is for one of the example schedulers (there are 3, one written in 
C, one in tk/tcl and one in BASL).

Viz:

  --set-sched=TYPE        sets the scheduler type. If TYPE is
                          "c" the scheduler will be written in C
                          "tcl" the server will use a Tcl based scheduler
                          "basl" will use the rule based scheduler
                          "no" then their will be no scheduling done
                          (the "c" scheduler is the default)


- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QDKFO2KABBYQAh8RAtIZAJwN0D0dts5DyU3tSN4eLsucYn6DsQCgiB7q
wVSIraBXrPWoODE2LbglW14=
=4Etb
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rene.storm at emplics.com  Mon Aug 18 05:26:17 2003
From: rene.storm at emplics.com (Rene Storm)
Date: Mon, 18 Aug 2003 11:26:17 +0200
Subject: mulitcast copy or snowball copy 
Message-ID: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>

Hi Beowulfers,
 
Problem:
I want to distribute large files over a cluster.
To raise performance I decided to copy the file to the local HD of any node in the cluster.
 
Did someone find a multicast solution for that or maybe something with snowball principle?
 
Till now I've take a look at msync (multicast rsync).
Does someone have experiences with JETfs ?
 
My idea was to write some scripts which copy files via rsync with snowball,
but there are some heavy problems.
e.g. 
What happens if one node (in the middle) is down.
How does the next snowball generation know when to start copying (the last ones have finished copying)?
 
 
Any ideas ?
 
Thanks in advance
Ren?
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Mon Aug 18 09:12:52 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Mon, 18 Aug 2003 15:12:52 +0200 (CEST)
Subject: mulitcast copy or snowball copy 
In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
Message-ID: <Pine.LNX.4.33.0308181504150.3086-100000@maloney.ethz.ch>

On Mon, 18 Aug 2003, Rene Storm wrote:
> Problem:
> I want to distribute large files over a cluster.
> To raise performance I decided to copy the file to the local HD of
> any node in the cluster.

Quick solution:
Dolly [1]
;-)

Longer description: I once wrote a tool called "Dolly" to clone whole
hard-disk drives, partitions, or large files to many nodes in a
cluster. It does so by sending the files concurrently around the
cluster in a "TCP chain". In a switched network, this solution is
often faster then IP multicast becauce Dolly can use the proven TCP
congestion control and error correction, whereas high-speed reliable
multicast is something difficult.

> Till now I've take a look at msync (multicast rsync).

Another tool is "udpcast".

> What happens if one node (in the middle) is down.

Dolly, can't handle that (it's a working prototype), but Atsushi
Manabe extended Dolly into Dolly++, which supposedly can handle node
failures (see link in [1]).

We use Dolly regularly to clone our small 16-node cluster and the
local support group uses Dolly to clone the larger 128-node
cluster. Because that cluster has two Fast Ethernet networks, we can
clone whole disks with about 20 MByte/s to all nodes in the cluster.

If you want to clone files instead of partitions, just specify your
file name in the config file instead of the device file.

- Felix

[1] http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mike at etek.chalmers.se  Mon Aug 18 08:56:33 2003
From: mike at etek.chalmers.se (Mikael Fredriksson)
Date: Mon, 18 Aug 2003 14:56:33 +0200
Subject: mulitcast copy or snowball copy
References: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
Message-ID: <3F40CD01.89E6AB62@etek.chalmers.se>

Rene Storm wrote:
> 
> Hi Beowulfers,
> 
> Problem:
> I want to distribute large files over a cluster.

<cut>

> Any ideas ?


Jepp, there is a distribution system for large files mainly for the
Internet, but it can probbably be of use for you.  It's a fast way to
distribute a large file from one host to several others, at the same
time.
Check out: http://bitconjurer.org/BitTorrent/index.html


MF

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 10:51:27 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 10:51:27 -0400 (EDT)
Subject: mulitcast copy or snowball copy 
In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
Message-ID: <Pine.LNX.4.44.0308180931440.3226-100000@beotest.scyld.com>

On Mon, 18 Aug 2003, Rene Storm wrote:

> I want to distribute large files over a cluster.

How large?  Some people think that 1MB is large, while others consider
large files to be 2GB+ (e.g. "Large File Summit").  This will have a
significant impact on how you copy the file.

> To raise performance I decided to copy the file to the local HD of any
> node in the cluster. 
>  
> Did someone find a multicast solution for that or maybe something with
> snowball principle? 

There are several multicast file distribution protocols, but they all
share the same potential flaw: they use multicast.  That means that they
will work great in a few specific installations, generally small
clusters on a single Ethernet switch.

But as you grow, multicast becomes more of a problem.

Here is a strong indicator for using multicast
   A shared media or repeater-based network (e.g. traditional Ethernet)

Here are a few of the contra-indications for using multicast
   Larger clusters
   Non-Ethernet networks 
   "Smart" Ethernet switches which try to filter packets
   Random communication traffic while copying
   Heavy non-multicast traffic while copying
   Multiple multicast streams
   NICs with mediocre, broken or slow to configure multicast filters
   Drivers not tuned for rapid multicast filter changes

Or, in summary, "using the cluster for something besides a multicast demo.

Here is an example: The Intel EEPro100 design configures the multicast
filter with a special command appended to the transmit command queue.
The command is followed by a list of the multicast addresses to accept.
While the command is usually queued to avoid delaying the OS, the chip
makes an effort to keep the Rx side synchronous by turning off the
receiver while it's computing the new multicast filter.  So the longer
the multicast filter list and the more frequently it is changed, the
more packets dropped.  And what's the biggest performance killer with
multicast?  Dropped packets..

> My idea was to write some scripts which copy files via rsync with snowball,

If you are doing this for yourself, the solution is easy.
Try the different approaches and stop when you find one that works for you.
If you are building a system for use by others (as we do), then the
problem becomes more challenging.  

> but there are some heavy problems.
> e.g. 
> What happens if one node (in the middle) is down.

Good: first consider the semantics of failure.  That means both recovery
and reporting the failure.

My first suggesting is that *not* implement a program that copies a file
to every available node.  Instead use a system where you first get a
list of available ("up") nodes, and then copy the files to that node
list.  When the copy completes continue to use that node list rather
then letting jobs use newly-generated "up" lists.

A geometrically cascading copy can work very well.  It very effectively
uses current networks (switched Ethernet, Myrinet, SCI, Quadrics,
Infiniband), and can make use of the sendfile() system call. 

For a system such as Scyld, use a zero-base geometric cascade: move the
work off of the master as the first step.  The master generates the work
list and immediately shifts the process creation work off to the first
compute node.  The master then only monitors for completion.

You can implement low-overhead fault checking by counting down job
issues and job completion.  As the first machine falls idle, check that
the final machine to assign work is still running.  As the next-to-last
job completes, check that the one machine still working is up.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From erik at aarg.net  Mon Aug 18 10:14:25 2003
From: erik at aarg.net (Erik Arneson)
Date: Mon, 18 Aug 2003 07:14:25 -0700
Subject: mulitcast copy or snowball copy
In-Reply-To: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
References: <29B376A04977B944A3D87D22C495FB2305E24A@vertrieb.emplics.com>
Message-ID: <20030818141424.GA16386@aarg.net>

On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote:
> Hi Beowulfers,
>  
> Problem:
> I want to distribute large files over a cluster.
> To raise performance I decided to copy the file to the local HD of any node in the cluster.
>  
> Did someone find a multicast solution for that or maybe something with snowball principle?

I am really new to the Beowulf thing, so I am not sure if this solution is a
good one or not.  But have you taken a look at the various network
filesystems?  OpenAFS has a configurable client-side cache, and if the files
are needed only for reading this ends up being a very quick and easy way to
distribute changes throughout a number of nodes.

(However, I have noticed that network filesystems are not often mentioned in
conjunction with Beowulf clusters, and I would really love to learn why.
Performance?  Latency?  Complexity?)

-- 
;; Erik Arneson <erik at aarg.net>      SD, Ashland Lodge No. 23 ;;
;; GPG Key ID: 2048R/8B4CBC9C   CoTH, Siskiyou Chapter No. 21 ;;
;; <http://erik.arneson.org/>     <http://www.aarg.net/mason> ;;

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 481 bytes
Desc: not available
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20030818/1a5556dd/attachment-0001.sig>

From farschad at myrealbox.com  Mon Aug 18 12:17:14 2003
From: farschad at myrealbox.com (Farschad Torabi)
Date: Mon, 18 Aug 2003 21:07:14 +0450
Subject: MPICH again
Message-ID: <1061224634.8d1769c0farschad@myrealbox.com>

Hi All,
I still have some problems running MPICH on my machine :^(

I've installed MPICH and PGF90 on my PC and I am able to compile parallel codes using MPI with mpif90 command. But the problem arise when I want to run the executable file on a Bowulf cluster.

As Jason Connor told me, I use the following command

<MPICH_directory>/bin/mpirun -machinefile machs -np 2 a.out

But it prompts me that there are not enough architecture on LINUX.

In this case it is like when I run the executable file (i.e.  a.out) manually without using mpirun.

what do you think about this??
Thank you in advance

Farschad Torabi

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rene.storm at emplics.com  Mon Aug 18 11:34:16 2003
From: rene.storm at emplics.com (Rene Storm)
Date: Mon, 18 Aug 2003 17:34:16 +0200
Subject: AW: mulitcast copy or snowball copy 
Message-ID: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com>

Hi Donald,


> I want to distribute large files over a cluster.
How large?  Some people think that 1MB is large, while others consider
large files to be 2GB+ (e.g. "Large File Summit").  This will have a
significant impact on how you copy the file.

Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB
each. (Overall 30 GB)
	And the cluster is 128++ nodes.


Here is an example: The Intel EEPro100 design configures the multicast
filter with a special command appended to the transmit command queue.
The command is followed by a list of the multicast addresses to accept.
While the command is usually queued to avoid delaying the OS, the chip
makes an effort to keep the Rx side synchronous by turning off the
receiver while it's computing the new multicast filter.  So the longer
the multicast filter list and the more frequently it is changed, the
more packets dropped.  And what's the biggest performance killer with
multicast?  Dropped packets..


Rene: Thats right, but what if I ignore dropped packets and accept the
corrupt files ?
	I would be able to rsync them later on.
	First Multicast to create files, Second step is to compare with
rsync.
	I've tried this and it isn't really slow, if you're doing the
rsync via snowball.

If you are doing this for yourself, the solution is easy.
Try the different approaches and stop when you find one that works for
you. If you are building a system for use by others (as we do), then the
problem becomes more challenging.  

Rene: That's the problem with all the things you do, first they are for
your own and then everybody wants them ;o)

> but there are some heavy problems.
> e.g.
> What happens if one node (in the middle) is down.

Good: first consider the semantics of failure.  That means both recovery
and reporting the failure.

My first suggesting is that *not* implement a program that copies a file
to every available node.  Instead use a system where you first get a
list of available ("up") nodes, and then copy the files to that node
list.  When the copy completes continue to use that node list rather
then letting jobs use newly-generated "up" lists.

Rene: Good idea


You can implement low-overhead fault checking by counting down job
issues and job completion.  As the first machine falls idle, check that
the final machine to assign work is still running.  As the next-to-last
job completes, check that the one machine still working is up.

Rene: But how do I get this status back to my "master", e.g command from
master: node16 copy to node17? 
	I don't want do de-centralize my job, like fire and forget.


Cya,

Rene

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 12:50:57 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 12:50:57 -0400 (EDT)
Subject: AW: mulitcast copy or snowball copy 
In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com>
Message-ID: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>

On Mon, 18 Aug 2003, Rene Storm wrote:

> Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB
> each. (Overall 30 GB)
> 	And the cluster is 128++ nodes.

Those are important parameters.
What network type are you using?
  If Ethernet, what switches and topology?
     (My guess is that you are using "smart" switches, likely connected
      with a chassis backplane.)

> > Here is an example: ...the longer
> the multicast filter list ... the more packets dropped.

> Rene: Thats right, but what if I ignore dropped packets and accept the
> corrupt files ?  I would be able to rsync them later on.

This is costly.  "Open loop" multicast protocols work by having the
receiver track the missing blocks, and requesting (or interpolating)
them later.  Here you are discarding that information and doing much
extra work on both the sending and receiving side by later locating the
missing blocks.

An alternative is closed-loop multicast, with positive acknowledgment
before proceeding more than one window.

> First Multicast to create files, Second step is to compare with rsync.
> 	I've tried this and it isn't really slow, if you're doing the
> rsync via snowball.

This is verifying/filling with a neighbor instead of the original
sender.  Except here you don't know when you are both missing the same
blocks.

> If you are doing this for yourself, the solution is easy.
...
> Rene: That's the problem with all the things you do, first they are for
> your own and then everybody wants them ;o)

If your end goal is to publish papers, do the hack.
If your end goal is make works useful for other, you have to start with
a wider view.

>> [Do] *not* implement a program that copies a file
>> to every available node.  Instead use a system where you first get a
>> list of available ("up") nodes, and then copy the files to that node
>> list.  When the copy completes continue to use that node list rather
>> then letting jobs use newly-generated "up" lists.
> 
> Rene: Good idea

This approach applies to a wide range of cluster tasks.
A similar idea is that you don't care as much about which nodes are
currently up as you care about which nodes have remained up since you
last checked.

[[ Ideally you could ask "which nodes will be up when this program
completes", but there are all sorts of temporal and halting issues
there. ]]

>> You can implement low-overhead fault checking by counting down job
>> issues and job completion.  As the first machine falls idle, check that
>> the final machine to assign work is still running.  As the next-to-last
>> job completes, check that the one machine still working is up.
> 
> Rene: But how do I get this status back to my "master", e.g command from
> master: node16 copy to node17? 

We have a positive completion indication as part of the Job/Process
Management subsystem.

If you consider the problem, the final acknowledgment must flow from the
last worker to the process that is checking for job completion.  You
might as well put that process on the cluster master.  The natural
Unix-style implementation is having the controlling machine hold the
parent of the process tree implementing the work, even if the work is
divided elsewhere.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 13:31:17 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 13:31:17 -0400 (EDT)
Subject: mulitcast copy or snowball copy
In-Reply-To: <20030818141424.GA16386@aarg.net>
Message-ID: <Pine.LNX.4.44.0308181256290.3226-100000@beotest.scyld.com>

On Mon, 18 Aug 2003, Erik Arneson wrote:
> On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote:
> > Hi Beowulfers,
> >  
> > Problem:
> > I want to distribute large files over a cluster.
> > To raise performance I decided to copy the file to the local HD of any node in the cluster.
> >  
> > Did someone find a multicast solution for that or maybe something with snowball principle?
> 
> I am really new to the Beowulf thing, so I am not sure if this solution is a
> good one or not.  But have you taken a look at the various network
> filesystems?  OpenAFS has a configurable client-side cache, and if the files
> are needed only for reading this ends up being a very quick and easy way to
> distribute changes throughout a number of nodes.

This is a good example of why Grid/wide-area tools should not be
confused with local cluster approaches.  The time scale, performance and
complexity issues are much different.

AFS uses TCP/IP to transfer whole files from a server.  With multiple
servers the configuration is static or slow changing.

> (However, I have noticed that network filesystems are not often mentioned in
> conjunction with Beowulf clusters, and I would really love to learn why.
> Performance?  Latency?  Complexity?)

It's because file systems are critically important to many applications.

There is no universal cluster file system, and thus no single solution.
The best approach is not tie the cluster management, membership, or
process control to the file system in any way.  Instead the file system
should be selection based on the application's need for consistency,
performance and reliability.  For instance, NFS is great for small,
read-only input files.  But using NFS for large files, or when any files
will be written or updated, results in both performance and consistency
problems.

When working from a large read-only database, explicitly pre-staging
(copying) the database to the compute nodes is usually better than
relying on an underlying FS.  It's easier, more predictable and more
explicit than per-directory tuning FS cache parameters.

As as example of why predictability is very important, imagine what
happens to an adaptive algorithm when a cached parameter file expires,
or a daemon does a bunch of work.  That machine suddenly is slower, and
that part of the problem now looks "harder".  So the work is reshuffled,
only to be shuffled back during the next time step.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lange at informatik.Uni-Koeln.DE  Mon Aug 18 14:21:59 2003
From: lange at informatik.Uni-Koeln.DE (Thomas Lange)
Date: Mon, 18 Aug 2003 20:21:59 +0200
Subject: AW: mulitcast copy or snowball copy 
In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com>
References: <29B376A04977B944A3D87D22C495FB2301278C@vertrieb.emplics.com>
Message-ID: <16193.6471.238244.224191@informatik.uni-koeln.de>

Hi,

I would try rgang, a nice tools which uses a tree structure for
copying files or executing commands on a large list of nodes. It's
written in python but there's also a compiled binary. It's very
flexible and fast. Search for rgang in google to find the download page.


To allow scaling to kiloclusters, the new rgang can utilize a
tree-structure, via an "nway" switch.  When so invoked, rgang uses
rsh/ssh to spawn copies of itself on multiple nodes.  These copies in
turn spawn additional copies.


Product Name:           rgang
Product Version:        2.5  ("rgang" cvs rev. 1.103)
Date (mm/dd/yyyy):      06/23/2003

                                   ORIGIN
                                   ======
Author

    Ron Rechenmacher

    Fermi National Accelerator Laboratory - Mail Station 234
    P.O Box 500
    Batavia, IL 60510
    Internet: rgang-support at fnal.gov


-- 
regards Thomas
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mitchskin at comcast.net  Mon Aug 18 13:40:59 2003
From: mitchskin at comcast.net (Mitchell Skinner)
Date: 18 Aug 2003 10:40:59 -0700
Subject: AW: mulitcast copy or snowball copy
In-Reply-To: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>
References: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>
Message-ID: <1061228458.5291.32.camel@zeitgeist>

On Mon, 2003-08-18 at 09:50, Donald Becker wrote:
> This is costly.  "Open loop" multicast protocols work by having the
> receiver track the missing blocks, and requesting (or interpolating)
> them later.  Here you are discarding that information and doing much
> extra work on both the sending and receiving side by later locating the
> missing blocks.

Some possible google terms include: reliable multicast, forward error
correction

There's an ietf working group on reliable multicast that wasn't making a
whole lot of progress the last time I checked.  At that time, I recall
there being some acknowledgment-based implementations as well as one
forward error correction-based implementation using reed-solomon codes,
from an academic in Italy whose name I forgot.

It's been a little while, but when I looked at the code for that
FEC-based reliable multicast program (rmdp?) I think it could only
handle pretty small files.  My understanding is that FEC-based
approaches should scale better in terms of the number of receiving
nodes, but the algorithms can be very time/space intensive.  There's a
patented algorithm from Digital Fountain that's supposed to be pretty
efficient (google tornado codes, michael luby, digital fountain) but I'm
not aware that they have a cluster-oriented product.  My impression of
them was that they were pretty WAN-oriented.

If I was less lazy I'd give some links instead of google terms, but
hopefully that's some food for thought.

Mitch

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 16:00:04 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 16:00:04 -0400 (EDT)
Subject: AW: mulitcast copy or snowball copy
In-Reply-To: <1061228458.5291.32.camel@zeitgeist>
Message-ID: <Pine.LNX.4.44.0308181518250.3226-100000@beotest.scyld.com>

On 18 Aug 2003, Mitchell Skinner wrote:
> On Mon, 2003-08-18 at 09:50, Donald Becker wrote:

> > This is costly.  "Open loop" multicast protocols work by having the
> > receiver track the missing blocks, and requesting (or interpolating)
> > them later.  Here you are discarding that information and doing much
> > extra work on both the sending and receiving side by later locating the
> > missing blocks.
..
> There's an ietf working group on reliable multicast that wasn't making a
> whole lot of progress the last time I checked.

It's a hard problem, and when they agree on a protocol it likely won't
apply to clusters.

The packet loss characteristic and cost trade-off is much different on a
WAN than with a local Ethernet switch on a cluster.  On a WAN every
packet is costly to transport, so it's worth having both end stations
doing extensive computations.

On a cluster we might talk about doing more computation to avoid
communication, but that's only for a few applications.  In reality we
prefer to do minimal work.  Thus we prefer OS-bypass for application
communication, and kernel-only for file system I/O.  Notice the
attention given to zero copy, TCP offload, TOE/TSO and sendfile().

Multicast and packet FEC add exactly what people are trying to avoid,
extra copying, complexity and work.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rene.storm at emplics.com  Mon Aug 18 17:27:56 2003
From: rene.storm at emplics.com (Rene Storm)
Date: Mon, 18 Aug 2003 23:27:56 +0200
Subject: AW: AW: mulitcast copy or snowball copy
Message-ID: <29B376A04977B944A3D87D22C495FB2301278E@vertrieb.emplics.com>

Ok,

A geometrically cascading structure gives me some more disadvantages.
If you are using an additional high performance network, eg myrinet or infiniband you won't have problems with the switch bandwidth.

If you are using low cost Ethernet/Gigabit network topology with 2 or more hups between the nodes (like FFN), the last "generation" of the snowball could be a heavy bottleneck.

It seems, there are too many variables for too many kinds of clusters.

A big cluster farm often got a "idle" network, but only one, while a MPI cluster could have a network for the message passing and one for commands and copying. You could use this service-network to copy our files without using full bandwidth of this network. 
But this would cost something cluster users don't have: time.


Rene 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Mon Aug 18 17:37:27 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Mon, 18 Aug 2003 14:37:27 -0700
Subject: big memory opteron
In-Reply-To: <1061228458.5291.32.camel@zeitgeist>
References: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com> <1061228458.5291.32.camel@zeitgeist>
Message-ID: <20030818213727.GB2131@greglaptop.internal.keyresearch.com>

I'm attempting to put together a big memory 2-cpu Opteron box, without
success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of
memory. Now that's a pretty strange number, since if I was out of chip
selects, it should see exactly 4 GBytes.

Any clues?

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From farooqkamal_76 at yahoo.com  Mon Aug 18 18:38:21 2003
From: farooqkamal_76 at yahoo.com (Farooq Kamal)
Date: Mon, 18 Aug 2003 15:38:21 -0700 (PDT)
Subject: Newbie
Message-ID: <20030818223821.11770.qmail@web21209.mail.yahoo.com>

Hi Everyone,

Its my first email to this group. What I was looking
for is that "is beowulf transparent to applications
running".

What I mean by that is suppose I run a apache server
on the master node; will the cluster manage the load
balancing and process migration itself? or every
application that is intented to run on beowulf must be
written from scracth to do so.

And at last if beowulf can't do, is there anyother
implematation of clusters that has these above said
qualities

Regards
Farooq Kamal
SZABIST - Karachi
Pakistan


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Mon Aug 18 22:41:49 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 19 Aug 2003 12:41:49 +1000
Subject: Newbie
In-Reply-To: <20030818223821.11770.qmail@web21209.mail.yahoo.com>
References: <20030818223821.11770.qmail@web21209.mail.yahoo.com>
Message-ID: <200308191241.50419.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 19 Aug 2003 08:38 am, Farooq Kamal wrote:

> And at last if beowulf can't do, is there anyother
> implematation of clusters that has these above said
> qualities

I think what you're looking for is OpenMOSIX.

	http://www.openmosix.org/

There's an introduction to it at the Intel website at:

http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic+Editorial%3a%3axeon_openmosix&cntType=IDS_EDITORIAL&catCode=BMB

Excuse the large URL!

good luck!
Chris
- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QY5tO2KABBYQAh8RAqKrAJ9SY5wfCvvL35hLPubrEa8/xFuYsgCdFHYi
4wDadQBbfYpz06hX3YRkwRI=
=QIb3
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Mon Aug 18 22:43:57 2003
From: becker at scyld.com (Donald Becker)
Date: Mon, 18 Aug 2003 22:43:57 -0400 (EDT)
Subject: AW: AW: mulitcast copy or snowball copy
In-Reply-To: <29B376A04977B944A3D87D22C495FB2301278E@vertrieb.emplics.com>
Message-ID: <Pine.LNX.4.44.0308182219100.3226-100000@beotest.scyld.com>

On Mon, 18 Aug 2003, Rene Storm wrote:

> A geometrically cascading structure gives me some more disadvantages.
> If you are using an additional high performance network, eg myrinet or
> infiniband you won't have problems with the switch bandwidth. 
>
> If you are using low cost Ethernet/Gigabit network topology with 2 or
> more hups between the nodes (like FFN), the last "generation" of the
> snowball could be a heavy bottleneck. 

No one uses Ethernet repeaters on a cluster.  32 port Fast Ethernet
switches are under $5/port.  Even for Gigabit Ethernet, 8 port switches
can be found for $20/port.

An unusual topology might be better utilized by mapping the copy
topology to the physical, but that's not the usual case.  The typical
case is an essentially flat topology, or one close enough that treating
it as flat avoids complexity.


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Mon Aug 18 23:21:16 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Tue, 19 Aug 2003 11:21:16 +0800 (CST)
Subject: Scalable PBS
In-Reply-To: <200308181152.57812.csamuel@vpac.org>
Message-ID: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com>

--- Chris Samuel <csamuel at vpac.org> ????
>http://www.vpac.org/content/services_and_support/facility/linux_cluster.php

Interesting ;->

May be you can take a look at the PBS addons like
mpiexec, maui scheduler.

> Nope, it's always been running OpenPBS prior to
> migrating to SPBS.

SGE is sponsored by Sun, and is opensource, I am
currently using it.

http://gridengine.sunsource.net/

Andrew.


-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Mon Aug 18 23:47:40 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 19 Aug 2003 13:47:40 +1000
Subject: Scalable PBS
In-Reply-To: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com>
References: <20030819032116.40876.qmail@web16806.mail.tpe.yahoo.com>
Message-ID: <200308191347.42057.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 19 Aug 2003 01:21 pm, Andrew Wang wrote:

> May be you can take a look at the PBS addons like
> mpiexec, maui scheduler.

Already there. :-)

We've got some users using mpiexec (though it does mean that you can nolonger 
restart a mom and have an MPI job keep going like you could with MPICH's 
mpirun) and we swapped to the MAUI scheduler yesterday (not without 
problems).

> > Nope, it's always been running OpenPBS prior to
> > migrating to SPBS.
>
> SGE is sponsored by Sun, and is opensource, I am
> currently using it.
>
> http://gridengine.sunsource.net/

What's your impression of it ?

Does it integrate with commercial molecular modelling packages like MSI ?

cheers,
Chris
- -- 
 Chris Samuel -- VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing
 Bldg 91, 110 Victoria Street, Carlton South,
 VIC 3053, Australia - http://www.vpac.org/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QZ3cO2KABBYQAh8RAnBbAJ9MbVoDWNp0pjp6CHANpDZe9K2i0QCfSbE9
jlJDiWkEkM2a1uY+qCETprU=
=9w8a
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bradshaw at mcs.anl.gov  Tue Aug 19 00:42:52 2003
From: bradshaw at mcs.anl.gov (Rick Bradshaw)
Date: Mon, 18 Aug 2003 23:42:52 -0500
Subject: big memory opteron
In-Reply-To: <20030818213727.GB2131@greglaptop.internal.keyresearch.com> (Greg
 Lindahl's message of "Mon, 18 Aug 2003 14:37:27 -0700")
References: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>
	<1061228458.5291.32.camel@zeitgeist>
	<20030818213727.GB2131@greglaptop.internal.keyresearch.com>
Message-ID: <87k79agsvn.fsf@skywalker-lin.mcs.anl.gov>

Greg,
        This seems to be a huge bug that has been in the Bios for over
a year now. I have only seen this on the AGP motherboards
though. Unfortunetly they still perform much better than the none AGP
boards that do recognise all the memory.

Rick
Greg Lindahl <lindahl at keyresearch.com> writes:

> I'm attempting to put together a big memory 2-cpu Opteron box, without
> success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of
> memory. Now that's a pretty strange number, since if I was out of chip
> selects, it should see exactly 4 GBytes.
>
> Any clues?
>
> -- greg
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Tue Aug 19 08:42:21 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Tue, 19 Aug 2003 14:42:21 +0200 (CEST)
Subject: mulitcast copy or snowball copy 
In-Reply-To: <Pine.LNX.4.44.0308181211590.3226-100000@beotest.scyld.com>
Message-ID: <Pine.LNX.4.33.0308191439530.31520-100000@maloney.ethz.ch>

On Mon, 18 Aug 2003, Donald Becker wrote:
> On Mon, 18 Aug 2003, Rene Storm wrote:
[...]
> > Rene: That's the problem with all the things you do, first they are for
> > your own and then everybody wants them ;o)
>
> If your end goal is to publish papers, do the hack.

If you want to write a paper you might also want to consider reading
the following papers as related work:

@article{
CCPE2002,
  author = "Felix Rauch and Christian Kurmann and Thomas M. Stricker",
  title = "{Optimizing the Distribution of Large Data Sets in Theory and Practice}",
  journal = "Concurrency and Computation: Practice and Experience",
  year = 2002,
  volume = 14,
  number = 3,
  pages = "165--181",
  month = apr
}

% frisbee-usenix03.pdf
% Cloning tool, with multicast data distribution, compression techniques etc.
@inproceedings{
Frisbee-Usenix2003,
  author = "Mike Hibler and Leigh Stoller and Jay Lepreau and Robert Ricci and Chad Barb",
  title = "{Fast, Scalable Disk Imaging with Frisbee}",
  booktitle = "Proceedings of the USENIX Annual Technical Conference 2003",
  year = 2003,
  month = jun,
  organization = "The USENIX Association"
}

Regards,
Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lehi.gracia at amd.com  Tue Aug 19 10:15:33 2003
From: lehi.gracia at amd.com (lehi.gracia at amd.com)
Date: Tue, 19 Aug 2003 09:15:33 -0500
Subject: big memory opteron
Message-ID: <99F2150714F93F448942F9A9F112634C07BE62CD@txexmtae.amd.com>

Greg,

You might want to try upgrading to the lates BIOS, what type of board do you have?

-Lehi

-----Original Message-----
From: Greg Lindahl [mailto:lindahl at keyresearch.com] 
Sent: Monday, August 18, 2003 4:37 PM
To: beowulf at beowulf.org
Subject: big memory opteron


I'm attempting to put together a big memory 2-cpu Opteron box, without success. With 8 1GB dimms installed, the BIOS sees about 4.8 gbytes of memory. Now that's a pretty strange number, since if I was out of chip selects, it should see exactly 4 GBytes.

Any clues?

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gmpc at sanger.ac.uk  Tue Aug 19 12:53:59 2003
From: gmpc at sanger.ac.uk (Guy Coates)
Date: Tue, 19 Aug 2003 17:53:59 +0100 (BST)
Subject: AW: mulitcast copy or snowball copy
In-Reply-To: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com>
References: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com>
Message-ID: <Pine.OSF.4.44.0308191728060.3629692-100000@ecs2f.internal.sanger.ac.uk>


We've tried both multicast and snowball for data distribution on our
cluster. We have a 60Gig dataset which we have to distribute to 1000
nodes.

We started off using snowball copies. They work, but care is needed in
your choice of tools for the file-transfers.  rsync works, but can have
problems with large (> 2Gig) files if you use rsh as the transport
mechanism. (this is an rsh bug on some redhat versions rather than an
rsync bug).

rsync over ssh gets around that problem, but of course has the added
encryption overhead.

You should also avoid the incremental update mode of rsync (which is the
default). We've found that it will silently corrupt your files if you
rsync across different architectures (eg alpha-->ia32). It also has
problems with large files.


The only usable multicast code we've found that actually works is udpcast.

http://udpcast.linux.lu/

There are plenty of other multicast codes to choose from out on the web,
and most of them fall over horribly as soon as you cross more than one
switch or have more than 10-20 hosts.

We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used
it to sucessfully distribute our 60gig dataset over large numbers of nodes
simultaneously.

In practice, on gigabit, we find that disk write speed is the limiting
factor rather than the network. Lawrence Livermore use udpcast to install
OS images on the MCR cluster, and I believe they side-step the disk
performance issue by writing data to a ramdisk as an intermediate step.
Obviously this only makes sense if your dataset < size of memory.

Our current file distribution strategy is to use a combination of rsync
and updcast. We do a dummy rsync to find out what files need updating, tar
them up, pipe the tarball through udpcast and then untar the files and the
client.

The main performance killer we've found for udpcast is cheap switches.

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csmith at lnxi.com  Tue Aug 19 13:48:24 2003
From: csmith at lnxi.com (Curtis Smith)
Date: Tue, 19 Aug 2003 11:48:24 -0600
Subject: AW: mulitcast copy or snowball copy
References: <200308181902.h7IJ2Aw18118@NewBlue.Scyld.com> <Pine.OSF.4.44.0308191728060.3629692-100000@ecs2f.internal.sanger.ac.uk>
Message-ID: <072a01c3667a$16624c90$a423a8c0@blueberry>

You might want to look into the Clusterworx product from Linux Networx. It
has been used to boot and image clusters over 1100 nodes in size using
multicast, and supports image sizes over 4GB. Multiple images can be served
by a single server using ethernet. Each channel can use 100% of the network
bandwidth (12.5MB per second on Fast Ethernet) or can be throttled to a
specific rate. We typically use a transmission rate of 10MB per second on
Fast Ethernet (30 seconds for a 300MB image), allowing DHCP traffic to get
through. The multicast server can also be throttled to ensure that its
doesn't overdrive the switch or hub (if you are using cheap ones) which in
many cases can account for up to 95% of packet loss. If your switch is fast
and is IGMP enabled, you will generally experience little to no packet loss.
The technology is based on UDP and multicast and works with LinuxBios and
Etherboot, and was used to image the MCR cluster many times prior to its
deployment at LLNL. MCR could go from powered-off bare metal to running in
about 7 minutes (most of which was disk formatting).

Curtis Smith
Principal Software Engineer
Linux Networx (www.lnxi.com)

----- Original Message -----
From: "Guy Coates" <gmpc at sanger.ac.uk>
To: <beowulf at scyld.com>
Sent: Tuesday, August 19, 2003 10:53 AM
Subject: Re:AW: mulitcast copy or snowball copy


>
> We've tried both multicast and snowball for data distribution on our
> cluster. We have a 60Gig dataset which we have to distribute to 1000
> nodes.
>
> We started off using snowball copies. They work, but care is needed in
> your choice of tools for the file-transfers.  rsync works, but can have
> problems with large (> 2Gig) files if you use rsh as the transport
> mechanism. (this is an rsh bug on some redhat versions rather than an
> rsync bug).
>
> rsync over ssh gets around that problem, but of course has the added
> encryption overhead.
>
> You should also avoid the incremental update mode of rsync (which is the
> default). We've found that it will silently corrupt your files if you
> rsync across different architectures (eg alpha-->ia32). It also has
> problems with large files.
>
>
> The only usable multicast code we've found that actually works is udpcast.
>
> http://udpcast.linux.lu/
>
> There are plenty of other multicast codes to choose from out on the web,
> and most of them fall over horribly as soon as you cross more than one
> switch or have more than 10-20 hosts.
>
> We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used
> it to sucessfully distribute our 60gig dataset over large numbers of nodes
> simultaneously.
>
> In practice, on gigabit, we find that disk write speed is the limiting
> factor rather than the network. Lawrence Livermore use udpcast to install
> OS images on the MCR cluster, and I believe they side-step the disk
> performance issue by writing data to a ramdisk as an intermediate step.
> Obviously this only makes sense if your dataset < size of memory.
>
> Our current file distribution strategy is to use a combination of rsync
> and updcast. We do a dummy rsync to find out what files need updating, tar
> them up, pipe the tarball through udpcast and then untar the files and the
> client.
>
> The main performance killer we've found for udpcast is cheap switches.
>
> Cheers,
>
> Guy Coates
>
> --
> Guy Coates,  Informatics System Group
> The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
> Tel: +44 (0)1223 834244 ex 7199
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From john152 at libero.it  Wed Aug 20 07:26:43 2003
From: john152 at libero.it (john152 at libero.it)
Date: Wed, 20 Aug 2003 13:26:43 +0200
Subject: Detection performance?
Message-ID: <HJX14J$21D56BA0B57FCA847280F4D74B872403@libero.it>

Hi all,
does anyone know about the performance 
of Mii-diag using ioctl calls?

Using Mii-diag, what could be the average delay 
between the link-status change ( phisically ) 
and the detection of this event.

I'm using a 3Com 905 Tornado PC card; is there 
a different delay for each PC card in changing the 
status register? 
How long could this delay be in your experience?

I'd like to have a delay minor than 1 ms
between the time in which i phisically 
disconnect the cable and the time in which
I have the detection (in example with a printf on video, ...)

In your experience, is it reasonable? 
Normally do I have to wait  for a greater delay?

Thanks in advance for your kind answer and observations.

Giovanni di Giacomo


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Wed Aug 20 08:30:58 2003
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 20 Aug 2003 14:30:58 +0200 (CEST)
Subject: mulitcast copy or snowball copy
In-Reply-To: <Pine.OSF.4.44.0308191728060.3629692-100000@ecs2f.internal.sanger.ac.uk>
Message-ID: <Pine.LNX.4.33.0308201419350.28889-100000@maloney.ethz.ch>

On Tue, 19 Aug 2003, Guy Coates wrote:
> The only usable multicast code we've found that actually works is udpcast.
>
> http://udpcast.linux.lu/
>
> There are plenty of other multicast codes to choose from out on the web,
> and most of them fall over horribly as soon as you cross more than one
> switch or have more than 10-20 hosts.
>
> We get ~70-80% wirespeed on 100MBit and Gigabit ethernet, and we've used
> it to sucessfully distribute our 60gig dataset over large numbers of nodes
> simultaneously.

That's interesting, since I tried udpcast once (just a few tests) on
our Cabletron SmartSwitchRouter with Gigabit Ethernet without disk
accesses and I got about 350 Mbps, while Dolly ran with approx. 500 Mbps
on Machines with 1 GHz processors.

I even used Dolly once (already many years ago, with 400 MHz machines)
to clone two 24-node clusters at the same time, they were connected to
two different switches and had a router in between. The throughput for
the nodes was about 6.9 MByte/s over Fast Ethernet for every of the
nodes.

> The main performance killer we've found for udpcast is cheap switches.

True. I tried it once with a cheap and simple ATI 24-port Fast
Ethernet switch. Udpcast run with only about 1 MByte/s since the
switch decided to multicast everything with only 10 Mbps (one machine
that wasn't a member of the multicast group was connected with only
10 Mbps). Dolly on the other hand worked perfect with full wire speed
on that switch.

- Felix

-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: +41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   +41 1 632 1307

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Wed Aug 20 09:39:16 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Wed, 20 Aug 2003 15:39:16 +0200 (CEST)
Subject: Detection performance?
In-Reply-To: <HJX14J$21D56BA0B57FCA847280F4D74B872403@libero.it>
Message-ID: <Pine.LNX.4.44.0308201517440.22455-100000@kenzo.iwr.uni-heidelberg.de>

On Wed, 20 Aug 2003,  wrote:

> Using Mii-diag, what could be the average delay 
> between the link-status change ( phisically ) 
> and the detection of this event.

Depends on the card capabilities and the driver. Most drivers poll for 
change, some use an interrupt.

> I'm using a 3Com 905 Tornado PC card; is there 
> a different delay for each PC card in changing the 
> status register? 

I don't understand the question...

> I'd like to have a delay minor than 1 ms
> between the time in which i phisically 
> disconnect the cable and the time in which
> I have the detection (in example with a printf on video, ...)

The 3c59x driver polls every 60 seconds for media status when using 
autonegotiation (default). People from the HA and bonding projects have 
modified this to allow very fine polling of the media registers, however 
this has a big disadvantage: the CPU spends a lot of time waiting for 
completion of in/out operations - the finer the poll, the more CPU lost.
The time taken to talk to th MII does not depend on the CPU speed, but on 
th PCI speed, so the faster the CPU, the more instruction cycles are lost 
to I/O.

> In your experience, is it reasonable? 

No, because the network card should transfer data, not be a watchdog.
There is one other solution, but there is no code for it yet. At least 
the Tornado cards allow generating an interrupt whenever the media 
changes. This would alleviate the need to continually poll the media 
registers and would give an indication very soon after the event happened. 
This was on my to-do list for a long time, but it was never done and 
probably won't be done soon.

> Normally do I have to wait  for a greater delay?

If by "normally" you mean "the 359x driver distributed with the kernel" or 
"the 3c59x driver from Scyld", then yes.

> Thanks in advance for your kind answer and observations.

This isn't really beowulf related. Please use vortex at scyld.com for 
discussing the 3c59x driver.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jhearns at micromuse.com  Wed Aug 20 10:50:30 2003
From: jhearns at micromuse.com (John Hearns)
Date: Wed, 20 Aug 2003 15:50:30 +0100
Subject: Detection performance?
In-Reply-To: <HJX14J$21D56BA0B57FCA847280F4D74B872403@libero.it>
References: <HJX14J$21D56BA0B57FCA847280F4D74B872403@libero.it>
Message-ID: <3F438AB6.6090507@micromuse.com>

That's an interesting question.
Can you tell us what your application is, and why it needs fast response?

First thought I had would be to SNMP trap the port status on the switch, 
rather than
the card. But I must admit I have no idea of the latency there, but 
would I would expect it to be
much more than 1ms.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Wed Aug 20 12:33:49 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 20 Aug 2003 12:33:49 -0400
Subject: clubmask 0.5 released
Message-ID: <1061397229.16487.45.camel@roughneck>

Name        : Clubmask
Version     : 0.5                             
Release     : 1
Group       : Cluster Resource Management and Scheduling
Vendor      : Liniac Project, University of Pennsylvania
License     : GPL-2
URL         : http://clubmask.sourceforge.net

What is Clubmask
----------------

Clubmask is a resource manager designed to allow Bproc based clusters
enjoy the full scheduling power and configuration of the Maui HPC
Scheduler.

Clubmask uses a modified version of the Supermon resource monitoring
software to gather resource information from the cluster nodes. This
information is combined with job submission data and delivered to the
Maui scheduler. Maui issues job control commands back to Clubmask, which
then starts or stops the job scripts using the Bproc environment. 

Clubmask also provides builtin support for a supermon2ganglia translator
that allows a standard Ganlgia  web backend to contact supermon and get
XML data that will disply through the Ganglia web interface.

Clubmask is currently running on around 10 clusters, varying in size
from 8 to 128 nodes, and has been tested up to 5000 jobs.

Links
-------------
Bproc: http://bproc.sourceforge.net
Ganglia: http://ganglia.sourceforge.net
Maui Scheduler: http://www.supercluster.org/maui
Supermon: http://supermon.sourceforge.net

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andrewxwang at yahoo.com.tw  Fri Aug 22 00:39:04 2003
From: andrewxwang at yahoo.com.tw (=?big5?q?Andrew=20Wang?=)
Date: Fri, 22 Aug 2003 12:39:04 +0800 (CST)
Subject: SGE on AMD Opteron ?
In-Reply-To: <200308201609.UAA08558@nocserv.free.net>
Message-ID: <20030822043904.18171.qmail@web16811.mail.tpe.yahoo.com>

Using the 32-bit x86 glinux binary package, it works
on my machine. SGE gets the load information and the
system/hardware information correctly:

> qhost
HOSTNAME             ARCH       NPROC  LOAD   MEMTOT  
MEMUSE   SWAPTO   SWAPUS
-------------------------------------------------------------------------------
global               -              -     -        -  
     -        -        -
opteron1            glinux         2  0.00   997.0M   
47.8M     1.0G     4.3M

Andrew.


 --- Mikhail Kuzminsky <kus at free.net> ????
>    Sorry, is here somebody who
> works w/Sun GrideEngine on AMD Opteron platform ?
> I'm interesting in any information -
> about binary SGE distribution in 32-bit mode,
> or about compilation from the source for x86-64
> mode,
> under SuSE or RedHat distribution etc.
> 
> Yours
> Mikhail Kuzminsky
> Zelinsky Institute of Organic Chemistry
> Moscow
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

-----------------------------------------------------------------
??? Yahoo!??
??????? - ????????????
http://fate.yahoo.com.tw/
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jhearns at micromuse.com  Sat Aug 23 03:48:58 2003
From: jhearns at micromuse.com (John Hearns)
Date: 23 Aug 2003 08:48:58 +0100
Subject: SGE on AMD Opteron ?
In-Reply-To: <200308201609.UAA08558@nocserv.free.net>
References: <200308201609.UAA08558@nocserv.free.net>
Message-ID: <1061624938.1182.57.camel@harwood>

On Wed, 2003-08-20 at 17:09, Mikhail Kuzminsky wrote:
>    Sorry, is here somebody who
> works w/Sun GrideEngine on AMD Opteron platform ?
> I'm interesting in any information 


>From - 
Return-Path: <>
Received: from localhost by clarice with LMTP for
<jhearns at micromuse.com>;
	Sat, 23 Aug 2003 08:51:33 +0100
Received: from mta.micromuse.com (mta.micromuse.com [194.131.185.92]) by
	mailstore.micromuse.co.uk (Switch-2.2.6/Switch-2.2.4) with ESMTP id
	h7N7pXZ27346 for <jhearns at micromuse.com>; Sat, 23 Aug 2003 08:51:33
+0100
Received: from marstons.services.quay.plus.net
	(marstons.services.quay.plus.net [212.159.14.223]) by mta.micromuse.com
	(Switch-2.2.6/Switch-2.2.6) with SMTP id h7N7pWY27479 for
	<jhearns at micromuse.com>; Sat, 23 Aug 2003 08:51:32 +0100
Message-Id: <200308230751.h7N7pWY27479 at mta.micromuse.com>
Received: (qmail 19110 invoked for bounce); 23 Aug 2003 07:51:26 -0000
Date: 23 Aug 2003 07:51:26 -0000
From: MAILER-DAEMON at marstons.services.quay.plus.net
To: jhearns at micromuse.com
Subject: failure notice
X-Perlmx-Spam: Gauge=XXXIIIIIIIII, Probability=39%,
	Report="FAILURE_NOTICE_1, MAILER_DAEMON, NO_MX_FOR_FROM, NO_REAL_NAME,
	SPAM_PHRASE_00_01"
X-Evolution-Source: imap://jhearns at mta.micromuse.com/
Mime-Version: 1.0

Hi. This is the qmail-send program at marstons.services.quay.plus.net.
I'm afraid I wasn't able to deliver your message to the following
addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

<bewoulf at bewoulf.org>:
Sorry, I couldn't find any host named bewoulf.org. (#5.1.2)

--- Below this line is a copy of the message.

Return-Path: <jhearns at micromuse.com>
Received: (qmail 19106 invoked by uid 10001); 23 Aug 2003 07:51:26 -0000
Received: from dockyard.plus.com (HELO .) (212.159.87.168)
  by marstons.services.quay.plus.net with SMTP; 23 Aug 2003 07:51:26
-0000
Subject: Re: SGE on AMD Opteron ?
From: John Hearns <jhearns at micromuse.com>
To: bewoulf at bewoulf.org
In-Reply-To: <200308201609.UAA08558 at nocserv.free.net>
References: <200308201609.UAA08558 at nocserv.free.net>
Content-Type: text/plain
Organization: Micromuse
Message-Id: <1061624843.1183.52.camel at harwood>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) 
Date: 23 Aug 2003 08:47:23 +0100
Content-Transfer-Encoding: 7bit

On Wed, 2003-08-20 at 17:09, Mikhail Kuzminsky wrote:
>    Sorry, is here somebody who
> works w/Sun GrideEngine on AMD Opteron platform ?
> I'm interesting in any information -


I'm working with this.
More news when I get it.

Also, and I know that all I have to do is Google and do some
reading, but does andone on the list have experience with lm_sensors
on Opteron? Specifically HDAMA motherboards.

A quick Google just turned up a post by Mikhail in June
on this very subject...


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From saville at comcast.net  Sat Aug 23 17:38:49 2003
From: saville at comcast.net (Gregg Germain)
Date: Sat, 23 Aug 2003 17:38:49 -0400
Subject: Help! Endless RARP requests
Message-ID: <3F47DEE9.F00C5213@comcast.net>

Hi,

 I have installed the Scyle basic edition I got from Linux Central (RH
6.2).

 I've done the installation and I selected the range of IP addresses
they suggest for the slave nodes. ifconfig shows that eth1 is operating.

 I connect a slave node to the Master node by connecting the Slave's
eth0 card to the Master's eth1 card.

 I created a slave boot floppy, and boot the slave. It boots ok but
starts sending RARP requests that never get satisfied. It sits there
forever making more requests (well eventually it reboots itself and
tries again but then there's endless RARP requests).

 Can any one give me a hint?

 Do I have to go through a hub to connect that first slave to the
master? I know I'll have to have a hub for the second slave, but I
thought I could make a direct connection for the first one.

 Any help would be greatly appreciated.

thanks

Gregg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From adm35 at georgetown.edu  Sun Aug 24 14:36:02 2003
From: adm35 at georgetown.edu (adm35 at georgetown.edu)
Date: Sun, 24 Aug 2003 14:36:02 -0400
Subject: Help! Endless RARP requests
Message-ID: <1967012801.1280119670@georgetown.edu>

You'll either need a hub, switch or crossover cable.

Arnie Miles
Systems Administrator:  Advanced Research Computing
Adjunct Faculty:  Computer Science
202.687.9379
168 Reiss Science Building
http://www.georgetown.edu/users/adm35
http://www.guppi.arc.georgetown.edu

----- Original Message -----
From: Gregg Germain <saville at comcast.net>
Date: Saturday, August 23, 2003 5:38 pm
Subject: Help! Endless RARP requests

> Hi,
> 
> I have installed the Scyle basic edition I got from Linux Central (RH
> 6.2).
> 
> I've done the installation and I selected the range of IP addresses
> they suggest for the slave nodes. ifconfig shows that eth1 is 
> operating.
> I connect a slave node to the Master node by connecting the Slave's
> eth0 card to the Master's eth1 card.
> 
> I created a slave boot floppy, and boot the slave. It boots ok but
> starts sending RARP requests that never get satisfied. It sits there
> forever making more requests (well eventually it reboots itself and
> tries again but then there's endless RARP requests).
> 
> Can any one give me a hint?
> 
> Do I have to go through a hub to connect that first slave to the
> master? I know I'll have to have a hub for the second slave, but I
> thought I could make a direct connection for the first one.
> 
> Any help would be greatly appreciated.
> 
> thanks
> 
> Gregg
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joachim at ccrl-nece.de  Mon Aug 25 03:29:29 2003
From: joachim at ccrl-nece.de (Joachim Worringen)
Date: Mon, 25 Aug 2003 09:29:29 +0200
Subject: PCI-X/133 NICs on PCI-X/100
In-Reply-To: <200308221815.WAA27091@nocserv.free.net>
References: <200308221815.WAA27091@nocserv.free.net>
Message-ID: <200308250929.29082.joachim@ccrl-nece.de>

Mikhail Kuzminsky:
> Really I need to estimate: will Mellanox MTPB23108 IB PCI-X/133 cards
> work w/PCI-X/100 slots on Opteron-based mobos (most of
> them have PCI-X/100, exclusions that I know are Tyan S2885 and Apppro
> mobos) - i.e. how high is the probability that they are
> incompatible ?

Very low. But why don't you ask the vendor directly?

 Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wade.hampton at nsc1.net  Mon Aug 25 08:31:34 2003
From: wade.hampton at nsc1.net (Wade Hampton)
Date: Mon, 25 Aug 2003 08:31:34 -0400
Subject: help with e1000 upgrade
Message-ID: <3F4A01A6.4090608@nsc1.net>

G'day,

I am upgrading the larger of my clusters to 1G ethernet.  All nodes are
TYAN motherboards (including the head node), and have on-board
1G.  I've been using the default e1000 driver on my head node for
the past year.  It's version 4.1.7.  However, when I try to boot my
slave nodes, they appear to "hang" after initializing the NIC.

I tried upgrading to the newer 5.1.3 driver.  The head node
is up and working.  I made a boot floppy and tried booting,
but once again, it hung right after the line displaying the
e1000 and its IRQ.

In the slave node BIOS, I have turned off the eepro100
and turned on the e1000. 

Any help would be appreciated. 

Cheers,
--
Wade Hampton


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wade.hampton at nsc1.net  Mon Aug 25 11:33:26 2003
From: wade.hampton at nsc1.net (Wade Hampton)
Date: Mon, 25 Aug 2003 11:33:26 -0400
Subject: help with e1000 upgrade
In-Reply-To: <3F4A163E.B3301A38@accessgate.net>
References: <3F4A01A6.4090608@nsc1.net> <3F4A163E.B3301A38@accessgate.net>
Message-ID: <3F4A2C46.6040007@nsc1.net>

Doug Shubert wrote:

>Hello Wade,
>
>Wade Hampton wrote:
>
>  
>
>>G'day,
>>
>>I am upgrading the larger of my clusters to 1G ethernet.  All nodes are
>>    
>>
>
>Are the on-board NIC's Intel ?
>
Intel

>>I tried upgrading to the newer 5.1.3 driver.  The head node
>>is up and working.  I made a boot floppy and tried booting,
>>but once again, it hung right after the line displaying the
>>e1000 and its IRQ.
>>
>>    
>>
>
>Are you using Cat5e or Cat6 cabling?
>
>We have found that Cat6 works more reliably
>on auto sense 10/100/1000 NIC's and switches.
>
So far, CAT5E (3-6 foot cables).

>>In the slave node BIOS, I have turned off the eepro100
>>and turned on the e1000.
>>    
>>
>We are using the E1000 driver in kernel 2.4.21 and it works flawlessly.
>
I've been using it in the Scyld 2.4.17 kernel from my head node for 
nearly a year
without any issues.  The master has the same motherboard and chips, only
more disks, etc.

The issue seems to be with booting my slave nodes from the Scyld boot disc.

Thanks,
--
Wade Hampton

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Tue Aug 26 01:33:33 2003
From: rouds at servihoo.com (RoUdY)
Date: Tue, 26 Aug 2003 09:33:33 +0400
Subject: linpack
In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com>
Message-ID: <web-20577157@servihoo.com>

Hello,
Can someone please tell me a bit more about linpack and 
how to implement it so that i can measure its performance 
.
And also some recommended sites

Thnx
roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Tue Aug 26 01:33:33 2003
From: rouds at servihoo.com (RoUdY)
Date: Tue, 26 Aug 2003 09:33:33 +0400
Subject: linpack
In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com>
Message-ID: <web-20577157@servihoo.com>

Hello,
Can someone please tell me a bit more about linpack and 
how to implement it so that i can measure its performance 
.
And also some recommended sites

Thnx
roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Tue Aug 26 07:16:42 2003
From: rouds at servihoo.com (RoUdY)
Date: Tue, 26 Aug 2003 15:16:42 +0400
Subject: mpich2-0.93
In-Reply-To: <200308251901.h7PJ1ew21514@NewBlue.Scyld.com>
Message-ID: <web-20597250@servihoo.com>

hello 
Please help me, i really need help
Because i can run mpd on the localhost but not in a ring 
of PC's
When I type "mpiexec -np 3 ~/mpich2-0.93/examples/cpi "
I get the answer 
Permission to node1 denied
Permission to node 2  denied..................

Hope to hear from u very soon 
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From angel at wolf.com  Tue Aug 26 10:33:05 2003
From: angel at wolf.com (Angel Rivera)
Date: Tue, 26 Aug 2003 14:33:05 GMT
Subject: Change Management Control
Message-ID: <20030826143305.24318.qmail@houston.wolf.com>

I am looking for information/sites and a formal best practice change control 
for clusters.  Can someone point me in the right direction? thx -ar
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nordquist at geosci.uchicago.edu  Tue Aug 26 16:50:34 2003
From: nordquist at geosci.uchicago.edu (Russell Nordquist)
Date: Tue, 26 Aug 2003 15:50:34 -0500 (CDT)
Subject: mpich2-0.93
In-Reply-To: <web-20597250@servihoo.com>
Message-ID: <Pine.GSO.4.44.0308261539390.8787-100000@geosci.uchicago.edu>


It sounds like you haven't setup password-less communication between your
nodes.

Can you "rsh node1" or "ssh node1" and not be prompted? If not you need to
setup password-less rsh (usaully .rhosts) or ssh. You can tell which one
mpirun (or change it) is using by the value of RSHCOMMAND (at least in
the 1.2 version) in mpirun.

russell

On Tue, 26 Aug 2003 at 15:16, RoUdY wrote:

> hello
> Please help me, i really need help
> Because i can run mpd on the localhost but not in a ring
> of PC's
> When I type "mpiexec -np 3 ~/mpich2-0.93/examples/cpi "
> I get the answer
> Permission to node1 denied
> Permission to node 2  denied..................
>
> Hope to hear from u very soon
> --------------------------------------------------
> Get your free email address from Servihoo.com!
> http://www.servihoo.com
> The Portal of Mauritius
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

- - - - - - - - - - - -
Russell Nordquist
UNIX Systems Administrator
Geophysical Sciences Computing
http://geosci.uchicago.edu/computing
NSIT, University of Chicago
 - - - - - - - - - - -

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Tue Aug 26 20:34:38 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 27 Aug 2003 10:34:38 +1000
Subject: mpich2-0.93
In-Reply-To: <Pine.GSO.4.44.0308261539390.8787-100000@geosci.uchicago.edu>
References: <Pine.GSO.4.44.0308261539390.8787-100000@geosci.uchicago.edu>
Message-ID: <200308271034.39916.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 27 Aug 2003 06:50 am, Russell Nordquist wrote:

> Can you "rsh node1" or "ssh node1" and not be prompted? If not you need to
> setup password-less rsh (usaully .rhosts) or ssh. You can tell which one
> mpirun (or change it) is using by the value of RSHCOMMAND (at least in
> the 1.2 version) in mpirun.

Standard security blah - rsh is evil, ssh is your friend. :-)

It is possible to not install rsh, rlogin and rcp and replace them with 
symbolic links to ssh, slogin and scp.

This should work for most cases, but of course, test, test and test again.

We were fortunate, although we had installed the r-series clients on our 
cluster the daemons weren't enabled in inetd, so we knew we couldn't break 
anything by removing them (as they'd never have worked in the first place).

So far not found anything that has a problem because of this - although don't 
nuke users .rhosts files as some other programs, like PBS, call ruserok() to 
validate connections!

cheers,
Chris
- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/S/yeO2KABBYQAh8RAr/QAKCNHOz5hxIejvGOW34KZsRW74u0NwCeOONj
C49BRL6ceXRIHHNhl1mqHss=
=BM9q
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ajiang at mail.eecis.udel.edu  Wed Aug 27 17:32:38 2003
From: ajiang at mail.eecis.udel.edu (Ao Jiang)
Date: Wed, 27 Aug 2003 17:32:38 -0400 (EDT)
Subject: A question about Bewoulf software:
Message-ID: <Pine.GSO.4.33.0308271724160.6747-100000@stimpy.eecis.udel.edu>

   Hi,
   These days, our lab are planning to built up
a Beowulf cluster, which uses Intel Xeon Processors
or Pentium 4, and Intel Pro Gigabit (10/100/100)
ethernet card.
   We wonder if we choose commerical software, such as
scyld, which version will support Xeon Processor or
Pentium 4 respectively? And which version will support
Intel Pro Gigabit Ethernet card?
   If we try buliding by ourself, which version of software
we should choose?
   Thanks a lot for your kind suggestion. I am looking
forward to hearing from you. Thanks again.

   Tom

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Wed Aug 27 19:41:17 2003
From: becker at scyld.com (Donald Becker)
Date: Wed, 27 Aug 2003 19:41:17 -0400 (EDT)
Subject: A question about Bewoulf software:
In-Reply-To: <Pine.GSO.4.33.0308271724160.6747-100000@stimpy.eecis.udel.edu>
Message-ID: <Pine.LNX.4.44.0308271928500.1780-100000@beotest.scyld.com>

On Wed, 27 Aug 2003, Ao Jiang wrote:

>    These days, our lab are planning to built up
> a Beowulf cluster, which uses Intel Xeon Processors
> or Pentium 4, and Intel Pro Gigabit (10/100/100)
> ethernet card.
>    We wonder if we choose commerical software, such as
> scyld, which version will support Xeon Processor or
> Pentium 4 respectively?

Most Linux distributions will "support" the Pentium 4 and Xeon.  The
question is if the kernel is compiled to take advantage of the newer
processor features.

The Scyld distribution now has about a dozen different kernels to match the
processor types and UP/SMP on the master and compute nodes.  Typically
only two to four of the kernels are installed, based which checkboxes
are slected during installation.  We always install a safe, featureless
i386 uniprocessor 

BTW, you might think that the processor family is the most important
optimization, but there is an even bigger difference between
uniprocessor and SMP kernels.

> If we try buliding by ourself, which version of software
> we should choose?

You pretty much have two choices:
   be library version compatible with a consumer/workstation
      distribution (Red Hat, SuSE, Debian), or
   use a meta-distribution such as GenToo or Debian and compile
     everything yourself.

> And which version will support Intel Pro Gigabit Ethernet card?

Every few weeks Intel comes out with a new card version with a new PCI
ID.  The e1000 driver is one of the five or so drivers that we are
constantly updating to support just-introduced chips.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From exa at kablonet.com.tr  Wed Aug 27 20:39:21 2003
From: exa at kablonet.com.tr (Eray Ozkural)
Date: Thu, 28 Aug 2003 03:39:21 +0300
Subject: gigabit switches for 32-64 nodes
Message-ID: <200308280339.21458.exa@kablonet.com.tr>

hi there,

are there any high performance gigabit ethernet switches for a beowulf cluster 
consisting of 32 to 64 nodes? what do you recommend for the interconnect of 
such a system?

regards,

-- 
Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From widyono at cis.upenn.edu  Tue Aug 26 17:33:06 2003
From: widyono at cis.upenn.edu (Daniel Widyono)
Date: Tue, 26 Aug 2003 17:33:06 -0400
Subject: perl-bproc bindings
Message-ID: <20030826213306.GA2497@central.cis.upenn.edu>

Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc,
something more recent than spring of 2001?  Failing that, anyone have a
chance to flesh out the missing information in Dan's work (e.g. C constant()
function which doesn't seem to exist, error handling, etc.)?  I have it "just
working" for "just users", and barely at that.  Error handling consists of
returning -128 minus negated error code if there's an error.

I've already Googled and checked these archives (perl bproc binding).
Everything points back to Dan's work.

Thanks,
Dan W.

-- 
-- Daniel Widyono                      http://www.cis.upenn.edu/~widyono
-- Liniac Project,     CIS Dept.,    SEAS,    University of Pennsylvania
-- Mail: CIS Dept, 302 Levine     3330 Walnut St  Philadelphia, PA 19104
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Thu Aug 28 01:49:42 2003
From: becker at scyld.com (Donald Becker)
Date: Thu, 28 Aug 2003 01:49:42 -0400 (EDT)
Subject: perl-bproc bindings
In-Reply-To: <20030826213306.GA2497@central.cis.upenn.edu>
Message-ID: <Pine.LNX.4.44.0308280148180.1780-100000@beotest.scyld.com>

On Tue, 26 Aug 2003, Daniel Widyono wrote:

> Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc,
> something more recent than spring of 2001?

There are updated bindings, and a small example, at
  ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz


-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mitchel at navships.com  Thu Aug 28 05:08:13 2003
From: mitchel at navships.com (Mitchel Kagawa)
Date: Wed, 27 Aug 2003 23:08:13 -1000
Subject: gigabit switches for 32-64 nodes
References: <200308280339.21458.exa@kablonet.com.tr>
Message-ID: <000601c36d43$ea127360$0a02a8c0@kitsu2>

We use a Foundry Fastiron II Plus with 64 non-blocking copper gigabit ports.
A little on the pricy side but it works very well.

~Mitchel

----- Original Message -----
From: "Eray Ozkural" <exa at kablonet.com.tr>
To: <beowulf at beowulf.org>
Sent: Wednesday, August 27, 2003 2:39 PM
Subject: gigabit switches for 32-64 nodes


> hi there,
>
> are there any high performance gigabit ethernet switches for a beowulf
cluster
> consisting of 32 to 64 nodes? what do you recommend for the interconnect
of
> such a system?
>
> regards,
>
> --
> Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
> Comp. Sci. Dept., Bilkent University, Ankara  KDE Project:
http://www.kde.org
> www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction:
http://mp3.com/ariza
> GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D
539C
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Thu Aug 28 08:57:15 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Thu, 28 Aug 2003 14:57:15 +0200 (CEST)
Subject: 32bit slots and riser cards
Message-ID: <Pine.LNX.4.44.0308281440470.4549-100000@kenzo.iwr.uni-heidelberg.de>


Dear beowulfers,

In planning for some new cluster nodes, I hit a small problem. I want:
- a modern mainboard for dual-Xeon (preferred) or dual-Athlon
- 1U or 2U rackmounted case
- to use an old Myrinet M2M-PCI32 (Lanai4) which requires a riser card for 
installing in the case

The problem is that all mainboards that I looked at position the 32bit PCI
slot(s) near the edge of the mainboard and I cannot see how the riser
card can be installed into them so that the card still fits in the case; 
the Myrinet card does not fit (keyed differently) into the 64bit PCI slots 
or 64bit risers. Is there some solution to this problem or do I have to go 
back to midi-tower cases ?

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Thu Aug 28 10:24:19 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 28 Aug 2003 10:24:19 -0400
Subject: perl-bproc bindings
In-Reply-To: <Pine.LNX.4.44.0308280148180.1780-100000@beotest.scyld.com>
References: <Pine.LNX.4.44.0308280148180.1780-100000@beotest.scyld.com>
Message-ID: <1062080659.7565.0.camel@roughneck>

On Thu, 2003-08-28 at 01:49, Donald Becker wrote:
> On Tue, 26 Aug 2003, Daniel Widyono wrote:
> 
> > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc,
> > something more recent than spring of 2001?
> 
> There are updated bindings, and a small example, at
>   ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz

Any chance you guys have updated python bindings as well?

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at bellsouth.net  Thu Aug 28 11:04:52 2003
From: laytonjb at bellsouth.net (Jeffrey B. Layton)
Date: Thu, 28 Aug 2003 11:04:52 -0400
Subject: Intel acquiring Pallas
Message-ID: <3F4E1A14.8030900@bellsouth.net>

Good morning!

   I though I would post this for those who haven't seen it yet:

http://www.theregister.co.uk/content/4/32522.html

"Intel has signed on to acquire German software maker Pallas,
hoping the company's performance tools can give it an edge in
the compute cluster arena."

Enjoy!

Jeff


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 11:30:30 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 11:30:30 -0400 (EDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E1A14.8030900@bellsouth.net>
Message-ID: <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>

On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:

> Good morning!
> 
>    I though I would post this for those who haven't seen it yet:
> 
> http://www.theregister.co.uk/content/4/32522.html
> 
> "Intel has signed on to acquire German software maker Pallas,
> hoping the company's performance tools can give it an edge in
> the compute cluster arena."

Interesting.  I'm trying to understand where and how this will help them
-- more often than not it is a Bad Thing when hardware mfrs start
dabbling in something higher than firmware or compilers -- Apple (and
Next in its day) stands at one end of that path.

It's especially curious given that Intel is already overwhelmingly
dominant in the compute cluster arena (with only AMD a meaningful
cluster competitor, and with apple and the PPC perhas a distant third).
Not to mention the fact that if they REALLY wanted to get an edge in the
compute cluster arena, they'd acquire somebody like Dolphin or Myricom.

Monitoring is lovely and even important for application tuning, but it
is an application layer on TOP of both systems software and the network.
Or perhaps they are buying them so they can instrument their compilers?

   rgb

> 
> Enjoy!
> 
> Jeff
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at bellsouth.net  Thu Aug 28 11:50:41 2003
From: laytonjb at bellsouth.net (Jeffrey B. Layton)
Date: Thu, 28 Aug 2003 11:50:41 -0400
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>
Message-ID: <3F4E24D1.9010301@bellsouth.net>

Robert G. Brown wrote:

>On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:
>  
>
>>Good morning!
>>
>>   I though I would post this for those who haven't seen it yet:
>>
>>http://www.theregister.co.uk/content/4/32522.html
>>
>>"Intel has signed on to acquire German software maker Pallas,
>>hoping the company's performance tools can give it an edge in
>>the compute cluster arena."
>>    
>>
>
>Interesting.  I'm trying to understand where and how this will help them
>-- more often than not it is a Bad Thing when hardware mfrs start
>dabbling in something higher than firmware or compilers -- Apple (and
>Next in its day) stands at one end of that path.
>
>It's especially curious given that Intel is already overwhelmingly
>dominant in the compute cluster arena (with only AMD a meaningful
>cluster competitor, and with apple and the PPC perhas a distant third).
>Not to mention the fact that if they REALLY wanted to get an edge in the
>compute cluster arena, they'd acquire somebody like Dolphin or Myricom.
>
>Monitoring is lovely and even important for application tuning, but it
>is an application layer on TOP of both systems software and the network.
>Or perhaps they are buying them so they can instrument their compilers?
>
>   rgb
>

Bob,

   Very interesting observation. I wonder if Intel doesn't have something
else up their sleeve? Could they be trying to get back into Supercomputer
game (not likely, but didn't they get some DoD money recently?). Could
they be helping with networking stuff (Intel has been discussing the next
generation networking stuff lately). Maybe some sort of TCP Offload
Engine? Maybe something with their new bus ( PCI Express?) They have also
created CSA (Communication Streaming Architecture) in their new chipset
to bypass the PCI bottleneck. Of course they could also be after the Pallas
parallel debuggers to integrate into their compilers (like you mentioned)
or perhaps to help with debugging threaded code in the hyperthreaded chips.
   Not that you mention it, this is a somewhat interesting development.
I wonder what they're up to?

>Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>Duke University Dept. of Physics, Box 90305
>Durham, N.C. 27708-0305
>Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>  
>

Jeff


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 12:05:52 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 12:05:52 -0400 (EDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E24D1.9010301@bellsouth.net>
Message-ID: <Pine.LNX.4.44.0308281157370.20373-100000@ganesh.phy.duke.edu>

On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:

> to bypass the PCI bottleneck. Of course they could also be after the Pallas
> parallel debuggers to integrate into their compilers (like you mentioned)
> or perhaps to help with debugging threaded code in the hyperthreaded chips.
>    Not that you mention it, this is a somewhat interesting development.
> I wonder what they're up to?

My guess is something like this, given what pallas does, but if this is
the case, they may be preparing to attempt a task that has brought
strong programmers to their knees repeatedly in the past -- create a
true parallel compiler.  A compiler where the thread library
transparently hides a network-based cluster, complete with migration and
load balancing.  So the same code, written on top of a threading
library, could compile and run transparently on a single processor or a
multiprocessor or a distributed cluster.  Or something.

Hell, they're one of the few entities that can afford to tackle such a
blue-sky project, and just perhaps it is time for the project to be
tackled.  At least they can attack it from both ends at once -- writing
the compiler at the same time they hack the hardware around.  But
they're going to have create a hardware-level virtual interface for a
variety of IPC mechanism's for this to work, I think, in order to
instrument it locally and globally with no particular penalty either
way.  Or, of course, buy SCI and start putting the chipset on their
motherboards as a standard feature on a custom bus.  Myricom wouldn't
like that (or Dolphin if they went the other way), but it would make a
hell of a clustering motherboard.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rocky at atipa.com  Thu Aug 28 12:22:01 2003
From: rocky at atipa.com (Rocky McGaugh)
Date: Thu, 28 Aug 2003 11:22:01 -0500 (CDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E24D1.9010301@bellsouth.net>
Message-ID: <Pine.LNX.4.44.0308281113070.12077-100000@rocky>


On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:

> Bob,
> 
>    Very interesting observation. I wonder if Intel doesn't have something
> else up their sleeve? Could they be trying to get back into Supercomputer
> game (not likely, but didn't they get some DoD money recently?). Could
> they be helping with networking stuff (Intel has been discussing the next
> generation networking stuff lately). Maybe some sort of TCP Offload
> Engine? Maybe something with their new bus ( PCI Express?) They have also
> created CSA (Communication Streaming Architecture) in their new chipset
> to bypass the PCI bottleneck. Of course they could also be after the Pallas
> parallel debuggers to integrate into their compilers (like you mentioned)
> or perhaps to help with debugging threaded code in the hyperthreaded chips.
>    Not that you mention it, this is a somewhat interesting development.
> I wonder what they're up to?
> 

Intel's already dropped Infiniband, and they have also recently gotten 
very quiet about using PCI Express as a node interconnect. In fact, this
use of PCI Express has recently been switched to one of their "non-Goals" 
for the technology.

I'd guess that Intel does not care about this market. This is fine by me.
I'd rather have the Myricom's and Dolphin's that live or die by their
products to ensure the products are getting the attention they deserve.


-- 
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://67.8450073/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From walkev at presearch.com  Thu Aug 28 12:49:25 2003
From: walkev at presearch.com (Vann H. Walke)
Date: Thu, 28 Aug 2003 12:49:25 -0400
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281157370.20373-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0308281157370.20373-100000@ganesh.phy.duke.edu>
Message-ID: <1062089290.4363.22.camel@localhost.localdomain>

Hmmm...

Not to throw water on hopes for parallelizing compilers and Intel
supported parallel debuggers, but my guess is that Intel's move is much
less revolutionary (but perhaps still important).  

Pallas's main HPC product is Vampir/Vampirtrace.  These are performance
analysis tools.  As such they would only be peripherally useful for
compiler design (perhaps to measure the effects of certain changes). 
Even for this purpose, Vampir/Vampirtrace doesn't provide the amount of
detail that Intel's own V-Tune product does.  

For debugging, Pallas resells Etnus Totalview.  For compiler options
Pallas has the Intel compilers as well as PGI.  As far as I can tell,
Pallas doesn't do any significant independent development for these
systems.

So, what does the Pallas performance analysis product do that is
important?  Vampir/Vampirtrace allows the collection and display of data
from a large number of programs running in parallel.  Doing this well is
not trivial.  Time differences between machines must be taken into
account.  The tools  must be able to handle a potentially huge amount of
trace data (running a profiler on a 1000 process system is a much
different animal from instrumenting a single process job).  And, finally
once all this data is collected it has to be presented in some way which
can actually be of use.  VA/VT is among the best available tools for
this purpose.

So, why would Intel want to acquire Pallas?  First, they have a good
product which can be sold at a high price.  Combined with some Intel
marketing they "should" be able to make money on the product.  Second,
Vampirtrace has the capability of using processor performance counters. 
By pushing the capabilities of VA/VT to work on Intel processors it
promotes "lock-in" to Intel processors.  In this way a developer using
the Intel compilers, V-Tune for single process analysis, and Vampir for
parallel profiling, wouldn't be likely to move to an AMD or Power
platform.  

Is this a good thing?  For the most part probably so.  Intel should be
able to help improve the Vampir software.  Making it work even better on
Intel processors doesn't really hurt things if you're using another
system and might make things really nice for those of us on Intel
hardware.  Hopefully it's development on other systems won't languish.  

But, on the basis of this acquisition, I wouldn't hold my breath for
parallel compilers or a full fledged Intel return to the HPC market.

Vann
Presearch, Inc.


On Thu, 2003-08-28 at 12:05, Robert G. Brown wrote:
> On Thu, 28 Aug 2003, Jeffrey B. Layton wrote:
> 
> > to bypass the PCI bottleneck. Of course they could also be after the Pallas
> > parallel debuggers to integrate into their compilers (like you mentioned)
> > or perhaps to help with debugging threaded code in the hyperthreaded chips.
> >    Not that you mention it, this is a somewhat interesting development.
> > I wonder what they're up to?
> 
> My guess is something like this, given what pallas does, but if this is
> the case, they may be preparing to attempt a task that has brought
> strong programmers to their knees repeatedly in the past -- create a
> true parallel compiler.  A compiler where the thread library
> transparently hides a network-based cluster, complete with migration and
> load balancing.  So the same code, written on top of a threading
> library, could compile and run transparently on a single processor or a
> multiprocessor or a distributed cluster.  Or something.
> 
> Hell, they're one of the few entities that can afford to tackle such a
> blue-sky project, and just perhaps it is time for the project to be
> tackled.  At least they can attack it from both ends at once -- writing
> the compiler at the same time they hack the hardware around.  But
> they're going to have create a hardware-level virtual interface for a
> variety of IPC mechanism's for this to work, I think, in order to
> instrument it locally and globally with no particular penalty either
> way.  Or, of course, buy SCI and start putting the chipset on their
> motherboards as a standard feature on a custom bus.  Myricom wouldn't
> like that (or Dolphin if they went the other way), but it would make a
> hell of a clustering motherboard.
> 
>    rgb
> 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Thu Aug 28 13:53:32 2003
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 28 Aug 2003 10:53:32 -0700
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E24D1.9010301@bellsouth.net>
References: <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>
 <Pine.LNX.4.44.0308281123260.20373-100000@ganesh.phy.duke.edu>
Message-ID: <5.2.0.9.2.20030828104843.03073620@mailhost4.jpl.nasa.gov>

At 11:50 AM 8/28/2003 -0400, Jeffrey B. Layton wrote:
>Robert G. Brown wrote:
>
>>
>>Interesting.  I'm trying to understand where and how this will help them
>>-- more often than not it is a Bad Thing when hardware mfrs start
>>dabbling in something higher than firmware or compilers -- Apple (and
>>Next in its day) stands at one end of that path.
>>
>>It's especially curious given that Intel is already overwhelmingly
>>dominant in the compute cluster arena (with only AMD a meaningful
>>cluster competitor, and with apple and the PPC perhas a distant third).
>>Not to mention the fact that if they REALLY wanted to get an edge in the
>>compute cluster arena, they'd acquire somebody like Dolphin or Myricom.
>>
>>   rgb
>
>Bob,
>
>   Very interesting observation. I wonder if Intel doesn't have something
>else up their sleeve? Could they be trying to get back into Supercomputer
>game (not likely, but didn't they get some DoD money recently?). Could
>they be helping with networking stuff (Intel has been discussing the next
>generation networking stuff lately). Maybe some sort of TCP Offload
>Engine? Maybe something with their new bus ( PCI Express?) They have also
>created CSA (Communication Streaming Architecture) in their new chipset
>to bypass the PCI bottleneck. Of course they could also be after the Pallas
>parallel debuggers to integrate into their compilers (like you mentioned)
>or perhaps to help with debugging threaded code in the hyperthreaded chips.
>   Not that you mention it, this is a somewhat interesting development.
>I wonder what they're up to?
>
>Jeff

Intel is making a big push into wireless and RF technology.  A recent 
article ( I don't recall where exactly,but one of the trade rags..) 
mentioned that the mass market (consumer) don't seem to need much more 
processor crunch (at least until Windows XXXP comes out, then you'll need 
all that power just to apply the patches), but that they saw a big market 
opportunity in integrated wireless networking.  Simultaneously, the 
generalized tanking of the telecom industry has meant that they can hire 
very skilled RF engineers for reasonable wages without having to compete 
against speculative piles of options, etc. (I suspect that there are some 
skilled RF engineers who are now older and wiser and less speculative, 
too!)   We're talking about RF chip designers, as well as PWB layout, 
circuit designers, and antenna folks.

It wouldn't surprise me that Intel is looking at other areas than 
traditional CPU and processor support kinds of roles.


James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glen at cert.ucr.edu  Thu Aug 28 14:19:28 2003
From: glen at cert.ucr.edu (Glen Kaukola)
Date: Thu, 28 Aug 2003 11:19:28 -0700
Subject: thrashing
Message-ID: <3F4E47B0.3000805@cert.ucr.edu>

Hi there,

So for our newest simulations, we're working with a different domain, 
where each of our grid cells are much smaller, and so we're expecting 
the runs to take about 4 times longer.  But actually they're taking 
around 40 times longer.  I'm thinking this may have something to do with 
not having enough memory.  The problem with this theory is that I'm not 
really sure how to tell if my machines are thrashing.  On a desktop 
machine I can tell no problem, as the disk starts going crazy and the 
system pretty much grinds to a halt.  But on a machine up in my server 
room on which I don't have any gui and where it's too loud to hear any 
disk activity, I'm really not sure how to tell whether it's thrashing or 
not.  I mean, I can look at top, and free, and sar and everything 
doesn't look much different than when the other simulations were 
running, except for maybe 'sar -W', which is a little bit higher.  
Anyway, if someone could help me out with a way to determine without a 
doubt if my machines are thrashing or not, then I'd greatly appriciate it.

Thanks for your time,
Glen Kaukola

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Thu Aug 28 14:35:49 2003
From: becker at scyld.com (Donald Becker)
Date: Thu, 28 Aug 2003 14:35:49 -0400 (EDT)
Subject: perl-bproc bindings
In-Reply-To: <1062080659.7565.0.camel@roughneck>
Message-ID: <Pine.LNX.4.44.0308281428240.18703-100000@beotest.scyld.com>

On 28 Aug 2003, Nicholas Henke wrote:

> On Thu, 2003-08-28 at 01:49, Donald Becker wrote:
> > On Tue, 26 Aug 2003, Daniel Widyono wrote:
> > 
> > > Anyone out there besides Dan Ridge from Scyld make Perl bindings for bproc,
> > > something more recent than spring of 2001?
> > 
> > There are updated bindings, and a small example, at
> >   ftp://www.scyld.com/pub/beowulf-components/bproc-perl/bproc-perl.tar.gz
> 
> Any chance you guys have updated python bindings as well?

0.9-8 is the current version -- which are you using?
The last bugfix was logged in October of 2003.

The next planned refresh has added bindings for the Beostat statistics
library, Beomap job mapping and BBQ job scheduling systems.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From laytonjb at bellsouth.net  Thu Aug 28 14:39:02 2003
From: laytonjb at bellsouth.net (Jeffrey B. Layton)
Date: Thu, 28 Aug 2003 14:39:02 -0400
Subject: thrashing
In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu>
References: <3F4E47B0.3000805@cert.ucr.edu>
Message-ID: <3F4E4C46.1010300@bellsouth.net>

Use vmstat. Try something like,

vmstat 1 10

(1 second delay, 10 repeats). Look in the columns labeled,

 swap
si   so

that will give you the information you want.

Good Luck!

Jeff


> Hi there,
>
> So for our newest simulations, we're working with a different domain, 
> where each of our grid cells are much smaller, and so we're expecting 
> the runs to take about 4 times longer.  But actually they're taking 
> around 40 times longer.  I'm thinking this may have something to do 
> with not having enough memory.  The problem with this theory is that 
> I'm not really sure how to tell if my machines are thrashing.  On a 
> desktop machine I can tell no problem, as the disk starts going crazy 
> and the system pretty much grinds to a halt.  But on a machine up in 
> my server room on which I don't have any gui and where it's too loud 
> to hear any disk activity, I'm really not sure how to tell whether 
> it's thrashing or not.  I mean, I can look at top, and free, and sar 
> and everything doesn't look much different than when the other 
> simulations were running, except for maybe 'sar -W', which is a little 
> bit higher.  Anyway, if someone could help me out with a way to 
> determine without a doubt if my machines are thrashing or not, then 
> I'd greatly appriciate it.
>
> Thanks for your time,
> Glen Kaukola
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From landman at scalableinformatics.com  Thu Aug 28 15:08:53 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: 28 Aug 2003 15:08:53 -0400
Subject: thrashing
In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu>
References: <3F4E47B0.3000805@cert.ucr.edu>
Message-ID: <1062097733.8882.120.camel@protein.scalableinformatics.com>

Hi Glen:

  Several methods.

1) vmstat

	vmstat 1

and look at the so/si columns, not to mention the r/b/w.

2) swapon -s

to see the swap usage

3) top

has an ok summary of the vm info

4) cat /proc/meminfo 

can give a crude picture of the memory system.

On Thu, 2003-08-28 at 14:19, Glen Kaukola wrote:
> Hi there,
> 
> So for our newest simulations, we're working with a different domain, 
> where each of our grid cells are much smaller, and so we're expecting 
> the runs to take about 4 times longer.  But actually they're taking 
> around 40 times longer.  I'm thinking this may have something to do with 
> not having enough memory.  The problem with this theory is that I'm not 
> really sure how to tell if my machines are thrashing.  On a desktop 
> machine I can tell no problem, as the disk starts going crazy and the 
> system pretty much grinds to a halt.  But on a machine up in my server 
> room on which I don't have any gui and where it's too loud to hear any 
> disk activity, I'm really not sure how to tell whether it's thrashing or 
> not.  I mean, I can look at top, and free, and sar and everything 
> doesn't look much different than when the other simulations were 
> running, except for maybe 'sar -W', which is a little bit higher.  
> Anyway, if someone could help me out with a way to determine without a 
> doubt if my machines are thrashing or not, then I'd greatly appriciate it.
> 
> Thanks for your time,
> Glen Kaukola
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Aug 28 15:43:08 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 28 Aug 2003 12:43:08 -0700
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.4.44.0308281440470.4549-100000@kenzo.iwr.uni-heidelberg.de>
References: <Pine.LNX.4.44.0308281440470.4549-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <20030828194308.GA1778@greglaptop.internal.keyresearch.com>

On Thu, Aug 28, 2003 at 02:57:15PM +0200, Bogdan Costescu wrote:

> the Myrinet card does not fit (keyed differently) into the 64bit PCI slots 
> or 64bit risers. Is there some solution to this problem or do I have to go 
> back to midi-tower cases ?

Doesn't that mean that the Myrinet card is 5 volts, and you only have
3.3 volt PCI slots? It's such an old Myrinet card that I don't
remember the details of when PCI got a 3.3 volt option.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 16:24:47 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 16:24:47 -0400 (EDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E44D8.7090200@wildopensource.com>
Message-ID: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>

On Thu, 28 Aug 2003, Stephen Gaudet wrote:

> With Itanium2 this is not the case.  Both Red Hat and SuSe have a fixed 
> cost of about $400.00, plus or minus a few dollars per system.  
> Therefore, due to this fixed cost, MOST people looking at a cluster 
> won't touch Itanium2. 

Steve,

Are you suggesting RH has put together a package that is NOT GPL in any
way that would significantly affect the 64 bit market?  The kernel, the
compiler, and damn near every package is GPL, much of it from Gnu
itself.  Am I crazy here?

So I'm having a hard time seeing why one would HAVE to pay them
$400/system for anything except perhaps proprietary non-GPL "advanced
server" packages that almost certainly wouldn't be important to HPC
cluster builders (and which they would have had to damn near develop in
a sealed room to avoid incorporating GPL stuff in it anywhere).

> Some white box resellers are looking at taking RH Advanced Server and 
> stripping it down and offering on their ia64 clusters.  However, if 
> their not working with code lawyers, and paying very close attention to 
> copy right laws, they could end up with law suits down the road.

If Red Hat isn't careful and not working very carefully with code
lawyers, I think the reverse is a lot more likely, as Richard Stallman
is known to take the Gnu Public License (free as in air at the source
level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't
"own" a hell of a lot of code in what they sell; the bulk of what they
HAVE written is GPL derived and hence GPL by inheritance alone.  The
Open Source community would stomp anything out of line with hobnailed
boots and club it until it stopped twitching...

So although many a business may cheerfully pay $400/seat for advanced
server because it is a cost and support model they are comfortable with,
I don't see what there is to stop anyone from taking an advanced server
copy (which necessarily either comes with src rpm's or makes them
publically available somewhere), doing an rpmbuild on all the src rpm's
(as if anyone would care that you went through an independent rebuild vs
just used the distribution rpm's) and putting it on 1000 systems, or
giving the sources to a friend, or even reselling a repackaging of the
whole thing (as long as they don't call them Red Hat and as long as they
omit any really proprietary non-GPL work).

I even thought there were some people on the list who were using at
least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm
wrong...:-(

  rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Aug 28 16:23:37 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 28 Aug 2003 13:23:37 -0700
Subject: thrashing
In-Reply-To: <3F4E47B0.3000805@cert.ucr.edu>
References: <3F4E47B0.3000805@cert.ucr.edu>
Message-ID: <20030828202337.GA1964@greglaptop.internal.keyresearch.com>

On Thu, Aug 28, 2003 at 11:19:28AM -0700, Glen Kaukola wrote:

> I mean, I can look at top, and free, and sar and everything 
> doesn't look much different than when the other simulations were 
> running,

A clear sign of thrashing is that the program should be getting a lot
less than 100% of the cpu, because it's waiting for blocks from the
disk.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at keyresearch.com  Thu Aug 28 16:31:20 2003
From: lindahl at keyresearch.com (Greg Lindahl)
Date: Thu, 28 Aug 2003 13:31:20 -0700
Subject: Change Management Control
In-Reply-To: <20030826143305.24318.qmail@houston.wolf.com>
References: <20030826143305.24318.qmail@houston.wolf.com>
Message-ID: <20030828203120.GB1964@greglaptop.internal.keyresearch.com>

> I am looking for information/sites and a formal best practice change 
> control for clusters.  Can someone point me in the right direction? thx -ar

Most clusters are a lot more informal, and don't have any kind of
change control.

I suspect your best bet would be to look at people involved in LISA:
Large Installation Systems Administration. These guys are mostly
commercial, and we (the HPC cluster community) don't talk to them
much, even though we should.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 16:42:19 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 16:42:19 -0400 (EDT)
Subject: thrashing
In-Reply-To: <1062097733.8882.120.camel@protein.scalableinformatics.com>
Message-ID: <Pine.LNX.4.44.0308281634330.20373-100000@ganesh.phy.duke.edu>

On 28 Aug 2003, Joseph Landman wrote:

> Hi Glen:
> 
>   Several methods.
> 
> 1) vmstat
> 
> 	vmstat 1
> 
> and look at the so/si columns, not to mention the r/b/w.
> 
> 2) swapon -s
> 
> to see the swap usage
> 
> 3) top
> 
> has an ok summary of the vm info
> 
> 4) cat /proc/meminfo 
> 
> can give a crude picture of the memory system.

and if you want to watch pretty much all of this information in parallel
(on all the systems at once) xmlsysd provides output fields with the
information available in both vmstat and free (cat /proc/meminfo), so
you can actually watch for swapping or paging or leaks on lots of
systems at once in wulfstat.  It easily handles updates with a 5 second
granularity and can often manage 1 second (depending on your network and
number of nodes and so forth).

It's on the brahma website or linked under my own.

I don't really provide a direct monitor of disk activity (partly out of
irritation at custom-parsing the multidelimited "disk_io" field in
/proc/stat), but if you were really interested in it I could probably
bite the bullet and add a "disk" display that would work for up to four
disks in a few hours of work.

I'd guess that ganglia could also manage this sort of monitoring as
well, but I don't use it (as I wrote my package before they started
theirs by a year or three) so I don't know for sure.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Aug 28 16:49:06 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 28 Aug 2003 16:49:06 -0400 (EDT)
Subject: 32bit slots and riser cards
In-Reply-To: <20030828194308.GA1778@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0308281644070.20373-100000@ganesh.phy.duke.edu>

On Thu, 28 Aug 2003, Greg Lindahl wrote:

> On Thu, Aug 28, 2003 at 02:57:15PM +0200, Bogdan Costescu wrote:
> 
> > the Myrinet card does not fit (keyed differently) into the 64bit PCI slots 
> > or 64bit risers. Is there some solution to this problem or do I have to go 
> > back to midi-tower cases ?
> 
> Doesn't that mean that the Myrinet card is 5 volts, and you only have
> 3.3 volt PCI slots? It's such an old Myrinet card that I don't
> remember the details of when PCI got a 3.3 volt option.

I think that this is right, Greg -- the keying is related to voltage.

If your actual PCI slots are keyed correctly, they should be able to
manage either voltage (IIRC), but you may have to replace the risers.
We've had trouble getting risers that didn't key correctly or work
correctly for one kind of card or the other (or one motherboard or
another) in the past.  It sounds like this might be your problem if
you're referring to replacing the cases and not the motherboard itself.
Look around and see if you find better/different risers -- there are a
fair number of different kinds of risers out there, at least for 2U.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Aug 28 17:06:40 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 28 Aug 2003 14:06:40 -0700 (PDT)
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.LNX.4.44.0308281350160.9879-100000@twin.uoregon.edu>

The cost of the os, either of a blessed one, or a roll your own one hasn't 
been a significant factor in our reluctance to use Itanium II.

The lack of commodity mainboards.

The steep price of the cpu's.

and lack of a clear view into intels product lifecycle for itaniumII.

have been issues.

Itanium II 1.3ghz 3mb cpu's have only recently arrived at ~$1400ea. 
opteron 244s are less than half that and that's before we put the rest of 
the system around it.

we have some off-the-shelf compaq itanium boxes to evaluate but at around 
$8000 ea that sort of a non-starter.

joelja

On Thu, 28 Aug 2003, Robert G. Brown wrote:

> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> 
> > With Itanium2 this is not the case.  Both Red Hat and SuSe have a fixed 
> > cost of about $400.00, plus or minus a few dollars per system.  
> > Therefore, due to this fixed cost, MOST people looking at a cluster 
> > won't touch Itanium2. 
> 
> Steve,
> 
> Are you suggesting RH has put together a package that is NOT GPL in any
> way that would significantly affect the 64 bit market?  The kernel, the
> compiler, and damn near every package is GPL, much of it from Gnu
> itself.  Am I crazy here?
> 
> So I'm having a hard time seeing why one would HAVE to pay them
> $400/system for anything except perhaps proprietary non-GPL "advanced
> server" packages that almost certainly wouldn't be important to HPC
> cluster builders (and which they would have had to damn near develop in
> a sealed room to avoid incorporating GPL stuff in it anywhere).
> 
> > Some white box resellers are looking at taking RH Advanced Server and 
> > stripping it down and offering on their ia64 clusters.  However, if 
> > their not working with code lawyers, and paying very close attention to 
> > copy right laws, they could end up with law suits down the road.
> 
> If Red Hat isn't careful and not working very carefully with code
> lawyers, I think the reverse is a lot more likely, as Richard Stallman
> is known to take the Gnu Public License (free as in air at the source
> level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't
> "own" a hell of a lot of code in what they sell; the bulk of what they
> HAVE written is GPL derived and hence GPL by inheritance alone.  The
> Open Source community would stomp anything out of line with hobnailed
> boots and club it until it stopped twitching...
> 
> So although many a business may cheerfully pay $400/seat for advanced
> server because it is a cost and support model they are comfortable with,
> I don't see what there is to stop anyone from taking an advanced server
> copy (which necessarily either comes with src rpm's or makes them
> publically available somewhere), doing an rpmbuild on all the src rpm's
> (as if anyone would care that you went through an independent rebuild vs
> just used the distribution rpm's) and putting it on 1000 systems, or
> giving the sources to a friend, or even reselling a repackaging of the
> whole thing (as long as they don't call them Red Hat and as long as they
> omit any really proprietary non-GPL work).
> 
> I even thought there were some people on the list who were using at
> least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm
> wrong...:-(
> 
>   rgb
> 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alvin at Mail.Linux-Consulting.com  Thu Aug 28 17:16:38 2003
From: alvin at Mail.Linux-Consulting.com (Alvin Oga)
Date: Thu, 28 Aug 2003 14:16:38 -0700 (PDT)
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.4.44.0308281440470.4549-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.LNX.3.96.1030828141255.4158A-100000@Maggie.Linux-Consulting.com>


On Thu, 28 Aug 2003, Bogdan Costescu wrote:

> In planning for some new cluster nodes, I hit a small problem. I want:
> - a modern mainboard for dual-Xeon (preferred) or dual-Athlon
> - 1U or 2U rackmounted case
> - to use an old Myrinet M2M-PCI32 (Lanai4) which requires a riser card for 
> installing in the case

the 2U chassis should be trivial to solve for either 32bit or 64bit pci
slots 

for 1U chassis... you need to pick "right motherboard" that works with the
chassis ... and pci cards 
	( you cannot do a mix and match with any motherboard )

	if you want performance out of your pci card, you will have to use
	64bit pci slots or 32bit pci slot - but the riser card should be
	one piece instead of the whacky non-conforming wires  between
	the "2 sections of the pci riser"

32 and 64 bit pci riser cards  (not cheap but lot better than most others)
	http://www.adexelec.com
 
c ya
alvin

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From johnb at quadrics.com  Thu Aug 28 17:22:13 2003
From: johnb at quadrics.com (John Brookes)
Date: Thu, 28 Aug 2003 22:22:13 +0100
Subject: thrashing
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA7E5E2C1@stegosaurus.bristol.quadrics.com>

The test I would probably suggest to someone whose machine I had no access
to is 'swapoff -a'. It's not big and it's not clever, but largely removes
the need for value judgements: if it bombs in an OOM style, you were most
probably thrashing.

Just a thought.

Cheers,

John Brookes
Quadrics

> -----Original Message-----
> From: Glen Kaukola [mailto:glen at mail.cert.ucr.edu]
> Sent: 28 August 2003 19:19
> To: beowulf at beowulf.org
> Subject: thrashing
> 
> 
> Hi there,
> 
> So for our newest simulations, we're working with a different domain, 
> where each of our grid cells are much smaller, and so we're expecting 
> the runs to take about 4 times longer.  But actually they're taking 
> around 40 times longer.  I'm thinking this may have something 
> to do with 
> not having enough memory.  The problem with this theory is 
> that I'm not 
> really sure how to tell if my machines are thrashing.  On a desktop 
> machine I can tell no problem, as the disk starts going crazy and the 
> system pretty much grinds to a halt.  But on a machine up in 
> my server 
> room on which I don't have any gui and where it's too loud to 
> hear any 
> disk activity, I'm really not sure how to tell whether it's 
> thrashing or 
> not.  I mean, I can look at top, and free, and sar and everything 
> doesn't look much different than when the other simulations were 
> running, except for maybe 'sar -W', which is a little bit higher.  
> Anyway, if someone could help me out with a way to determine 
> without a 
> doubt if my machines are thrashing or not, then I'd greatly 
> appriciate it.
> 
> Thanks for your time,
> Glen Kaukola
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From exa at kablonet.com.tr  Thu Aug 28 13:56:53 2003
From: exa at kablonet.com.tr (Eray Ozkural)
Date: Thu, 28 Aug 2003 20:56:53 +0300
Subject: Filesystem
In-Reply-To: <Pine.LNX.3.96.1030801194203.13498A-100000@Maggie.Linux-Consulting.com>
References: <Pine.LNX.3.96.1030801194203.13498A-100000@Maggie.Linux-Consulting.com>
Message-ID: <200308282056.54106.exa@kablonet.com.tr>

On Saturday 02 August 2003 05:45, Alvin Oga wrote:
> i think ext3 is better than reiserfs
>
> i think ext3 is not any better than ext2 in terms
> of somebody hitting pwer/reset w/o proper shutdown
> 	- i always allow it to run e2fsck when it does
> 	an unclean shutdown ...
>
> 	- yes ext3 will timeout and continue and restore from
> 	backups but ... am paranoid about the underlying ext2
> 	getting corrupted by random power off and resets
>

I basically think ext3 and ext2 are a joke and we use XFS on the nodes with no 
performance problem. Excellent reliability!


Regards,

-- 
Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sp at scali.com  Thu Aug 28 16:55:51 2003
From: sp at scali.com (Steffen Persvold)
Date: Thu, 28 Aug 2003 22:55:51 +0200
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
Message-ID: <3F4E6C57.9030406@scali.com>

Robert G. Brown wrote:
> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> 
> 
>>With Itanium2 this is not the case.  Both Red Hat and SuSe have a fixed 
>>cost of about $400.00, plus or minus a few dollars per system.  
>>Therefore, due to this fixed cost, MOST people looking at a cluster 
>>won't touch Itanium2. 
> 
> 
> Steve,
> 
> Are you suggesting RH has put together a package that is NOT GPL in any
> way that would significantly affect the 64 bit market?  The kernel, the
> compiler, and damn near every package is GPL, much of it from Gnu
> itself.  Am I crazy here?
> 
> So I'm having a hard time seeing why one would HAVE to pay them
> $400/system for anything except perhaps proprietary non-GPL "advanced
> server" packages that almost certainly wouldn't be important to HPC
> cluster builders (and which they would have had to damn near develop in
> a sealed room to avoid incorporating GPL stuff in it anywhere).
> 
> 
>>Some white box resellers are looking at taking RH Advanced Server and 
>>stripping it down and offering on their ia64 clusters.  However, if 
>>their not working with code lawyers, and paying very close attention to 
>>copy right laws, they could end up with law suits down the road.
> 
> 
> If Red Hat isn't careful and not working very carefully with code
> lawyers, I think the reverse is a lot more likely, as Richard Stallman
> is known to take the Gnu Public License (free as in air at the source
> level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't
> "own" a hell of a lot of code in what they sell; the bulk of what they
> HAVE written is GPL derived and hence GPL by inheritance alone.  The
> Open Source community would stomp anything out of line with hobnailed
> boots and club it until it stopped twitching...
> 
> So although many a business may cheerfully pay $400/seat for advanced
> server because it is a cost and support model they are comfortable with,
> I don't see what there is to stop anyone from taking an advanced server
> copy (which necessarily either comes with src rpm's or makes them
> publically available somewhere), doing an rpmbuild on all the src rpm's
> (as if anyone would care that you went through an independent rebuild vs
> just used the distribution rpm's) and putting it on 1000 systems, or
> giving the sources to a friend, or even reselling a repackaging of the
> whole thing (as long as they don't call them Red Hat and as long as they
> omit any really proprietary non-GPL work).
> 
> I even thought there were some people on the list who were using at
> least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm
> wrong...:-(
> 


Robert,

AFAIK, there is no "proprietary non-GPL" work in RedHat's Enterprise Linux line. I think the price is so high because of the support level you're buing. All the source for RHEL, 
either 32bit or 64bit is available on their ftp sites for download. And as long as they do that I don't think they're violating GPL, but I might be wrong (as I'm not a lawyers, 
but I'm sure RH has plenty of them).

And actually, according to their web site, the cheapest (most suitable cluster) release for ITP2; RHEL WS (workstation) is $792, AS (advanced server) is $1992 for standard edition 
and $2988 for premium edition.

Regards,
-- 
       Steffen Persvold           ,,,       mailto: sp at scali.com
    Senior Software Engineer     (o-o)      http://www.scali.com
-----------------------------oOO-(_)-OOo-----------------------------
Scali AS, PObox 150, Oppsal, N-0619 Oslo, Norway, Tel: +4792484511

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Thu Aug 28 18:11:19 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 28 Aug 2003 15:11:19 -0700 (PDT)
Subject: Intel acquiring Pallas
In-Reply-To: <3F4E6C57.9030406@scali.com>
Message-ID: <Pine.LNX.4.44.0308281508210.9879-100000@twin.uoregon.edu>

Stephen... anyone who wants can grab the entire srpms dir for AS and build 
it. The only way they'll end up with a  lawsuit is if they represent the 
result as official suppoprt redhat linux AS...

If you like you can pick it up from the RH mirrors including mine.

> > On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> > 
> >>Some white box resellers are looking at taking RH Advanced Server and 
> >>stripping it down and offering on their ia64 clusters.  However, if 
> >>their not working with code lawyers, and paying very close attention to 
> >>copy right laws, they could end up with law suits down the road.
> > 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Thu Aug 28 18:25:42 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 29 Aug 2003 00:25:42 +0200 (CEST)
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.3.96.1030828141255.4158A-100000@Maggie.Linux-Consulting.com>
Message-ID: <Pine.LNX.4.44.0308290019530.10632-100000@kenzo.iwr.uni-heidelberg.de>

On Thu, 28 Aug 2003, Alvin Oga wrote:

> the 2U chassis should be trivial to solve for either 32bit or 64bit pci
> slots 

Well, maybe trivial for you who do this for a living :-)

> for 1U chassis... you need to pick "right motherboard" that works with the
> chassis ... and pci cards 
> 	( you cannot do a mix and match with any motherboard )

Sure, but I was looking for example at the Intel offerings which pair dual 
Xeon mainboards with 1U/2U cases that are certified to work together.

> 	if you want performance out of your pci card,

I know that this 32bit/33MHz card looks slow by today's standards, but I 
think that it can still provide lower latency than e1000 or tg3 -driven 
cards, so Id' like to continue to use them.

> 32 and 64 bit pci riser cards  (not cheap but lot better than most others)
> 	http://www.adexelec.com

Many thanks for this address. I did try to use google before writting to 
the list, but I came with all sorts of shops, but nothing with good 
descriptions, which is what I needed most.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Thu Aug 28 18:44:24 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 29 Aug 2003 00:44:24 +0200 (CEST)
Subject: 32bit slots and riser cards
In-Reply-To: <20030828194308.GA1778@greglaptop.internal.keyresearch.com>
Message-ID: <Pine.LNX.4.44.0308290025520.10632-100000@kenzo.iwr.uni-heidelberg.de>

On Thu, 28 Aug 2003, Greg Lindahl wrote:

> Doesn't that mean that the Myrinet card is 5 volts, and you only have
> 3.3 volt PCI slots ?

Bingo. This is exactly what most people that wrote to me off-list probably 
missed, although I did mention that it doesn't fit because of a different 
keying - I should have probably mentioned this explicitly. All the 32bit 
slots that I've seen on these mainboards allow inserting such cards, which 
makes me believe that they support both 5V and 3.3V cards; but the 64bit 
slots are 3.3V only.

I don't have much experience with rackmounted systems, which it's probably 
evident, so I didn't know what to expect from a riser. Thanks to Alvin 
Oga's mention of Adexelec site, I was able to find out that the risers 
exist in many different variations. For example, I was wondering if such a 
riser exist that would allow mounting of the card from the edge toward the 
middle of the mainboard, while most common way is the other way around - 
I still need to find out if the case allows fixing of the card the other 
way around, but this is an easier problem to solve.

One other thing that turned me off was that in a system composed of only
Intel components (SE7501WV2 mainboard and SR2300 case) the 64bit PCI slots
on the mainboard allow inserting of the Myrinet card (but didn't try to
see if it works), while the riser cards that came with the case do not,
allowing only 3.3V ones - so the riser imposes additional limitations...

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gotero at linuxprophet.com  Thu Aug 28 19:46:29 2003
From: gotero at linuxprophet.com (Glen Otero)
Date: Thu, 28 Aug 2003 16:46:29 -0700
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281508210.9879-100000@twin.uoregon.edu>
Message-ID: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>

Joel-

Have you actually built RH AS from scratch using their SRPMS?  Or do  
you know anyone that has? I'm very interested in doing this but I heard  
there were some pretty significant obstacles along the lines of package  
dependencies.

Glen

On Thursday, August 28, 2003, at 03:11  PM, Joel Jaeggli wrote:

> Stephen... anyone who wants can grab the entire srpms dir for AS and  
> build
> it. The only way they'll end up with a  lawsuit is if they represent  
> the
> result as official suppoprt redhat linux AS...
>
> If you like you can pick it up from the RH mirrors including mine.
>
>>> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
>>>
>>>> Some white box resellers are looking at taking RH Advanced Server  
>>>> and
>>>> stripping it down and offering on their ia64 clusters.  However, if
>>>> their not working with code lawyers, and paying very close  
>>>> attention to
>>>> copy right laws, they could end up with law suits down the road.
>>>
>
> --  
> ----------------------------------------------------------------------- 
> ---
> Joel Jaeggli  	       Unix Consulting 	        
> joelja at darkwing.uoregon.edu
> GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F  
> 56B2
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
Glen Otero, Ph.D.
Linux Prophet
619.917.1772

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From walkev at presearch.com  Thu Aug 28 19:57:53 2003
From: walkev at presearch.com (Vann H. Walke)
Date: Thu, 28 Aug 2003 19:57:53 -0400
Subject: Intel acquiring Pallas (Redhat AS Rebuild)
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
References: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <1062115073.7007.2.camel@localhost.localdomain>

Haven't tried it but...

http://www2.uibk.ac.at/zid/software/unix/linux/rhel-rebuild.htm
http://www.uibk.ac.at/zid/software/unix/linux/rhel-rebuild-l.html

Vann

On Thu, 2003-08-28 at 19:46, Glen Otero wrote:
> Joel-
> 
> Have you actually built RH AS from scratch using their SRPMS?  Or do  
> you know anyone that has? I'm very interested in doing this but I heard  
> there were some pretty significant obstacles along the lines of package  
> dependencies.
> 
> Glen
> 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Thu Aug 28 20:54:43 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Fri, 29 Aug 2003 10:54:43 +1000
Subject: Intel acquiring Pallas
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
References: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <200308291054.45059.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 29 Aug 2003 09:46 am, Glen Otero wrote:

> Have you actually built RH AS from scratch using their SRPMS?  Or do  
> you know anyone that has?

- From the Rocks Cluster Distribution website:

http://www.rocksclusters.org/Rocks/

[...]
Rocks 2.3.2 IA64 is based on Red Hat Advanced Workstation 2.1 recompiled from 
Red Hat's publicly available source RPMs.
[...]

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/TqRTO2KABBYQAh8RAnd4AJkBCFmq3tyb97EgHvg5x9mrsqkGGQCghGqG
9cF9eAKLTHD6lQS4kZGtg0A=
=WVIz
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From timm at fnal.gov  Thu Aug 28 20:59:57 2003
From: timm at fnal.gov (Steven Timm)
Date: Thu, 28 Aug 2003 19:59:57 -0500
Subject: Intel acquiring Pallas
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <Pine.SGI.4.31.0308281959160.3207571-100000@fsgi01.fnal.gov>

The ROCKS distribution at www.rocksclusters.org claims
to have done so for the IA64 architecture.. I have not tested it myself.
Your mileage may vary.

Steve

------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Core Support Services Dept.
Assistant Group Leader, Scientific Computing Support Group
Lead of Computing Farms Team

On Thu, 28 Aug 2003, Glen Otero wrote:

> Joel-
>
> Have you actually built RH AS from scratch using their SRPMS?  Or do
> you know anyone that has? I'm very interested in doing this but I heard
> there were some pretty significant obstacles along the lines of package
> dependencies.
>
> Glen
>
> On Thursday, August 28, 2003, at 03:11  PM, Joel Jaeggli wrote:
>
> > Stephen... anyone who wants can grab the entire srpms dir for AS and
> > build
> > it. The only way they'll end up with a  lawsuit is if they represent
> > the
> > result as official suppoprt redhat linux AS...
> >
> > If you like you can pick it up from the RH mirrors including mine.
> >
> >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> >>>
> >>>> Some white box resellers are looking at taking RH Advanced Server
> >>>> and
> >>>> stripping it down and offering on their ia64 clusters.  However, if
> >>>> their not working with code lawyers, and paying very close
> >>>> attention to
> >>>> copy right laws, they could end up with law suits down the road.
> >>>
> >
> > --
> > -----------------------------------------------------------------------
> > ---
> > Joel Jaeggli  	       Unix Consulting
> > joelja at darkwing.uoregon.edu
> > GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F
> > 56B2
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> Glen Otero, Ph.D.
> Linux Prophet
> 619.917.1772
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From nfalano at hotmail.com  Thu Aug 28 21:29:34 2003
From: nfalano at hotmail.com (Norman Alano)
Date: Fri, 29 Aug 2003 09:29:34 +0800
Subject: mpich
Message-ID: <Law15-F27gnZgVohmvB0000e296@hotmail.com>


greetings !

i already installed mpich... but the problem is whenever i run an 
application for instant the examples in the mpich the graphics wont show.... 
how can i configure so that i can run the application with graphic?

cheers

norman

_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE* 
http://join.msn.com/?page=features/junkmail

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Fri Aug 29 00:04:27 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 28 Aug 2003 21:04:27 -0700 (PDT)
Subject: Intel acquiring Pallas
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <Pine.LNX.4.44.0308282053050.11230-100000@twin.uoregon.edu>

I've built almost all of it with the exception of gtk and kde related 
stuff which was outside the scope of my interest, on a redhat 7.2 box... I 
wouldn't try it on a 9 host.

joelja

On Thu, 28 Aug 2003, Glen Otero 
wrote:

> Joel-
> 
> Have you actually built RH AS from scratch using their SRPMS?  Or do  
> you know anyone that has? I'm very interested in doing this but I heard  
> there were some pretty significant obstacles along the lines of package  
> dependencies.
> 
> Glen
> 
> On Thursday, August 28, 2003, at 03:11  PM, Joel Jaeggli wrote:
> 
> > Stephen... anyone who wants can grab the entire srpms dir for AS and  
> > build
> > it. The only way they'll end up with a  lawsuit is if they represent  
> > the
> > result as official suppoprt redhat linux AS...
> >
> > If you like you can pick it up from the RH mirrors including mine.
> >
> >>> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> >>>
> >>>> Some white box resellers are looking at taking RH Advanced Server  
> >>>> and
> >>>> stripping it down and offering on their ia64 clusters.  However, if
> >>>> their not working with code lawyers, and paying very close  
> >>>> attention to
> >>>> copy right laws, they could end up with law suits down the road.
> >>>
> >
> > --  
> > ----------------------------------------------------------------------- 
> > ---
> > Joel Jaeggli  	       Unix Consulting 	        
> > joelja at darkwing.uoregon.edu
> > GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F  
> > 56B2
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit  
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> Glen Otero, Ph.D.
> Linux Prophet
> 619.917.1772
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sgaudet at wildopensource.com  Fri Aug 29 09:52:31 2003
From: sgaudet at wildopensource.com (Stephen Gaudet)
Date: Fri, 29 Aug 2003 09:52:31 -0400
Subject: Intel acquiring Pallas
In-Reply-To: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.44.0308281606030.20373-100000@ganesh.phy.duke.edu>
Message-ID: <3F4F5A9F.5000809@wildopensource.com>


Robert, and everyone else,

To be clear on this without breaking NDA's see below;

> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> 
> 
>>With Itanium2 this is not the case.  Both Red Hat and SuSe have a fixed 
>>cost of about $400.00, plus or minus a few dollars per system.  
>>Therefore, due to this fixed cost, MOST people looking at a cluster 
>>won't touch Itanium2. 
> 
> 
> Steve,
> 
> Are you suggesting RH has put together a package that is NOT GPL in any
> way that would significantly affect the 64 bit market?  The kernel, the
> compiler, and damn near every package is GPL, much of it from Gnu
> itself.  Am I crazy here?
> 
> So I'm having a hard time seeing why one would HAVE to pay them
> $400/system for anything except perhaps proprietary non-GPL "advanced
> server" packages that almost certainly wouldn't be important to HPC
> cluster builders (and which they would have had to damn near develop in
> a sealed room to avoid incorporating GPL stuff in it anywhere).
> 
> 
>>Some white box resellers are looking at taking RH Advanced Server and 
>>stripping it down and offering on their ia64 clusters.  However, if 
>>their not working with code lawyers, and paying very close attention to 
>>copy right laws, they could end up with law suits down the road.

I can't really comment here on what I hear resellers looking to do.

> If Red Hat isn't careful and not working very carefully with code
> lawyers, I think the reverse is a lot more likely, as Richard Stallman
> is known to take the Gnu Public License (free as in air at the source
> level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't
> "own" a hell of a lot of code in what they sell; the bulk of what they
> HAVE written is GPL derived and hence GPL by inheritance alone.  The
> Open Source community would stomp anything out of line with hobnailed
> boots and club it until it stopped twitching...
> 
> So although many a business may cheerfully pay $400/seat for advanced
> server because it is a cost and support model they are comfortable with,
> I don't see what there is to stop anyone from taking an advanced server
> copy (which necessarily either comes with src rpm's or makes them
> publically available somewhere), doing an rpmbuild on all the src rpm's
> (as if anyone would care that you went through an independent rebuild vs
> just used the distribution rpm's) and putting it on 1000 systems, or
> giving the sources to a friend, or even reselling a repackaging of the
> whole thing (as long as they don't call them Red Hat and as long as they
> omit any really proprietary non-GPL work).
> 
> I even thought there were some people on the list who were using at
> least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm
> wrong...:-(
> 


In regards to the high-performance/technical computing space.

People buy Red Hat Advanced Server and SuSE Linux Enterprise Server
because that's what the ISVs support (Oracle, DB2, Sybase, WebLogic
etc.).  RHAS and SLES are primarity targeted at the commercial
computing space.

In the HPC space, there is a void in the sense that Red Hat doesn't
have a "community" distribution for IA-64 anymore (7.2 was the last).
Don't know whether SuSE make their bits readily available.

There are, however, several free alternatives:

  - Debian, for instance, is available for all HP hardware (as it is
    the internal software development vehicle at HP).

  - MSC Linux is also available for download (www.msclinux.com).

  - Rocks (www.rocksclusters.org) is a stripped and shipped Red Hat
    Advanced Server 2.1 for IA-64.


So it's perfectly reasonable to use any of the above - as long as you
don't require technical support (something WOS could provide, though).

The strip and ship game works for now. However, given the increasing
customization and branding done by Red Hat in later releases (8 and 9,
also in RHAS 3) it is probably not going to be feasible to keep doing
this going forward.  Red Hat's brand is very strong and consequently
it's all over the place in their products now.  So I guesstimate that
debranding is going to be at least an order of magnitude harder for
RHAS 3.


And just to clear up confusion.  Here's the scoop with RHAS,
availabity, support agreements, etc.:

1. Red Hat has decided *not* to make binaries/ISO images of RHAS
    available for download.  Given that the distribution is covered by
    the GPL, *nothing* prevents somebody else from making it available.
    It is out there on the net if you look hard enough.

2. Again, being covered by the GPL, nothing prevents you from
    distributing it in unmodified form.  It's perfectly legal to burn
    CDs and give them to customers.

3. If you modify the product in any way you invalidate the branding on
    RHAS as a whole, and you can no longer call the result RHAS without
    infringing Red Hat's trademarks.

4. If you buy RHAS from Red Hat you have to sign a service level
    agreement.  This agreement is not restricting distribution of the
    RHAS binaries or source.  It is a service level agreement between
    you and Red Hat (which you unfortunately have to sign to get access
    to the product in the first place).

5. One of the clauses in the SLA states that you agree to pay a
    support fee for each system you use RHAS on (and you grant RH the
    right to audit your network).  If you choose not to comply with
    this clause, Red Hat will declare the service agreement null and
    void and you will no longer have access to patches and security
    fixes.

6. Given that the update packages are covered by the GPL, *nothing*
    prevents a receiver of said packages to make them available for
    download on the Internet.  Red Hat can do *nothing* to prevent
    further distribution.  IOW, nothing prevents you from buying one
    license and make the updates available to the rest of the world.

    Red Hat can, however, potentially decide not to provide you with
    future updates if you do this.  This is a bit unclear in the SLA.

Ok.  So, executive summary: Red Hat are using a service customer level
agreement to limit spreading of binary versions of RHAS.  Given that
RHAS is covered by the GPL, they cannot prevent distribution.  Their
only rebuttal will be refusal of further updates as per the SLA.

But in the case of technical computing it isn't really that important
whether the product is called RHAS, Rocks or HP Linux for HPC.  They
are all functionally identical.


mkp,
Resident Paralegal

-- Martin K. Petersen Wild Open Source, Inc. mkp at wildopensource.com 
http://www.wildopensource.com/

BTW: http://www.msclinux.com/  has been shut down.

-- 
Steve Gaudet

Wild Open Source (home office)
----------------------
Bedford, NH 03110
pH:603-488-1599
cell:603-498-1600
http://www.wildopensource.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From asabigue at fing.edu.uy  Fri Aug 29 04:06:38 2003
From: asabigue at fing.edu.uy (Ariel Sabiguero)
Date: Fri, 29 Aug 2003 11:06:38 +0300
Subject: European Commission Patentability rules
Message-ID: <3F4F098E.7080202@fing.edu.uy>

Dear all:

I have not seen comments on the list regarding to this subject. 
I know that this might be considered political and off-topic but I
believe that most of our (beowulf) software technology is Open/Free
and that the results of further regulations might affect our work.
Sorry for the noise for those of you who already knew this.

Regards

Ariel

<extract>
On September 1st the European Commission is going to vote a revised 
version of the European Patentability rules. The proposed revision 
contains a set of serious challenges to Open Source development since 
regulation regarding software patents will be broadly extended and might 
forbid independent development of innovative (Open Source and not) 
software-based solutions.

The European Open Source community is very concerned about the upcoming 
new regulation and has organized a demo protest for August 27, asking 
Open Source supporting sites to change their home pages to let everyone 
know what is going on at the European Parliament. 

For further information please see http://swpat.ffii.org and 
http://petition.eurolinux.org.
</extract>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Aug 29 10:17:09 2003
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 29 Aug 2003 10:17:09 -0400 (EDT)
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.4.44.0308290025520.10632-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <Pine.LNX.4.44.0308291010060.23587-100000@ganesh.phy.duke.edu>

On Fri, 29 Aug 2003, Bogdan Costescu wrote:

> One other thing that turned me off was that in a system composed of only
> Intel components (SE7501WV2 mainboard and SR2300 case) the 64bit PCI slots
> on the mainboard allow inserting of the Myrinet card (but didn't try to
> see if it works), while the riser cards that came with the case do not,
> allowing only 3.3V ones - so the riser imposes additional limitations...

One last possible solution you can consider if you're using 2U cases and
don't mind ugly is that MANY of the cards you might want to add nowadays
are half-height cards on full height backplates.  Usually the backplate
is held on by two little screws.  The half height cards will snap into a
regular PCI slot normally (vertically) and still permit the case to
close with no riser at all.  The two negatives are there there are no
"half height riser backplates" that I know of, so the back of each
chassis will be open to the air, which may or may not screw around with
cooling airflow in negative ways, and the fact that you can't "screw the
cards down".  Both of these can be solved (or ignored) with a teeny bit
of effort, although you'll probably prefer to just get a riser that
meets your needs -- there are risers with a key that fits in the AGP
slot, risers with 32 bit keys, risers with 64 bit keys.. shop around.
Be aware that some of the risers you can buy don't work properly (why I
can't say, given that they appear to be little more than bus extenders
with keys to grab power and timing/address lines).

At a guess this won't help you with an old myrinet card as it is
probably full height, but if you get desperate and it's not, you could
likely make this work.

   rgb

> 
> -- 
> Bogdan Costescu
> 
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Fri Aug 29 11:10:43 2003
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 29 Aug 2003 17:10:43 +0200 (CEST)
Subject: 32bit slots and riser cards
In-Reply-To: <Pine.LNX.4.44.0308291010060.23587-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.LNX.4.44.0308291705250.17481-100000@kenzo.iwr.uni-heidelberg.de>

On Fri, 29 Aug 2003, Robert G. Brown wrote:

> is that MANY of the cards you might want to add nowadays are half-height
> cards on full height backplates.

Nice try :-)
It's a full-height card. And buying a taller case for each node with these
Myrinet cards to allow vertical mounting would make me start looking for
an 100U rack :-)

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lehi.gracia at amd.com  Fri Aug 29 10:48:55 2003
From: lehi.gracia at amd.com (lehi.gracia at amd.com)
Date: Fri, 29 Aug 2003 09:48:55 -0500
Subject: Intel acquiring Pallas
Message-ID: <99F2150714F93F448942F9A9F112634C07BE62F3@txexmtae.amd.com>


>6. Given that the update packages are covered by the GPL, *nothing*
>    prevents a receiver of said packages to make them available for
>    download on the Internet.  Red Hat can do *nothing* to prevent
>    further distribution.  IOW, nothing prevents you from buying one
>    license and make the updates available to the rest of the world.
>
>    Red Hat can, however, potentially decide not to provide you with
>    future updates if you do this.  This is a bit unclear in the SLA.

Correct me if I'm wrong, I though part of the GPL was that you have to give the source code to anyone that asks for it, is it not? Per section 2b.

2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: 

b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. 

http://www.gnu.org/copyleft/gpl.html?cid=6

They still keep patches on their web site do they not?

Lehi Gracia


-----Original Message-----
From: Stephen Gaudet [mailto:sgaudet at wildopensource.com] 
Sent: Friday, August 29, 2003 8:53 AM
To: Robert G. Brown
Cc: Rocky McGaugh; Jeffrey B. Layton; beowulf at beowulf.org
Subject: Re: Intel acquiring Pallas


Robert, and everyone else,

To be clear on this without breaking NDA's see below;

> On Thu, 28 Aug 2003, Stephen Gaudet wrote:
> 
> 
>>With Itanium2 this is not the case.  Both Red Hat and SuSe have a 
>>fixed
>>cost of about $400.00, plus or minus a few dollars per system.  
>>Therefore, due to this fixed cost, MOST people looking at a cluster 
>>won't touch Itanium2. 
> 
> 
> Steve,
> 
> Are you suggesting RH has put together a package that is NOT GPL in 
> any way that would significantly affect the 64 bit market?  The 
> kernel, the compiler, and damn near every package is GPL, much of it 
> from Gnu itself.  Am I crazy here?
> 
> So I'm having a hard time seeing why one would HAVE to pay them 
> $400/system for anything except perhaps proprietary non-GPL "advanced 
> server" packages that almost certainly wouldn't be important to HPC 
> cluster builders (and which they would have had to damn near develop 
> in a sealed room to avoid incorporating GPL stuff in it anywhere).
> 
> 
>>Some white box resellers are looking at taking RH Advanced Server and
>>stripping it down and offering on their ia64 clusters.  However, if 
>>their not working with code lawyers, and paying very close attention to 
>>copy right laws, they could end up with law suits down the road.

I can't really comment here on what I hear resellers looking to do.

> If Red Hat isn't careful and not working very carefully with code 
> lawyers, I think the reverse is a lot more likely, as Richard Stallman 
> is known to take the Gnu Public License (free as in air at the source 
> level, with inheritance) pretty seriously.  Red Hat (or SuSE) doesn't 
> "own" a hell of a lot of code in what they sell; the bulk of what they 
> HAVE written is GPL derived and hence GPL by inheritance alone.  The 
> Open Source community would stomp anything out of line with hobnailed 
> boots and club it until it stopped twitching...
> 
> So although many a business may cheerfully pay $400/seat for advanced 
> server because it is a cost and support model they are comfortable 
> with, I don't see what there is to stop anyone from taking an advanced 
> server copy (which necessarily either comes with src rpm's or makes 
> them publically available somewhere), doing an rpmbuild on all the src 
> rpm's (as if anyone would care that you went through an independent 
> rebuild vs just used the distribution rpm's) and putting it on 1000 
> systems, or giving the sources to a friend, or even reselling a 
> repackaging of the whole thing (as long as they don't call them Red 
> Hat and as long as they omit any really proprietary non-GPL work).
> 
> I even thought there were some people on the list who were using at 
> least some 64 bit stuff already, both for AMD's and Intels.  Maybe I'm 
> wrong...:-(
> 


In regards to the high-performance/technical computing space.

People buy Red Hat Advanced Server and SuSE Linux Enterprise Server because that's what the ISVs support (Oracle, DB2, Sybase, WebLogic etc.).  RHAS and SLES are primarity targeted at the commercial computing space.

In the HPC space, there is a void in the sense that Red Hat doesn't have a "community" distribution for IA-64 anymore (7.2 was the last). Don't know whether SuSE make their bits readily available.

There are, however, several free alternatives:

  - Debian, for instance, is available for all HP hardware (as it is
    the internal software development vehicle at HP).

  - MSC Linux is also available for download (www.msclinux.com).

  - Rocks (www.rocksclusters.org) is a stripped and shipped Red Hat
    Advanced Server 2.1 for IA-64.


So it's perfectly reasonable to use any of the above - as long as you don't require technical support (something WOS could provide, though).

The strip and ship game works for now. However, given the increasing customization and branding done by Red Hat in later releases (8 and 9, also in RHAS 3) it is probably not going to be feasible to keep doing this going forward.  Red Hat's brand is very strong and consequently it's all over the place in their products now.  So I guesstimate that debranding is going to be at least an order of magnitude harder for RHAS 3.


And just to clear up confusion.  Here's the scoop with RHAS, availabity, support agreements, etc.:

1. Red Hat has decided *not* to make binaries/ISO images of RHAS
    available for download.  Given that the distribution is covered by
    the GPL, *nothing* prevents somebody else from making it available.
    It is out there on the net if you look hard enough.

2. Again, being covered by the GPL, nothing prevents you from
    distributing it in unmodified form.  It's perfectly legal to burn
    CDs and give them to customers.

3. If you modify the product in any way you invalidate the branding on
    RHAS as a whole, and you can no longer call the result RHAS without
    infringing Red Hat's trademarks.

4. If you buy RHAS from Red Hat you have to sign a service level
    agreement.  This agreement is not restricting distribution of the
    RHAS binaries or source.  It is a service level agreement between
    you and Red Hat (which you unfortunately have to sign to get access
    to the product in the first place).

5. One of the clauses in the SLA states that you agree to pay a
    support fee for each system you use RHAS on (and you grant RH the
    right to audit your network).  If you choose not to comply with
    this clause, Red Hat will declare the service agreement null and
    void and you will no longer have access to patches and security
    fixes.

6. Given that the update packages are covered by the GPL, *nothing*
    prevents a receiver of said packages to make them available for
    download on the Internet.  Red Hat can do *nothing* to prevent
    further distribution.  IOW, nothing prevents you from buying one
    license and make the updates available to the rest of the world.

    Red Hat can, however, potentially decide not to provide you with
    future updates if you do this.  This is a bit unclear in the SLA.

Ok.  So, executive summary: Red Hat are using a service customer level agreement to limit spreading of binary versions of RHAS.  Given that RHAS is covered by the GPL, they cannot prevent distribution.  Their only rebuttal will be refusal of further updates as per the SLA.

But in the case of technical computing it isn't really that important whether the product is called RHAS, Rocks or HP Linux for HPC.  They are all functionally identical.


mkp,
Resident Paralegal

-- Martin K. Petersen Wild Open Source, Inc. mkp at wildopensource.com 
http://www.wildopensource.com/

BTW: http://www.msclinux.com/  has been shut down.

-- 
Steve Gaudet

Wild Open Source (home office)
----------------------
Bedford, NH 03110
pH:603-488-1599
cell:603-498-1600
http://www.wildopensource.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From henken at seas.upenn.edu  Fri Aug 29 11:17:04 2003
From: henken at seas.upenn.edu (Nicholas Henke)
Date: 29 Aug 2003 11:17:04 -0400
Subject: Intel acquiring Pallas
In-Reply-To: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
References: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <1062170224.9421.4.camel@roughneck>

On Thu, 2003-08-28 at 19:46, Glen Otero wrote:
> Joel-
> 
> Have you actually built RH AS from scratch using their SRPMS?  Or do  
> you know anyone that has? I'm very interested in doing this but I heard  
> there were some pretty significant obstacles along the lines of package  
> dependencies.
> 

The links to the rhel-rebuild howto and mailing list are enought to get
this done -- I just did 2.1 ES ( why bother with spending more for AS ?
). We purchased one copy of ES, and I used that to do the rebuild. Of
course, it is not completely automatic, but there are only a handfull of
packages that do not build without a bit of tweaking.

As far as pkg dependencies go, it is _much_ easier to build on a similar
system.

Now for the $10K question -- are there any reasons that I ( or someone
else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It
of course still has the RH branding all over it, but it could be
distributed being called 'Nics Fun RH clone', or something similar.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From robert at yu.org  Fri Aug 29 12:09:46 2003
From: robert at yu.org (Robert K. Yu)
Date: Fri, 29 Aug 2003 09:09:46 -0700 (PDT)
Subject: beowulf to a good home
Message-ID: <20030829160946.6897.qmail@web40904.mail.yahoo.com>

Hi,

I have the following:

16 machines
450 MHz dual Celeron each (i.e. 32 CPU)
128M memory each
100BaseT switch
6G drive each

I would like to donate these machines and see them
put to good use.  Pick up from the San Francisco south bay area,
or you pay for shipment.  Thanks.

-Robert

=====
Robert K. Yu
mailto:robert at yu.org
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Fri Aug 29 12:19:31 2003
From: becker at scyld.com (Donald Becker)
Date: Fri, 29 Aug 2003 12:19:31 -0400 (EDT)
Subject: Intel acquiring Pallas
In-Reply-To: <99F2150714F93F448942F9A9F112634C07BE62F3@txexmtae.amd.com>
Message-ID: <Pine.LNX.4.44.0308291149540.18703-100000@beotest.scyld.com>

On Fri, 29 Aug 2003 lehi.gracia at amd.com wrote:

  Red Hat can do *nothing* to prevent
> >    further distribution.  IOW, nothing prevents you from buying one
> >    license and make the updates available to the rest of the world.
> >
> >    Red Hat can, however, potentially decide not to provide you with
> >    future updates if you do this.  This is a bit unclear in the SLA.
> 
> Correct me if I'm wrong, I though part of the GPL was that you have to
> give the source code to anyone that asks for it, is it not? Per section
> 2b. 

No, section 2b states that you must propage the license, not make the
source code available to any third party.

Section 3 covers distribution and redistribution.  You don't have to
make the source code available to an arbitrary third party, just those
with the offers in 3b or 3c.  For distributions Red Hat ship with the
source code, they have no further obligations.

> >6. Given that the update packages are covered by the GPL, *nothing*
> >    prevents a receiver of said packages to make them available for
> >    download on the Internet.

For most individual packages, correct.

And the following discussion covers individual packages, not the
distribution as a whole.
If the package contains a trademarked logo embedded with GPL code they
   Should grant the right to use a package unmodified, including the logo
     (The GPL doesn't explicitly cover the case of logos, but a
     reasonable reading is that if Red Hat itself packages up the logo
     you have the right of unmodified distribution.)
   May require you to remove the logo with any modificatation

The entire distribution is another issue.  It may be protected by
copyright on the collection.  The may restrict distribution of packages
consisting of Red Hat branding and logos, which means some level of
content reassembly is necessary to distribute.

Red Hat may also insist that you not misrepresent a copy as a Red Hat
product.  This is an area where it's difficult to generalize.  They may
require removing packages/elements consisting of just logos or Red Hat
documentation.  And third parties can use the trademark name where it's
descriptive, but not misleading.  Consider the difference between
"Chevrolet Service Station" and "Service Station for Chevrolets"

[[ Native English speakers immediately understand the difference, and
think of this rule as just part of the language.  But you will not find
this legally-inculcated distinction as a part of the grammer. ]]

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Fri Aug 29 13:16:06 2003
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Fri, 29 Aug 2003 10:16:06 -0700 (PDT)
Subject: IBM releases C/C++/F90 compilers - optimized for G5
Message-ID: <20030829171606.62071.qmail@web11408.mail.yahoo.com>

Free download:

http://www-3.ibm.com/software/awdtools/ccompilers/ 
http://www-3.ibm.com/software/awdtools/fortran/

Rayson


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From joelja at darkwing.uoregon.edu  Fri Aug 29 14:52:19 2003
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Fri, 29 Aug 2003 11:52:19 -0700 (PDT)
Subject: Intel acquiring Pallas
In-Reply-To: <5E9EB8D6-DA4C-11D7-A174-000393911A90@linuxprophet.com>
Message-ID: <Pine.LNX.4.44.0308291145110.18203-100000@twin.uoregon.edu>

On Fri, 29 Aug 2003, Glen Otero wrote:
> 
> You can redistribute it as long as it doesn't have RH all over it and  
> you don't use the RH name while endorsing/promoting it. I suppose you 
> could say it's RH compliant and built from RH srpms. The loop hole that 
> RH is taking advantage of is the fact that they are compliant with the 
> GPL as long as they release the sources. They comply with the GPL by 
> releasing the sources in srpm format, and so technically do not have to 
> make the isos freely available. By making it slightly difficult to 
> build your own distro, and not offering support to those who do, RH is 
> coaxing people to take the path of least resistance (wrt effort) and 
> buy licenses.

I wouldn't really consider it a loophole, it's compatible with the spirit 
of the gpl. it's not as convenient as some people might like... but the 
sources are all there and they build and work.
 
> Glen
> >
> > Nic
> > -- 
> > Nicholas Henke
> > Penguin Herder & Linux Cluster System Programmer
> > Liniac Project - Univ. of Pennsylvania
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit 
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> Glen Otero, Ph.D.
> Linux Prophet
> 619.917.1772
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gotero at linuxprophet.com  Fri Aug 29 14:12:37 2003
From: gotero at linuxprophet.com (Glen Otero)
Date: Fri, 29 Aug 2003 11:12:37 -0700
Subject: Intel acquiring Pallas
In-Reply-To: <1062170224.9421.4.camel@roughneck>
Message-ID: <5E9EB8D6-DA4C-11D7-A174-000393911A90@linuxprophet.com>


On Friday, August 29, 2003, at 08:17  AM, Nicholas Henke wrote:

> On Thu, 2003-08-28 at 19:46, Glen Otero wrote:
>> Joel-
>>
>> Have you actually built RH AS from scratch using their SRPMS?  Or do
>> you know anyone that has? I'm very interested in doing this but I 
>> heard
>> there were some pretty significant obstacles along the lines of 
>> package
>> dependencies.
>>
>
> The links to the rhel-rebuild howto and mailing list are enought to get
> this done -- I just did 2.1 ES ( why bother with spending more for AS ?
> ). We purchased one copy of ES, and I used that to do the rebuild. Of
> course, it is not completely automatic, but there are only a handfull 
> of
> packages that do not build without a bit of tweaking.
>
> As far as pkg dependencies go, it is _much_ easier to build on a 
> similar
> system.
>
> Now for the $10K question -- are there any reasons that I ( or someone
> else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It
> of course still has the RH branding all over it, but it could be
> distributed being called 'Nics Fun RH clone', or something similar.

You can redistribute it as long as it doesn't have RH all over it and  
you don't use the RH name while endorsing/promoting it. I suppose you 
could say it's RH compliant and built from RH srpms. The loop hole that 
RH is taking advantage of is the fact that they are compliant with the 
GPL as long as they release the sources. They comply with the GPL by 
releasing the sources in srpm format, and so technically do not have to 
make the isos freely available. By making it slightly difficult to 
build your own distro, and not offering support to those who do, RH is 
coaxing people to take the path of least resistance (wrt effort) and 
buy licenses.

Glen
>
> Nic
> -- 
> Nicholas Henke
> Penguin Herder & Linux Cluster System Programmer
> Liniac Project - Univ. of Pennsylvania
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
Glen Otero, Ph.D.
Linux Prophet
619.917.1772

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From raysonlogin at yahoo.com  Fri Aug 29 19:01:03 2003
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Fri, 29 Aug 2003 16:01:03 -0700 (PDT)
Subject: IBM releases C/C++/F90 compilers - optimized for Apple G5
In-Reply-To: <99F2150714F93F448942F9A9F112634C07BE62F5@txexmtae.amd.com>
Message-ID: <20030829230103.91270.qmail@web11407.mail.yahoo.com>

(Sorry, didn't made it clear in my last email...)

The compilers are for MacOSX.

Rayson

> Which one do we use for Linux, will the AIX one work?
>
 
> > Free download:
> > 
> > http://www-3.ibm.com/software/awdtools/ccompilers/ 
> > http://www-3.ibm.com/software/awdtools/fortran/
> > 


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rouds at servihoo.com  Sat Aug 30 12:48:36 2003
From: rouds at servihoo.com (RoUdY)
Date: Sat, 30 Aug 2003 20:48:36 +0400
Subject: .rhosts or /etc/hosts.equiv
In-Reply-To: <200308291902.h7TJ2Ow14727@NewBlue.Scyld.com>
Message-ID: <web-20830086@servihoo.com>

hi
If i don't find these to file should i create it?
i know that .rhosts is hidden but when I do ls -a
i cannot find it even if i use the command locate
therefore if i create it what permission should i give 
them
thanks
roudy
--------------------------------------------------
Get your free email address from Servihoo.com!
http://www.servihoo.com
The Portal of Mauritius
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mikee at mikee.ath.cx  Sat Aug 30 13:53:56 2003
From: mikee at mikee.ath.cx (Mike Eggleston)
Date: Sat, 30 Aug 2003 12:53:56 -0500
Subject: .rhosts or /etc/hosts.equiv
In-Reply-To: <web-20830086@servihoo.com>; from rouds@servihoo.com on Sat, Aug 30, 2003 at 08:48:36PM +0400
References: <200308291902.h7TJ2Ow14727@NewBlue.Scyld.com> <web-20830086@servihoo.com>
Message-ID: <20030830125356.C3206@mikee.ath.cx>

On Sat, 30 Aug 2003, RoUdY wrote:

> hi
> If i don't find these to file should i create it?
> i know that .rhosts is hidden but when I do ls -a
> i cannot find it even if i use the command locate
> therefore if i create it what permission should i give 
> them
> thanks
> roudy

the file ~/.rhosts should have permissions of 600
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From csamuel at vpac.org  Sun Aug 31 19:45:52 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 1 Sep 2003 09:45:52 +1000
Subject: Trademark caveats about building RHAS from SRPMS (was Re: Intel acquiring Pallas)
In-Reply-To: <1062170224.9421.4.camel@roughneck>
References: <D8BF30E7-D9B1-11D7-A174-000393911A90@linuxprophet.com> <1062170224.9421.4.camel@roughneck>
Message-ID: <200309010945.53871.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 30 Aug 2003 01:17 am, Nicholas Henke wrote:

> Now for the $10K question -- are there any reasons that I ( or someone
> else ) should not distribute the recompiled version of 2.1{A,E,W}S ? It
> of course still has the RH branding all over it, but it could be
> distributed being called 'Nics Fun RH clone', or something similar.

Redhat have a set of rules of what you can and cannot do. Basically whilst 
they comply with the GPL they do restrict what you can do with their 
trademarks (i.e. things like Redhat and the ShadowMan logo).

Two of the major things are:

http://www.redhat.com/about/corporate/trademark/guidelines/page6.html

C. You may not state that your product "contains Red Hat Linux X.X." This 
would amount to impermissible use of Red Hat's trademarks. [...]

D. You must modify the files identified as REDHAT-LOGOS and ANACONDA-IMAGES so 
as to remove all use of images containing the "Red Hat" trademark or Red 
Hat's Shadow Man logo. Note that mere deletion of these files may corrupt the 
software.

So if you want to build and redistribute from their SRPMS you will need to do 
extra work to make them happy.

Note that RMS thinks that this use of trademark in relation to the GPL is 
legitimate, in an interview quoted on the "Open For Business" website he says 
(in regards to Mandrake):

http://www.ofb.biz/modules.php?name=News&file=article&sid=260

[quote]

TRB: Another interesting current issue is the concept of what might be seen as 
"hybrid licensing." For example, MandrakeSoft's Multi-Network Firewall is 
based on entirely Free Software, however the Mandrake branding itself is 
placed under a more restrictive license (you can't redistribute it for a 
fee). This give the user or consultant two choices -- use the software under 
the more restrictive licensing or remove the Mandrake artwork. What are your 
thoughts on this type or approach?

 RMS: I think it is legitimate. Freedom to redistribute and change software is 
a human right that must be protected, but the commercial use of a logo is a 
very different matter. Provided that removing the logo from the software is 
easy to do in practice, the requirement to pay for use of the logo does not 
stain the free status of the software itself. 

[/quote] 

- -- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/UoiwO2KABBYQAh8RAji6AJ4smNhqZ/my4k8i787Uaqs+n4rfsACcC4yS
BLtsLZDIzG8Hm0KEACBOZyo=
=A0dE
-----END PGP SIGNATURE-----

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf