From stig at mono.org  Thu Feb  1 05:12:50 2001
From: stig at mono.org (stig)
Date: Thu, 1 Feb 2001 10:12:50 +0000 (GMT)
Subject: Scyld and Red Hat 7
In-Reply-To: <3A78A2D6.DCAD1B6F@cs.tamu.edu>
Message-ID: <Pine.NEB.4.10.10102011008220.20354-100000@electron.mono.org>

As long as the system includes the main libs, a kernel and the popular
package managers (well RPM)  does it really matter what distribution it is
based on?

Would there be this discussion if they 'based' it on their own compilation
of binaries instead of those of RedHats.


David


On Wed, 31 Jan 2001, Gerry Creager wrote:

> Ken wrote:
> > 
> > Martin Siegert wrote:
> > 
> > > 2. with respect to hardware support: most of that comes with the kernel,
> > >    particularly everything that is loaded as modules (e.g., and NIC drivers).
> > >    Hence, upgrading to a 2.4 kernel probably gets you better hardware
> > >    support than upgrading to RH 7.0.
> > >
> > 
> > I can agree with you in that since I upgraded to RH7 I've decided to use
> > Mandrake instead. ;-)
> > The original questions was about baseing the next release of Scyld in
> > RH7.  It is hard to upgrade a distro that doesn't load in the first
> > place.  If upgrading the kernel is enough, then that's fine.  I can
> > think of plenty of reasons to keep the distro as simple and functional
> > as possible.
> 
> While my experiences with Mandrake have generally been horror stories,
> my experiences with RH7 have been disasters of truly epic proportion. 
> I've just about got RH7 tamed... as long as I don't mount my CDROM and
> use tcp/ip at the same time... and NO! I'm not kidding.
> 
> I've stuck to RH6.2 for production.  
> -- 
> Gerry Creager                        |      Never ascribe to Malice that
> AATLT			             |      which can adequately be
> Texas A&M University                 |      explained by Stupidity.
> 979.458.4020  (Phone)                |      -- Lazarus Long
> 979.847.8578  (Fax)
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From valentin at olagrande.net  Thu Feb  1 04:58:23 2001
From: valentin at olagrande.net (valentin at olagrande.net)
Date: Thu, 1 Feb 2001 03:58:23 -0600 (CST)
Subject: diskless nodes with scyld
Message-ID: <200102010958.DAA28459@og1.olagrande.net>

I am trying to set up a diskless cluster using the Scyld CD-ROM. Althought
previous articles in the archive suggest that this is possible, I have found
no instruction anywhere on how to do this.

My nodes are single P3-800/133 with 128Mb of RAM, floppy drives, and 3 10/100
ethernet ports. They also can send dhcp requests.

>From an article in the December archives, I assumed that all I need to do
is change the /etc/beowulf/fstab file. Mine currently has the following
entries:

/dev/ram3	/		ext2	fs_size=65536	0 0
none		/proc		proc	defaults	0 0
none		/dev/pts	devpts	gid=5,mode=620	0 0
$MASTER:/home	/home		nfs	defaults	0 0

This fails, and I have trouble interpreting the error log attached at the
end of this message. Now, who can help me?

1. What are the instructions for booting a diskless node with Scyld?

2. Is it possible to boot a diskless node without a Scyld floppy or CD-ROM?
   My nodes send out DHCP requests. Can I simply setup a dchp server to
   hand out /var/beowulf/. Will dhcpd conflict with beoserv over ports?

Valentin


--------------cut here-------------
[root at scyld beowulf]# cat /var/log/beowulf/node.0
node_up: Setting system clock.
mke2fs 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
ext2fs_check_if_mountFilesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
128 inodes, 1024 blocks
51 blocks (4.98%) reserved for the super user
First data block=1
1 block group
8192 blocks per group, 8192 fragments per group
128 inodes per group

Writing inode tables: done                            
Writing superblocks and filesystem accounting information: : No such file or directory while determining whether /dev/ram1 is mounted.
done
node_up: TODO set interface netmask.
node_up: Configuring loopback interface.
/dev/hda: No such device
beoboot: /lib/modules/2.2.16-21.beo/modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules/2.2.16-21.beo/modules.dep: No such file or directory
setup_fs: Checking /dev/ram3 (type=ext2)...
e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
ext2fs_check_if_mount: No such file or directory while determining whether /dev/ram3 is mounted.
Couldn't find ext2 superblock, trying backup blocks...

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

e2fsck: Bad magic number in super-block while trying to open /dev/ram3
setup_fs: FSCK failure.
setup_fs: Creating ext2 on /dev/ram3...
mke2fs 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
ext2fs_check_if_mount: No such file or directory while determining whether /dev/ram3 is mounted.
setup_fs: Mounting /dev/ram3 on /rootfs//... (type=ext2; options=defaults)
setup_fs: Checking 192.168.2.1:/home (type=nfs)...
setup_fs: Mounting 192.168.2.1:/home on /rootfs//home... (type=nfs; options=defaults)
beoboot: /lib/modules/2.2.16-21.beo/modules.dep missing
beoboot: /lib/modules/2.2.16-21.beo/modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules/2.2.16-21.beo/modules.dep: No such file or director\y
/usr/lib/beoboot/bin/node_modprobe: /lib/modules/2.2.16-21.beo/modules.dep: No such file or directory
node_modprobe: installing kernel module: nfs
/tmp/nfs.o: unresolved symbol rpc_register_sysctl_Rbf9a77c0
/tmp/nfs.o: unresolved symbol rpc_wake_up_task_Rffa78ed9
/tmp/nfs.o: unresolved symbol rpc_do_call_R0fae8de2
/tmp/nfs.o: unresolved symbol rpc_proc_unregister_R5bd26000
/tmp/nfs.o: unresolved symbol rpc_allocate_R0cd1c989
/tmp/nfs.o: unresolved symbol rpcauth_lookupcred_R0366fdf8
/tmp/nfs.o: unresolved symbol rpc_clnt_sigunmask_R17abaa09
/tmp/nfs.o: unresolved symbol xdr_encode_string_Rabc0fe0c
/tmp/nfs.o: unresolved symbol rpc_init_task_Rf4c99bc4
/tmp/nfs.o: unresolved symbol rpc_sleep_on_R41929c92
/tmp/nfs.o: unresolved symbol rpc_shutdown_client_Rb50bc549
/tmp/nfs.o: unresolved symbol rpc_create_client_R4589e663
/tmp/nfs.o: unresolved symbol rpciod_up_R375492a4
/tmp/nfs.o: unresolved symbol rpc_call_setup_R6f2441da
/tmp/nfs.o: unresolved symbol rpc_proc_init_Rf56e5632
/tmp/nfs.o: unresolved symbol rpc_killall_tasks_R66ae6aea
/tmp/nfs.o: unresolved symbol rpc_release_task_Re71e954e
/tmp/nfs.o: unresolved symbol nlmclnt_proc_Rc02cb40f
/tmp/nfs.o: unresolved symbol nfs_debug_Raf5bf6ef
/tmp/nfs.o: unresolved symbol rpc_execute_R2f4e83ce
/tmp/nfs.o: unresolved symbol rpc_clnt_sigmask_R3b8df6d4
/tmp/nfs.o: unresolved symbol xprt_create_proto_Rc88e4139
/tmp/nfs.o: unresolved symbol rpciod_down_Rbabf0f35
/tmp/nfs.o: unresolved symbol rpc_proc_register_R83e79004
/tmp/nfs.o: unresolved symbol xprt_destroy_Rea15ebb6
/tmp/nfs.o: unresolved symbol rpc_wake_up_next_R134f0e35
mount: fs type nfs not supported by kernel
Failed to mount 192.168.2.1:/home on /home.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sjarczyk at wist.net.pl  Thu Feb  1 05:54:11 2001
From: sjarczyk at wist.net.pl (Sergiusz Jarczyk)
Date: Thu, 1 Feb 2001 11:54:11 +0100 (CET)
Subject: Q: Any parallel DBs for the cluster computers ?
In-Reply-To: <005001c08bef$71056f00$5f72f2cb@TEST>
Message-ID: <Pine.LNX.4.21.0102011143020.8676-100000@dns-01.wist.net.pl>

On Thu, 1 Feb 2001, Yoon Jae Ho wrote:

> I am seeking the Parallel Database for the linux clusters for 2 years. 
> but failed.
> 
> Is there any information about Parallel Database using PVFS or GFS or itself
> filesystem or any other parallel filesystem ?
> 
> Is there anyone here making the Parallel Database for the linux cluster 
> including Scyld Beowulf ?
> 
> I will be happy if I get any information about Parallal Database for the
> linux .
> 

You should check clustra:
http://www.clustra.com

Sergiusz


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb  1 07:35:09 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 1 Feb 2001 07:35:09 -0500 (EST)
Subject: Scyld and Red Hat 7
In-Reply-To: <Pine.NEB.4.10.10102011008220.20354-100000@electron.mono.org>
Message-ID: <Pine.LNX.4.30.0102010710110.21143-100000@ganesh.phy.duke.edu>

On Thu, 1 Feb 2001, stig wrote:

> As long as the system includes the main libs, a kernel and the popular
> package managers (well RPM)  does it really matter what distribution it is
> based on?
>
> Would there be this discussion if they 'based' it on their own compilation
> of binaries instead of those of RedHats.

The reasons to periodically upgrade an operating system distribution
(theirs or anybody else's), and not just the kernel, are many and valid.
By the numbers:

  a) Improved compilers and support libraries.  This is probably the
number one reason to upgrade a whole distribution rather than just the
kernel.  Sure, you can just upgrade compilers alone, and kernels alone,
and libraries alone, but at some point (especially for major e.g. libc
revisions) you find that you have to rebuild everything anyway and the
whole point of distributions and kickstart and yellow dog's "yup" tool
is to make it easy to get from tested configuration to tested
configuration.  I've done systems management piecemeal and it is no fun
at all.

This is currently a highly nontrivial reason in my mind.  I'm in the
middle of fixing an extremely serious bug in the cpu-rate tool I've been
using to measure floating point performance on nodes and have uncovered
a rat's nest of wierdness somewhere in the gcc/linux interaction on 6.2
systems.  As in I can run the same benchmark code with the same
parameters and get two completely different timings, depending literally
one whether I set a parameter by a fallthrough default or "override" the
parameter to the exact same value on the command line.  Or change the
order of initialization statements.  Different by a factor of two -- not
a small difference.  This SEEMS to be fixed in RH 7.0 although I'm still
testing.

  b) Improved kernel.  For example, NFS is basically and maddeningly
broken in pre-2.18 kernels (but MAY be fixed in 2.18) -- I've actually
survived a server crash without having to reboot all my NFS clients
since upgrading my (non-scyld) cluster.  Yes, one can rebuild the kernel
by hand, but some of the scyld advantages (and other useful beowulf
stuff) interface directly with the kernel.  These days one sometimes has
to upgrade the base compiler to upgrade the kernel.  This is less
important to a scyld beowulf than to a more general purpose cluster
node, but scyld cannot remain stagnant at a given kernel revision
forever.

  c) Improved everything else.  This isn't too important to scyld but
again, even e.g. MPI marches along.  Bugs are fixed, optimizations are
tuned.  Scyld may not have to remain sync'd to RH's development cycle,
but it has to re-release its OWN distribution package periodically to
keep everything up to date and/or users will have to periodically
upgrade node or server packages piecemeal.

RH 7 has definitely got some problems, but 7.1beta comes out what,
today? and reportedly fixes a lot of those problems (as do the many
updates already released).  Since RH 7 has an incompatible RPM relative
to 6.2, the 6.2->7 upgrade requires a pretty serious commitment and lots
of folks are holding off until its problems diminish.

I therefore don't think that the issue is whether scyld should rebuild
on the 7.x distribution -- it is rather a question of when.  This is
thus a reasonable question to ask, although there is (as noted) less
pressure for them to do it immediately.  There is also the question of
how difficult it is to do the rebuild -- if the distribution is RPM
packaged, rebuilding really shouldn't take long at all; it is the
testing and stabilizing that takes the time.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From andresfc at ideam.gov.co  Thu Feb  1 10:07:56 2001
From: andresfc at ideam.gov.co (Andres Felipe CALDERON)
Date: Thu, 1 Feb 2001 10:07:56 -0500
Subject: Any parallel DBs for the cluster computers ?
References: <005001c08bef$71056f00$5f72f2cb@TEST>
Message-ID: <001701c08c60$c741c820$0200000a@casa.zipa.sdc>

Oracle Parallel Server?
  ----- Mensaje original ----- 
  De: Yoon Jae Ho 
  Para: beowulf at beowulf.org 
  Enviado: Mi?coles, 31 de Enero de 2001 08:36 p.m.
  Asunto: Q: Any parallel DBs for the cluster computers ?


  I am seeking the Parallel Database for the linux clusters for 2 years. but failed.

  Is there any information about Parallel Database using PVFS or GFS or itself filesystem or any other parallel filesystem ?

  Is there anyone here making the Parallel Database for the linux cluster including Scyld Beowulf ?

  I will be happy if I get any information about Parallal Database for the linux .

  Is there anyone to make parallel mysql to be used for the cluster ?

  Thank you in advance


  ---------------------------------------------------------------------------------------
  Yoon Jae Ho
  Economist
  POSCO Research Institute
   
  yoon at bh.kyungpook.ac.kr
  jhyoon at mail.posri.re.kr
  http://ie.korea.ac.kr/~supercom/  Korea Beowulf Supercomputer
   
  Imagination is more important than knowledge.  A. Einstein
  "?o?o?AAI?????I A?A?AC Aa??A?" ?o???a ?i?i, " A?A??A C??AAC ?a?u" Aa?a ?cAI?? ?U?c(A?AcA??? ?y??CI?a CA?I???? AuAU)
  "?o?o?AAI?o 'AI?IAo?| ???e?i???A ?E?A'AI?o?i ?? ?o AO?A??, ?o?o?A E??AA? ???u???U?? ?RC?Au E??AAI?U."   ?AAI ?eA??o   
  "?oC?Au ?E?A ?e?cA? ?i?RAu ?o?o?A E??A?? ???oAI ?E?U"  A?AcE? 2000.4.22
  "?o?o?AA? AyA??A?u ?i?o ???R?i ?a???? ACC??? ??AaCN?U" A? AcE? 2000.4.29
  "?o?o?AAC ?CCoA? ?IA??u ?I?UCN ?e?AAI CE?aCI?U" A? AcE? 2000.4.24
  " http://www.c3tv.com " 2001.1.10 
  ----------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20010201/5c3166b0/attachment.html>

From yocum at linuxcare.com  Thu Feb  1 11:12:49 2001
From: yocum at linuxcare.com (Dan Yocum)
Date: Thu, 01 Feb 2001 10:12:49 -0600
Subject: Q: Any parallel DBs for the cluster computers ?
References: <005001c08bef$71056f00$5f72f2cb@TEST>
Message-ID: <3A798B01.54AEBE34@linuxcare.com>

This probably isn't completely related to the beowulf list (probably
more related to the linux-ha list), but has anyone run a DB (pick a DB,
any DB) on a cluster using DBD (distributed block device) and
C-Ensemble's distributed lock manager (http://www.northforknet.com)?

Cheers,
Dan


> Yoon Jae Ho wrote:
> 
> I am seeking the Parallel Database for the linux clusters for 2 years.
> but failed.
> 
> Is there any information about Parallel Database using PVFS or GFS or
> itself filesystem or any other parallel filesystem ?
> 
> Is there anyone here making the Parallel Database for the linux
> cluster including Scyld Beowulf ?
> 
> I will be happy if I get any information about Parallal Database for
> the linux .
> 
> Is there anyone to make parallel mysql to be used for the cluster ?


-- 
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.  630.697.8066 tel
yocum at linuxcare.com, http://www.linuxcare.com
Linuxcare. Putting open source to work.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From alangrimes at starpower.net  Thu Feb  1 13:30:48 2001
From: alangrimes at starpower.net (Alan Grimes)
Date: Thu, 01 Feb 2001 13:30:48 -0500
Subject: Big Iorn
Message-ID: <3A79AB58.47921B55@starpower.net>

Hey, I have been hearing a lot of things about MVS, the inherant
superiority of the 390, and all sorts of stuff about how all these big
machines are so radicaly advanced that its not even funny... 

This has finally piqued my interest to the point where I now would like
to know more about how these machines work and what they can actually
do. 
Since this list is tangentaly related to that field I am sure there are
at least a few here who could give me some useful pointers. =)

-- 
Perhaps I will upgrade my OS from Win 3.11...
But It has to be more sophisticated than Win 3.11.
As well as less complicated than Win 3.11.
*AND* It must run on THE MACHINE!!!!
http://users.erols.com/alangrimes/  <my website.
Any usage of this e-mail account is subject to the terms and conditions
specified on my website.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From newt at scyld.com  Thu Feb  1 14:26:59 2001
From: newt at scyld.com (Daniel Ridge)
Date: Thu, 1 Feb 2001 14:26:59 -0500 (EST)
Subject: diskless nodes with scyld
In-Reply-To: <200102010958.DAA28459@og1.olagrande.net>
Message-ID: <Pine.LNX.4.21.0102011343070.904-100000@eleanor.wdhq.scyld.com>


On Thu, 1 Feb 2001 valentin at olagrande.net wrote:

> I am trying to set up a diskless cluster using the Scyld CD-ROM. Althought
> previous articles in the archive suggest that this is possible, I have found
> no instruction anywhere on how to do this.

We have an updated CD out now (see our website) that runs disklessly
by default (based on popular demand).

> >From an article in the December archives, I assumed that all I need to do
> is change the /etc/beowulf/fstab file. Mine currently has the following
> entries:
> 
> /dev/ram3	/		ext2	fs_size=65536	0 0
> none		/proc		proc	defaults	0 0
> none		/dev/pts	devpts	gid=5,mode=620	0 0
> $MASTER:/home	/home		nfs	defaults	0 0

That looks right.

> This fails, and I have trouble interpreting the error log attached at the
> end of this message. Now, who can help me?

> 1. What are the instructions for booting a diskless node with Scyld?

Comment out the /home mount in your fstab. Your nodes are having NFS
problems that are keeping them from coming up. (the node NFS fs module is
failing to load for some reason).

> 2. Is it possible to boot a diskless node without a Scyld floppy or CD-ROM?
>    My nodes send out DHCP requests. Can I simply setup a dchp server to
>    hand out /var/beowulf/. Will dhcpd conflict with beoserv over ports?

Basically no problem. I'll whip up some instructions for people who want
to PXE boot their boxes. With the latest Scyld release, you basically do:
beoboot -2 -i to make phase-2 images with the kernel and initrd split out
so that you can use them with any boot strategy you choose.

Regards,
	Dan Ridge
	Scyld Computing Corporation


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From siegert at sfu.ca  Thu Feb  1 17:01:03 2001
From: siegert at sfu.ca (Martin Siegert)
Date: Thu, 1 Feb 2001 14:01:03 -0800
Subject: Scyld and Red Hat 7
In-Reply-To: <Pine.LNX.4.30.0102010710110.21143-100000@ganesh.phy.duke.edu>; from rgb@phy.duke.edu on Thu, Feb 01, 2001 at 07:35:09AM -0500
References: <Pine.NEB.4.10.10102011008220.20354-100000@electron.mono.org> <Pine.LNX.4.30.0102010710110.21143-100000@ganesh.phy.duke.edu>
Message-ID: <20010201140103.A2727@stikine.ucs.sfu.ca>

On Thu, Feb 01, 2001 at 07:35:09AM -0500, Robert G. Brown wrote:
> On Thu, 1 Feb 2001, stig wrote:
> 
> > As long as the system includes the main libs, a kernel and the popular
> > package managers (well RPM)  does it really matter what distribution it is
> > based on?

With respect to applications it matters on which version of glibc the distro
is based on.

> > Would there be this discussion if they 'based' it on their own compilation
> > of binaries instead of those of RedHats.
> 
> The reasons to periodically upgrade an operating system distribution
> (theirs or anybody else's), and not just the kernel, are many and valid.
> By the numbers:
> 
>   a) Improved compilers and support libraries.  This is probably the
> number one reason to upgrade a whole distribution rather than just the
> kernel.  Sure, you can just upgrade compilers alone, and kernels alone,
> and libraries alone, but at some point (especially for major e.g. libc
> revisions) you find that you have to rebuild everything anyway and the
> whole point of distributions and kickstart and yellow dog's "yup" tool
> is to make it easy to get from tested configuration to tested
> configuration.  I've done systems management piecemeal and it is no fun
> at all.

This is also the #1 reason for me not to upgrade: if a new distribution
comes with a glibc that is not downward compatible with the commercial
compilers and scientific libraries that I purchased, I simply cannot
use it without spending lots of $$. There must be very good reasons
for that. Right now I doubt that, e.g., Portland compilers aren't even
available for glibc-2.2; no NAG library either.

> This is currently a highly nontrivial reason in my mind.  I'm in the
> middle of fixing an extremely serious bug in the cpu-rate tool I've been
> using to measure floating point performance on nodes and have uncovered
> a rat's nest of wierdness somewhere in the gcc/linux interaction on 6.2
> systems.  As in I can run the same benchmark code with the same
> parameters and get two completely different timings, depending literally
> one whether I set a parameter by a fallthrough default or "override" the
> parameter to the exact same value on the command line.  Or change the
> order of initialization statements.  Different by a factor of two -- not
> a small difference.  This SEEMS to be fixed in RH 7.0 although I'm still
> testing.

I admire you - benchmarking is an art by itself. Just look at the stream
benchmark: The comments in the code (stream_d.f) tell you that you can
either use static or f90-type allocatable arrays. They don't tell you
that the results will be dramatically different (you see the same difference
with with stream_d.c when you malloc the array). So which way should you
do it? Probably the slow way if you want a meaningful result for your
application - I at least malloc almost everything at
run time. However, stream results are never quoted that way.

>   b) Improved kernel.  For example, NFS is basically and maddeningly
> broken in pre-2.18 kernels (but MAY be fixed in 2.18) -- I've actually
> survived a server crash without having to reboot all my NFS clients
> since upgrading my (non-scyld) cluster.  Yes, one can rebuild the kernel
> by hand, but some of the scyld advantages (and other useful beowulf
> stuff) interface directly with the kernel.  These days one sometimes has
> to upgrade the base compiler to upgrade the kernel.  This is less
> important to a scyld beowulf than to a more general purpose cluster
> node, but scyld cannot remain stagnant at a given kernel revision
> forever.

That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3
And as long as I can do it without going to glibc-2.2 I'll probably
upgrade. Now it doesn't look as if RH will be releasing a 2.4 kernel
rpm for 6.2 (although I can't see a reason why they couldn't).
[side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?]
With respect to Scyld (and RH and whoever) this means: I would welcome
upgrades as long as the distribution remains downward compatible.
The showstopper is glibc here and not the kernel.
Sure there are limits to that, but the reasons for giving up downward
compatibility must be very good: so good that the $$ reasons given above
don't count anymore.

>   c) Improved everything else.  This isn't too important to scyld but
> again, even e.g. MPI marches along.  Bugs are fixed, optimizations are
> tuned.  Scyld may not have to remain sync'd to RH's development cycle,
> but it has to re-release its OWN distribution package periodically to
> keep everything up to date and/or users will have to periodically
> upgrade node or server packages piecemeal.
> 
> RH 7 has definitely got some problems, but 7.1beta comes out what,
> today? and reportedly fixes a lot of those problems (as do the many
> updates already released).  Since RH 7 has an incompatible RPM relative
> to 6.2, the 6.2->7 upgrade requires a pretty serious commitment and lots
> of folks are holding off until its problems diminish.
> 
> I therefore don't think that the issue is whether scyld should rebuild
> on the 7.x distribution -- it is rather a question of when.  This is
> thus a reasonable question to ask, although there is (as noted) less
> pressure for them to do it immediately.  There is also the question of
> how difficult it is to do the rebuild -- if the distribution is RPM
> packaged, rebuilding really shouldn't take long at all; it is the
> testing and stabilizing that takes the time.

... and when they decide to rebuild based on 7.x they hopefully consider
keeping a branch based on glibc-2.1.

Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From siegert at sfu.ca  Thu Feb  1 17:27:11 2001
From: siegert at sfu.ca (Martin Siegert)
Date: Thu, 1 Feb 2001 14:27:11 -0800
Subject: Alpha beowulf: True64 or Linux?
Message-ID: <20010201142711.B2727@stikine.ucs.sfu.ca>

We are in the planning stages of setting up a small Alpha cluster.
One of the questions that came up is: should we use True64 or Linux?
Now I don't need any flame wars here, but serious arguments.
You don't even have to convince me (I probably have to run the thing.
Since I am familiar with Linux and I'll continue to support our
Pentium based cluster, Linux just means less work - which is one good
argument, but I need more than that).

Thus:
- are there performance differences?
- software availability? I heard that Compaq's development suite (compilers,
  debuggers, etc.) is available on both platforms. What about scientific
  libraries, etc.
- my guess is that both OS are fully 64bit OS (files > 2GB, etc.).
  How about the compilers? Can I have 128bit precision for floating point
  operations?
- if we buy 4 processor smp boxes: How is the support under either OS?
  (OpenMP, etc.)
- How good is the smp performance (i.e., is it worth it in comparison to
  myrinet?)?
- what other pros and cons?

I'd appreciate all comments and remarks that'll help me to come to a
decision one way or the other.

Thanks.

Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wsb at paralleldata.com  Thu Feb  1 17:54:51 2001
From: wsb at paralleldata.com (W Bauske)
Date: Thu, 01 Feb 2001 16:54:51 -0600
Subject: Scyld and Red Hat 7
References: <Pine.NEB.4.10.10102011008220.20354-100000@electron.mono.org> <Pine.LNX.4.30.0102010710110.21143-100000@ganesh.phy.duke.edu> <20010201140103.A2727@stikine.ucs.sfu.ca>
Message-ID: <3A79E93B.B3C1E0BB@paralleldata.com>

Martin Siegert wrote:
> 
> That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3
> And as long as I can do it without going to glibc-2.2 I'll probably
> upgrade. Now it doesn't look as if RH will be releasing a 2.4 kernel
> rpm for 6.2 (although I can't see a reason why they couldn't).
> [side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?]

I run 2.4.x on two x86's with LFS working. Not too heavy of testing but
it seems to behave. You do need to compile your code with the correct 
defines to access them. Also, on a RH6.2 system, you need to recompile
certain other programs with the right defines too, like your shell.


Wes

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From JParker at coinstar.com  Thu Feb  1 17:36:35 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Thu, 1 Feb 2001 14:36:35 -0800
Subject: Scyld and Red Hat 7
Message-ID: <OFA988530C.25F82A1B-ON082569E6.005C3BA1@coinstar.com>

G'Day !

>> As long as the system includes the main libs, a kernel and the popular
>> package managers (well RPM)  does it really matter what distribution it 
is
>> based on?

>The reasons to periodically upgrade an operating system distribution
>(theirs or anybody else's), and not just the kernel, are many and valid.
>By the numbers:

Well the question is still valid.  Very few would disagree that you should 
update your system from time to time with the latest version of your 
distribution of choice. 

I happen to prefer Debian  ;-)

So the question remains ... is Schyld compatable with the other major 
distributions ?


cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more 
important than that !!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20010201/3ff1e4a2/attachment.html>

From fmuldoo at alpha2.eng.lsu.edu  Thu Feb  1 17:42:33 2001
From: fmuldoo at alpha2.eng.lsu.edu (Frank Muldoon)
Date: Thu, 01 Feb 2001 16:42:33 -0600
Subject: Alpha beowulf: True64 or Linux?
References: <20010201142711.B2727@stikine.ucs.sfu.ca>
Message-ID: <3A79E658.DA17BF2A@me.lsu.edu>

I have tested my CFD code using Dec's Fortran 90/95 compiler on 2 identical Alpha 21264's @500Mhz.  The ratio of
time to finish for Tru64/linux was .85.  This is right in line with what Dec was saying the performance penalty
for using Linux on their machines was.  Does anyone know why this is?  I heard something about Linux not having
page coloring, which I am not familiar with.

--
Frank Muldoon
Computational Fluid Dynamics Research Group
Louisiana State University
Baton Rouge, LA 70803
225-344-7676 (h)
225-388-5217 (w)


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bapper at piratehaven.org  Thu Feb  1 17:58:05 2001
From: bapper at piratehaven.org (Brian Pomerantz)
Date: Thu, 1 Feb 2001 14:58:05 -0800
Subject: Alpha beowulf: True64 or Linux?
In-Reply-To: <3A79E658.DA17BF2A@me.lsu.edu>; from fmuldoo@alpha2.eng.lsu.edu on Thu, Feb 01, 2001 at 04:42:33PM -0600
References: <20010201142711.B2727@stikine.ucs.sfu.ca> <3A79E658.DA17BF2A@me.lsu.edu>
Message-ID: <20010201145805.A22564@skull.piratehaven.org>

On Thu, Feb 01, 2001 at 04:42:33PM -0600, Frank Muldoon wrote:
> I have tested my CFD code using Dec's Fortran 90/95 compiler on 2
> identical Alpha 21264's @500Mhz.  The ratio of time to finish for
> Tru64/linux was .85.  This is right in line with what Dec was saying
> the performance penalty for using Linux on their machines was.  Does
> anyone know why this is?  I heard something about Linux not having
> page coloring, which I am not familiar with.
> 

Page coloring has to do with how cache lines map to pages in memory.
Here is a brief blurb on page coloring from the BSD people:

	We'll end with the page coloring optimizations. Page coloring
	is a performance optimization designed to ensure that accesses
	to contiguous pages in virtual memory make the best use of the
	processor cache. In ancient times (i.e. 10+ years ago)
	processor caches tended to map virtual memory rather than
	physical memory. This led to a huge number of problems
	including having to clear the cache on every context switch in
	some cases, and problems with data aliasing in the cache.
	Modern processor caches map physical memory precisely to solve
	those problems. This means that two side-by-side pages in a
	processes address space may not correspond to two side-by-side
	pages in the cache. In fact, if you aren't careful
	side-by-side pages in virtual memory could wind up using the
	same page in the processor cache -- leading to cacheable data
	being thrown away prematurely and reducing CPU performance.
	This is true even with multi-way set-associative caches
	(though the effect is mitigated somewhat).                              

	FreeBSD's memory allocation code implements page coloring
	optimizations, which means that the memory allocation code
	will attempt to locate free pages that are contiguous from the
	point of view of the cache. For example, if page 16 of
	physical memory is assigned to page 0 of a process's virtual
	memory and the cache can hold 4 pages, the page coloring code
	will not assign page 20 of physical memory to page 1 of a
	process's virtual memory. It would, instead, assign page 21 of
	physical memory. The page coloring code attempts to avoid
	assigning page 20 because this maps over the same cache memory
	as page 16 and would result in non-optimal caching. This code
	adds a significant amount of complexity to the VM memory
	allocation subsystem as you can well imagine, but the result
	is well worth the effort. Page Coloring makes VM memory as
	deterministic as physical memory in regards to cache
	performance.

There has been a lot of arguing back and forth about whether there is
any benefit to page coloring when you take into consideration that it
is very time consuming and difficult to set up and get right.  The
thing that I here REALLY increases performance on many scientific apps
is the use of super pages.


BAPper

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From jcownie at etnus.com  Fri Feb  2 05:52:03 2001
From: jcownie at etnus.com (James Cownie)
Date: Fri, 02 Feb 2001 10:52:03 +0000
Subject: Alpha beowulf: True64 or Linux? 
In-Reply-To: Your message of "Thu, 01 Feb 2001 14:27:11 PST."
             <20010201142711.B2727@stikine.ucs.sfu.ca> 
Message-ID: <14Odox-0pB-00@etnus.com>

Martin Siegert asked :-

> Should we use True64 or Linux?

> - software availability? I heard that Compaq's development suite (compilers,
>   debuggers, etc.) is available on both platforms. What about scientific
>   libraries, etc.

Our Totalview debugger is available for either operating system (and
supports MPI on either).

Compaq's compilers are available on either, however I believe that the
Compaq compilers on Linux do _not_ support either HPF or OpenMP. (For
HPF they like their own message passing system, and for OpenMP they
like their own thread library).

Good luck.

-- Jim 

James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bogdan.costescu at iwr.uni-heidelberg.de  Fri Feb  2 08:28:38 2001
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 2 Feb 2001 14:28:38 +0100 (CET)
Subject: Scyld and Red Hat 7
In-Reply-To: <20010201140103.A2727@stikine.ucs.sfu.ca>
Message-ID: <Pine.LNX.4.30.0102021404230.23424-100000@kenzo.iwr.uni-heidelberg.de>

On Thu, 1 Feb 2001, Martin Siegert wrote:

> That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3

NFSv3 support is present in 2.2.18, however NVSv3 over TCP doesn't work
right at this point (this is valid for 2.4). All the patches that were
floating around and were integrated by major vendors in their kernels were
also integrated in 2.2.18. However, you have to compile it yourself, it
doesn't come as RH update... NFS FAQ and mailing list at
http://nfs.sourceforge.net

> [side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?]

Yes, but you also need LFS support from your glibc. AFAIK, glibc-2.1 from
RH 6.2 is not compiled for LFS.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From glindahl at hpti.com  Fri Feb  2 09:30:43 2001
From: glindahl at hpti.com (Greg Lindahl)
Date: Fri, 2 Feb 2001 09:30:43 -0500
Subject: Q: Any parallel DBs for the cluster computers ?
In-Reply-To: <005001c08bef$71056f00$5f72f2cb@TEST>; from yoon@bh.kyungpook.ac.kr on Thu, Feb 01, 2001 at 10:36:46AM +0900
References: <005001c08bef$71056f00$5f72f2cb@TEST>
Message-ID: <20010202093043.A1138@wumpus.hpti.com>

> Is there any information about Parallel Database using PVFS or GFS or
> itself filesystem or any other parallel filesystem ?

Parallel databases don't necessarily use parallel filesystems. That's
a detail which the database vendor generally hides from you.

Oracle, for example, has a parallel database which doesn't require a
shared filesystem; it only really requires shared data. Then they have
their own lock manager, which provides all they need. Unfortunately I
don't think this is available on Linux.

However, depending on your problem, you might be able to use N
separate databases and do the parallel part yourself. For example, you
would run all queries on all the databases and combine all of the
results. I've seen some non-Linux software which does this over any
SQL database, but I haven't seen such a system for Linux.

-- g

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From sshealy at asgnet.psc.sc.edu  Fri Feb  2 11:49:50 2001
From: sshealy at asgnet.psc.sc.edu (Scott Shealy)
Date: Fri, 2 Feb 2001 11:49:50 -0500 
Subject: Q: Any parallel DBs for the cluster computers ? 
Message-ID: <5773B442597BD2118B9800105A1901EE1B4D4B@asgnet2>

I think if you are looking  for polished open source stuff .... you are
probably out of luck.  But if you will accept a commercial solution look
into IBM's DB2 Extended Enterprise Edtion.  We have used this DB for large
warehousing and data mining projects extensively on our IBM SP(really
nothing more than a real fancy beowulf) and have been pleased with its
performance and awesome scalablilty.  Unlike Oracle OPS(really has
components of a shared architecture which doesnt scale as well), DB2 EEE
uses a shared nothing architecture that is difficult to
configure,administer, and adds an extra dimension for the DBA's and data
architects to deal with .... but really kicks! Recently IBM has released it
for linux and I think you can download a trial from them.  We are getting
ready to give it a whirl on our linux cluster.  You probably already know
this but you are  probably going to have to configure your beowulf nodes a
little differently than you typically do for other computational tasks.  You
will need to spend alot of money on the IO subsystem on each node(thats the
bottleneck in a DB) and if you are need to gurantee uptime you going to have
to think about fail over for each node.  Anyway have fun!

Scott Shealy

E811 Inc
sshealy at E811.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wsb at paralleldata.com  Fri Feb  2 14:40:42 2001
From: wsb at paralleldata.com (W Bauske)
Date: Fri, 02 Feb 2001 13:40:42 -0600
Subject: Scyld and Red Hat 7
References: <Pine.LNX.4.30.0102021404230.23424-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <3A7B0D3A.5D351FC3@paralleldata.com>

Bogdan Costescu wrote:
> 
> On Thu, 1 Feb 2001, Martin Siegert wrote:
> 
> > That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3
> 
> NFSv3 support is present in 2.2.18, however NVSv3 over TCP doesn't work
> right at this point (this is valid for 2.4). All the patches that were
> floating around and were integrated by major vendors in their kernels were
> also integrated in 2.2.18. However, you have to compile it yourself, it
> doesn't come as RH update... NFS FAQ and mailing list at
> http://nfs.sourceforge.net
> 
> > [side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?]
> 
> Yes, but you also need LFS support from your glibc. AFAIK, glibc-2.1 from
> RH 6.2 is not compiled for LFS.
> 

How about trying it before commenting?

I did and it appears to work...

[wsb at wsb62 wsb]$ cat /etc/*lease
Red Hat Linux release 6.2 (Zoot)
[wsb at wsb62 wsb]$ ls -l /z/wsb62f
total 10496032
-rw-r--r--    1 root     root     10737418240 Nov 30 22:18 junk1
drwxr-xr-x    2 root     root        16384 Nov 25 19:08 lost+found

[wsb at wsb62 wsb]$ dmesg | more
Linux version 2.4.0-test10 (root at wsb62.paralleldata.com) (gcc version egcs-2.91.
66 19990314/Linux (egcs-1.1.2 release)) #6 SMP Sat Nov 25 15:52:34 CST 2000


Wes

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From toon at moene.indiv.nluug.nl  Thu Feb  1 16:59:58 2001
From: toon at moene.indiv.nluug.nl (Toon Moene)
Date: Thu, 01 Feb 2001 22:59:58 +0100
Subject: [Fwd: Scyld and Red Hat 7]
Message-ID: <3A79DC5D.529F9A0D@moene.indiv.nluug.nl>

Sorry - meant for the list.

-- 
Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)
-------------- next part --------------
An embedded message was scrubbed...
From: Toon Moene <toon at moene.indiv.nluug.nl>
Subject: Re: Scyld and Red Hat 7
Date: Thu, 01 Feb 2001 22:31:13 +0100
Size: 2188
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20010201/51a4dfaf/attachment.mht>

From Todd_Henderson at readwo.com  Thu Feb  1 14:28:46 2001
From: Todd_Henderson at readwo.com (Todd Henderson)
Date: Thu, 01 Feb 2001 13:28:46 -0600
Subject: diskless nodes with scyld
References: <Pine.LNX.4.21.0102011343070.904-100000@eleanor.wdhq.scyld.com>
Message-ID: <3A79B8EE.D0D7336A@readwo.com>

What is the oldest Intel that the Scyld will install and run on?  I have a couple of old 486's at home I was
thinking about playing around with?

Thanks,
Todd

Daniel Ridge wrote:

> On Thu, 1 Feb 2001 valentin at olagrande.net wrote:
>
> > I am trying to set up a diskless cluster using the Scyld CD-ROM. Althought
> > previous articles in the archive suggest that this is possible, I have found
> > no instruction anywhere on how to do this.
>
> We have an updated CD out now (see our website) that runs disklessly
> by default (based on popular demand).
>
> > >From an article in the December archives, I assumed that all I need to do
> > is change the /etc/beowulf/fstab file. Mine currently has the following
> > entries:
> >
> > /dev/ram3     /               ext2    fs_size=65536   0 0
> > none          /proc           proc    defaults        0 0
> > none          /dev/pts        devpts  gid=5,mode=620  0 0
> > $MASTER:/home /home           nfs     defaults        0 0
>
> That looks right.
>
> > This fails, and I have trouble interpreting the error log attached at the
> > end of this message. Now, who can help me?
>
> > 1. What are the instructions for booting a diskless node with Scyld?
>
> Comment out the /home mount in your fstab. Your nodes are having NFS
> problems that are keeping them from coming up. (the node NFS fs module is
> failing to load for some reason).
>
> > 2. Is it possible to boot a diskless node without a Scyld floppy or CD-ROM?
> >    My nodes send out DHCP requests. Can I simply setup a dchp server to
> >    hand out /var/beowulf/. Will dhcpd conflict with beoserv over ports?
>
> Basically no problem. I'll whip up some instructions for people who want
> to PXE boot their boxes. With the latest Scyld release, you basically do:
> beoboot -2 -i to make phase-2 images with the kernel and initrd split out
> so that you can use them with any boot strategy you choose.
>
> Regards,
>         Dan Ridge
>         Scyld Computing Corporation
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From ddj at mookie.cis.brown.edu  Sun Feb  4 00:15:58 2001
From: ddj at mookie.cis.brown.edu (Dave Johnson)
Date: Sun, 4 Feb 2001 00:15:58 -0500
Subject: Scyld + myrinet mpich-gm?
Message-ID: <200102040515.f145FwN18773@mookie.cis.brown.edu>

I've gotten myself involved in bringing a small cluster up and
into production.  I'm learning as I go, with the help of the
archives of this mailing list.  Unfortunately the searchable
archives at Supercomputer.org seem to be off line (I get internal
server error), and out of date (the last messages seem to be from
around May 2000).

The current setup is one master with 100base-T to the world, gigabit
fiber to a 16-10/100 + 2-1000 switch, and 12 diskless slaves with
10/100 and myrinet interfaces.  The Scyld release of last Monday is
up and running, and I can bpsh to my heart's content.

I'm stuck at the point of trying to deploy MPI.  Scyld supplies mpi-beowulf
which does not appear to me to use bproc, and /usr/bin/mpirun and mpprun
which do.  I've built the mpich-gm from Myricom, but their mpirun command
does not grok bpsh, and expects either rsh or ssh daemons on each slave.

I've tried a number of approaches that start out looking like they might
work, but have gotten stuck after a few hours down each cowpath.

Here is a list of some of the snags (I've lost track of some others):

bpsh is not a full blown shell, doesn't deal well with redirection, changing
directory before running a command, and in particular it can't be swapped for
rsh or ssh when configuring mpich (ie -rsh=bpsh).

The master node is outside the myrinet, I haven't a clue how to get
it to cooperate with the slaves over ethernet yet have the slaves
use myrinet as much as possible.

I tried hacking on the first test in mpich-1.2..4/examples/test
(pt2pt/third) that you get when you do make testing or runtests -check.
Tried to get it to use /usr/bin/mpirun.  Had to get rid of -mvhome and
-mvback args first, then tried to use bpsh to start up the mpirun on
one node, hoping it could use GM to start up on the other slaves.
After creating the directory in /var where it could create shm_beostat,

Now I get truckloads of errors:
shmblk_open: Couldn't open shared memory file: /shm_beostat
shmblk_open failed.

I suppose these might be from the other nodes, expecting everyone is
sharing /var, but I'm leery of nfs mounting all of the master's /var
on each slave.

I tried applying the Scyld patches against the 1.2.0 mpich sources to
the 1.2..4 sources from Myricom, but most of them went into the mpid/ch_p4
directory, which is not built when --with-device=ch_gm is specified.

Then I thought I'd look into the mpprun sources, but I couldn't get
them to build even before I started hacking on them... decided to look
elsewhere for a while.

Tried getting sshd2 up and running on a slave node.  So far it insists
on asking for my password and won't accept it at all.

Has anyone got a working cluster anything like the one we're building?
What did you have to do differently to make the various packages and
drivers play nice with each other?  Where did I go wrong?

Thanks,

	-- ddj

	Dave Johnson
	ddj at cascv.brown.edu
	Brown University TCASCV


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From tibbs at math.uh.edu  Sun Feb  4 02:40:39 2001
From: tibbs at math.uh.edu (Jason L Tibbitts III)
Date: 04 Feb 2001 01:40:39 -0600
Subject: Scyld + myrinet mpich-gm?
In-Reply-To: Dave Johnson's message of "Sun, 4 Feb 2001 00:15:58 -0500"
References: <200102040515.f145FwN18773@mookie.cis.brown.edu>
Message-ID: <ufad7cy28w8.fsf@epithumia.math.uh.edu>

>>>>> "DJ" == Dave Johnson <ddj at mookie.cis.brown.edu> writes:

DJ> Has anyone got a working cluster anything like the one we're building?

We have the same basic structure: Gigabit Ethernet from front end to
switch, 100MBps Ethernet from switch to nodes, and Myrinet between just the
nodes.  In our case, we have 32 nodes plus the front end and the previous
generation 16 port Myrinet switches, so getting the front end on the
Myrinet would be rather expensive.  With the new switch setup it wouldn't
be so bad.

I had a short exchange with Donald Becker about our configuration; I don't
want to speak for him, but the impression I got was that they hadn't really
anticipated this configuration.  Their setup lets you run entirely over
Myrinet, but it assumes that the front end is on the Myrinet as well.

With a support contract, it's possible that they could work this out, but I
can't push that funding for an existing cluster so I've backed off the
Scyld setup for now.  I'll specify it with our next cluster purchase.
-- 
 Jason L Tibbitts III - tibbs at uh.edu - 713/743-3486 - 660PGH - 94 PC800
    System Manager:  University of Houston Department of Mathematics 
Born alone beneath pale sardonic skies.  One love, one life, one sorrow.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From tibbs at math.uh.edu  Sun Feb  4 02:40:39 2001
From: tibbs at math.uh.edu (Jason L Tibbitts III)
Date: 04 Feb 2001 01:40:39 -0600
Subject: Scyld + myrinet mpich-gm?
In-Reply-To: Dave Johnson's message of "Sun, 4 Feb 2001 00:15:58 -0500"
References: <200102040515.f145FwN18773@mookie.cis.brown.edu>
Message-ID: <ufad7cy28w8.fsf@epithumia.math.uh.edu>

>>>>> "DJ" == Dave Johnson <ddj at mookie.cis.brown.edu> writes:

DJ> Has anyone got a working cluster anything like the one we're building?

We have the same basic structure: Gigabit Ethernet from front end to
switch, 100MBps Ethernet from switch to nodes, and Myrinet between just the
nodes.  In our case, we have 32 nodes plus the front end and the previous
generation 16 port Myrinet switches, so getting the front end on the
Myrinet would be rather expensive.  With the new switch setup it wouldn't
be so bad.

I had a short exchange with Donald Becker about our configuration; I don't
want to speak for him, but the impression I got was that they hadn't really
anticipated this configuration.  Their setup lets you run entirely over
Myrinet, but it assumes that the front end is on the Myrinet as well.

With a support contract, it's possible that they could work this out, but I
can't push that funding for an existing cluster so I've backed off the
Scyld setup for now.  I'll specify it with our next cluster purchase.
-- 
 Jason L Tibbitts III - tibbs at uh.edu - 713/743-3486 - 660PGH - 94 PC800
    System Manager:  University of Houston Department of Mathematics 
Born alone beneath pale sardonic skies.  One love, one life, one sorrow.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From per at computer.org  Sun Feb  4 14:09:09 2001
From: per at computer.org (Per Jessen)
Date: Sun, 04 Feb 2001 14:09:09 
Subject: Big Iorn
Message-ID: <200102041407.f14E7Ol18306@mercury.nildram.co.uk>

On Thu, 01 Feb 2001 13:30:48 -0500, Alan Grimes wrote:

>Hey, I have been hearing a lot of things about MVS, the inherant
>superiority of the 390, and all sorts of stuff about how all these big
>machines are so radicaly advanced that its not even funny... 
>
>This has finally piqued my interest to the point where I now would like
>to know more about how these machines work and what they can actually
>do. 
>Since this list is tangentaly related to that field I am sure there are
>at least a few here who could give me some useful pointers. =)

What would you like to know ? 
I doubt if the z-server architecture is particularly advanced, but it's
probably on a par with other modern processors. 
I've done system-level development (mostly assembler) for the 370 and 
390 architectures for 10-12 years - ask away. I've done VM, MVS and TPF - 
not much else runs on 390 - except for Linux now.


regards,
Per Jessen

regards,
Per Jessen


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Feb  4 09:47:29 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 4 Feb 2001 09:47:29 -0500 (EST)
Subject: Kickstart Installation problems
In-Reply-To: <DB91BC27B91BD411B598009027E59F70539863@exchangehq.quova.com>
Message-ID: <Pine.LNX.4.30.0102040938330.32520-100000@ganesh.phy.duke.edu>

On Wed, 31 Jan 2001, Mallik Vonteddu wrote:

> After booting from the floppy, it could able to get the IP address from
> the DHCP server,but it fails to mount the NFS partition.
> It comes out with an error message" Mount: RPC timeout " .
>
> Checked the following daemons Portmapper,nfsd,mountd and rpcinfo.
> Executing the command "exportfs" shows the exported partitions too.
> Evertyhing seems to work on the nfs server, but when it tries to mount
> the nfs partition, it hangs there for some time and comes out
> as " Mount : RPC timeout " .

Have you checked to make sure that the ip number you are granting still
has permissions to mount?

Have you tried booting a rescue floppy and mounting the NFS partition by
hand?

Is the NFS partition mountable by other clients in the net (if they are
given permission to mount)?

I'm sorry if these suggestions sound lame, but you've already checked a
lot, it sounds like, and it worked and now it doesn't.  Either something
changed or something broke (hardware or software).  First hypothesis is
that something changed, so look for something that changed -- an extra
character that somehow got typed in the kickstart line in its dhcpd
entry, an address from the wrong block -- typos can be killers because
everything "works" but -- doesn't.  Second hypothesis is software, so
make sure that the NFS client-server connection is valid for the
exported space for some other reliable client.  Check to be sure that
your kickstart floppy is valid, unbroken, current, and works for some
other client (if you can).  At this point, you've checked the entire
install path, and you're down to client hardware.  Which does break,
although I wouldn't expect it to produce an RPC error (only) if it did.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Sun Feb  4 10:23:04 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 4 Feb 2001 10:23:04 -0500 (EST)
Subject: Scyld and Red Hat 7
In-Reply-To: <3A787495.830E5454@hsc.vcu.edu>
Message-ID: <Pine.LNX.4.30.0102040948300.32520-100000@ganesh.phy.duke.edu>

On Wed, 31 Jan 2001, Mike Davis wrote:

> For a production server, I'm in complete agreement with Martin. The most
> important thing that a
> research computer can do is continually compute research. Flippant as
> that might sound, it is the
> truth. While I have upgraded my desktop and some webservers to RH7, I
> have no overwhelming
> desire to upgrade our cluster for the reasons mentioned.

It's anecdotal, to be sure, but after the RH 5.x->RH 6.x upgrade in our
department all my compiled research binaries ran some 20% faster.  We
made back the one day of downtime in one week of production, and of
course there were other tremendous benefits in even slightly broken 6.0
compared to 5.2.  There were library issues associated with upgrades as
well back then and all of these arguments were advanced and debated.

The tension between stability and improvement is as old as code itself.
Most people find a happy medium that is reasonably economic -- they get
things stable and productive and then leave them alone until their
friends start to make fun of them and then they upgrade, grumbling all
the while, get things stable, and then leave them alone (iterate
indefinitely).  As long as they have smart and helpful friends who live
close enough to the bleeding edge that it eventually is stabilized, this
is probably just fine.

It can easily be carried to a fault, though, as my anecdote makes clear.
We'd ALL pretty much make fun of somebody still installing and running
5.2 on brand new hardware (and only buying peripheral hardware from the
limited list of supported devices from that time), wouldn't we?  There
are real improvements associated with upgrades, and at some point it
becomes clearly worth it to pay the "cost" of the upgrade (time, hassle,
money, instability, recompiling, and so forth, which is actually pretty
damn minimal for RH based systems with kickstart) to gain the benefits.

Piecemeal upgrade isn't a good answer either, at least not in the long
run (although it is essential for prototyping an organizational
upgrade).  It becomes increasingly difficult to manage an "island" of
obsoleted systems in a sea of current ones for a variety of reasons:
the rpm incompatibility between 6.2 and 7.0, the hassle of tracking two
different update lists to ensure that your overall operation remains
secure (a step often skipped, but then lots of operations just aren't
particularly secure), the "missing application" problem when something
you get used to on the one distro isn't on the other, and in the case of
desktops, the lack of backwards compatibility in many of the X/gnome
improvements that really screw things up if one shifts between
distributions with a common NFS mounted home) it starts costing one MORE
time and MORE productivity to keep things heterogeneous than it would to
upgrade.  Homogeneity equals administrative scalability, and this
contributes to overall productivity too.

But you know all this -- I'm just trying to provide some perspective for
less experienced readers, so they don't get the impression that we're
linux-luddites of some sort who plan to be running 6.2 two years from
now...:-) So my point wasn't that everybody should stop everything and
upgrade to 7 NOW or that Scyld should do so right away -- it was that at
some point (the point where in the mind of the individual the costs and
benefits start to balance) one SHOULD upgrade, and that Scyld is likely
to do so when in their judgement that point is reached.

For what it's worth we haven't upgraded to 7.0 yet either, but at some
point in the not too distant future we will (possibly to 7.1 instead of
7.0).  We'll do it "all at once", by prototyping and thoroughly testing
a few archetypical systems, certifying a particular collection of rpm's
and updates that "works", and then using kickstart to simply convert
over all the department systems in a day (or at most two).  Not much
lost productivity, although we will also not be stupid and do this right
before some important event (like finals) when the systems HAVE to be up
in case there are problems.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lowther at att.net  Sun Feb  4 10:37:41 2001
From: lowther at att.net (Ken)
Date: Sun, 04 Feb 2001 10:37:41 -0500
Subject: diskless nodes with scyld
References: <Pine.LNX.4.21.0102011343070.904-100000@eleanor.wdhq.scyld.com> <3A79B8EE.D0D7336A@readwo.com>
Message-ID: <3A7D7745.782E00CD@att.net>

Todd Henderson wrote:
> 
> What is the oldest Intel that the Scyld will install and run on?  I have a couple of old 486's at home I was
> thinking about playing around with?
> 

It should go on a i386.  Ok for playing around with, but not really
usefull given todays prices on hardware vs electicty. ;-)

Ken

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lowther at att.net  Sun Feb  4 10:37:41 2001
From: lowther at att.net (Ken)
Date: Sun, 04 Feb 2001 10:37:41 -0500
Subject: diskless nodes with scyld
References: <Pine.LNX.4.21.0102011343070.904-100000@eleanor.wdhq.scyld.com> <3A79B8EE.D0D7336A@readwo.com>
Message-ID: <3A7D7745.782E00CD@att.net>

Todd Henderson wrote:
> 
> What is the oldest Intel that the Scyld will install and run on?  I have a couple of old 486's at home I was
> thinking about playing around with?
> 

It should go on a i386.  Ok for playing around with, but not really
usefull given todays prices on hardware vs electicty. ;-)

Ken

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From wsb at paralleldata.com  Sun Feb  4 23:25:10 2001
From: wsb at paralleldata.com (W Bauske)
Date: Sun, 04 Feb 2001 22:25:10 -0600
Subject: Big Iron
References: <200102041407.f14E7Ol18306@mercury.nildram.co.uk>
Message-ID: <3A7E2B26.B8091B7D@paralleldata.com>

Per Jessen wrote:
> 
> On Thu, 01 Feb 2001 13:30:48 -0500, Alan Grimes wrote:
> 
> >Hey, I have been hearing a lot of things about MVS, the inherant
> >superiority of the 390, and all sorts of stuff about how all these big
> >machines are so radicaly advanced that its not even funny...
> >
> >This has finally piqued my interest to the point where I now would like
> >to know more about how these machines work and what they can actually
> >do.
> >Since this list is tangentaly related to that field I am sure there are
> >at least a few here who could give me some useful pointers. =)
> 
> What would you like to know ?
> I doubt if the z-server architecture is particularly advanced, but it's
> probably on a par with other modern processors.
> I've done system-level development (mostly assembler) for the 370 and
> 390 architectures for 10-12 years - ask away. I've done VM, MVS and TPF -
> not much else runs on 390 - except for Linux now.
> 


Go to IBM's site and look up "zseries" and "linux 390". Look thru that 
and search for whatever else you're curious about.


Wes

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Jon.Tegner at wiglaf.se  Sun Feb  4 13:54:50 2001
From: Jon.Tegner at wiglaf.se (Jon Tegner)
Date: Sun, 04 Feb 2001 19:54:50 +0100
Subject: Managing rpms
Message-ID: <3A7DA57A.612FB7A7@wiglaf.se>

In a post awhile back the yup-package for maintaining rpms was
mentioned, and I was wondering if someone has experiences of that or
some other package which automatically takes care of updating rpms in a
system (on the page http://www.rpm.org/software.html there seems to be
several canditates).

Regards,

/jon


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Carl_Notfors at vdgc.com.sg  Mon Feb  5 02:23:42 2001
From: Carl_Notfors at vdgc.com.sg (Carl_Notfors at vdgc.com.sg)
Date: Mon, 5 Feb 2001 15:23:42 +0800
Subject: Fault tolerance and MPI
Message-ID: <OFE28C5AC3.70F96389-ON482569EA.0027CA31@vdgc.com.sg>


Our computational model is quite simple.  We have a master node and a
number of slave nodes.  All communication is between the master and the
slaves, ie. no internode communication, so all communication is done with
MPI_Send and MPI_Recv (we are using LAM/MPI).

The problem with MPI is that there is no fault tolerance, if a slave node
"dies" the whole process goes down.  According to the LAM documentation it
should be possible to achieve some fault tolerance but we have as yet not
tried this.

Is there anyone who has got this working?  Is there fault tolerance in any
othe MPI implementations?  Would it be better to use PVM if you want fault
tolerance?


Carl


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From bahnsen at theo-physik.uni-kiel.de  Mon Feb  5 05:30:20 2001
From: bahnsen at theo-physik.uni-kiel.de (Robert Bahnsen)
Date: Mon, 5 Feb 2001 11:30:20 +0100 (MET)
Subject: Alpha beowulf: True64 or Linux?
In-Reply-To: <200102021700.MAA11198@blueraja.scyld.com> from "beowulf-admin@beowulf.org" at Feb 02, 2001 12:00:06 PM
Message-ID: <200102051030.LAA03119@berg.theo-physik.uni-kiel.de>


 Martin,

as far as the NAG fl90 Library is concerned the following versions are 
available for Compaq Alpha:

FNDAU04DB  Release 4 / Compaq Alpha UNIX / Compaq compiler
FNDAU03D9  Release 3 / Compaq Alpha UNIX / NAGWare compiler
FNDAL03D9  Release 3 / Compaq Alpha Linux / NAGWare compiler

The combination Compaq Alpha Linux + (free/cheap) Compaq compiler is 
missing, and NAG said they would not release one in the near future.
Take this pro Tru64 or con NAG, as you like.

HTH,  Robert

> - are there performance differences?
> - software availability? I heard that Compaq's development suite (compilers,
>   debuggers, etc.) is available on both platforms. What about scientific
>   libraries, etc.
> - my guess is that both OS are fully 64bit OS (files > 2GB, etc.).
>   How about the compilers? Can I have 128bit precision for floating point
>   operations?
> - if we buy 4 processor smp boxes: How is the support under either OS?
>   (OpenMP, etc.)
> - How good is the smp performance (i.e., is it worth it in comparison to
>   myrinet?)?
> - what other pros and cons?

--
Dipl.-Phys. Robert Bahnsen  Institut f. Theoretische Physik und Astrophysik
                            CAU Kiel, Leibnizstr. 15, D-24098 Kiel, Germany
                            Fon: +49 (0)431 8804112 Fax: +49 (0)431 8804094
E-Mail:  bahnsen at tp.cau.de  www.theo-physik.uni-kiel.de/~bahnsen/index.html

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Feb  5 07:21:13 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 5 Feb 2001 07:21:13 -0500 (EST)
Subject: Managing rpms
In-Reply-To: <3A7DA57A.612FB7A7@wiglaf.se>
Message-ID: <Pine.LNX.4.30.0102050657420.4475-100000@ganesh.phy.duke.edu>

On Sun, 4 Feb 2001, Jon Tegner wrote:

> In a post awhile back the yup-package for maintaining rpms was
> mentioned, and I was wondering if someone has experiences of that or
> some other package which automatically takes care of updating rpms in a
> system (on the page http://www.rpm.org/software.html there seems to be
> several canditates).
>
> Regards,
>
> /jon

I've just started using yup personally, as it is being prototyped as a
method for automating the generally incredibly painful process of
keeping a system or set of systems both consistent and secure (in the
sense of being up to date with respect to security patches and the
like).

yup does a lot of things for an RPM-based distribution that we are used
to seeing only from e.g. Debian -- it is dependency aware and can update
an entire dependency tree with one call.  It also does sanity checks and
effectively forces one to eliminate inconsistencies from an RPM tree
before it will run -- on one of my oldest systems, I had multiple rpm
revisions of some packages installed which had survived the 6.2 upgrade.
yup patiently went through this and helped me figure out what was
bollixed up and remove or hand update things until it was satisfied that
the distribution itself on the system was at least not overtly broken
somewhere.

It can also be used to generate a plain list of all installed packages.

In application, once one has a clean system it becomes a simple
client-side call.  It can be run nightly in a cron script, for example,
on all clients.  The clients are directed to an FTP server which has yup
configuration information and distribution/update directories.
Everything is then done automagically -- it compares what you have to
what you should have, retrieves and caches copies of rpm's that need
updating and all their dependencies, installs them, removes the cache
copies, and goes away.

It can also be run from the command line targeted at specific packages.
For example, on the aforementioned host I still have a bug that is
preventing a full update (a bug which might well be in yup -- the
package isn't yet perfect).  However, it still works fine for individual
packages, and I'm working my way through "important" packages one at a
time.  Below is a trace of operation for updating e.g. lpr (which is
actually not that important on this host, but is out of date):

rgb at rgb|T:3#more /tmp/rpm-list
rgb at rgb|T:4#yup update lpr
Reading RPM database... (100%)
Performing dependencies sanity check...
Checking for package list updates...
Done transfering...    280B in   0.0s at  115kB per/sec
Package list is up to date...
Reading package list... (100%)

As requested, I will do the following:
[update: lpr]
Downloading lpr-0.50-7.6.x.i386.rpm
Done transfering...  89.6kB in   2.0s at 44.8kB per/sec
Reading packages... Done
lpr-0.50-7.6.x.i386.rpm [..........]
42.770user 0.780sys 86.8%, 0ib 0ob 0tx 0da 0to 0swp 0:50.16

This appears to be a bit easier than:

  a) Figuring out the package of lpr that I have.
  b) Finding out if it is superceded by an update
  c) Hand-ftp'ing the update rpm (and dependencies, if any) from a
     Red Hat mirror.
  d) installing the rpm(s) by hand.

One call, all automated.  If run a second time, it returns:

rgb at rgb|T:5#yup update lpr
Reading RPM database... (100%)
Performing dependencies sanity check...
Checking for package list updates...
Done transfering...    280B in   0.0s at  114kB per/sec
Package list is up to date...
Reading package list... (100%)

Error: Package lpr is already installed and is the latest version
40.620user 0.650sys 99.6%, 0ib 0ob 0tx 0da 0to 0swp 0:41.41

That's all I know at the moment from a user perspective (somebody else
is managing the FTP site and master yup configuration).  I believe that
this configuration process isn't too arduous, though.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From tony at MPI-Softtech.Com  Mon Feb  5 09:49:12 2001
From: tony at MPI-Softtech.Com (Tony Skjellum)
Date: Mon, 5 Feb 2001 08:49:12 -0600 (CST)
Subject: Fault tolerance and MPI
In-Reply-To: <OFE28C5AC3.70F96389-ON482569EA.0027CA31@vdgc.com.sg>
Message-ID: <Pine.GSO.4.21.0102050848180.20693-100000@mpi.mpi-softtech.com>

You can see our initial paper on this subject at

http://www.mpi-softtech.com/publications/mpift-paper-dsm2001.pdf

It contains references to other known works in this area.

-Tony

Anthony Skjellum, PhD, President (tony at mpi-softtech.com) 
MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 39759
+1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com
"Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters."

On Mon, 5 Feb 2001 Carl_Notfors at vdgc.com.sg wrote:

> 
> 
> Our computational model is quite simple.  We have a master node and a
> number of slave nodes.  All communication is between the master and the
> slaves, ie. no internode communication, so all communication is done with
> MPI_Send and MPI_Recv (we are using LAM/MPI).
> 
> The problem with MPI is that there is no fault tolerance, if a slave node
> "dies" the whole process goes down.  According to the LAM documentation it
> should be possible to achieve some fault tolerance but we have as yet not
> tried this.
> 
> Is there anyone who has got this working?  Is there fault tolerance in any
> othe MPI implementations?  Would it be better to use PVM if you want fault
> tolerance?
> 
> 
> Carl
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From keithu at parl.clemson.edu  Mon Feb  5 10:14:52 2001
From: keithu at parl.clemson.edu (Keith Underwood)
Date: Mon, 5 Feb 2001 10:14:52 -0500 (EST)
Subject: Scyld + myrinet mpich-gm?
In-Reply-To: <200102040515.f145FwN18773@mookie.cis.brown.edu>
Message-ID: <Pine.LNX.4.30.0102051004130.1173-100000@keithu-pc.parl.clemson.edu>

Hmmm...  we have something similar, but not quite the same.  We have a
master w/ 100base-T to the world, gigabit fiber to a 24-10/100 + 2-1000
switch and 16 slaves (not diskless) with 10/100 and gigabit interfaces.
We only have 16 ports on our gigabit switch and out master is a different
type of machine from the 16 slaves.  We have successfully convinced the
machines to communicate over the gigabit exclusively while communicating
with the master over the 10/100.  You do need to use the Scyld MPI though.
I seriously doubt that you will get another MPI running as is.

Anyway, what we did was:

after bringing the nodes up:
	bpsh -a route add -host 192.168.1.1 eth0
	bpsh -a route del default
	bpsh -a modprobe sk98lin

then on each node:
	bpsh <node> ifconfig eth1 up <nodes current IP>

Then to run an MPI job that DOES NOT run on the head:
	NO_INLINE_MPIRUN=true bpsh 0 mpiapp -p4pg /tmp/pgfile

where /tmp/pgfile is a p4 process group file.

This is a real sketchy config so don't expect too much support on it
just yet ;-)

On Sun, 4 Feb 2001, Dave Johnson wrote:

> I've gotten myself involved in bringing a small cluster up and
> into production.  I'm learning as I go, with the help of the
> archives of this mailing list.  Unfortunately the searchable
> archives at Supercomputer.org seem to be off line (I get internal
> server error), and out of date (the last messages seem to be from
> around May 2000).
>
> The current setup is one master with 100base-T to the world, gigabit
> fiber to a 16-10/100 + 2-1000 switch, and 12 diskless slaves with
> 10/100 and myrinet interfaces.  The Scyld release of last Monday is
> up and running, and I can bpsh to my heart's content.
>
> I'm stuck at the point of trying to deploy MPI.  Scyld supplies mpi-beowulf
> which does not appear to me to use bproc, and /usr/bin/mpirun and mpprun
> which do.  I've built the mpich-gm from Myricom, but their mpirun command
> does not grok bpsh, and expects either rsh or ssh daemons on each slave.
>
> I've tried a number of approaches that start out looking like they might
> work, but have gotten stuck after a few hours down each cowpath.
>
> Here is a list of some of the snags (I've lost track of some others):
>
> bpsh is not a full blown shell, doesn't deal well with redirection, changing
> directory before running a command, and in particular it can't be swapped for
> rsh or ssh when configuring mpich (ie -rsh=bpsh).
>
> The master node is outside the myrinet, I haven't a clue how to get
> it to cooperate with the slaves over ethernet yet have the slaves
> use myrinet as much as possible.
>
> I tried hacking on the first test in mpich-1.2..4/examples/test
> (pt2pt/third) that you get when you do make testing or runtests -check.
> Tried to get it to use /usr/bin/mpirun.  Had to get rid of -mvhome and
> -mvback args first, then tried to use bpsh to start up the mpirun on
> one node, hoping it could use GM to start up on the other slaves.
> After creating the directory in /var where it could create shm_beostat,
>
> Now I get truckloads of errors:
> shmblk_open: Couldn't open shared memory file: /shm_beostat
> shmblk_open failed.
>
> I suppose these might be from the other nodes, expecting everyone is
> sharing /var, but I'm leery of nfs mounting all of the master's /var
> on each slave.
>
> I tried applying the Scyld patches against the 1.2.0 mpich sources to
> the 1.2..4 sources from Myricom, but most of them went into the mpid/ch_p4
> directory, which is not built when --with-device=ch_gm is specified.
>
> Then I thought I'd look into the mpprun sources, but I couldn't get
> them to build even before I started hacking on them... decided to look
> elsewhere for a while.
>
> Tried getting sshd2 up and running on a slave node.  So far it insists
> on asking for my password and won't accept it at all.
>
> Has anyone got a working cluster anything like the one we're building?
> What did you have to do differently to make the various packages and
> drivers play nice with each other?  Where did I go wrong?
>
> Thanks,
>
> 	-- ddj
>
> 	Dave Johnson
> 	ddj at cascv.brown.edu
> 	Brown University TCASCV
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

---------------------------------------------------------------------------
Keith Underwood                   Parallel Architecture Research Lab (PARL)
keithu at parl.clemson.edu                                  Clemson University


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From newt at scyld.com  Mon Feb  5 14:47:24 2001
From: newt at scyld.com (Daniel Ridge)
Date: Mon, 5 Feb 2001 14:47:24 -0500 (EST)
Subject: diskless nodes with scyld
In-Reply-To: <3A79B8EE.D0D7336A@readwo.com>
Message-ID: <Pine.LNX.4.21.0102051445180.1216-100000@eleanor.wdhq.scyld.com>


Todd,

On Thu, 1 Feb 2001, Todd Henderson wrote:

> What is the oldest Intel that the Scyld will install and run on?  I have a couple of old 486's at home I was
> thinking about playing around with?

Our distribution will (out of the box) run on slave nodes which have a PCI
bus. If you have more time than money, our software can certianly be made
to run on older machines.

Regards,
	Dan Ridge
	Scyld Computing Corporation


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From shahin at labf.org  Tue Feb  6 03:00:52 2001
From: shahin at labf.org (Mofeed Shahin)
Date: Tue, 6 Feb 2001 08:00:52 +0000
Subject: MP PowerPC
Message-ID: <01020608005202.14355@localhost.localdomain>

Has anyone had a look these ? 

http://www.totalimpact.com/G3_MP.html

What do people think of them?

Mof.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lowther at att.net  Mon Feb  5 17:01:46 2001
From: lowther at att.net (Ken)
Date: Mon, 05 Feb 2001 17:01:46 -0500
Subject: MP PowerPC
References: <01020608005202.14355@localhost.localdomain>
Message-ID: <3A7F22CA.36F13FFF@att.net>

Mofeed Shahin wrote:
> 
> Has anyone had a look these ?
> 
> http://www.totalimpact.com/G3_MP.html
> 
> What do people think of them?
> 

I've heard they are expensive.  They say up to 8 boards can work
together, but don't give a height diminsion.  I kind of doubt you could
populate all your slots with them.  At least not on my board. :(

Ken

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From mrao2001 at yahoo.com  Mon Feb  5 15:37:03 2001
From: mrao2001 at yahoo.com (mrao2001 at yahoo.com)
Date: Mon, 5 Feb 2001 12:37:03 -0800
Subject: Kickstart Installation problems
References: <Pine.LNX.4.30.0102040938330.32520-100000@ganesh.phy.duke.edu>
Message-ID: <007401c08fb3$663c9c20$2464a8c0@quova.com>

Hi Robert

You are right. Something has changed i.e some one had

plugged his linux test machine on the same network,

which is also running DHCP leasing different IP addresses

on the network. After removing the test box, installation

went on smoothly.


Thanks a lot for your help

Regards

Mallik

----- Original Message -----
From: "Robert G. Brown" <rgb at phy.duke.edu>
To: "Mallik Vonteddu" <mallik at quova.com>
Cc: <beowulf at beowulf.org>
Sent: Sunday, February 04, 2001 6:47 AM
Subject: Re: Kickstart Installation problems


> On Wed, 31 Jan 2001, Mallik Vonteddu wrote:
>
> > After booting from the floppy, it could able to get the IP address from
> > the DHCP server,but it fails to mount the NFS partition.
> > It comes out with an error message" Mount: RPC timeout " .
> >
> > Checked the following daemons Portmapper,nfsd,mountd and rpcinfo.
> > Executing the command "exportfs" shows the exported partitions too.
> > Evertyhing seems to work on the nfs server, but when it tries to mount
> > the nfs partition, it hangs there for some time and comes out
> > as " Mount : RPC timeout " .
>
> Have you checked to make sure that the ip number you are granting still
> has permissions to mount?
>
> Have you tried booting a rescue floppy and mounting the NFS partition by
> hand?
>
> Is the NFS partition mountable by other clients in the net (if they are
> given permission to mount)?
>
> I'm sorry if these suggestions sound lame, but you've already checked a
> lot, it sounds like, and it worked and now it doesn't.  Either something
> changed or something broke (hardware or software).  First hypothesis is
> that something changed, so look for something that changed -- an extra
> character that somehow got typed in the kickstart line in its dhcpd
> entry, an address from the wrong block -- typos can be killers because
> everything "works" but -- doesn't.  Second hypothesis is software, so
> make sure that the NFS client-server connection is valid for the
> exported space for some other reliable client.  Check to be sure that
> your kickstart floppy is valid, unbroken, current, and works for some
> other client (if you can).  At this point, you've checked the entire
> install path, and you're down to client hardware.  Which does break,
> although I wouldn't expect it to produce an RPC error (only) if it did.
>
>    rgb
>
> --
> Robert G. Brown                        http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From RSchilling at affiliatedhealth.org  Mon Feb  5 21:14:14 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Mon, 5 Feb 2001 18:14:14 -0800 
Subject: MP PowerPC
Message-ID: <51FCCCF0C130D211BE550008C724149EBE1039@mail1.affiliatedhealth.org>

I'd also want to take a look at the driver code as well.  The page indicated
they are capable of running ELF code, but I'd look for more information
about how the GNU environment is used on a specific installation.

It'd be nice if it works well though, 'cause you might actually have
something that compares to the Intel daughter cards that were made for the
Power PC.

--Richard Schilling


> -----Original Message-----
> From: Ken [mailto:lowther at att.net]
> Sent: Monday, February 05, 2001 2:02 PM
> To: shahin at labf.org
> Cc: beowulf at beowulf.org
> Subject: Re: MP PowerPC
> 
> 
> Mofeed Shahin wrote:
> > 
> > Has anyone had a look these ?
> > 
> > http://www.totalimpact.com/G3_MP.html
> > 
> > What do people think of them?
> > 
> 
> I've heard they are expensive.  They say up to 8 boards can work
> together, but don't give a height diminsion.  I kind of doubt 
> you could
> populate all your slots with them.  At least not on my board. :(
> 
> Ken
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20010205/7474af02/attachment.html>

From zolia at lydys.sc-uni.ktu.lt  Tue Feb  6 05:53:19 2001
From: zolia at lydys.sc-uni.ktu.lt (zolia)
Date: Tue, 6 Feb 2001 12:53:19 +0200 (EET)
Subject: BSc diploma & beowulf
Message-ID: <Pine.LNX.4.21.0102061252470.23151-100000@lydys.sc-uni.ktu.lt>


hello,

i was reading this list nearly a year. I've made a small cluster based on
debian and mpi; ran mpqc w/ mpipro and did some successful computations,
but for my BSc diploma i have to create some full functional application,
and i would like it to run on my cluster. I thought about few
things: implement face morphing algorithm and parallelize it. Other would
be to write some monitoring/management programs, maybe with snmp, but in
this case i don't know for sure what it would be (what tasks to manage, 
monitor etc..) :/ 

If you have any suggestions or new ideas what program would be usefull,
please let me know.

thanx,

====================================================================
Antanas Masevicius             Kaunas University of Technology
Studentu 48a-101               Computer Center
LT-3028 Kaunas                 LITNET NOC UNIX Systems Administrator
Lithuania                      E-mail: zolia at sc.ktu.lt


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kragen at pobox.com  Tue Feb  6 14:11:46 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Tue, 6 Feb 2001 14:11:46 -0500 (EST)
Subject: Scyld and Red Hat 7
Message-ID: <200102061911.OAA16710@kirk.dnaco.net>

"Stephen Gaudet" <sgaudet at atipa.com> writes:
> > The one reason that could make me upgrade is the installation of a 2.4
> > kernel. Since RH 7.0 does not have it, there is no reason to upgrade yet.
> 
> Here's another reason you might be interested in if looking to use large
> data sets.

Presumably you're talking about being interested in 2.4, not RH 7.

> Latest Linux kernel holds appeal for IT
> 
> The keepers of the Linux operating system have made improvements to the core
> technology that should make it easier to find lost data.
> The biggest addition to the release of Linux kernel 2.4.1 is the ReiserFS,
> which is a journaling file system. Journaling file systems are key to
> operating systems and applications used over extended corporate networks
> because they allow administrators to more quickly recover data in the event
> of system failure.

IMHO, this is not a particularly good explanation of the situation;
metadata-journaling filesystems like ReiserFS sacrifice a little
performance in the average case (when everything is working fine) to
make fscking after a crash very quick.

If you're losing any significant amount of your cluster time to
fscking, you probably have bigger problems that you should address
first.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kragen at pobox.com  Tue Feb  6 14:11:47 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Tue, 6 Feb 2001 14:11:47 -0500 (EST)
Subject: Big Iorn
Message-ID: <200102061911.OAA16720@kirk.dnaco.net>

"Per Jessen" <per at computer.org> writes:
> What would you like to know ? 
> I doubt if the z-server architecture is particularly advanced, but it's
> probably on a par with other modern processors. 
> I've done system-level development (mostly assembler) for the 370 and 
> 390 architectures for 10-12 years - ask away. I've done VM, MVS and TPF - 
> not much else runs on 390 - except for Linux now.

I'm curious:
- in pure (integer, symbolic, or floating-point) computational speed,
  without much memory access, how do the 390 processors compare to
  other modern CPUs?  (I know that's not what they're sold for, but
  I'm interested to hear the answer.)
- in memory bandwidth (stream benchmarks, for example), how do they compare?
- in I/O bandwidth, how do they compare? 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kragen at pobox.com  Tue Feb  6 14:11:46 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Tue, 6 Feb 2001 14:11:46 -0500 (EST)
Subject: Scyld and Red Hat 7
Message-ID: <200102061911.OAA16710@kirk.dnaco.net>

"Stephen Gaudet" <sgaudet at atipa.com> writes:
> > The one reason that could make me upgrade is the installation of a 2.4
> > kernel. Since RH 7.0 does not have it, there is no reason to upgrade yet.
> 
> Here's another reason you might be interested in if looking to use large
> data sets.

Presumably you're talking about being interested in 2.4, not RH 7.

> Latest Linux kernel holds appeal for IT
> 
> The keepers of the Linux operating system have made improvements to the core
> technology that should make it easier to find lost data.
> The biggest addition to the release of Linux kernel 2.4.1 is the ReiserFS,
> which is a journaling file system. Journaling file systems are key to
> operating systems and applications used over extended corporate networks
> because they allow administrators to more quickly recover data in the event
> of system failure.

IMHO, this is not a particularly good explanation of the situation;
metadata-journaling filesystems like ReiserFS sacrifice a little
performance in the average case (when everything is working fine) to
make fscking after a crash very quick.

If you're losing any significant amount of your cluster time to
fscking, you probably have bigger problems that you should address
first.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From Eugene.Leitl at lrz.uni-muenchen.de  Tue Feb  6 15:12:24 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl at lrz.uni-muenchen.de)
Date: Tue, 06 Feb 2001 21:12:24 +0100
Subject: Scyld and Red Hat 7
References: <200102061911.OAA16710@kirk.dnaco.net>
Message-ID: <3A805AA8.D3E08970@lrz.uni-muenchen.de>

kragen at pobox.com wrote:
 
> IMHO, this is not a particularly good explanation of the situation;
> metadata-journaling filesystems like ReiserFS sacrifice a little
> performance in the average case (when everything is working fine) to
> make fscking after a crash very quick.

I have the impression ReiserFS also offers much better performance
and noticeably better raw bit utilization in case of many small files.
Also, the roadmap is at where the goodies are. It is not just a fs...

For a good time call:
	http://www.namesys.com/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lowther at att.net  Wed Feb  7 08:56:54 2001
From: lowther at att.net (Ken)
Date: Wed, 07 Feb 2001 08:56:54 -0500
Subject: Scyld and Red Hat 7
References: <200102061911.OAA16710@kirk.dnaco.net> <3A805AA8.D3E08970@lrz.uni-muenchen.de>
Message-ID: <3A815426.138B251E@att.net>

Eugene.Leitl at lrz.uni-muenchen.de wrote:
> 
> kragen at pobox.com wrote:
> 
> > IMHO, this is not a particularly good explanation of the situation;
> > metadata-journaling filesystems like ReiserFS sacrifice a little
> > performance in the average case (when everything is working fine) to
> > make fscking after a crash very quick.
> 
> I have the impression ReiserFS also offers much better performance
> and noticeably better raw bit utilization in case of many small files.
> Also, the roadmap is at where the goodies are. It is not just a fs...
> 
> For a good time call:
>         http://www.namesys.com/
> 

I have had crashes where fscheck required manual intervention and ended
with the statement: "File system altered!".

Maybe not too bad on an individual node, but I'd rather see the ReiserFS
give me that little "using old" blurb rush by on the head node after a
crash.  RSF is effectively putting all that data you crunched into a new
file and keeping the old on hand until the new is successfully written
as opposed to opening the old and overwriting it.  If you crash during
the write, you lose that file.  Of course, you could always have the
software writing dupicates in case of a crash.

Ken

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From kragen at pobox.com  Wed Feb  7 12:41:23 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Wed, 7 Feb 2001 12:41:23 -0500 (EST)
Subject: Scyld and Red Hat 7
Message-ID: <200102071741.MAA03604@kirk.dnaco.net>

Ken <lowther at att.net> writes:
> I have had crashes where fscheck required manual intervention and ended
> with the statement: "File system altered!".

On ext2fs, reiserfs, or what?

> Maybe not too bad on an individual node, but I'd rather see the ReiserFS
> give me that little "using old" blurb rush by on the head node after a
> crash.  

"using old"?

> RSF is effectively putting all that data you crunched into a new

"RSF"?


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at coffee.psychology.mcmaster.ca  Wed Feb  7 14:26:01 2001
From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn)
Date: Wed, 7 Feb 2001 14:26:01 -0500 (EST)
Subject: ServerWorks HEsl reviewed
Message-ID: <Pine.LNX.4.10.10102071414460.20232-100000@coffee.psychology.mcmaster.ca>

	http://www.anandtech.com/showdoc.html?i=1414&p=18

I'm embarassed to admit that I noticed this review.  why?
it's on anandtech.  I *can* however, honestly claim I only did it 
because I was putting off cleaning the litter boxes (4 of them!)

anyway, the short story is: extremely mundane Sandra scores.
double-wide PC133 doesn't deliver anything there, though most
of the other benchmarks were modestly better than other boards.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From RSchilling at affiliatedhealth.org  Wed Feb  7 17:58:29 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Wed, 7 Feb 2001 14:58:29 -0800 
Subject: An IT Research and Development center
Message-ID: <51FCCCF0C130D211BE550008C724149EBE104C@mail1.affiliatedhealth.org>

I have recently been given cause to research the feasibility of opening up
an advanced technology research center, and I'm wondering if any of you or
your organizations would be interested in using one if it were available.

The goal of the center would be to host a place where organizations,
researchers, and students could go to get their hands on systems that might
not be otherwise available.  This would include:

     virtual reality
     simulators, static and full-motion
     beowulf clusters
     geographic information systems

The list is not finalized, but one aim would be to open the center to as
many disciplines as possible.  The center would also aim to host
conferences, and provide educational programs, such as an after school
technology program for youths.

So far, it looks like if the participants are willing to share the costs, it
could be an affordable resource.

Thanks for considering  . . .

Richard Schilling
Webmaster / Web Integration Programmer
Affiliated Health Services
Mount Vernon, WA USA
phone 01 360 856 7129
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20010207/5fc097af/attachment.html>