Scyld + myrinet mpich-gm?

Keith Underwood keithu at
Mon Feb 5 10:14:52 EST 2001

Hmmm...  we have something similar, but not quite the same.  We have a
master w/ 100base-T to the world, gigabit fiber to a 24-10/100 + 2-1000
switch and 16 slaves (not diskless) with 10/100 and gigabit interfaces.
We only have 16 ports on our gigabit switch and out master is a different
type of machine from the 16 slaves.  We have successfully convinced the
machines to communicate over the gigabit exclusively while communicating
with the master over the 10/100.  You do need to use the Scyld MPI though.
I seriously doubt that you will get another MPI running as is.

Anyway, what we did was:

after bringing the nodes up:
	bpsh -a route add -host eth0
	bpsh -a route del default
	bpsh -a modprobe sk98lin

then on each node:
	bpsh <node> ifconfig eth1 up <nodes current IP>

Then to run an MPI job that DOES NOT run on the head:
	NO_INLINE_MPIRUN=true bpsh 0 mpiapp -p4pg /tmp/pgfile

where /tmp/pgfile is a p4 process group file.

This is a real sketchy config so don't expect too much support on it
just yet ;-)

On Sun, 4 Feb 2001, Dave Johnson wrote:

> I've gotten myself involved in bringing a small cluster up and
> into production.  I'm learning as I go, with the help of the
> archives of this mailing list.  Unfortunately the searchable
> archives at seem to be off line (I get internal
> server error), and out of date (the last messages seem to be from
> around May 2000).
> The current setup is one master with 100base-T to the world, gigabit
> fiber to a 16-10/100 + 2-1000 switch, and 12 diskless slaves with
> 10/100 and myrinet interfaces.  The Scyld release of last Monday is
> up and running, and I can bpsh to my heart's content.
> I'm stuck at the point of trying to deploy MPI.  Scyld supplies mpi-beowulf
> which does not appear to me to use bproc, and /usr/bin/mpirun and mpprun
> which do.  I've built the mpich-gm from Myricom, but their mpirun command
> does not grok bpsh, and expects either rsh or ssh daemons on each slave.
> I've tried a number of approaches that start out looking like they might
> work, but have gotten stuck after a few hours down each cowpath.
> Here is a list of some of the snags (I've lost track of some others):
> bpsh is not a full blown shell, doesn't deal well with redirection, changing
> directory before running a command, and in particular it can't be swapped for
> rsh or ssh when configuring mpich (ie -rsh=bpsh).
> The master node is outside the myrinet, I haven't a clue how to get
> it to cooperate with the slaves over ethernet yet have the slaves
> use myrinet as much as possible.
> I tried hacking on the first test in mpich-1.2..4/examples/test
> (pt2pt/third) that you get when you do make testing or runtests -check.
> Tried to get it to use /usr/bin/mpirun.  Had to get rid of -mvhome and
> -mvback args first, then tried to use bpsh to start up the mpirun on
> one node, hoping it could use GM to start up on the other slaves.
> After creating the directory in /var where it could create shm_beostat,
> Now I get truckloads of errors:
> shmblk_open: Couldn't open shared memory file: /shm_beostat
> shmblk_open failed.
> I suppose these might be from the other nodes, expecting everyone is
> sharing /var, but I'm leery of nfs mounting all of the master's /var
> on each slave.
> I tried applying the Scyld patches against the 1.2.0 mpich sources to
> the 1.2..4 sources from Myricom, but most of them went into the mpid/ch_p4
> directory, which is not built when --with-device=ch_gm is specified.
> Then I thought I'd look into the mpprun sources, but I couldn't get
> them to build even before I started hacking on them... decided to look
> elsewhere for a while.
> Tried getting sshd2 up and running on a slave node.  So far it insists
> on asking for my password and won't accept it at all.
> Has anyone got a working cluster anything like the one we're building?
> What did you have to do differently to make the various packages and
> drivers play nice with each other?  Where did I go wrong?
> Thanks,
> 	-- ddj
> 	Dave Johnson
> 	ddj at
> 	Brown University TCASCV
> _______________________________________________
> Beowulf mailing list, Beowulf at
> To change your subscription (digest mode or unsubscribe) visit

Keith Underwood                   Parallel Architecture Research Lab (PARL)
keithu at                                  Clemson University

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list