Need a little help getting started

Donald Becker becker at scyld.com
Mon Nov 24 17:13:53 EST 2003


On Sun, 23 Nov 2003, William J Mandra wrote:

> Hello all. I am new to this lit and apologize in advance if any of the
> questions that I have are silly but here it goes.  I am in the design phase
> of a cluster and I am having some trouble figuring out which software
> packages to use.  The cluster will originally consist of 12 nodes linked via
> 100BaseT switched ethernet and a cluster controller. The following are some
> of my requirements:
>   1.  All nodes netboot off of the cluster controller
>   2.  automatic process migration and load balancing (openMOSIX)

Do you require transparent process migration at run-time (e.g. Mosix)
which imposed significant overhead, or will directed process migration
work?

>   3.  distributed shared memory

Ahhh, you have control of your application, which implies that you
likely won't benefit from transparent process migration.

There are several Distributed Shared Memory (DSM) systems, with
different design tradeoffs.  Since it's very easy to thrash a DSM
system, you should select one that matches you application's needs and
then carefully tune your application.

You should treat the DSM system exactly the same as MPI or the
message-passing subsystem of PVM: a library that fits with the rest of
the system, not the piece around which everything else revolves.

> The cluster controller will be connected to both the main network and the
> private cluster network and I would like to be able to start applications on
> the cluster remotely via the cluster controller.

That's a normal configuration.  Almost every cluster design configures
one (or a small number of) master and designates the other machines as
compute nodes.  The Scyld system goes further by making the compute
slaves capable of only running processes initiated and controlled by the
master.

> I have been doing an exhaustive amount of research on all of the different
> software available to accomplish this, but I have fallen short in figuring
> out which ones will work together.

You'll find two approaches:
  - Monolithic designs, that have no independently replaceable subsystems
  - Component designs, that use independent subsystem

The challenge is implementing component designs using an over-all
architecture that results in a simple system.  Most approaches using
independent components end up being unable to evolve.  The result is
overly feature-full, complex subsystems as individual try to address new
problems using only the subsystem they understand and have control over.

> I am planning on using Red Hat 9 on all of the nodes in the cluster.

You should understand what you are asking for: perhaps you mean "I need
library and application compatibility with Red Hat 9".  Because you
aren't going to get process migration and DSM without modifying the
kernel and/or libraries.

> I just need a little more information to give me that push in the right
> direction.  I do have some time, s I am not planning to start building the
> cluster until March or April.

You should consider Gigabit Ethernet a likely baseline network by then.
If your application requires DSM, there is a fair chance that you would
benefit from Remote DMA (RDMA) or Remote Write in SCI, Myrinet, Quadrics
or Infiniband.  Selecting one of those will impose a library interface,
and you may find that you have few additional decisions to make.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list