[Beowulf] Differenz between a Grid and a Cluster???

Joe Landman landman at scalableinformatics.com
Thu Sep 22 08:05:23 EDT 2005


In a nutshell, a grid defines a virtualized cloud of processing/data 
motion across one or more domains of control and 
authentication/authorization, while a cluster provides a virtualized 
cloud of processing/data motion across a single domain of control and 
authentication/authorization.  Clusters are often more tightly coupled 
via low latency network or high performance fabrics than grids.  Then 
there is the relative hype and the marketing/branding ...

Robert G. Brown wrote:
> Mark Hahn writes:
> 

[...]

> To be really fair, one should note that tools have existed to manage
> moderate cluster heterogeneity for single applications since the very
> earliest days of PVM.  The very first presentation I ever saw on PVM in
> 1992 showed slides of computations parallelized over a cluster that
> included a Cray, a pile of DEC workstations, and a pile of Sun
> workstations.  PVM's aimk and arch-specific binary path layout was

aimk is IMO evil.  Not PVM's in particular, but aimk in general.  The 
one I like to point to is Grid Engine.  It is very hard to adapt to new 
environments.

When you run on multiple heterogenous platforms and you are dealing with 
floating point codes, you need to be very careful with a number of 
things, including rounding modes, precision, sensitivity of the 
algorithm to roundoff error accumulation at different rates, the fact 
that PCs are 80 bit floating point units, and RISC/Cray machines use 
32/64 bits and 64/128 for doubles.  It could be done, but if you wanted 
reliable/reasonable answers, you had to be aware of these issues and 
make sure you code was designed appropriately.

[...]

> Some of the gridware packages do exactly this -- you don't distribute
> binaries, you distribute tarball (or other) packages and a set of rules
> to build and THEN run your application.  I don't think that any of these
> use rpms, although they should -- a well designed src rpm is a nearly

RPM is not a panacea.  It has lots of problems in general.  The idea is 
good, just the implementation ranges from moderately ok to absolutely 
horrendous, depending upon what you want to do with it.  If you view RPM 
as a fancy container for a package, albiet one that is slightly brain 
damaged, you are least likely to be bitten by some of its more 
interesting features.  What features?  Go look at the RedHat kernels 
circa 2.4 for all the work-arounds they needed to do to deal with its 
shortcomings.

I keep hearing how terrible tarballs and zip files are for package 
distribution.   But you know, unlike RPMs, they work the same, 
everywhere.  Sure they don't have versioning and file/package registry 
for easy installation/removal.  That is an easily fixable problem IMO. 
Sure they don't have scripting of the install.  Again, this is easily 
fixable (jar files and the par files are examples that come to mind for 
software distribution from java and perl respectively).

> Most grids are likely not THAT hardware heterogeneous so that only a
> handful (e.g. i386, x86_64) of binaries need to be maintained.  Because
> of binary compatibility, these grid applications give up at most certain
> optimizations when run on imperfectly matched platforms, e.g. i386 on an
> Opteron.  That leaves plenty of room for very beneficial scaling as far
> as the cycle consumer is concerned, even if it is less than hardware
> optimal.  It also permits the grid organization to trade off the human
> costs of managing multiple binary images against the efficiency costs of
> running a generic version even where it isn't optimal.

Generally, if an application wrapper is in a container, say a .jar file 
(not advocating Java, but just bear with me), which runs on the 
execution target, and copies over the relevant binary into a temporary 
binary directory, then you don't need anything installed on the grid 
system execution host apart from a queuing system connection.

> Basically, it isn't that hard to manage binaries for x86_64 and i386 --
> I have to do this in our own cluster, let alone a grid.  Nor is it that
> bad (performance-wise) if you have to run i386 on x86_64. 

We have seen up to a factor of 2 on chemistry codes.  If your run takes 
2 weeks (a number of our customers take longer than 2 weeks), it 
matters.  If your run takes 2 minutes, it probably doesn't matter unless 
you need to do 10000 runs.

It is not hard to manage binaries in general with a little thought and 
design.  It is not good to purposefully run a system at a lower speed as 
a high performance computational resource unless the cost/pain of 
getting the better binaries is to large or simply impossible (e.g. some 
of the vendor code out there is still compiled against RedHat 7.1 on 
i386, makes supporting it ... exciting ... and not in a good way)

> For most of
> the (embarrassingly parallel) jobs that use a grid in the first place,
> the point is the massive numbers of CPUs with near perfect scaling, not
> how much you eke out of each CPU.

Grids are used not just for embarrassingly parallel jobs.  They are also 
used to implement large distributed pipeline computing systems (in bio 
for example).  These systems have throughput rates governed in large 
part by the performance per node.  Running on a cluster would be ideal 
in many cases, as you will have that nice high bandwidth network fabric 
to help move data about (gigabit is good, IB and others are better for 
this).

Rapidly emerging from the pipeline/grid world for bio computing is 
something we have been saying all along, that the major pain is (apart 
from authentication, scheduling, etc) data motion.  There, CPU 
speed/type doesn't matter all that much.  The problem is trying to force 
fit a steer through a straw.  There are other problems associated with 
this as well, but the important aspect of these systems is measured in 
throughput (which is not number of jobs of embarrassingly parallel work 
per unit time, but how many threads and how much data you can push 
through per unit time).  To use the steer and straw analogy,  you can 
build a huge pipeline by aggregating many straws.  Just don't ask the 
steer how he likes having parts of him being pushed through many straws. 
  The pipeline for these folks is the computer (no not the network). 
Databases factor into this mix.   As do other things.  The computations 
are rarely floating point intensive.

Individual computation performance does matter, as pipelines do have 
transmission rates at least partially impacted by CPU performance.  In 
some cases, long pipelines with significant computing tasks are CPU 
bound, and can takes days/weeks.  These are cases prime for acceleration 
by leveraging the better CPU technology.

>> in that way of thinking, grids make a lot of sense as a 
>> shrink-wrap-app farm.
> 
> Sure.  Or farms for any application where building a binary for the 2-3
> distinct architectures takes five minutes per and you plan to run them
> for months on hundreds of CPUs.  Retuning and optimizing per
> architecture being strictly optional -- do it if the return for doing so
> outweighs the cost.  Or if you have slave -- I mean "graduate student"
> -- labor with nothing better to do:-)

Heh... I remember doing empirical fits to energy levels and band 
structures and other bits of computing as an integral part of the 
computing path for my first serious computing assignment in grad school. 
  I seem to remember thinking it could be automated, and starting to 
work on the Fortran code to do.  Perl was quite new then, not quite to 
version 3.

Pipelines are set up and torn down with abandon.  They are virtualized, 
so you never know which bit of processing you are going to do next, or 
where your data will come from, or where it is going to until you get 
your marching orders.  It is quite different from Monte Carlo.  It is 
not embarrassingly parallel per node, but per pipe which may use one 
through hundreds (thousands) of nodes.

Most parallelization on clusters is the wide type:  you distribute your 
run over as many nodes as practical for good performance. 
Parallelization on grids can either be trivial ala Monte Carlo, or 
pipeline based.  Pipeline based parallelism is getting as much work done 
by creating the longest pipeline path practical keeping as much work 
done per unit time as possible (and keeping the communication costs 
down).  Call this deep type parallelism   On some tasks, pipelines are 
very good for getting lots of work done.  For other tasks they are not 
so good.   There is an analogy with current CPU pipelines if you wish to 
make it.

Joe

> 
>    rgb
> 
>>
>> regards, mark hahn.
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list