|
We recently received an email from Joe Springer
asking a common question about clusters:
Question: I want to run an application whose memory and disk
requirements are larger
than any one node. Could using a cluster allow me to
run an application such that "memory" and disk needs are
fulfilled by being distributed...?
The short answer is: It depends for memory, yes for storage, but there
is more to it than that ...
The longer answer is bit more involved. Let's look at
memory first. Many applications that run on clusters
would never fit on a single node. If a program that uses a large
data set is run across a
cluster, then the data is usually sliced and diced across the nodes,
communication is done via MPI (Message Passing Interface).
Note that MPI is essentially a memory copy operation as
each cluster node is like an island it has its own memory and HDD
(sometime no HDD, but a network File System is used.) When a
node needs information from another node, that information
is sent across the network and placed into memory.
If you want to use a cluster to expand the memory on a node,
then this can gets a bit more involved. Remember that moving data
across a network is often an order of magnitude slower than
accessing it on a motherboard. For this reason, attempts to
create a shared memory clusters have been met with various levels
of success, but there is no general solution. After all, you are still
passing memory (messages) between nodes at the lowest level. There is one company,
ScaleMP, that provides a
software solution for a cluster wide shared memory model. i.e.
your program might be able to use more memory than a single node,
but I do not have experience with this software (and it requires InfiniBand). There are also
"memory appliances" like the Violin Scalable Memory that can support up to 10 TBytes of DRAM.
In terms of storage there are many solutions. A simple solutions
is some form of attached storage like the JackRabbit from
Scalable Informatics. If you are considering parallel
I/O (many nodes reading and writing from the same file/filesystem
that gets very application specific. Check out our
File Systems pages for
more information.
To answer to your final question:
Which Linux project or distro would be best for such a
situation?
The specifics of your application requirements will
determine how you would use a cluster. I'm not sure I have
enough information to give a complete answer.
You may find it useful to look at our
Learning About Clusters
section and take a look at our links
Links Page.
Finally, maybe some of our readers will supply their comments as well. Thanks for asking!
Comment on this article
You must login to leave comments...
Other Visitors Comments
There are no comments currently....
|