Home
Learning About Clusters
Programming Clusters
Administering Clusters
Benchmarking Clusters
File Systems for Clusters
Cluster Applications/Grid
Cluster News
Site Map
 
    Home
Search
Monkey Support
Main Menu
Home
News
Features
Columns
Reviews
Links
FAQ's
Contact
Site Information
Cluster Classifieds
Projects
Conference Reports
Cluster Agenda
Site Map
Add This Article
Login Form





Lost Password?
No account yet? Register
Syndicate

Visit Basement Supercomputing

Cluster Builder

Appro International


Ask The Monkey's: Big Memory and Storage PDF Print E-mail
Written by Douglas Eadline   
Friday, 25 July 2008
We recently received an email from Joe Springer asking a common question about clusters:

Question: I want to run an application whose memory and disk requirements are larger than any one node. Could using a cluster allow me to run an application such that "memory" and disk needs are fulfilled by being distributed...?

The short answer is: It depends for memory, yes for storage, but there is more to it than that ...

The longer answer is bit more involved. Let's look at memory first. Many applications that run on clusters would never fit on a single node. If a program that uses a large data set is run across a cluster, then the data is usually sliced and diced across the nodes, communication is done via MPI (Message Passing Interface). Note that MPI is essentially a memory copy operation as each cluster node is like an island it has its own memory and HDD (sometime no HDD, but a network File System is used.) When a node needs information from another node, that information is sent across the network and placed into memory.

If you want to use a cluster to expand the memory on a node, then this can gets a bit more involved. Remember that moving data across a network is often an order of magnitude slower than accessing it on a motherboard. For this reason, attempts to create a shared memory clusters have been met with various levels of success, but there is no general solution. After all, you are still passing memory (messages) between nodes at the lowest level. There is one company, ScaleMP, that provides a software solution for a cluster wide shared memory model. i.e. your program might be able to use more memory than a single node, but I do not have experience with this software (and it requires InfiniBand). There are also "memory appliances" like the Violin Scalable Memory that can support up to 10 TBytes of DRAM.

In terms of storage there are many solutions. A simple solutions is some form of attached storage like the JackRabbit from Scalable Informatics. If you are considering parallel I/O (many nodes reading and writing from the same file/filesystem that gets very application specific. Check out our File Systems pages for more information.

To answer to your final question:

Which Linux project or distro would be best for such a situation?

The specifics of your application requirements will determine how you would use a cluster. I'm not sure I have enough information to give a complete answer. You may find it useful to look at our Learning About Clusters section and take a look at our links Links Page.

Finally, maybe some of our readers will supply their comments as well. Thanks for asking!

Comment on this article
You must login to leave comments...


Other Visitors Comments
There are no comments currently....
Last Updated ( Friday, 25 July 2008 )
 
Next Article >
Appro International
Poll
What is the range of cores you use for MPI jobs ?
 
Who's Online
We have 11 guests online
Latest Stories/News
Popular
Worldwide Front Page Visits

Locations of visitors to this page

Monkey Stats
Google PageRank modul - Camelpark SEO centrum

 

Creative Commons License
  ©2005-2008 Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.