[Beowulf] Re: Purdue Supercomputer
hahn at mcmaster.ca
Sat May 10 20:28:03 EDT 2008
> clusters.What if you have 1 of the systems in the cluster down or any
> network failures.Can make our cluster(2-5 sytems only) work properly.
normally, the cluster's management software will monitor and deal with
node failure. at least that means noticing a failure and ensuring that the
node isn't used (until fixed) and dealing with any jobs that involved the
node. it's also fairly common for server nodes (not just slave/compute
nodes) to have some failover/high-availability features. (HA can also be
done for compute jobs, but IMHO it's not worth considering in normal cases,
ie, infrequent node failures.)
> Also what about geographically distant cluster systems.Say 1 in USA
sure, there's nothing about clusters that really assumes locality,
though obviously geographic distribution has effects on achievable
performance for wide-area MPI or distant file access. wide-area
clustering seems more of a political stunt to me (yes, including grids.)
> and other in India.How do we manage our cluster in mishaps or
> difficult conditions.
I find that with IPMI and console redirection, it's very rarely necessary to
care about where your nodes are, at least from a sysadmin perspective.
you need to ask what the benefit is, though, in a wide-area cluster
(versus seprate, local ones.) I wouldn't assume that management would
be easier, and obviously only gratuitously parallel apps (sometimes called
embarassinly parallel) could use it.
> lastly, how about having beowulf cluster systems in space.putting 1 pc
> on each planet or celestial body that we want to track and the server
> in india.
just because it could be done doesn't mean it makes sense...
> is linux the best choice in such cases...
your choice of OS depends primarily on your preference and experience.
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf