[Beowulf] HPC fault tolerance using virtualization

John Hearns hearnsj at googlemail.com
Mon Jun 15 13:59:37 EDT 2009


I was doing a search on ganglia + ipmi (I'm looking at doing such a
thing for temperature measurement) when I cam across this paper:

http://www.csm.ornl.gov/~engelman/publications/nagarajan07proactive.ppt.pdf

Proactive Fault Tolerance for HPC using Xen virtualization

Its something I've wanted to see working - doing a Xen live migration
of a 'dodgy' compute node, and the job just keeps on trucking.
Looks as if these guys have it working. Anyone else seen similar?

John Hearns
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list