[Beowulf] HPC fault tolerance using virtualization

John Hearns hearnsj at googlemail.com
Mon Jun 15 13:59:37 EDT 2009

I was doing a search on ganglia + ipmi (I'm looking at doing such a
thing for temperature measurement) when I cam across this paper:


Proactive Fault Tolerance for HPC using Xen virtualization

Its something I've wanted to see working - doing a Xen live migration
of a 'dodgy' compute node, and the job just keeps on trucking.
Looks as if these guys have it working. Anyone else seen similar?

John Hearns
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list