[Beowulf] HPC fault tolerance using virtualization
hearnsj at googlemail.com
Mon Jun 15 13:59:37 EDT 2009
I was doing a search on ganglia + ipmi (I'm looking at doing such a
thing for temperature measurement) when I cam across this paper:
Proactive Fault Tolerance for HPC using Xen virtualization
Its something I've wanted to see working - doing a Xen live migration
of a 'dodgy' compute node, and the job just keeps on trucking.
Looks as if these guys have it working. Anyone else seen similar?
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf