[Beowulf] HPC fault tolerance using virtualization

John Hearns hearnsj at googlemail.com
Tue Jun 16 05:05:07 EDT 2009

2009/6/16 Kilian CAVALOTTI <kilian.cavalotti.work at gmail.com>

> My take on this is that it's probably more efficient to develop
> checkpointing
> features and recovery in software (like MPI) rather than adding a
> virtualization layer, which is likely to decrease performance.
The performance hits measured by Panda et. al. on Infiniband connected
hardware are of the order of 5 percent (I may be wrong here). I believe that
if we can get features like live migration of failing machines, plus
specialized stripped-down virtual machines specific to job types then we
will see virtualization becoming mainstream in HPC clustering.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20090616/06026c8b/attachment-0001.html>
-------------- next part --------------
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list