[Beowulf] reboot without passing through BIOS?

Kilian CAVALOTTI kilian at stanford.edu
Thu Jul 31 16:00:45 EDT 2008

On Wednesday 30 July 2008 09:13:56 am David Mathog wrote:
> If one were to build nodes without ECC memory it would probably be a
> good idea to reboot them from time to time to clean out whatever bad
> bits might have accumulated.  It then occurred to me that doing so
> would require a trip through the BIOS on every reboot, at least on
> every x86 based computer I'm familiar with.  That is not a terrible
> thing, but it made me wonder if it is really necessary. 

I may be totally missing the point, but doesn't the memory need to be 
physically (as in electrically) reset in order to clean out those bad 
bits? And doesn't this require a hard reboot, for the machine to be 
power cycled, so that memory cells are reinitialized? 

I mean, if the BIOS stage is skipped, as in kexec'ing a new kernel, 
electrical initialization doesn't occur, and the bad bits will probably 
stick there. Unless the kernel does this kind of scrubbing in its 
initialization phase, which I don't know, I don't see any reason why 
the memory would be cleaned from errors.

And another point I wonder about, is to know if a reboot would do any 
good for non-ECC memory anyway. As far as I understand it, a memory 
error is either a repeatable, hard one, like a bad chip, and a reboot 
won't change anything about it, since the hardware is faulty ; either a 
transient, soft error, where a bad value is read once, but where next 
reads are ok. So unless there's a sort of accumulation somewhere in the 
soft case, I don't really understand what a reboot could do about it?

If you got some light to shed on this, I'd be interested.

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list