[Beowulf] reboot without passing through BIOS?
kilian at stanford.edu
Thu Jul 31 16:00:45 EDT 2008
On Wednesday 30 July 2008 09:13:56 am David Mathog wrote:
> If one were to build nodes without ECC memory it would probably be a
> good idea to reboot them from time to time to clean out whatever bad
> bits might have accumulated. It then occurred to me that doing so
> would require a trip through the BIOS on every reboot, at least on
> every x86 based computer I'm familiar with. That is not a terrible
> thing, but it made me wonder if it is really necessary.
I may be totally missing the point, but doesn't the memory need to be
physically (as in electrically) reset in order to clean out those bad
bits? And doesn't this require a hard reboot, for the machine to be
power cycled, so that memory cells are reinitialized?
I mean, if the BIOS stage is skipped, as in kexec'ing a new kernel,
electrical initialization doesn't occur, and the bad bits will probably
stick there. Unless the kernel does this kind of scrubbing in its
initialization phase, which I don't know, I don't see any reason why
the memory would be cleaned from errors.
And another point I wonder about, is to know if a reboot would do any
good for non-ECC memory anyway. As far as I understand it, a memory
error is either a repeatable, hard one, like a bad chip, and a reboot
won't change anything about it, since the hardware is faulty ; either a
transient, soft error, where a bad value is read once, but where next
reads are ok. So unless there's a sort of accumulation somewhere in the
soft case, I don't really understand what a reboot could do about it?
If you got some light to shed on this, I'd be interested.
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf