robl at mcs.anl.gov
Tue Jan 29 10:39:33 EST 2002
On Mon, Jan 21, 2002 at 06:13:44PM -0800, Martin Siegert wrote:
> This is somewhat off topic - sorry for that.
it's a great topic for clusters. in an ideal world, the kernel never
oopses, but when you have N kernels and possibly dodgy hardware, it
i get frustrated with this list because topics like Martin's get
ignored, while topics like cooling with LN2, game console clusters
and anything athlon get multi-day discussions.
[snip problem report ]
> The first thing I would like to do is to log the oops message. Right now
> it goes to the console only - it does not appear in the log files
> although syslog sends everything of severity *.info to /var/log/messages.
i guess you've read Documentation/oops-tracing.txt , but if not, it's
a good start.
depending on where the panic happens, the part of the kernel that
would normally write that oops out to disk doesn't run.
So you've got a few options:
. typing off the screen: sucks. a lot. and is highly error prone.
and the kernel console blanking mechanism might kick in ( and since
the kernel has paniced, it won't listed for input signals and unblank
itself ) but if you've got no other option...
( one time a guy took a picture of the oops with a digital camera and
sent that to me. that was fun. I don't have any character regognition
software, but if someone knows of a linux OCR tool that won't mind a
screenful of hex, i'd like to hear about it )
. serial console: not bad. if it's just one machine, you can pass
parameters to your kernel and capture all kernel messages over the
serial port. Documentation/serial-console.txt has all the info you
. netconsole: http://people.redhat.com/mingo/netconsole-patches/
like a serial console, but using your network device instead of a
serial device. It's a kernel patch and a convienece script for the
sender and a userspace tool for the reciever to display the messages.
Patching a kernel and setting up yet another tool might be a bit much,
but man is it cool to see it work :>
. patch your kernel to support "dump log to swapfile" or "dump log to
disk". I haven't set something like this up, but always meant to
try it out...
Basically the name of the game is to get that oops into a form you can
feed to ksymoops, then hope the backtrace it prints out gives you a
clue. ( like "oh, the last thing it called was do_scsi_service... maybe
i have a dogdy scisi controller ).
Anybody else know of good ways ( even funny bad ways might be
entertaining) to capture an oops?
A215 0178 EA2D B059 8CDF
B29D F333 664A 4280 315B
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf