becker at scyld.com
Mon Oct 21 11:42:53 EDT 2002
On Mon, 21 Oct 2002 alvin at Maggie.Linux-Consulting.com wrote:
> On Mon, 21 Oct 2002, Manel Soria wrote:
> > We are looking for a diagnostic tool that (ideally) would
> > allow us to determine what component/s of a node fail. It should
> > test the processor, RAM, disk and network cards under heavy load
> > but in repeatable conditions.
> testing those items individually is a lot of work ...
> test process/proceedure is more important than the actual test ??
> - many different cpu/disk/memory/nic tests
The only Linux hardware tests you list are a CPU test (cpuburn) and many
entries for memtest86. You missed several Linux "SMART"-based disk
diagnostics tools and the NIC diagnostics at
> > -Monitor the CPU temperature.
> use i2c-2.6.5 and lm_sensors to read the health monitors on the
> also get a regular digital thermometer from your local hw store
> for sanity checking
Good advice, since lm_sensors can only guess what type of thermal sensor
is on the motherboard. When the guessed calibration is off, it is
usually way off, but you cannot count on that.
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Scyld Beowulf cluster system
Annapolis MD 21403 410-990-9993
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf