[Beowulf] What services do you run on your cluster nodes?

Patrick Geoffray patrick at myri.com
Tue Sep 23 22:03:05 EDT 2008

Perry E. Metzger wrote:
>> You realize that most big HPC systems are using interconnects that
>> don't generate many or any interrupts, right?
> Of course. Usually one even uses interrupt pacing/mitigation even in
> gig ethernet on a modern machine -- otherwise you're not going to get
> reasonable performance. (For 10Gig, you have to do even uglier
> tricks.)

What Greg is trying to say is that high-speed interconnects used in HPC 
do not raises interrupts at all. Data is delivered directly in 
user-space, and the app (or the communication library) busy polls on it, 
no kernel/OS involved. There is one app process per core (usually bound 
to improve locality in a NUMA architecture). When a daemon wakes up, it 
will preempt a core, and the app process just has to wait. If the app is 
tighly coupled, that will delay everybody.

You can say that a daemon waking up every couple of hours is no big 
deal. However, if these events are uniformly distributed on a couple 
thousand nodes, it will happen a couple thousand times more often. You 
can solve this by gang scheduling the daemons across all the nodes, or 
you can turn them off.

However, it is only important for large machines with tightly coupled 
codes. For the majority of the cases, it's just being anal.

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list