[Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Håkon Bugge hbugge at platform.com
Thu Aug 14 16:32:22 EDT 2008


Gus' numbers makes sense to me. I assume his 
workload consists of multiple sized jobs, serial, 
modest parallel, and parallel jobs using all 
resources. Without pre-emptive scheduling, the 
batch queue system has to starve the system in 
order to run the larger jobs. Obviously, before a 
job which consumes all resources starts , then 
all resources have to be idle. Which means no 
jobs can't be scheduled, even though they're idle.

Another interesting metric is of course how many 
of the jobs runs to successful completion, i.e., 
are not killed due to resource limits, or 
crashes, or for other reasons. That's what I call net vs. gross utilization.


Thanks, Håkon
(opinions of myself, now working for Platform Computing)

At 19:45 14.08.2008, Gus Correa wrote:
>Hello Mark and list
>
>The measurement was based on walltime.
>It just refers to the user occupancy of the cluster, versus what was left idle
>(for all reasons, e.g. lack of resources to 
>serve large queued jobs, lack of enough jobs to fill all nodes, etc).
>The number is simply the utilized resources 
>divided by the available resources.
>This gives a coarse measure of machine utilization.
>Take the walltime of all jobs multiplied by the 
>number of nodes (or CPUs) each job used,
>sum them,
>and divide by the duration of this period (say, 
>one year) times the number of nodes (or CPUs) in the cluster.
>
>Maybe 70% utilization is low compared to 
>airplane seats, subway occupancy, hotel rooms, restaurant tables,
>Internet, telephone networks, and perhaps to other clusters.
>I don't know, I am not an operations research person.
>The only other number I could find for a (well 
>used) large cluster in our science field was below 70%,
>and now Chris mentioned 77%.
>
>Are there published numbers of resource utilization for other machines,
>say, public clusters in the US, Canada, Europe, world?
>
>Yes, our cluster is dedicated to a small group 
>of earth scientists and students (20-40 users)
>and it is small (32 nodes, 64 cpus). Cluster 
>size and user population size most likely make a difference,
>but in any case, I would be interested in seeing any other numbers
>for any kind of cluster.
>
>Regards,
>Gus Correa
>
>--
>---------------------------------------------------------------------
>Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
>Lamont-Doherty Earth Observatory - Columbia University
>P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
>---------------------------------------------------------------------
>
>
>Mark Hahn wrote:
>
>>>>It appears we've averaged almost 77% utilisation
>>>>since the beginning of 2004 (when our current usage
>>>>system records begin).
>>>Thank you very much for the data point!
>>>
>>>I've insisted here that above 70% utilization is very good,
>>>given the random nature of demand and jobs on queues in the academia, etc.
>>
>>
>>that sounds very strange to me.  do you really 
>>mean that 30% of your cpu time is idle?  I wonder whether there could be a big
>>difference in methodology.  for instance, if you're using an MPI library
>>(probably based on tcp) that doesn't spin-wait but blocks as for disk IO
>>say 20% of the time, then you might consider this to be 80% utilization.
>>an MPI that spin-waits might show 100% with the same perf/throughput.
>>
>>70% utilization is terrible if you really mean "fraction of allocatable cpu
>>time occupied by jobs".  that is at the job 
>>scheduler level, not at the kernel scheduler level.
>>
>>>However, some folks would want more than 90% efficiency to get happy.
>>
>>
>>I would be embarassed to have less than 90%.  perhaps 70% would make sense
>>for a cluster dedicated to a small or 
>>narrowly-defined group.  I find that a 
>>sufficient userbase means you _always_ have 
>>something to run, of any size/resource available.
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or 
>unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
Håkon Bugge
Chief Technologist
mob. +47 92 48 45 14
off. +47 21 37 93 19
fax. +47 22 23 36 66
Hakon.Bugge at platform.com
Skype: hakon_bugge

Platform Computing, Inc.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list